CN117947000A - CRISPR-Cas system and method - Google Patents

CRISPR-Cas system and method Download PDF

Info

Publication number
CN117947000A
CN117947000A CN202211327698.9A CN202211327698A CN117947000A CN 117947000 A CN117947000 A CN 117947000A CN 202211327698 A CN202211327698 A CN 202211327698A CN 117947000 A CN117947000 A CN 117947000A
Authority
CN
China
Prior art keywords
sequence
abcas
seq
crispr
protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211327698.9A
Other languages
Chinese (zh)
Inventor
刘俊杰
李承平
孙奥
陈之航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202211327698.9A priority Critical patent/CN117947000A/en
Publication of CN117947000A publication Critical patent/CN117947000A/en
Pending legal-status Critical Current

Links

Landscapes

  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The invention discloses a CRISPR-Cas system and a CRISPR-Cas method. The present invention provides a CRISPR-Cas system comprising: (i) AbCas pi protein; and (ii) AbCas pi guide RNA, said AbCas pi guide RNA forming a complex with said AbCas pi protein, and said AbCas pi guide RNA comprising a guide sequence that hybridizes to a target sequence in a target nucleic acid; wherein the AbCas pi protein recognizes a C base-rich protospacer adjacent motif. The system provided by the invention has the advantages that the intracellular editing activity can reach more than 10%, the intracellular editing activity is good, and the system has good transformation prospect and clinical application prospect.

Description

CRISPR-Cas system and method
Technical Field
The invention belongs to the technical field of gene editing, and relates to a CRISPR-Cas system and a CRISPR-Cas method.
Background
Clustered, regularly interspaced short palindromic repeats (clustered regularly interspaced short palindromic repeats, CRISPR) -Cas is an acquired immune system that bacteria and archaea evolved to protect against exogenous virus or plasmid invasion, which has proven to have great clinical therapeutic potential. The system realizes various operations from gene editing to gene expression regulation and control by combining guide RNA (gRNA) with a Cas protein targeting target sequence, and provides a safe and effective treatment basis for gene pathogenic mutation. Currently, CRISPR-Cas system based therapies have been applied to the treatment of a variety of diseases to preclinical and clinical testing stages. However, CRISPR-Cas systems that have been developed at present, such as Cas9, cas12a and Cas12e systems, still have several bottlenecks waiting for further breakthrough as tools for gene therapy, the biggest bottlenecks being the accuracy and safety of currently commonly used editing tool molecules, etc., which include:
1. activity window restriction for CRISPR-Cas systems
In addition to the gRNA guiding the CRISPR-Cas system by base complementary pairing, different Cas proteins also need to recognize specific protospacer adjacent motif (protospacer adjacent motif, PAM) sequences near the editing site. For example, spyCas9 recognizes NGG sequences, asCas a recognizes TTTN sequences. However, about half of the known pathogenic genetic variations are caused by single nucleotide variations (single nucleotide variant, SNV), and in order to edit different SNVs it is necessary to match the surrounding different potential PAM sequences. Therefore, to meet the gene editing needs of different clinical diseases, it is necessary to find and identify more new Cas proteins that recognize different PAM sequences to expand the applicable editing activity window of CRISPR-Cas systems in the genome.
2. Off-target phenomenon of CRISPR-Cas system
The editing off-target phenomenon of the gRNA can occur in the process of guiding the Cas protein to carry out gene editing, so that when the CRISPR-Cas system is developed for carrying out gene editing, the off-target effect of the CRISPR-Cas system is reduced as much as possible. The off-target effect of CRISPR-Cas systems occurs mainly for two reasons: off-target effects caused by mismatches between the gRNA and non-target DNA sequences and off-target effects that do not rely on the occurrence of gRNA. Different Cas protein families have different off-target effects due to mismatches, and further screening of Cas proteins with better clinical performance is required to construct gene editing systems.
3. Size restriction of CRISPR-Cas systems
Currently, adeno-associated virus (AAV) vectors have been widely used during delivery of CRISPR-Cas system-based gene therapies in clinical applications. However, despite the relatively mature applications in the clinic, the size limitations of AAV vectors on CRISPR-Cas systems remain the biggest limitation. Currently, AAV vectors are capable of carrying up to about 4700 base pairs of DNA sequence at a time. However, most commonly used SpyCas9 has 4300 base pairs, and therefore the length of the remaining module sequences is limited. Therefore, only compact promoters can be selected, tissue-specific promoters or control factors for inducing expression by introducing more conditions cannot be used, and a single-base editing system with longer complete package sequences and developed based on dCS 9 cannot be used.
4. Immunogenicity of CRISPR-Cas systems
In addition, immunogenicity has been found in gene editing systems. CRISPR-Cas systems are protein complexes derived from bacteria and archaea, some of which are derived from pathogenic bacteria common to humans such as staphylococcus aureus (SaCas 9) or streptococcus pyogenes (SpyCas 9). There is a risk of eliciting an immune response in humans in clinical trials.
In conclusion, the current clinical gene editing means lack a gene editing tool enzyme which has wider activity window, smaller protein molecular weight, non-pathogenic bacteria source and better activity.
Disclosure of Invention
Problems to be solved by the invention
Based on the above-described problems with the prior art, the present invention provides a novel CRISPR-Cas system in which the Cas protein has only 850-867 amino acids, which is smaller than the known Cas12 family proteins; the PAM sequence identified by the system is different from all known CRISPR-Cas12 systems, so that the editing activity window of the CRISPR-Cas12 system used in a genome is expanded; in addition, the system of the invention is derived from a bacterial strain derived from municipal sewage and has lower immunogenicity compared with a CRISPR-Cas system derived from human pathogenic bacteria such as staphylococcus aureus (SaCas 9) or streptococcus pyogenes (SpCas 9).
Solution for solving the problem
A first aspect of the present invention provides a CRISPR-Cas system comprising:
(i) AbCas pi proteins; and
(Ii) AbCas pi guide RNAs, the AbCas pi guide RNAs form a complex with the AbCas pi protein, and the AbCas pi guide RNAs comprise a guide sequence that hybridizes to a target sequence in a target nucleic acid;
Wherein the AbCas pi protein recognizes a C base-rich protospacer adjacent motif.
In some embodiments, the AbCas pi protein is derived from Armatimonadetes bacterium.
In some embodiments, the protospacer adjacent motif comprises 5'-CCN-3', wherein N is A, T, G or C bases.
In some preferred embodiments, the protospacer adjacent motif comprises any one selected from TGCCC, CACCT, TTCCC, TCCCC, CTCCC and CTCCT.
In some specific embodiments, the AbCas pi protein comprises one or more of the following sequences:
1) An amino acid sequence shown in any one of SEQ ID NO 1 to SEQ ID NO 4;
2) An amino acid sequence having at least 80%, 82%, 85%, 87%, 90%, 92%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence set forth in any one of SEQ ID NOs 1 to 4 and which retains the activity of binding AbCas pi guide RNAs and/or nuclease activity of the amino acid sequence set forth in any one of SEQ ID NOs 1 to 4;
3) An amino acid sequence of 1 or more amino acid residues added, substituted, deleted or inserted in the amino acid sequence shown in any one of SEQ ID NO. 1 to SEQ ID NO. 4, and which retains the activity of binding AbCas pi guide RNA and/or nuclease activity of the amino acid sequence shown in any one of SEQ ID NO. 1 to SEQ ID NO. 4;
4) An amino acid sequence encoded by a nucleotide sequence that hybridizes with a polynucleotide sequence encoding an amino acid sequence as set forth in any one of SEQ ID NOs 1 to 4 under stringent conditions and retains the activity of binding AbCas pi guide RNAs and/or nuclease activity of the amino acid sequence as set forth in any one of SEQ ID NOs 1 to 4, said stringent conditions being medium stringent conditions, medium-high stringent conditions, high stringent conditions or very high stringent conditions.
In some embodiments, the AbCas pi guide RNA is a double guide RNA.
In some alternative embodiments, the AbCas pi guide RNAs comprise any one selected from the group consisting of:
a) An activator RNA comprising a nucleotide sequence as shown in SEQ ID NO. 5 or a nucleotide sequence having 80% or more identity with the sequence shown in SEQ ID NO. 5, and
A targeting factor RNA comprising a nucleotide sequence as set forth in SEQ ID NO. 6 or a nucleotide sequence having 80% or more identity to the sequence set forth in SEQ ID NO. 6, and said guide sequence;
b) An activator RNA comprising a nucleotide sequence as shown in SEQ ID NO. 8 or a nucleotide sequence having 80% or more identity with the sequence shown in SEQ ID NO. 8, and
A targeting factor RNA comprising a nucleotide sequence as set forth in SEQ ID No. 9 or a nucleotide sequence having 80% or more identity to the sequence set forth in SEQ ID No. 9, and said guide sequence;
c) An activator RNA comprising the nucleotide sequence shown as SEQ ID NO. 11 or a nucleotide sequence having 80% or more identity with the sequence shown as SEQ ID NO. 11, and
A targeting factor RNA comprising a nucleotide sequence as set forth in SEQ ID NO. 12 or a nucleotide sequence having 80% or more identity to the sequence set forth in SEQ ID NO. 12, and said guide sequence;
d) An activator RNA comprising the nucleotide sequence shown as SEQ ID NO. 13 or a nucleotide sequence having 80% or more identity with the sequence shown as SEQ ID NO. 13, and
A targeting factor RNA comprising a nucleotide sequence as set forth in SEQ ID No. 14 or a nucleotide sequence having 80% or more identity to the sequence set forth in SEQ ID No. xx14, and said guide sequence.
In some embodiments, the AbCas pi guide RNA is a single guide RNA.
In some alternative embodiments, the AbCas pi guide RNA comprises a nucleotide sequence as set forth in any one of SEQ ID NOS.7, 10 or a nucleotide sequence having 80% or greater identity to the sequence set forth in any one of SEQ ID NOS.7, 10, and the guide sequence.
In some embodiments, the CRISPR-Cas system further comprises: (iii) a donor polynucleotide.
In a second aspect the invention provides a polynucleotide comprising one or more of the following:
i) A nucleotide sequence encoding AbCas pi protein in a CRISPR-Cas system according to the first aspect of the invention;
ii) a nucleotide sequence encoding AbCas pi guide RNA in a CRISPR-Cas system according to the first aspect of the invention; and
Iii) A donor polynucleotide sequence.
In a third aspect the invention provides a vector comprising a polynucleotide according to the second aspect of the invention.
In some preferred embodiments, the vector is an expression vector.
In a fourth aspect the invention provides a cell comprising one or more of the following:
(a) A CRISPR-Cas system according to the first aspect of the present invention;
(b) A polynucleotide according to the second aspect of the invention; and
(C) The carrier according to the third aspect of the invention.
In a fifth aspect the invention provides a kit comprising one or more of the following:
(A) A CRISPR-Cas system according to the first aspect of the present invention;
(B) A polynucleotide according to the second aspect of the invention;
(C) A carrier according to the third aspect of the invention; and
(D) The cell according to the fourth aspect of the invention.
A sixth aspect of the invention provides a method of modifying a target nucleic acid comprising the step of contacting the target nucleic acid with a CRISPR-Cas system according to the first aspect of the invention.
ADVANTAGEOUS EFFECTS OF INVENTION
Through implementation of the technical scheme, the invention can obtain the following technical effects:
1. The CRISPR-AbCas pi system has the gene arrangement sequence of AbCas pi, tracrRNA, cas1, cas2, cas4 and CRISPR in the gene locus, is different from the existing V-type family system, and is a novel CRISPR-Cas system;
2. The novel AbCas pi proteins homologous in this system are between 850-867 amino acids in size, the smallest Cas proteins of the CRISPR-Cas12 family currently known to function as monomeric Cas proteins. All AbCas pi proteins are smaller than the known small Cas protein CasX (986 amino acid), so that the subsequent protein synthesis and in-vivo and in-vitro delivery of a system are facilitated, and the safety is better;
3. The identified PAM sequence of the system contains CCN, is different from the PAM sequence of all known CRISPR-Cas12 systems, and effectively expands the editing activity window used in the genome by the CRISPR-Cas12 system, namely expands the activity window range;
4. The strain present in the system is derived from municipal sewage, and has less risk of eliciting a human immune response in a later application, i.e. has lower immunogenicity, than CRISPR-Cas systems derived from staphylococcus aureus (SaCas 9) or streptococcus pyogenes (SpCas 9);
5. The system preliminarily verifies that the intracellular editing activity can reach more than 10%, and the intracellular editing activity is good, so that the system has good transformation prospect and clinical application prospect.
Drawings
FIG. 1 is a schematic diagram of the gene sequence distribution of the CRISPR-AbCas pi system.
FIG. 2 is a schematic diagram showing the results of in vitro purification of AbCas pi_1 protein, wherein A in FIG. 2 is a SDS-PAGE gel of AbCas pi_1 protein purification; b in FIG. 2 is a peak diagram of the result of purifying protein by molecular sieve.
FIG. 3 is a schematic diagram showing the results of in vitro purification of AbCas pi_2 protein, wherein A in FIG. 3 is a SDS-PAGE gel of AbCas pi_2 protein purification; b in FIG. 3 is a peak diagram of the result of purifying protein by molecular sieve.
FIG. 4 is a diagram showing the result of the prediction of the tracrRNA sequence, wherein A in FIG. 4 is the result of the tracrRNA prediction of AbCas pi_1 system; b in fig. 4 is the tracrRNA prediction result of AbCas pi_2 system. Wherein tracr represents the position of the promoter predicted by the promoter prediction software; anti-repeats represent repeat complementary sequence positions obtained by performing blast according to repeat sequences in the CRISPR sequences of the anti-repeats; the upper expression overlay is the expression level of the tracrRNA region obtained from the macro transcriptome analysis.
FIG. 5 is a schematic representation of the purification results of AbCas pi-1 sgRNA after in vitro transcription.
FIG. 6 is a schematic representation of the purification results of AbCas pi-2 sgRNA after in vitro transcription.
FIG. 7 shows the recognition specificity of AbCas pi_1 protein for PAM sequences at different salt ion concentrations; wherein, A in FIG. 7 is the PAM sequence preference of AbCas pi_1 protein under 50mM NaCl; b in FIG. 7 is the PAM sequence preference of AbCas pi_1 protein under 150mM NaCl; c in FIG. 7 is the PAM sequence preference of AbCas pi-1 protein under 300mM NaCl.
FIG. 8 shows the recognition specificity of AbCas pi_2 protein for PAM sequence at different salt ion concentrations, wherein A in FIG. 8 is the PAM sequence preference of AbCas pi_2 protein under 50mM NaCl conditions; FIG. 8B shows the PAM sequence preference of AbCas pi_2 protein under 150mM NaCl; c in FIG. 8 is the PAM sequence preference of AbCas pi-2 protein under 300mM NaCl.
FIG. 9 shows a comparison of the efficiency of AbCas pi_1 protein cleavage of 5' -Cy5 fluorescent-labeled double-stranded DNA (dsDNA) under different salt ion concentrations. FIG. 9A is a comparison of the cleavage gel patterns of AbCas pi_1 protein for dsDNA substrates at 50mM, 150mM and 300mM NaCl; b in fig. 9 is the calculation and curve fitting of the cleavage efficiency at three salt concentrations.
FIG. 10 shows a comparison of the efficiency of AbCas pi-2 protein cleavage of 5' -Cy5 fluorescent-labeled double-stranded DNA (dsDNA) under different salt ion concentrations. FIG. 10A is a comparison of the cleavage gel patterns of AbCas pi_2 protein for dsDNA substrates at 50mM, 150mM and 300mM NaCl; b in fig. 10 is the calculation and curve fitting of the cleavage efficiency at three salt concentrations.
FIG. 11 shows Green Fluorescence (GFP) generated by the genome of a system-edited HEK293T-EGFP frameshift mutant cell line based on AbCas pi_1/2 protein; wherein Guide-group is negative control; guide+ group is experimental group; bright field light is indicative of the morphology of the displayed region cells; fluorescence microscopy indicated the GFP expression of cells in the same display region.
FIG. 12 shows a comparison of cleavage efficiency of the genome T7E1 assay of the system-edited HEK293T-EGFP frameshift mutant cell line based on AbCas pi_1/2 protein. Wherein a in fig. 12 is AbCas pi_1 for five locus gene editing T7E1 cleavage efficiency glue pattern; b in FIG. 12 is AbCas pi_1T 7E1 cleavage efficiency glue plot for five locus gene edits; c in fig. 12 is Cas9 group is a positive control for editing the sequence using SpCas 9. Wherein NC in each figure represents a negative control group, and 1,2, 3, 4 and 5 respectively represent different target gene sequence numbers.
FIG. 13 shows a comparison of calculated values of efficiency statistics histogram based on T7E1 detection cleavage efficiency.
Detailed Description
In order that the invention may be more readily understood, certain technical and scientific terms are defined below. Unless defined otherwise herein, all other technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
In the present specification, the numerical range indicated by the term "numerical value a to numerical value B" means a range including the end point numerical value A, B.
In the present specification, the use of "substantially" or "substantially" means that the standard deviation from the theoretical model or theoretical data is within 5%, preferably 3%, more preferably 1%.
In the present specification, the meaning of "can" includes both the meaning of performing a certain process and the meaning of not performing a certain process.
In this specification, "optional" or "optionally" means that the subsequently described event or circumstance may or may not occur, and that the description includes instances where the event occurs and instances where it does not.
Reference throughout this specification to "some specific/preferred embodiments," "other specific/preferred embodiments," "an embodiment," and so forth, means that a particular element (e.g., feature, structure, property, and/or characteristic) described in connection with the embodiment is included in at least one embodiment described herein, and may or may not be present in other embodiments. In addition, it is to be understood that the elements may be combined in any suitable manner in the various embodiments.
In this specification, the term "CRISPR" refers to clustered, regularly interspaced short palindromic repeats (Clustered regularly interspaced short palindromic repeats) derived from the immune system of a microorganism.
In this specification, the term "Cas protein" refers to a CRISPR-associated protein, which, together with a CRISPR sequence, constitutes a CRISPR-Cas system, which Cas protein has a nuclease-related functional domain that cleaves a target sequence at a specific position by recognition PAM (protospacer adjacent motif).
In this specification, the terms "polynucleotide" and "nucleic acid" are used interchangeably to refer to polymeric forms of nucleotides (ribonucleotides or deoxyribonucleotides) of any length. Thus, this term includes, but is not limited to, single-stranded, double-stranded or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or polymers comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural or derivatized nucleotide bases.
In this specification, "hybridizable" or "complementary" or "substantially complementary" means that a nucleic acid (e.g., RNA, DNA) comprises a nucleotide sequence that enables the nucleic acid to non-covalently bind (i.e., form Watson-Crick (Watson-Crick) base pairs and/or G/U base pairs), "anneal" or "hybridize") to another nucleic acid in a sequence-specific, antiparallel manner (i.e., the nucleic acid specifically binds to the complementary nucleic acid) under appropriate in vitro and/or in vivo temperature and solution ionic strength conditions. Standard Watson-Crick base pairing includes: adenine (a) pairs with thymine (T), adenine (a) pairs with uracil (U), and guanine (G) pairs with cytosine (C). Furthermore, for hybridization between two RNA molecules (e.g., dsRNA), and for hybridization of a DNA molecule to an RNA molecule (e.g., when a DNA target nucleic acid base is paired with a guide RNA): guanine (G) can also be paired with uracil (U). For example, G/U base pairing is at least partially responsible for the degeneracy of the genetic code in the case of base pairing of a tRNA anticodon with a codon in an mRNA.
Hybridization and wash conditions are well known and are described in Sambrook, J., fritsch, E.F. and Maniatis, T.molecular Cloning: A Laboratory Manual, second edition, cold Spring Harbor Laboratory Press, cold Spring Harbor (1989), in particular chapter 11 and Table 11.1 of the reference; and Sambrook, j. And Russell, w., molecular Cloning: A Laboratory Manual, third edition, cold Spring Harbor Laboratory Press, cold Spring Harbor (2001). The conditions of temperature and ionic strength determine the "stringency" of hybridization.
In the present invention, "medium stringency conditions", "medium-high stringency conditions", "high stringency conditions" or "very high stringency conditions" describe conditions for hybridization and washing of nucleic acids. Guidance for performing hybridization reactions is found in Current Protocols in Molecular Biology, john Wiley & Sons, n.y. (1989), 6.3.1-6.3.6, incorporated herein by reference. Aqueous and non-aqueous processes are described in this document, and either may be used. For example, specific hybridization conditions are as follows: (1) Low stringency hybridization conditions are washed 2 times in 6 x sodium chloride/sodium citrate (SSC), at about 45 ℃, then at least 50 ℃, in 0.2 x SSC,0.1% sds (for low stringency conditions the wash temperature can be raised to 55 ℃); (2) Medium stringency hybridization conditions are washed 1 or more times in 6 XSSC, at about 45℃followed by 0.2 XSSC, 0.1% SDS at 60 ℃; (3) High stringency hybridization conditions are washed 1 or more times in 6 XSSC, at about 45℃followed by 65℃in 0.2 XSSC, 0.1% SDS and preferably; (4) Very high stringency hybridization conditions are 0.5M sodium phosphate, 7% SDS, washed 1 or more times in 0.2 XSSC, 1% SDS at 65℃followed by 65 ℃.
Hybridization requires that the two nucleic acids contain complementary sequences, but mismatches between bases are possible. The conditions suitable for hybridization between two nucleic acids depend on the length and degree of complementarity of the nucleic acids, which are well known variables in the art. The greater the degree of complementarity between two nucleotide sequences, the greater the melting temperature (Tm) value of the hybrids of nucleic acids having these sequences. For hybridization between nucleic acids with short-chain complementarity (e.g., complementarity of 35 or fewer, 30 or fewer, 25 or fewer, 22 or fewer, 20 or fewer, or 18 or fewer nucleotides), the mismatch position may become important (see Sambrook et al, supra, 11.7-11.8). Typically, the length of the hybridizable nucleic acid is 8 nucleotides or more (e.g., 10 nucleotides or more, 12 nucleotides or more, 15 nucleotides or more, 20 nucleotides or more, 22 nucleotides or more, 25 nucleotides or more, or 30 nucleotides or more). The temperature, wash solution salt concentration, and other conditions may be adjusted as desired, depending on factors such as the length of the complementary region and the degree of complementarity.
In the present invention, the terms "peptide", "polypeptide" and "protein" are used interchangeably herein and refer to polymeric forms of amino acids of any length, which may include encoded and non-encoded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones.
As used herein, "binding" (e.g., RNA binding domain, binding to a target nucleic acid, etc.) refers to a non-covalent interaction between macromolecules (e.g., a non-covalent interaction between a protein and a nucleic acid; a non-covalent interaction between a Cas protein/guide RNA complex and a target nucleic acid; etc.). When in a non-covalent interaction state, the macromolecule is said to be "associated" or "interacted" or "bound" (e.g., when molecule X is said to interact with molecule Y, then it means that molecule X is bound to molecule Y in a non-covalent manner). The binding interaction is generally characterized by a dissociation constant (K D) of less than 10 -6 M, less than 10 - 7 M, less than 10 -8 M, less than 10 -9 M, less than 10 -10 M, less than 10 -11 M, less than 10 -12 M, less than 10 -13 M, less than 10 -14 M, or less than 10 -15 M. "affinity" refers to the binding strength, with increased binding affinity associated with lower K D.
In the present invention, the term "conservative amino acid substitution" refers to the interchangeability of amino acid residues in proteins having similar side chains. For example, a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine and isoleucine; a group of amino acids with aliphatic-hydroxyl side chains consists of serine and threonine; a group of amino acids having amide-containing side chains consists of asparagine and glutamine; a group of amino acids with aromatic side chains consists of phenylalanine, tyrosine and tryptophan; a group of amino acids with basic side chains consists of lysine, arginine and histidine; a group of amino acids with acidic side chains consists of glutamic acid and aspartic acid; and a group of amino acids with sulfur-containing side chains consists of cysteine and methionine. Exemplary conservative amino acid substitutions are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine-glycine and asparagine-glutamine.
In the present invention, a polynucleotide or polypeptide has a certain percentage of "sequence identity" with another polynucleotide or polypeptide, which means that the percentage of bases or amino acids are identical when aligned and that the bases or amino acids are in the same relative position when the two sequences are compared. Sequence identity can be determined in many different ways. To determine sequence identity, sequences can be aligned using a variety of convenient methods and computer programs (e.g., BLAST, T-COFFEE, MUSCLE, MAFF T, etc.), which are available through the world Wide Web at websites including ncbi.nlm.nili.gov/BLAST、ebi.ac.uk/Tools/msa/tcoffee/、ebi.ac.uk/Tools/msa/muscle/、mafft.cbrc.jp/alignment/software/. See, e.g., alts chul et al (1990), J.mol. Bioi.215:403-10.
In the present invention, a DNA sequence "encoding" a particular RNA is a DNA nucleotide sequence transcribed into RNA. The DNA polynucleotide can encode RNA (mRNA) that is converted to a protein (thus both DNA and mRNA encode a protein), or the DNA polynucleotide can encode RNA (e.g., tRNA, rRNA, microrna (miRNA), "non-coding" RNA (ncRNA), guide RNA, etc.) that is not translated to a protein.
In the present invention, a "protein coding sequence" or a sequence encoding a particular protein or polypeptide is a nucleotide sequence that, when placed under the control of appropriate regulatory sequences, is transcribed into mRNA (in the case of DNA) and translated into polypeptide in vitro or in vivo (in the case of mRNA).
In the present invention, the term "naturally occurring" or "unmodified" or "wild-type" as applied to a nucleic acid, polypeptide, cell or organism refers to a nucleic acid, polypeptide, cell or organism that is found in nature. For example, a polypeptide or polynucleotide sequence that is present in an organism that can be isolated from a source in nature is naturally occurring.
In the present invention, "recombinant" means that a particular nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction, polymerase Chain Reaction (PCR) and/or ligation steps that result in a construct having a structural coding sequence or non-coding sequence that is distinguishable from endogenous nucleic acids found in the natural system. The DNA sequence encoding the polypeptide may be assembled from cDNA fragments or from a series of synthetic oligonucleotides to provide a synthetic nucleic acid capable of being expressed from recombinant transcription units contained in cells or in cell-free transcription and translation systems. Genomic DNA comprising the relevant sequences may also be used in the formation of recombinant genes or transcriptional units. Sequences of non-translated DNA may be present at the 5 'or 3' end of the open reading frame, wherein such sequences do not interfere with manipulation or expression of the coding region, and may in fact function to regulate production of the desired product by a variety of mechanisms (see "DNA regulatory sequences"). Alternatively, a DNA sequence encoding an untranslated RNA (e.g., a guide RNA) may also be considered recombinant. Thus, for example, the term "recombinant" nucleic acid refers to a non-naturally occurring polynucleotide or nucleic acid, for example, one made by human intervention from the artificial combination of two otherwise separate segments of sequence. Such artificial combination is often accomplished by chemical synthesis means or by manual manipulation of isolated segments of nucleic acids (e.g., by genetic engineering techniques). This is typically done to replace codons with codons encoding the same amino acid, a conserved amino acid or a non-conserved amino acid. Alternatively, such operations are performed to join nucleic acid segments having desired functions together to produce a desired combination of functions. Such artificial combination is often accomplished by chemical synthesis means or by manual manipulation of isolated segments of nucleic acids (e.g., by genetic engineering techniques). When a recombinant polynucleotide encodes a polypeptide, the sequence of the encoded polypeptide may be naturally occurring ("wild-type") or may be a variant (e.g., mutant) of the naturally occurring sequence. An example of this is DNA (recombinant) encoding a wild-type protein, wherein the DNA sequence is codon-optimized to express the protein in a cell in which the protein is in a non-native form (e.g., eukaryotic cell) (e.g., cas protein is expressed in eukaryotic cell). Thus, codon optimized DNA may be recombinant and non-naturally occurring, while the protein encoded by the DNA may have a wild type amino acid sequence.
Thus, the term "recombinant" polypeptide does not necessarily refer to a polypeptide whose amino acid sequence is not naturally occurring. In contrast, a "recombinant" polypeptide is encoded by a recombinant non-naturally occurring DNA sequence, but the amino acid sequence of the polypeptide may be naturally occurring ("wild-type") or non-naturally occurring (e.g., variants, mutants, etc.). Thus, a "recombinant" polypeptide is the result of human intervention, but may have a naturally occurring amino acid sequence.
In the present invention, a "vector" or "expression vector" is a replicon, such as a plasmid, phage, virus, artificial chromosome, or cosmid, to which another DNA segment (i.e., an "insert") may be attached in order to cause replication of the attached segment in a cell.
In the present invention, an "expression cassette" comprises a DNA coding sequence operably linked to a promoter. "operably linked" refers to a juxtaposition wherein the components described are in a relationship permitting them to function in their intended manner. For example, a promoter is operably linked to a coding sequence (or a coding sequence may also be referred to as being operably linked to a promoter) if the promoter affects the transcription or expression of the coding sequence.
In the present invention, the term "recombinant expression vector" or "DNA construct" is used interchangeably herein to refer to a DNA molecule comprising a vector and an insert. Recombinant expression vectors are typically produced for the purpose of expressing and/or propagating one or more inserts, or for the purpose of constructing other recombinant nucleotide sequences. The one or more inserts may or may not be operably linked to a promoter sequence, and may or may not be operably linked to a DNA regulatory sequence.
When such DNA is introduced into the interior of a cell, the cell is "genetically modified" or "transformed" or "transfected" with exogenous DNA or exogenous RNA, such as a recombinant expression vector. The presence of foreign DNA results in permanent or transient genetic changes. The transforming DNA may or may not be integrated (covalently linked) into the cell genome. In, for example, prokaryotes, yeast, and mammalian cells, the transforming DNA may be maintained on an episomal element such as a plasmid. A stably transformed cell is one in which the transforming DNA gradually integrates into the chromosome so that it is inherited to daughter cells by chromosomal replication, relative to eukaryotic cells. This stability is demonstrated by the ability of eukaryotic cells to establish cell lines or clones containing the population of daughter cells containing the transforming DNA. A "clone" is a population of cells derived from a single cell or a common ancestor by mitosis. A "cell line" is a clone of a primary cell that is capable of stable growth for many generations in vitro.
Suitable methods of genetic modification (also referred to as "transformation") include, for example, viral or phage infection, transfection, conjugation, protoplast fusion, liposome transfection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI) -mediated transfection, DEAE-dextran-mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct microinjection, nanoparticle-mediated nucleic acid delivery (see, e.g., panyam et al, adv Drug Deliv Rev.2012, 13 months, pii: S0169-409X (12) 00283-9.doi: 10.1016/j.addr.2012.09.23), and the like.
The choice of genetic modification method generally depends on the type of cell to be transformed and the environment in which the transformation occurs (e.g., in vitro, ex vivo, or in vivo). A general discussion of these methods can be found in Ausubel et al Short Protocols in Molecular Biology, 3 rd edition, wiley & Sons, 1995.
In the present invention, a "target nucleic acid" is a polynucleotide (e.g., DNA such as genomic DNA) that includes a site (a "target site" or "target sequence") targeted by an RNA-guided endonuclease polypeptide (e.g., cas protein, etc.). The target sequence is the sequence to which the guide sequence of the guide RNA will hybridize. For example, a target site (or target sequence) 5'-GAGCAUAUC-3' within a target nucleic acid is targeted (or bound by, hybridized to, or complementary to) the sequence 5 '-GAUAUGCUC-3'. Suitable hybridization conditions include physiological conditions normally present in cells. For double-stranded target nucleic acids, the strand of the target nucleic acid that is complementary to and hybridizes to the guide RNA is referred to as the "complementary strand" or "target strand"; while the strand of the target nucleic acid that is complementary to the "target strand" (and thus not complementary to the guide RNA) is referred to as the "non-target strand" or "non-complementary strand".
In the present invention, "cleavage" means cleavage of the covalent backbone of a target nucleic acid molecule (e.g., RNA, DNA). Cleavage may be initiated by a variety of methods including, but not limited to, enzymatic or chemical hydrolysis of phosphodiester bonds. Both single strand cleavage and double strand cleavage are possible, and double strand cleavage may occur due to two distinct single strand cleavage events.
In the present invention, "nuclease" and "endonuclease" are used interchangeably herein to mean an enzyme having catalytic activity for nucleic acid cleavage (e.g., ribonuclease activity (ribonuclease cleavage), deoxyribonuclease activity (deoxyribonuclease cleavage), etc.).
The following describes the technical scheme of the present invention in detail.
< CRISPR-Cas System >
The present invention provides a novel CRISPR-Cas system comprising a Cas protein and a guide RNA corresponding thereto. Wherein the Cas protein (e.g., abCas pi protein) interacts (binds) with a corresponding guide RNA (e.g., abCas pi guide RNA) to form Ribonucleoprotein (RNP) complexes that target specific sites in the target nucleic acid by base pairing between the guide RNA and a target sequence within the target nucleic acid molecule. The guide RNA includes a nucleotide sequence (guide sequence) complementary to the sequence of the target nucleic acid (target site/target sequence). Thus, in the present invention, abCas pi proteins form a complex with AbCas pi guide RNAs, and the guide RNAs provide sequence specificity to the RNP complex through the guide sequences. The AbCas pi proteins of the complex provide site-specific activity. In other words, abCas pi proteins are directed to (e.g., stabilized at) a target site within a target nucleic acid sequence (e.g., chromosomal or extra-chromosomal sequences, such as episomal sequences, microloop sequences, mitochondrial sequences, chloroplast sequences, etc.) due to their binding to a guide RNA.
In some embodiments, the present invention provides a CRISPR-Cas system comprising: (i) AbCas pi polypeptide (and/or nucleic acid encoding AbCas pi polypeptide) and (ii) AbCas pi guide RNA (and/or nucleic acid encoding AbCas pi guide RNA) (e.g., where AbCas pi guide RNA can be in the form of double guide RNA or single guide RNA). In other embodiments, the present invention provides a CRISPR-Cas system further comprising: (iii) a donor polynucleotide.
In this specification, the CRISPR-Cas system provided by the present invention is also referred to as CRISPR-AbCas pi system, or (AbCas pi protein based) system.
In other embodiments, the invention provides a nucleic acid/protein complex (RNP complex) comprising: (i) AbCas pi polypeptides of the invention; and (ii) AbCas pi guide RNAs (e.g., wherein AbCas pi guide RNAs may be in the form of double guide RNAs or single guide RNAs).
(AbCas pi protein)
AbCas pi polypeptides (used interchangeably with the term "AbCas pi protein") can bind to and/or modify (e.g., cleave) a target nucleic acid. In some embodiments, abCas pi protein is a naturally occurring protein (e.g., naturally occurring in a prokaryotic cell). In other embodiments, the AbCas pi protein is not a naturally occurring polypeptide (e.g., abCas pi protein is a variant AbCas pi protein, chimeric protein, etc.).
The assay to determine whether a given protein interacts with AbCas pi guide RNAs can be any convenient binding assay to test binding between a protein and a nucleic acid. Suitable binding assays (e.g., gel migration assays) are known to those of ordinary skill in the art (e.g., assays that include the addition of AbCas pi guide RNAs and proteins to a target nucleic acid). The assay to determine whether the protein has activity (e.g., to determine whether the protein has nuclease activity to cleave a target nucleic acid) can be any convenient assay (e.g., any convenient nucleic acid cleavage assay to test for nucleic acid cleavage). Suitable assays (e.g., cleavage assays) are known to those of ordinary skill in the art.
AbCas pi proteins act as endonucleases that catalyze double strand breaks at specific sequences in targeted double strand DNA (dsDNA). Sequence specificity is provided by the associated guide RNAs that hybridize to target sequences within the target DNA.
In some embodiments, the AbCas pi proteins provided herein are (or are derived from) naturally occurring (wild-type) proteins, which are derived from Armatimonadetes bacterium.
In some specific embodiments, abCas pi protein is selected from AbCas pi_1 protein, abCas pi_2 protein, abCas pi_3 protein, and AbCas pi_4 protein. That is, abCas pi protein comprises: 1) The amino acid sequence shown in any one of SEQ ID NO 1-SEQ ID NO 4.
The amino acid sequence of AbCas pi_1 protein is shown below (SEQ ID NO: 1):
MAKATKEVKSKRVEALRQVAYQRLERLERKAQKIGAHLRKPGKAADLQSLHYLLHKVEVEYHDIARNLEKDPTWTPKPKMRREKRAIVPESGPAAPLPTTAKGEPGRPANRHIPPPVPLDSARIPEDQQSMGQGSGGRSWCSAPFVEVKLPPTQWSNVREKLLKFRIEDDADIVRRWAEAKFGSIETARDGLRASAEIGTSPDVWRSFISRAISNGKKDFEPLLSLDDDELTADATAERVVRRWHQIDWVGRMLDSILETVPSGVSKDTFRSRVESRLKTFHSSVNSFELKKRKDGTVERKRKHTNPQFPYLSPSAVSIDPDVVTMEAVELLQMQPEERFAKDPNDANGRMRLRVLQAELGKARREALGRRGEKAPPWSGRKVFRGTTTRKREACLVWDKEAQADGLYFALVMSGGPKIDDKRFVYMDGQPLQSDWQLHNGVAGKAKSCRAMPLILKHDFLRWYHRHIKNHDVNAPLEKRCVHTTTQFVFVEPDEKKGLQPRLFIRPVFKFYDPVYEVPDSHSIDKKPDCRYLIGIDRGVNYPYRAAVYDCETNSIIADKFVDGRKADWERIRNELAYHQRRRDLLRNSRASSAAIQREIRAIARIRKRERGLNKVETVESIARLVDWAEENLGKCNYCFVLEDLSSNLNLGRNNRVKHIAAIKEALINQMRKRGYRFKKSGKVDGVREESAWYTSAVAPSGWWAKKEEVDGAWKADKTRPLARKIGSYYCCEEIDGLHLRGVLKGLGRAKRLVLQSDDPSAPTRRRGFGSELFWDPYCTELCGHAFPQGVVLDADFIGAFNIALRPLVREELGKKAKAVDLADRHQTLNPTVALRCGVTAYEFVEVGGDPRGGLRKILLNPAEAVI** Indicating the corresponding position of the stop codon
Nucleotide sequence of AbCas pi_1 protein (SEQ ID NO: 15):
ATGGCCAAGGCCACTAAAGAGGTGAAGTCTAAGCGCGTCGAGGCCCTGCGCCAGGTGGCATATCAGCGCCTGGAGCGGCTGGAGCGCAAGGCCCAGAAGATTGGCGCCCACCTCCGCAAGCCAGGCAAGGCCGCCGACCTGCAGAGCCTGCACTACCTGCTGCACAAGGTGGAAGTGGAGTACCACGATATCGCCCGCAACCTCGAGAAGGACCCTACCTGGACCCCTAAGCCTAAGATGCGCCGCGAGAAGCGGGCCATTGTCCCTGAGAGCGGCCCTGCCGCCCCACTGCCAACCACCGCCAAGGGCGAGCCTGGCCGCCCAGCCAACCGCCATATCCCACCTCCCGTGCCACTGGACTCCGCCCGCATCCCTGAGGATCAGCAGAGCATGGGCCAGGGCTCCGGCGGCCGCAGCTGGTGCTCCGCCCCATTTGTGGAAGTGAAGCTGCCACCAACCCAGTGGTCCAACGTCCGCGAGAAGCTGCTGAAGTTCCGCATTGAGGATGACGCCGACATCGTGCGCCGGTGGGCTGAGGCCAAGTTCGGCAGCATCGAGACTGCCCGTGATGGCCTGCGCGCTAGCGCTGAGATTGGCACCTCTCCTGATGTGTGGCGCTCCTTTATCTCCCGCGCCATTAGTAACGGGAAGAAGGACTTCGAGCCACTGCTGAGCCTGGATGATGATGAGCTTACTGCCGACGCCACTGCCGAGCGCGTGGTCCGCCGCTGGCATCAGATTGACTGGGTGGGCCGCATGCTTGATAGCATCCTCGAGACCGTCCCTTCCGGCGTGTCCAAGGACACTTTCCGCAGCCGCGTGGAGAGCCGGCTGAAAACTTTCCATAGCTCCGTGAATAGCTTCGAGCTGAAGAAGAGAAAGGACGGCACCGTGGAACGCAAGCGCAAGCACACTAACCCACAGTTTCCATACCTGAGCCCAAGTGCCGTCAGCATTGACCCAGACGTGGTGACTATGGAGGCCGTCGAGCTTCTGCAGATGCAGCCTGAGGAGCGCTTCGCCAAGGACCCAAACGACGCTAATGGCCGCATGAGACTGCGCGTGCTGCAGGCCGAGCTGGGCAAGGCCCGCCGCGAGGCCCTGGGCCGCCGAGGCGAGAAGGCCCCTCCTTGGTCTGGCCGCAAGGTCTTTCGCGGCACCACCACCCGCAAGCGCGAGGCTTGCCTGGTCTGGGACAAAGAGGCCCAGGCCGACGGCCTCTACTTTGCCCTCGTGATGAGCGGCGGCCCTAAGATTGACGATAAGCGCTTCGTGTACATGGATGGGCAGCCTTTGCAGAGCGATTGGCAGCTGCACAACGGCGTCGCCGGCAAGGCAAAGAGCTGCCGCGCCATGCCTTTGATTCTGAAGCATGATTTCCTGCGGTGGTACCACCGCCACATTAAGAACCATGATGTGAACGCCCCACTGGAGAAGCGCTGCGTCCACACCACCACCCAGTTCGTCTTTGTGGAGCCTGATGAGAAGAAGGGCCTGCAGCCACGCCTGTTCATTCGCCCTGTCTTTAAGTTTTACGATCCTGTCTACGAGGTCCCTGATAGCCACTCAATCGATAAGAAGCCTGACTGCAGATACCTGATCGGCATTGACCGCGGCGTGAACTACCCATATCGGGCCGCCGTGTACGATTGCGAGACCAACAGCATCATCGCCGACAAGTTTGTGGACGGCCGCAAGGCCGACTGGGAGCGCATTCGTAACGAGCTGGCCTACCACCAACGGCGCCGCGACCTGCTGCGCAACTCCCGTGCTAGCTCCGCCGCCATCCAGCGCGAGATTCGCGCCATCGCCCGCATCAGGAAACGTGAGCGGGGCCTGAACAAGGTGGAGACCGTCGAGAGCATTGCCCGCCTGGTCGACTGGGCCGAAGAGAACCTCGGCAAGTGCAACTACTGTTTCGTGCTCGAGGATTTGTCCTCCAACCTGAACCTGGGGCGCAACAACCGCGTGAAGCACATCGCCGCCATCAAAGAGGCCCTGATCAACCAGATGCGCAAGCGCGGCTACCGCTTTAAGAAGTCCGGCAAGGTGGACGGCGTGCGCGAGGAGAGCGCCTGGTACACTTCCGCCGTGGCCCCATCCGGGTGGTGGGCCAAGAAGGAAGAGGTGGACGGCGCCTGGAAGGCCGATAAGACTCGGCCACTGGCCCGCAAGATCGGCAGCTACTACTGCTGTGAAGAAATCGACGGCCTGCACCTGCGCGGCGTGCTGAAGGGCCTGGGCCGCGCCAAGCGGCTGGTCCTGCAGAGCGATGACCCATCCGCCCCTACCCGCCGGCGCGGCTTTGGCTCCGAGCTGTTCTGGGACCCTTACTGCACCGAACTGTGCGGCCACGCCTTCCCTCAGGGCGTCGTCCTCGATGCTGACTTTATTGGCGCCTTCAACATTGCCCTGCGCCCACTGGTGCGCGAGGAGCTGGGCAAAAAGGCCAAGGCCGTGGATCTGGCTGACCGCCACCAGACCCTGAACCCAACCGTCGCCCTGCGCTGCGGCGTGACCGCCTACGAGTTCGTCGAGGTGGGCGGCGATCCTCGCGGCGGCCTGCGCAAGATCCTGCTGAACCCTGCCGAGGCCGTGATTTGA
The amino acid sequence of AbCas pi_2 protein is shown below (SEQ ID NO: 2):
MGKNRSSSSDLSPLERSLRKVGENRLERLRVREEKIRKHIEQHPRGKNDHQALHFLLHQIEVERNDLYRNLKDPEYVPKPAKQRRERRQINVAKPPTRPKKEKGPQPESTKYVIRPPVPGKNLPAFASKYEARDTRDDSYQDGRSWTSAPYVEVELPILGADKVIQKLMKFVQKDERSIVRDWATKTYSSIEAAREALLVGAQVSEDVSVWRGLLAETKNAQNFAALSDDQIEAAMSKEAKGADLRPRRAALLVAQRHWVDQTVKAIKESAPSGVDKDTLDRRLRAGLRGFHTAANSGKHTNPQFPYLTAEKPVVPMESVVQSVLAFLDDPDDQRYTKDKEDDKKRHRVTVLQKELGKARPRKRLELQTPKWAGRPTVKGTISKRRDAALVWDTSKEANGLCLALPIGGMPKIDVEQFIYQDGTSLLSDCQIASKTTKKGAACAVLPLKPKHDFLRWFTKHVENHNPDAPLERRCLHNTTQFVIVDPEGPRPRLFVRPVFKFYDPGKTVPNTHETWKKPDCRYLVGIDRGINYVLRAVVVDTEEKKVIADIGLPGRKHEWRMIRDEIAYHQQMRDLARNTGKHASVVAKHVRALALARKKDRALGKFATVEAVAELVKKCEQDYGSGNYCFVLEDLDMGAMNLKRNNRVKHMAVMEEALVNQMRKQGYAYDGRRGRVDGVRHEGAWYTSQVSPFGWWAKRDEVEEAWKRDKTRPIGRKVGNWYEMPEPGQDGDRPDTYRKGYWSKPKNAEGKPYGRNRFSVEPGDEKPDAERRFCWGSELFWDPNVKSFKGKEFPEGVVLDADFVGALNIALRPLVNDGQGKGFKAEDMAREHTILNPQFKIACQIPVYEFVEEDGDKWAALRRIML*
* Indicating the corresponding position of the stop codon
Nucleotide sequence of AbCas pi_2 protein (SEQ ID NO: 16):
ATGGGCAAGAACCGCAGCTCTAGCTCTGATCTGTCCCCTCTGGAGCGCTCCCTGCGCAAGGTGGGCGAGAACCGCCTGGAGCGGCTGCGCGTGCGCGAGGAGAAGATTCGCAAGCACATCGAGCAACACCCACGCGGCAAGAACGATCACCAGGCCCTGCACTTCCTGCTGCACCAGATCGAGGTGGAGCGCAACGATCTGTACAGAAACCTGAAGGACCCTGAGTACGTGCCTAAGCCTGCCAAGCAGAGGCGCGAGCGCCGCCAGATTAACGTCGCCAAGCCTCCTACCAGACCTAAGAAAGAGAAGGGCCCACAGCCAGAGTCCACTAAATACGTCATTCGCCCTCCAGTGCCAGGCAAAAACCTGCCTGCCTTTGCCTCCAAGTACGAGGCCCGTGACACCCGCGATGACTCCTACCAGGATGGCCGGAGCTGGACCAGCGCCCCATACGTCGAGGTGGAGCTGCCAATTTTGGGGGCCGACAAGGTCATTCAGAAGCTGATGAAGTTTGTGCAGAAGGACGAGCGCTCCATTGTGCGCGACTGGGCCACCAAGACCTACTCCTCAATTGAGGCCGCACGCGAGGCCCTCCTGGTGGGCGCCCAGGTGAGCGAGGATGTGAGCGTGTGGCGCGGCCTGCTGGCAGAGACCAAGAATGCCCAGAACTTCGCCGCTCTCTCTGACGACCAGATCGAGGCAGCTATGAGCAAAGAGGCCAAGGGCGCCGACCTGCGCCCACGGCGCGCCGCCCTGCTGGTCGCCCAGCGCCACTGGGTGGATCAGACCGTCAAAGCTATTAAGGAGAGCGCCCCAAGCGGCGTCGATAAGGATACCCTGGACCGCCGCCTTCGGGCCGGCCTCCGCGGCTTCCATACCGCCGCTAACTCCGGCAAACATACTAACCCTCAGTTCCCATACCTGACAGCCGAGAAGCCAGTCGTGCCAATGGAGAGCGTCGTCCAGAGCGTGCTGGCATTTTTGGACGATCCAGATGATCAGAGATACACCAAGGACAAGGAAGATGATAAGAAGCGCCACCGCGTGACTGTCCTGCAGAAGGAGCTGGGCAAGGCCCGCCCTCGTAAGCGCCTGGAACTGCAGACCCCTAAGTGGGCCGGCCGCCCAACTGTGAAGGGCACTATCTCCAAACGCCGCGACGCCGCACTGGTCTGGGATACTAGCAAAGAGGCCAACGGCCTGTGCCTTGCCCTGCCAATCGGCGGCATGCCTAAGATCGACGTGGAGCAGTTCATCTACCAGGATGGCACCTCCCTGCTCAGCGATTGCCAGATCGCATCCAAGACTACCAAGAAGGGCGCCGCCTGTGCTGTCCTCCCTCTGAAGCCTAAGCACGACTTCCTGAGGTGGTTCACTAAGCACGTCGAGAACCATAACCCTGATGCCCCTCTCGAGCGCCGGTGCCTGCATAACACCACTCAGTTCGTGATTGTGGACCCTGAGGGCCCACGCCCACGCCTGTTCGTGCGCCCTGTGTTTAAGTTCTACGACCCAGGGAAAACCGTCCCTAACACCCACGAGACTTGGAAGAAGCCTGATTGCCGCTACCTTGTGGGCATCGACCGCGGCATTAACTACGTCCTGCGCGCCGTGGTGGTGGACACTGAGGAGAAGAAGGTCATCGCCGACATCGGCCTGCCAGGCCGCAAGCACGAGTGGAGGATGATTCGCGATGAGATTGCCTACCACCAGCAGATGCGCGACCTGGCCCGCAACACCGGCAAGCACGCCTCCGTCGTCGCTAAGCACGTCCGCGCCCTGGCCCTGGCCCGCAAGAAGGATCGCGCCCTTGGCAAGTTTGCCACCGTGGAGGCCGTGGCCGAGCTGGTGAAGAAGTGTGAGCAGGACTACGGCAGCGGGAACTACTGCTTCGTGCTGGAGGACCTGGACATGGGCGCCATGAACCTGAAGCGCAACAACCGCGTGAAGCACATGGCCGTGATGGAAGAGGCCCTGGTGAACCAGATGCGAAAGCAGGGCTACGCCTACGATGGCAGACGGGGCCGCGTCGACGGCGTGCGCCATGAGGGCGCCTGGTACACCTCCCAGGTGAGCCCATTCGGGTGGTGGGCCAAGCGCGATGAGGTGGAAGAGGCCTGGAAGCGCGATAAGACTCGCCCTATCGGCCGCAAGGTGGGCAACTGGTACGAGATGCCTGAGCCAGGCCAGGACGGCGATCGCCCTGACACTTACCGTAAGGGCTATTGGTCTAAGCCTAAGAACGCTGAGGGCAAGCCATACGGCCGCAACCGCTTTAGCGTGGAACCAGGCGATGAGAAGCCTGATGCCGAGCGCCGCTTTTGCTGGGGCAGCGAGCTGTTCTGGGACCCTAACGTCAAGTCCTTTAAAGGCAAGGAGTTCCCTGAGGGCGTGGTGCTGGACGCCGACTTTGTGGGCGCCCTGAACATCGCCCTGCGCCCACTTGTCAACGACGGCCAGGGCAAGGGCTTTAAGGCCGAGGACATGGCCCGCGAGCATACCATTCTGAACCCACAGTTTAAGATCGCCTGTCAGATCCCTGTGTACGAGTTCGTGGAGGAAGATGGCGATAAGTGGGCCGCCCTGCGCCGCATCATGCTGTGA
the amino acid sequence of AbCas pi_3 protein is shown below (SEQ ID NO: 3):
MGKNRSSSSDLSQLERSLRKVGENRLERLRVRGQKIRKHLEQHPRGKNDHQALHFLLHQIEVERNDLYRNLKDPEYVPKPAKRRRERRQINVAQPPTRPTKSVGPKPAPTTYVIPRPEPGRDLPAFASRYKASDSRGEDDQDGRSWTAAPFVEVELPIQIAGKILEKLRKYVQKDEREIVREWAVKTYGSIEAAREPLLIGAQVSEDVSVWRGLLAETKNAQDFAALSDDQIEAAMSKEAKGSDLRPRRAALLVAQRHWVDQTVKAIKESAPKGVDKDTLDRRLRAGLRGFHTAANSGKHTNPQFPYLTPKEAKVPLESVVNQVLEFLDDADDQRYVQVKVDDKKRHRVSHLQKELGKARPRKRLELQRPKWAGRPTVQGTISKRRDAALVWDTSKKENGLCLALPLGGLQKIDVERFIYQDGTSLLSDCQIASKTSKKGAACALMPLKPKHDFLRWYTKHVENHNADAPLERRCLHNTTQFVIVDPEGQRPRLFIRPVFKFYDPGKAVPNTHETWKKPDCRYLVGIDRGINYVLRAVVVDIEKKEVIADIHLQGDKHKWRMIRDEIAYHQQMRDLASNTGKHPSVVARHVRALALARKKDRALGRFTTVKAVADIVMQCENDYGSGNYCFVLEDLDMGKMNLKRNNRVKHMAVMKEALVNQMRKRGYAYDGRRGRADGVRYEGAWYTSQVSPFGWWAKREEVEEAWKKDTSRPIGRKVGNWYEMPDPNEEGKRSDVYRKGCWKKPQNASGKPYGRNRFCVEPGDEKPDAQRRFSWGSELFWDPNVKSFKGKEFPEGVVLDADFVGALNIALRPLVNDGQGRGFTADKMAEAHTRLNPQFEIVCKIPVYEFIEEHGDKRAKLRRIVL*
* Indicating the corresponding position of the stop codon
Nucleotide sequence of AbCas pi_3 protein (SEQ ID NO: 17):
ATGGGTAAGAATCGGTCCTCGTCCTCGGATTTGAGCCAGCTCGAACGATCCTTACGGAAAGTCGGTGAGAATCGCCTTGAGCGGCTGCGGGTGCGTGGGCAGAAGATTAGGAAGCACCTTGAACAGCACCCCCGAGGTAAGAACGATCATCAGGCCCTCCACTTTCTGCTCCACCAGATCGAGGTCGAACGGAATGACCTGTACCGAAACCTCAAAGACCCCGAATACGTGCCCAAGCCAGCGAAACGGCGGCGAGAAAGACGGCAGATCAACGTCGCCCAACCGCCGACCCGACCCACCAAGAGTGTGGGGCCGAAACCAGCGCCGACGACATACGTGATCCCGCGCCCCGAGCCAGGCCGTGACCTACCAGCATTCGCGAGCAGGTACAAGGCAAGTGACTCGAGAGGCGAGGACGACCAAGACGGTCGGTCATGGACTGCCGCGCCCTTTGTCGAAGTCGAGCTGCCGATACAAATTGCCGGCAAGATCCTCGAGAAACTCCGTAAGTACGTGCAAAAGGACGAACGGGAGATCGTTCGCGAGTGGGCTGTCAAGACCTATGGCTCGATCGAAGCCGCAAGAGAACCACTTCTTATCGGGGCACAAGTCTCGGAAGACGTCTCGGTCTGGCGCGGACTCCTCGCAGAAACGAAAAACGCACAGGACTTCGCCGCCCTCTCCGACGATCAGATCGAAGCAGCGATGTCGAAGGAGGCGAAGGGGTCAGACCTGCGTCCGAGGCGCGCCGCACTGCTAGTCGCACAGCGCCACTGGGTGGATCAGACCGTCAAGGCAATCAAGGAGTCGGCCCCGAAAGGCGTCGATAAGGACACACTCGATCGCCGTTTGCGCGCTGGCCTAAGGGGGTTTCATACAGCAGCTAATTCGGGTAAGCACACGAACCCACAGTTCCCATACTTGACGCCGAAAGAGGCAAAGGTGCCGTTAGAATCGGTCGTCAATCAGGTCTTAGAGTTCCTCGACGACGCGGACGACCAGCGCTACGTCCAGGTCAAGGTTGACGACAAGAAGCGCCACAGAGTCAGTCATCTCCAGAAGGAACTCGGGAAGGCGAGGCCGCGCAAGCGACTGGAGCTTCAGAGGCCAAAGTGGGCGGGTAGGCCTACAGTGCAAGGAACGATCAGCAAACGGCGCGACGCCGCACTCGTGTGGGACACGAGCAAGAAGGAAAACGGCCTCTGCCTCGCGCTCCCGCTCGGGGGTTTGCAGAAGATAGATGTTGAGCGGTTCATCTACCAAGACGGCACGTCACTATTGTCGGACTGCCAGATCGCGTCGAAGACCTCCAAGAAAGGTGCGGCGTGCGCGCTCATGCCGCTCAAGCCCAAGCACGACTTCCTGCGTTGGTACACCAAACACGTCGAGAACCACAACGCAGACGCGCCGCTCGAGCGCCGCTGTCTGCACAACACGACCCAGTTCGTGATCGTGGATCCAGAGGGGCAGCGCCCGCGTCTCTTCATCCGCCCCGTCTTCAAGTTCTACGACCCCGGCAAGGCAGTGCCGAACACGCACGAAACTTGGAAGAAGCCGGACTGCCGCTACCTGGTAGGGATCGACCGAGGTATCAACTACGTTCTGCGCGCTGTCGTTGTGGACATCGAAAAGAAGGAAGTCATCGCTGACATCCACCTACAAGGCGACAAGCACAAATGGAGGATGATCCGCGACGAGATCGCCTACCACCAACAGATGCGTGATCTTGCCAGCAACACAGGCAAACACCCGAGCGTCGTGGCGAGGCACGTCCGCGCACTCGCCCTCGCCCGCAAGAAGGATCGCGCGCTCGGCAGGTTTACGACGGTCAAGGCTGTCGCAGATATCGTCATGCAATGCGAAAACGACTACGGAAGCGGTAACTACTGCTTCGTGCTCGAAGACCTCGACATGGGCAAGATGAATCTCAAGCGCAACAACCGCGTGAAGCACATGGCCGTCATGAAGGAAGCGCTTGTCAATCAAATGCGCAAGCGCGGCTATGCCTACGACGGTCGCCGCGGCCGGGCGGACGGCGTCAGGTACGAGGGCGCATGGTACACGAGCCAAGTGTCCCCCTTCGGCTGGTGGGCCAAGCGTGAAGAGGTGGAGGAGGCGTGGAAGAAGGACACGTCGCGCCCGATCGGTCGCAAGGTCGGCAACTGGTACGAGATGCCAGATCCGAACGAAGAAGGAAAGCGGTCAGACGTGTATCGGAAGGGCTGCTGGAAGAAACCGCAGAACGCAAGCGGAAAGCCATACGGGCGGAACCGCTTCTGTGTGGAACCTGGCGACGAGAAGCCGGACGCTCAGCGGCGCTTCTCCTGGGGGAGCGAGCTGTTCTGGGACCCGAACGTGAAGTCCTTCAAGGGCAAAGAGTTTCCCGAGGGGGTCGTGCTGGACGCCGACTTCGTAGGAGCGCTCAACATCGCCCTTCGCCCACTCGTCAACGACGGTCAGGGCAGGGGCTTCACGGCAGACAAGATGGCCGAAGCGCATACGAGACTCAACCCGCAGTTCGAGATCGTTTGCAAAATCCCCGTTTATGAGTTCATCGAAGAGCACGGTGACAAGAGGGCAAAACTCAGGCGGATCGTGCTATAG
the amino acid sequence of AbCas pi_4 protein is shown below (SEQ ID NO: 4):
MPKKTSTVALSPRDIRLRELGEKRLQRLRQREEKIRRHLESERGRRDFQSLHFLLHKIEVERNDLYRNLYQNEGHESYVPKPGKTKHRKELSLPSTELPSPPDEKKGPRPKKSRYVIPQPVPGINLPRLINRFGKSDQKSESDQEGRFWTSAPFIEVELPMLNAHRVIKALMRFVQKDERSVVRTWAVTKFGSIEAAREVLLAGALLQREPEIMRGFLQNIDPWGSLSDEELIRDEKAWRTVKLLAQKNWVDQIAKSIKDSAPKGVDKDTLDRRLRSGLKAFHSAANSGKHTNPQFPYLTSEKPSANFESVVDSVLEFLDLEDKDRYTIAKVDDKKRHRVTALQKELGQAKPRVRLEQERSRWAGHSYLQGTITRKRQASLVWDGHRTENGLALAIPLDGMPKIDVQRYMYQDGTSLLSDRQITSKTKSEGKDCALMPLRFKHAFLRWYTKHVENHVAEAPLERRCIHNTTQFVIVDPEGKHPRLFIRPVFKFYDSNKTIQNSNAPWCKPQCRYLIGIDRGINYVLRAVVVDTEEKAVIDDIPLPGRKREWRAIRQEIAYFQRMRDLSKSAQERNRYVVALAKARRKDRSLGKTETVEAVAKLVQNCSERFGEGNYCFVLENLELGALNLKRNNRVKHLASMEEALIYQMRKRGYFYNSRSNRVDGVRWEAARYTSQVSPFGWWAKRDEVEKAKKQDKSMAIGRKIGEGYEGPQDDEIESHSTIYRQGRWMKLRNEEGKAYGRSRFVVQPEDLDPAQPRRFSWGSELFWDPYQKEFKGKSFSQGVVLDADFVGALNIALRPLVNDGKGKGFTTAMMAEAHVKLNPTFEIRCKIPVYEFIAENDNSRAALRRIVI*
* Indicating the corresponding position of the stop codon
Nucleotide sequence of AbCas pi_4 protein (SEQ ID NO: 18):
ATGCCTAAAAAGACCAGCACCGTGGCTCTGAGCCCACGCGACATCCGCCTGCGCGAACTGGGCGAAAAACGCCTGCAGCGCCTGCGCCAGCGCGAAGAAAAAATCCGCCGCCACCTGGAATCTGAACGCGGCCGCCGCGATTTCCAAAGCCTGCACTTCCTGCTGCACAAAATTGAAGTGGAACGCAACGACCTGTACCGCAACCTGTATCAGAACGAAGGCCACGAAAGCTACGTGCCAAAACCAGGCAAAACTAAACACCGCAAAGAACTCTCTCTGCCCTCTACCGAACTGCCTAGCCCACCTGATGAGAAAAAAGGCCCTCGCCCTAAAAAATCTCGCTACGTCATCCCACAGCCAGTGCCTGGCATCAACCTGCCTCGCCTGATCAATCGCTTCGGCAAGAGCGACCAGAAAAGCGAATCTGACCAGGAAGGCCGCTTCTGGACCTCTGCTCCTTTCATTGAAGTGGAACTGCCAATGCTGAACGCTCACCGCGTGATCAAAGCTCTGATGCGCTTCGTGCAGAAAGATGAACGCAGCGTCGTGCGAACTTGGGCAGTGACTAAATTCGGCTCTATCGAGGCCGCTCGCGAAGTGCTGTTGGCAGGCGCACTGCTGCAGAGAGAACCTGAAATTATGCGCGGCTTTCTGCAGAACATTGACCCATGGGGCAGTCTCAGCGACGAAGAACTGATCCGCGACGAAAAAGCCTGGCGCACCGTGAAACTGCTGGCACAGAAAAACTGGGTCGATCAGATCGCTAAATCCATCAAGGACAGCGCTCCTAAAGGCGTCGATAAAGATACCCTGGACCGCCGCCTGCGCTCTGGCCTGAAAGCATTCCACTCCGCTGCTAACTCCGGCAAACACACTAATCCACAGTTTCCTTACCTGACCTCTGAAAAACCTTCTGCTAACTTCGAATCTGTGGTCGACTCTGTCCTGGAATTCCTGGATCTGGAGGACAAAGACCGCTATACTATCGCAAAAGTGGATGACAAAAAGCGCCACCGCGTGACCGCCCTGCAGAAAGAACTGGGCCAGGCTAAGCCACGCGTGCGCCTGGAACAGGAACGCTCTCGCTGGGCAGGCCATAGCTACCTGCAGGGCACTATCACCCGGAAACGCCAGGCTTCCCTGGTCTGGGATGGCCATCGCACCGAAAACGGCCTGGCCCTGGCCATCCCTCTGGACGGCATGCCTAAAATTGATGTCCAGCGCTATATGTACCAGGACGGCACCTCCCTGCTGTCCGACCGCCAAATCACTAGCAAAACCAAATCCGAAGGCAAAGATTGTGCTCTCATGCCACTGCGCTTTAAACACGCATTCCTGCGCTGGTACACTAAACACGTGGAGAACCACGTGGCAGAAGCACCACTGGAACGCCGCTGCATTCATAACACCACCCAATTCGTGATTGTGGATCCAGAAGGAAAGCACCCTCGCCTGTTCATCCGCCCTGTGTTCAAGTTCTACGACTCTAACAAGACCATTCAGAACTCTAACGCCCCATGGTGTAAACCACAGTGCCGCTACCTGATTGGCATCGACCGCGGCATTAATTACGTGCTGCGCGCAGTCGTGGTGGATACTGAAGAAAAAGCTGTGATCGACGACATCCCACTGCCAGGCCGCAAACGGGAATGGCGTGCAATCCGCCAGGAAATCGCATACTTCCAGCGCATGCGCGATCTGAGCAAATCTGCCCAGGAACGCAACCGCTACGTGGTGGCCCTGGCTAAGGCTCGCCGCAAAGACCGCTCCCTGGGCAAAACCGAAACCGTGGAAGCTGTGGCTAAACTGGTCCAGAACTGCTCTGAACGCTTCGGCGAAGGCAACTACTGTTTCGTCCTCGAAAACCTGGAACTGGGCGCACTGAACCTGAAACGCAACAACCGCGTGAAACACCTGGCATCCATGGAAGAAGCTCTGATCTATCAGATGCGCAAACGCGGCTACTTTTATAACAGCCGCTCTAACCGCGTGGACGGCGTGCGCTGGGAAGCTGCTCGCTACACCAGCCAGGTGTCCCCTTTCGGGTGGTGGGCCAAACGCGACGAGGTGGAGAAAGCCAAAAAACAGGATAAATCGATGGCAATCGGCCGCAAAATTGGCGAAGGCTACGAAGGGCCACAGGATGATGAAATTGAATCCCATTCCACCATCTACCGCCAGGGCCGCTGGATGAAATTGCGCAACGAAGAGGGCAAAGCATATGGCCGCTCCCGCTTCGTGGTGCAGCCTGAAGATCTGGATCCAGCTCAGCCACGCCGCTTCAGCTGGGGCTCTGAACTGTTCTGGGACCCATACCAGAAAGAGTTCAAAGGCAAATCCTTCTCTCAGGGCGTGGTGCTGGACGCCGATTTCGTGGGCGCCCTGAACATCGCTCTGCGCCCACTGGTGAACGACGGCAAAGGCAAAGGCTTTACCACCGCTATGATGGCTGAAGCACACGTGAAACTGAACCCAACTTTTGAAATCCGCTGTAAAATCCCAGTGTATGAATTCATCGCTGAAAACGATAACTCTCGCGCAGCACTGCGCCGCATCGTGATCTGA
In other specific embodiments, the AbCas pi protein comprises one or more of the following sequences:
2) An amino acid sequence having at least 80%, 82%, 85%, 87%, 90%, 92%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence set forth in any one of SEQ ID NOs 1 to 4 and which retains the activity of binding AbCas pi guide RNAs and/or nuclease activity of the amino acid sequence set forth in any one of SEQ ID NOs 1 to 4;
3) An amino acid sequence of 1 or more amino acid residues added, substituted, deleted or inserted in the amino acid sequence shown in any one of SEQ ID NO. 1 to SEQ ID NO. 4, and which retains the activity of binding AbCas pi guide RNA and/or nuclease activity of the amino acid sequence shown in any one of SEQ ID NO. 1 to SEQ ID NO. 4; or alternatively
4) An amino acid sequence encoded by a nucleotide sequence that hybridizes with a polynucleotide sequence encoding an amino acid sequence as set forth in any one of SEQ ID NOs 1 to 4 under stringent conditions and retains the activity of binding AbCas pi guide RNAs and/or nuclease activity of the amino acid sequence as set forth in any one of SEQ ID NOs 1 to 4, said stringent conditions being medium stringent conditions, medium-high stringent conditions, high stringent conditions or very high stringent conditions.
It will be appreciated that AbCas pi proteins provided by the present invention may include one or more additional features. For example, in some embodiments, abCas pi proteins may comprise inhibitors, cytoplasmic localization sequences, export sequences, such as nuclear export sequences or other localization sequences, and tags that may be used to solubilize, purify, or detect fusion. Suitable tags provided herein include, but are not limited to, a Biotin Carboxylase Carrier Protein (BCCP) tag, myc tag, calmodulin tag, FLAG tag, hemagglutinin (HA) tag, polyhistidine tag, also known as histidine tag or His-tag, maltose Binding Protein (MBP) -tag, nus-tag, glutathione-S-transferase (GST) -tag, green Fluorescent Protein (GFP) -tag, thioredoxin-tag, S-tag, softags (e.g., softag 1, softag 3), chain tag, biotin ligase tag, flash tag, V5 tag, and SBP tag. Other suitable sequences will be apparent to those skilled in the art. In some embodiments, abCas pi proteins comprise one or more His tags.
In other embodiments, abCas pi proteins can also be fused (conjugated) to heterologous polypeptides having activity of interest to form fusion proteins (chimeric AbCas pi proteins) that have additional functions, such as modulating transcription of a target DNA, modifying enzymatic activity of a target nucleic acid, modifying enzymatic activity of a polypeptide associated with a target nucleic acid, and the like.
The AbCas pi proteins provided by the present invention are shorter compared to previously identified CRISPR-Cas endonucleases, and thus have the advantage of being relatively shorter when this protein is used as an alternative to providing the nucleotide sequence encoding the protein. For example, where a nucleic acid encoding AbCas pi protein is desired, such as where a viral vector (e.g., an AAV vector) is used, this can be used for delivery to cells such as eukaryotic cells (e.g., mammalian cells, human cells, mouse cells, in vitro, ex vivo, in vivo) for research and/or clinical applications. And the strain in which AbCas pi proteins are present is derived from municipal sewage with less risk of eliciting human immune responses in later applications than CRISPR-Cas systems derived from staphylococcus aureus (SaCas 9) or streptococcus pyogenes (SpCas 9).
(Protospacer adjacent motif (PAM))
AbCas pi proteins bind to target DNA at a target sequence defined by a region of complementarity between RNA of the target DNA and the target DNA. As with many Cas proteins, site-specific binding (and/or cleavage) of double-stranded target DNA occurs at a location determined by both: (i) base pairing complementarity between the guide RNA and the target DNA; and (ii) a short motif in the target DNA (i.e., a Protospacer Adjacent Motif (PAM)).
In some embodiments, PAM of AbCas pi protein is located directly 5' of the target sequence of the non-complementary strand of the target DNA (the complementary strand hybridizes to the guide sequence of the guide RNA, whereas the non-complementary strand does not hybridize directly to the guide RNA and is the reverse complement of the non-complementary strand).
The PAM sequence identified by the AbCas pi protein-based system provided by the invention is a motif rich in C bases, namely the AbCas pi protein in the system identifies the PAM sequence rich in C bases. In some specific embodiments, the PAM sequence comprises 5'-CCN-3' (where N is any DNA nucleotide), unlike PAM sequences of all known CRISPR-Cas12 systems, effectively expanding the editing activity window used by CRISPR-Cas12 systems in the genome. In some specific embodiments, PAM sequences recognized by AbCas pi proteins in the system are selected from the group consisting of: TGCCC, CACCT, TTCCC, TCCCC, CTCCC, CTCCT.
(AbCas pi guide RNA)
Nucleic acid molecules that bind to AbCas pi proteins to form ribonucleoprotein complexes (RNPs) and target the complexes to specific locations within a target nucleic acid (e.g., target DNA) are referred to herein as "AbCas pi guide RNAs" or simply "guide RNAs". It is understood that in some embodiments, hybrid DNA/RNA may be prepared such that AbCas pi guide RNA comprises DNA bases in addition to RNA bases, but the term "AbCas pi guide RNA" is still used to encompass such molecules herein.
In some embodiments, abCas pi guide RNAs comprise two segments, a targeting segment and a protein binding segment. The targeting segment of AbCas pi guide RNAs comprises a nucleotide sequence (guide sequence) that is complementary (and thus hybridizes) to a particular sequence (target site) within a target nucleic acid (e.g., target ssRNA, target ssDNA, complementary strand of double-stranded target DNA, etc.). The protein binding segment (or "protein binding sequence") interacts (binds) with AbCas pi polypeptides. The protein binding segment of AbCas pi guide RNA comprises two complementary nucleotides that hybridize to each other to form a double-stranded RNA duplex (dsRNA duplex). Site-specific binding and/or cleavage of a target nucleic acid (e.g., genomic DNA) can occur at a location where base pairing complementarity between AbCas pi guide RNA (the guide sequence of AbCas pi guide RNA) and the target nucleic acid is determined (e.g., the target sequence of a target locus).
AbCas pi guide RNAs and AbCas pi proteins form a complex (e.g., bind via non-covalent interactions). AbCas pi guide RNAs provide target specificity to a complex by comprising a targeting segment that comprises a guide sequence (a nucleotide sequence complementary to a target nucleic acid sequence). The AbCas pi protein of the complex provides site-specific activity (e.g., cleavage activity provided by the AbCas pi protein). In other words, abCas pi proteins are directed to a target nucleic acid sequence (e.g., target sequence) due to their binding to AbCas pi guide RNAs.
The "guide sequence" (also referred to as a "targeting sequence" of AbCas pi guide RNAs) can be modified so that the AbCas pi guide RNAs can target AbCas pi protein (e.g., naturally occurring AbCas pi protein, etc.) to any desired sequence of any desired target nucleic acid (except for the PAM sequences described above, which can be considered). Thus, for example, abCas pi guide RNAs can have a guide sequence that is complementary to (e.g., can hybridize to) a sequence in a nucleic acid in a eukaryotic cell, e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., eukaryotic chromosome, chromosomal sequence, eukaryotic RNA, etc.), and the like.
In some embodiments, abCas pi guide RNAs can also be said to include "activators" and "targets" (e.g., activator RNAs (e.g., tracrRNA) and target factor RNAs (e.g., crRNA), respectively). When the "activator" and "targeting factor" are two separate molecules, the guide RNA is referred to herein as "Dual guide RNA," "Dual guide RNA (dgRNA)", "Dual guide RNA," or "two-molecule guide RNA" (e.g., "AbCas pi Dual guide RNA").
In the present invention, the term "activator" or "activator RNA" is used herein to mean a tracrRNA-like molecule of AbCas pi double guide RNA (and thus, abCas pi single guide RNA when the "activator" and "targeting factor" are linked together by, for example, intervening nucleotides (tracrRNA: "trans-acting CRISPR RNA"). Thus, for example, abCas pi guide RNAs (dgRNA or sgrnas) comprise an activator sequence (e.g., a tracrRNA sequence). tracr molecules (tracrRNA) are naturally occurring molecules that hybridize with CRISPR RNA molecules (crRNA) to form AbCas pi double guide RNAs. The term "activator" is used herein to encompass not only naturally occurring tracrRNA, but also tracrRNA with modifications (e.g., truncations, extensions, sequence variations, base modifications, backbone modifications, bond modifications, etc.), wherein the activator retains at least one function of the tracrRNA (e.g., a dsRNA duplex that contributes to the binding of AbCas pi protein). In some embodiments, the activating factor provides one or more stem loops that can interact with AbCas pi proteins. The activator may be referred to as having a tracr sequence (tracrRNA sequence), and in some embodiments is a tracrRNA, although the term "activator" is not limited to naturally occurring tracrRNA.
In the present invention, the term "targeting factor" or "targeting factor RNA" is used herein to refer to a crRNA-like molecule (crRNA: "CRISPR RNA") of AbCas pi double guide RNA (and thus AbCas pi single guide RNA when the "activator" and "targeting factor" are linked together, e.g., by intervening nucleotides). Thus, for example, abCas pi guide RNAs (dgRNA or sgrnas) comprise a guide sequence and a duplex-forming segment (e.g., a duplex-forming segment of crRNA, which may also be referred to as a crRNA repeat sequence). Because the sequence of the targeting segment of the targeting factor (the segment that hybridizes to the target sequence of the target nucleic acid) is user-modified to hybridize to the desired target nucleic acid, the sequence of the targeting factor will typically be a non-naturally occurring sequence. However, the duplex-forming segment of the targeting factor (described in more detail herein) that hybridizes to the duplex-forming segment of the activating factor may comprise a naturally occurring sequence (e.g., a sequence that may comprise a duplex-forming segment of a naturally occurring crRNA, which may also be referred to as a crRNA repeat sequence). Thus, the term targeting factor is used herein to distinguish from naturally occurring crrnas, despite the fact that a portion of the targeting factor (e.g., a duplex-forming segment) typically comprises a naturally occurring sequence from a crRNA. However, the term "targeting factor" encompasses naturally occurring crrnas.
In some embodiments, the activator and the targeting factor are covalently linked to each other (e.g., by insertion of a nucleotide), and the guide RNA is referred to herein as "single guide RNA," "single guide RNA (sgRNA)", "single molecule guide RNA," or "one molecule guide RNA" (e.g., "AbCas pi single guide RNA"). Thus, abCas pi single guide RNAs comprise a targeting factor (e.g., a targeting factor RNA) and an activating factor (e.g., an activating factor RNA) linked to each other (e.g., by insertion of nucleotides) and hybridized to each other to form a double-stranded RNA duplex (dsRNA duplex) of the protein binding segment of the guide RNA, thereby producing a stem-loop structure. Thus, the targeting factor and the activating factor each have a duplex forming segment, wherein the duplex forming segment of the targeting factor and the duplex forming segment of the activating factor are complementary to each other and hybridize to each other.
In some embodiments, the linker of AbCas pi single guide RNAs is a stretch of nucleotides. In some embodiments, the targeting and activating factors of AbCas pi single guide RNAs are linked to each other by intervening nucleotides, and the linker may have a length of 3 to 20 nucleotides (nt). In some embodiments, the linker of AbCas pi single guide RNAs can have a length of 3 to 100 nucleotides (nt). In some embodiments, the linker of AbCas pi single guide RNAs can have a length of 3 to 10 nucleotides (nt).
In some specific embodiments, for AbCas pi_1 protein, the AbCas pi guide RNA in its dual guide RNA form comprises an activator RNA (e.g., tracrRNA) and a targeting factor RNA (e.g., crRNA), wherein the activator RNA comprises a nucleotide sequence as set forth in SEQ ID NO:5 or a nucleotide sequence having 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the sequence set forth in SEQ ID NO: 5.
AbCas pi_1 activator RNA (e.g., tracrRNA) (SEQ ID NO: 5):
UCUGCCGAAGACGCCGCACGGAGCCUGGGCCGGAAUCGUAGAUCGAACGCGGCAUCGAAGCCCUGCAGCCCUUCGGGGCCAAGGCGGCGCAGCAAGCCUCUUUCAGGCGGCAGAGUCCUUUAGAGUGU
The repeated sequence of crRNA comprises the nucleotide sequence shown as SEQ ID NO. 6 or a nucleotide sequence having 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the sequence shown as SEQ ID NO. 6.
AbCas pi_1 crRNA repeat (SEQ ID NO: 6):
GGCGCAGGACAAAAUGCACACUCUAAAGGAAUGAAAG
In some specific embodiments, for AbCas pi_1 protein, the AbCas pi guide RNA in its single guide RNA form comprises the nucleotide sequence set forth in SEQ ID No. 7 or a nucleotide sequence having 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the sequence set forth in SEQ ID No. 7.
AbCasπ_1sgRNA(SEQ ID NO:7):
GUCUGCCGAAGACGCCGCACGGAGCCUGGGCCGGAAUCGUAGAUCGAACGCGGCAUCGAAGCCCUGCAGCCCUUCGGGGCCAAGGCGGCGCAGCAAGCCUCUUUCAGGCGGCAGAGUCCUUUAGAGUGUGAGAGACACUCUAAAGGAAUGAAAG
Wherein the underlined part is the linker sequence (linker) linking the tracrRNA and crRNA repeats.
In some specific embodiments, for AbCas pi_2 protein, the AbCas pi guide RNA in its dual guide RNA form comprises an activator RNA (e.g., tracrRNA) and a targeting factor RNA (e.g., crRNA), wherein the activator RNA comprises a nucleotide sequence as set forth in SEQ ID NO:8 or a nucleotide sequence having 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the sequence set forth in SEQ ID NO: 8.
AbCas pi_2 activator RNA (e.g., tracrRNA) (SEQ ID NO: 8):
GUCUCGACUAUGCCGUACCACUAGACCGAGCCUACACGGCACGCGGUCAUAGCGUUAACCAAGGCGUGGUGACAAGCCUCUUUCAGGCGUCGGACACUUAAGAGCGUU
The repeated sequence of crRNA comprises the nucleotide sequence shown as SEQ ID NO. 9 or a nucleotide sequence having 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the sequence shown as SEQ ID NO. 9.
AbCas pi_2 crRNA repeat (SEQ ID NO: 9):
GUCGCAGGGGAUCAAGAACGCUCUUAGGGAAUGAAAG
In some specific embodiments, for AbCas pi_2 protein, the AbCas pi guide RNA in its single guide RNA form comprises the nucleotide sequence set forth in SEQ ID No. 10 or a nucleotide sequence having 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the sequence set forth in SEQ ID No. 10.
AbCasπ_2sgRNA(SEQ ID NO:10):
GUCUCGACUAUGCCGUACCACUAGACCGAGCCUACACGGCACGCGGUCAUAGCGUUAACCAAGGCGUGGUGACAAGCCUCUUUCAGGCGUCGGACACUUAAGAGCGUUGAGAGAACGCUCUUAGGGAAUGAAAG
Wherein the underlined part is the linker sequence (linker) linking the tracrRNA and crRNA repeats.
In some specific embodiments, for AbCas pi 3 protein, the AbCas pi guide RNA in its dual guide RNA form comprises an activator RNA (e.g., tracrRNA) and a targeting factor RNA (e.g., crRNA), wherein the activator RNA comprises a nucleotide sequence as set forth in SEQ ID NO:11 or a nucleotide sequence having 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the sequence set forth in SEQ ID NO: 11.
AbCas pi_3 activator RNA (e.g., tracrRNA) (SEQ ID NO: 11)
CUACACUAAGCCUAAACGGCACGAGCGAUAGCCCUGCGGGGAUUCCCCAAAGCCCGUACGACAAGCCUCUUUCAGGCGUCGGACACUUAAGAGCGUUAGGCGGGCGGUCCCUAAGCCGCUCGCCCCCUUAUCCCCACGGUUUCCAAGAACC
The repeated sequence of crRNA comprises the nucleotide sequence shown as SEQ ID NO. 12 or a nucleotide sequence having 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the sequence shown as SEQ ID NO. 12.
AbCas pi_3 crRNA repeat (SEQ ID NO: 12):
GUCGCAGGGGAUCAAGAACGCUCUUAGGGAAUGAAAG
In some specific embodiments, for AbCas pi_4 protein, the AbCas pi guide RNA in its dual guide RNA form comprises an activator RNA (tracrRNA) and a targeting factor RNA (crRNA), wherein the tracrRNA comprises a nucleotide sequence as set forth in SEQ ID No. 13 or a nucleotide sequence having 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the sequence set forth in SEQ ID No. 13.
AbCasπ_4tracrRNA(SEQ ID NO:13):
UUCUAAGUUCCAUCUCGAUGCGGAACGGAUACUACGCUGUAGUCUAUACGACACGAGUGAUAGCCCUGCGGGGUUCGCCCCUAAGUCCGUAUGACAAGCCUCUUUCAGGCGGUGGACUUCUAAGAGUGCUGGUGGGUG
The repeated sequence of crRNA comprises the nucleotide sequence shown as SEQ ID NO. 14 or a nucleotide sequence having 80% or more identity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100% identity) to the sequence shown as SEQ ID NO. 14.
AbCas pi_4 crRNA repeat (SEQ ID NO: 14):
GUCGCAUGAGGCGAGAAGCACUCUUAGGGAAUGAAAG
(AbCas. Pi. Guide sequence of guide RNA)
The targeting segment of AbCas pi guide RNAs comprises a guide sequence (i.e., a targeting sequence), which is a nucleotide sequence that is complementary to a sequence in a target nucleic acid (target site/target sequence). In other words, the targeting segment of AbCas pi guide RNAs can interact with a target nucleic acid (e.g., double-stranded DNA (dsDNA), single-stranded DNA (ssDNA), single-stranded RNA (ssRNA), or double-stranded RNA (dsRNA)) in a sequence-specific manner by hybridization (i.e., base pairing). The guide sequence of AbCas pi guide RNAs can be modified (e.g., by genetic engineering)/designed to hybridize to any desired target sequence within a target nucleic acid (e.g., eukaryotic target nucleic acid, such as genomic DNA) (e.g., when PAM is considered, e.g., when targeting dsDNA targets).
In some embodiments, the percentage of complementarity between the guide sequence and the target site of the target nucleic acid is 60% or greater (e.g., 65% or greater, 70% or greater, 75% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%). In some embodiments, the percentage of complementarity between the guide sequence and the target site of the target nucleic acid is 80% or greater (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%). In some embodiments, the percentage complementarity between the guide sequence and the target site of the target nucleic acid is 90% or greater (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%). In some embodiments, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100%.
In some embodiments, the percentage of complementarity between the guide sequence and the target site of the target nucleic acid is 60% or greater (e.g., 70% or greater, 75% or greater, 80% or greater, 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%) over 19-25 consecutive nucleotides. In some embodiments, the percentage complementarity between the guide sequence and the target site of the target nucleic acid is 80% or greater (e.g., 85% or greater, 90% or greater, 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%) over 19-25 consecutive nucleotides. In some embodiments, the percentage complementarity between the guide sequence and the target site of the target nucleic acid is 90% or greater (e.g., 95% or greater, 97% or greater, 98% or greater, 99% or greater, or 100%) over 19-25 consecutive nucleotides. In some embodiments, the percent complementarity between the guide sequence and the target site of the target nucleic acid is 100% over 19-25 consecutive nucleotides.
In some embodiments, the guide sequence has a length in the range of 19-30 nucleotides (nt) (e.g., 19-25, 19-22, 19-20, 20-30, 20-25, or 20-22 nt). In some embodiments, the guide sequence has a length in the range of 19-25 nucleotides (nt) (e.g., 19-22, 19-20, 20-25, or 20-22 nt). In some embodiments, the guide sequence has a length of 19 or more nts (e.g., 20 or more, 21 or more, or 22 or more nts; 19 nts, 20 nts, 21 nts, 22 nts, 23 nts, 24 nts, 25 nts, etc.). In some embodiments, the guide sequence has a length of 19 nt, a length of 20 nt, a length of 21 nt, a length of 22 nt, or a length of 23 nt.
(Donor polynucleotide/donor template)
Under the direction of AbCas pi double-or single-guide RNAs, the AbCas pi protein in some cases generates site-specific double-strand breaks (DSBs) or single-strand breaks (SSBs) within double-strand DNA (dsDNA) target nucleic acids (e.g., when the AbCas pi protein is a nickase variant), which are repaired by non-homologous end joining (NHEJ) or Homologous Directed Recombination (HDR).
In some embodiments, contacting the target DNA (with AbCas pi protein and AbCas pi guide RNA) occurs under conditions that allow non-homologous end joining or homology directed repair. Thus, in some embodiments, the target DNA is contacted with the donor polynucleotide (e.g., by introducing the donor polynucleotide into the cell), wherein the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide is integrated into the target DNA.
< Polynucleotide >
The present invention provides a polynucleotide comprising one or more of the following:
i) A nucleotide sequence encoding AbCas pi protein in the CRISPR-Cas system described above;
ii) a nucleotide sequence encoding AbCas pi guide RNAs in the CRISPR-Cas system described above; and
Iii) A donor polynucleotide sequence.
In some embodiments, when AbCas pi guide RNAs are single guide RNAs, the polynucleotide sequence encoding the guide RNAs may comprise a single nucleotide sequence. In other embodiments, when AbCas pi guide RNAs are double guide RNAs, the polynucleotide sequence encoding the guide RNAs may comprise two separate nucleotide sequences.
In some embodiments, the nucleotide sequence encoding AbCas pi protein is codon optimized. This type of optimization may require mutations in the nucleotide sequence encoding AbCas pi to mimic the codon bias of the intended host organism or cell while encoding the same protein.
< Vector >
The present invention provides a vector comprising a polynucleotide as described above, i.e. comprising one or more of the following:
(i) A nucleotide sequence encoding AbCas pi protein in the CRISPR-Cas system described above;
(ii) A nucleotide sequence encoding AbCas pi guide RNA in the CRISPR-Cas system described above; and
(Iii) Donor nucleotide sequence.
In some embodiments, the above (i) - (iii) may be in the same carrier. In other embodiments, the above (i) - (iii) may be in different carriers.
In some embodiments, the vector is an expression vector, more particularly a recombinant expression vector. Suitable expression vectors include viral expression vectors (e.g., viral vectors based on vaccinia virus, polio virus, adenovirus, adeno-associated virus, SV40, herpes simplex virus, human immunodeficiency virus, retroviral vectors (e.g., murine leukemia virus, spleen necrosis virus, and vectors derived from retroviruses such as rous sarcoma virus, hawy sarcoma virus, avian leukemia virus, lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, and breast tumor virus), and the like.
Any of a number of suitable transcriptional and translational control elements may be used in the expression vector, including constitutive and inducible promoters, transcriptional enhancer elements, transcriptional terminators, and the like, depending on the host/vector system used.
Methods of introducing nucleic acids into host cells are known in the art, and any convenient method may be used to introduce nucleic acids (e.g., expression constructs) into cells. Suitable methods include, for example, viral infection, transfection, liposome transfection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI) -mediated transfection, DEAE-dextran-mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct microinjection, nanoparticle-mediated nucleic acid delivery, and the like.
< Cell >
The present invention provides a cell comprising one or more of the following:
(a) The CRISPR-Cas system described above;
(b) A polynucleotide as described above; and
(C) The carrier described above.
The cell may be any of a variety of cells including, for example, in vitro cells, in vivo cells, ex vivo cells, primary cells, cancer cells, animal cells, plant cells, algal cells, fungal cells, and the like.
In some embodiments, the cell is a receptor for a CRISPR-Cas system provided herein, which may also be referred to as a "host cell" or a "target cell. The host cell or target cell may be a receptor for a CRISPR-Cas system provided by the present invention. The host cell or target cell may be a receptor for an RNP complex provided by the invention. The host cell or target cell may be a single component receptor in a CRISPR-Cas system provided by the present invention.
In some specific embodiments, non-limiting examples of cells include: prokaryotic cells, eukaryotic cells, bacterial cells, archaeal cells, cells of single-cell eukaryotic organisms, protozoan cells, plant-derived cells, algal cells, fungal cells, animal cells, invertebrate-derived cells, vertebrate-derived cells, mammalian-derived (e.g., ungulates; rodents; non-human primates; humans; felines; dogs, etc.), and the like. In some cases, the cells are cells that are not derived from a natural organism (e.g., the cells may be synthetic cells; also referred to as artificial cells).
< Kit >
The present invention provides a cell comprising one or more of the following:
(A) The CRISPR-Cas system described above;
(B) A polynucleotide as described above;
(C) The carrier described above; and
(D) The above-mentioned cells.
< Method of modifying target nucleic acid >
The present invention provides a method of modifying a target nucleic acid comprising the step of contacting the target nucleic acid with a CRISPR-Cas system provided herein. In some embodiments, the contacting results in modification of the target nucleic acid by the AbCas pi polypeptide.
In some embodiments, the CRISPR-Cas system comprises: abCas pi polypeptides and AbCas pi guide RNAs, wherein AbCas pi guide RNAs comprise a guide sequence that hybridizes to a target sequence of the target nucleic acid.
In some specific embodiments, the modification is cleavage of the target nucleic acid. In some specific embodiments, the target nucleic acid is selected from the group consisting of: double-stranded DNA, single-stranded DNA, RNA, genomic DNA, and extrachromosomal DNA.
In some specific embodiments, the contacting occurs in vitro or in vivo. In some specific embodiments, the contacting occurs inside or outside the cell.
In some specific embodiments, the cell is a eukaryotic cell or a prokaryotic cell.
In some more specific embodiments, the cell is selected from the group consisting of: plant cells, fungal cells, mammalian cells, reptile cells, insect cells, avian cells, fish cells, parasite cells, arthropod cells, invertebrate cells, vertebrate cells, rodent cells, mouse cells, rat cells, primate cells, non-human primate cells, and human cells.
In some more specific embodiments, the contacting results in genome editing.
In some embodiments, the contacting comprises introducing the CRISPR-Cas system into a cell. In some embodiments, the contacting further comprises: a DNA donor template is introduced into the cell.
Examples
According to the invention, through a bioinformatics means, a characteristic model of a known Cas12 protein sequence is constructed, and 3441 novel unreported Cas12 family proteins are successfully identified in an IMG/M public database. The cluster analysis of the evolutionary tree confirms a novel class of Cas12 family subclass proteins, which are different from other proteins of the known Cas12 family, the size distribution of which is 850-867 amino acids (aa), and the protein sequence obtained by NCBI Blast shows that the related Cas12 family subclass proteins are all Cas proteins from the same non-pathogenic bacteria Armatimonadetes bacterium, and are called AbCas pi proteins or AbCas pi polypeptides in the invention. Two AbCas pi proteins with larger sequence difference are selected in the invention: abCas pi_1 and AbCas pi_2, and AbCas pi_3 and AbCas pi_4. The gRNA is obtained through bioinformatics prediction and in vitro transcription, and two proteins are expressed and purified, so that a Cas protein-gRNA complex is constructed. Comprehensive application of in vitro cleavage biochemical experiments found that PAM activity windows for AbCas pi_1 and AbCas pi_2 proteins tended to select C-rich DNA sequences, unlike previous PAM tended to select other Cas12 family proteins rich in T. Finally, in vitro cutting DNA experiments prove that AbCas pi family proteins have stronger in vitro cutting activity on the PAM sequence rich in C. Therefore, the invention discovers a novel candidate Cas12 homologous protein with activity, small volume, non-pathogenic bacteria source and novel activity window, and can be developed into a novel gene editing platform.
Example 1: bioinformatic prediction of CRISPR-Cas systems and tracrRNA
1. Bioinformatics prediction of CRISPR-Cas systems
Firstly, the prediction of the CRISPR sequence is aimed at, and the CRISPR sequence can be used as a marker sequence for screening in a database due to obvious characteristics of the CRISPR sequence. The present embodiment predicts CRISPR sequences in datasets using CRISPRCASFINDER and CRISPRIDENTIFY software and merges the predictions of both software. Secondly, we finally successfully predict four CRISPR-AbCas pi systems by constructing a Hidden Markov Model (HMM) on known Cas proteins and annotating proteins near the predicted CRISPR sequence by the model.
2. Bioinformatic prediction of tracrRNA
The tracrRNA is the major component of gRNA in CRISPR systems, the biggest feature of which is the existence of anti-Repeat sequences that complement and pair with Repeat (Repeat) sequences. Therefore, for the prediction of tracrRNA, we first obtain an anti-Repeat sequence complementary to the Repeat sequence by blast of the Repeat sequence in the predicted CRISPR sequence (i.e. CRISPR RNA, or crRNA for short); secondly, predicting a promoter sequence positioned at the 5' upstream of the anti-repeat sequence by promoter prediction software (BDGP website); finally, the determined tracrRNA sequence was obtained by macro transcriptome analysis, as shown in fig. 4.
Example 2: results of in vitro purification of AbCas pi_1 and AbCas pi_2 proteins and in vitro transcription of gRNA
1. Construction of recombinant plasmids pET28B_6His_sumo_AbCaspi_1 and pET28B_6His_sumo_AbCaspi_2
(1) PET28B_6His_sumo vector (stored in this laboratory) was linearized with BamHI and XhoI endonucleases and recovered using a DNA miniagarose recovery kit (Magen Co.);
(2) The pUC57_ AbCas pi_1 and pUC57_ AbCas pi_2 sequences, which are ordered and synthesized from the manufacturing company and are optimized for both human cells and escherichia coli cell codons, are amplified by utilizing a PCR reaction, and the homologous arm sequences of the vectors are added to facilitate the construction of subsequent homologous recombinant plasmids, and then are recovered by a DNA small-scale agarose recovery kit (Magen company);
Primer sequences used for PCR amplification of protein gene sequences:
AbCasπ_1:
The pre-primer (SEQ ID NO: 19):
5’-CAGAGAACAGATTGGTGGATCCATGGCCAAGGCCACTAAAGAG-3’
rear primer (SEQ ID NO: 20):
5’-TGGTGGTGGTGGTGCTCGAGTCAAATCACGGCCTCGGCA-3’
AbCasπ_2:
the pre-primer (SEQ ID NO: 21):
5’-CAGAGAACAGATTGGTGGATCCATGGGCAAGAACCGCAGC-3’
rear primer (SEQ ID NO: 22):
5’-TGGTGGTGGTGGTGCTCGAGTCACAGCATGATGCGGCGC-3’
(3) The vector and the protein gene sequence with the homologous arm sequence are recombined by using a seamless cloning kit (Kangji corporation) and are transformed into E.Coli DH5 alpha (Tiangen Biotechnology Co., ltd.), and then the target plasmid is obtained by monoclonal sequencing.
2. Protein purification method for AbCas pi_1 and AbCas pi_2
(1) The pET28B_6His_sumo_AbCaspi_1 and pET28B_6His_sumo_AbCaspi_2 plasmids obtained in the step 1 are transformed and introduced into E.Coli Rossetta (DE 3) (Tiangen Biotechnology Co., ltd.) and after resuscitating, bacterial liquid is directly inoculated into 100mL LB culture medium, 220rpm is carried out, and the bacterial liquid is cultured overnight at 37 ℃ to obtain seed liquid;
(2) The next day, 20mL of seed liquid is inoculated into 1L of LB liquid culture medium, 220rpm,37 ℃ is used for culturing until the OD600 value is between 1.0 and 1.2, then the temperature is reduced to 16 ℃, and the culture is carried out for 18 hours overnight;
(3) Taking the bacterial liquid in the step (2), centrifuging at 3500rpm for 15 minutes, and collecting the precipitate;
(4) Taking the precipitate collected in the step (3), adding 100mL of lysis buffer (20mM HEPES pH7.5, 800mM NaCl,40mM imidazole (imidazole), 10% glycerol and 1mM TCEP) for resuspension, and performing ultrasonic lysis for 45min (220 w,2s work, 4s interval) to obtain a cell lysate. Taking cell lysate, centrifuging at 15000rpm for 80 minutes, and collecting supernatant;
(5) The collected supernatant was loaded onto a gravity column (GE HEALTHCARE), the column was washed with 5-10mL of lysis buffer and finally 10 column volumes of elution buffer containing 400mM imidazole were added to obtain the Sumo-tagged protein eluate. Elution buffer was 20mM HEPES pH7.5, 400mM NaCl,400mM imidazole, 1mM TECP,10% glycerol;
(6) Mixing the eluted product with sumo protein obtained in the step (5) with 100 mu L UlpI protease (stored in the laboratory), and reacting on ice for 30min;
(7) After completion of step (6), the samples were identified by SDS-PAGE gel, followed by addition of an equal volume of dilution buffer (20mM HEPES pH7.5, 200mM NaCl,10% glycerol), peristaltic pump loading onto 5mL heparin pre-loaded column (GE HEALTHCARE company), followed by competitive elution of the target protein on an AKTA (GE HEALTHCARE company) instrument using a gradient of salt concentration from 0M to 2M sodium chloride, gradually increasing. The eluted components are identified by SDS-PAGE, and a sample with correct protein band size is put into a 30kD concentration tube (Merck company), and is centrifuged for a plurality of times at 3800rpm for short time until the concentration is 0.5mL;
(8) The concentrated sample from step (7) was applied to a Superdex 200 10/300 column (GE HEALTHCARE) using SEC buffer (10mM HEPES pH7.5, 400mM NaCl,10% glycerol, 1mM TECP) via a 0.5mL loading ring. The chromatographic column separates and purifies the protein through the size of the particle diameter. The target protein started to peak after an elution volume of about 11.8mL, and the ratio of A280 to A260 was close to 2. After the eluted sample is identified to be correct by SDS-PAGE, the sample is concentrated by a 30kD concentration tube, and finally 10 mu L of the sample is frozen by liquid nitrogen.
The experimental results are shown in figures 2-3, A in figure 2 and A in figure 3 represent SDS-PAGE detection after protein purification, which shows that the purity of the two proteins is high, and the development of the subsequent experiments is facilitated; b in FIG. 2 and B in FIG. 3 represent the detection peak diagrams of AKTA instrument when passing through molecular sieve in the process of purifying protein, and the expression level of the protein is high, so that the extraction is convenient.
3. Synthesis and in vitro transcription of the DNA template of sgRNA
(1) The in vitro transcription DNA template of the sgRNA of AbCas pi_1 protein is formed by overlapping PCR (overlap PCR) by five primers, and the in vitro transcription DNA template of the sgRNA of AbCas pi_2 protein is formed by overlapping lap PCR by four primers. The product is further amplified by a pair of shorter upstream and downstream primers after being recovered by cutting gel, and the amplified product is directly used as a template for downstream in vitro transcription;
overlap PCR the primer sequences of the in vitro transcribed DNA templates for obtaining sgRNA are as follows:
for the sgsn RNA of AbCas pi_1 protein:
Firstly, four primer groups consisting of Caspi_1_sgF1 and Caspi_1_sgR1-Caspi_1_sgR4 are used for amplification; wherein:
Casπ_1_sgF1(SEQ ID NO:23):
5’-GACTTTAATACGACTCACTATAGTCTGCCGAAGACGCCGCACGGAGCCTGGGCC-3’
Casπ_1_sgR1(SEQ ID NO:24):
5’-TTCGATGCCGCGTTCGATCTACGATTCCGGCCCAGGCTCC-3’
Casπ_1_sgR2(SEQ ID NO:25):
5’-CTGAAAGAGGCTTGCTGCGCCGCCTTGGCCCCGAAGGGCTGCAGGCTTCGATGCCGCG-3’
Casπ_1_sgR3(SEQ ID NO:26):
5’-TCATTCCTTTAGAGTGTCTCTCACACTCTAAAGGACTCTGCCGCCTGAAAGAGGCTTGC-3’
Casπ_1_sgR4(SEQ ID NO:27):
5’-GTTCACCAGGGTGTCGCCCTCTTTCATTCCTTTAGAGTGTCTCT-3’
Further amplification was then performed using shorter upstream and downstream primer sets:
Casπ_1_sgF1-1(SEQ ID NO:28):
5’-GACTTTAATACGACTCACTATAGTCTGCCGAAGAC-3’
Casπ_1_sgR1-1(SEQ ID NO:29):
5’-GTTCACCAGGGTGTCGCCCTCC-3’
overlap PCR gives an in vitro transcribed DNA template sequence for the sgRNA, i.e.the sequence corresponding to the guide sequence-containing sgRNA for the AbCas pi_1 protein is (SEQ ID NO: 30):
GUCUGCCGAAGACGCCGCACGGAGCCUGGGCCGGAAUCGUAGAUCGAACGCGGCAUCGAAGCCCUGCAGCCCUUCGGGGCCAAGGCGGCGCAGCAAGCCUCUUUCAGGCGGCAGAGUCCUUUAGAGUGUGAGAGACACUCUAAAGGAAUGAAAGAGGGCGACACCCUGGUGAAC
the underlined section is a sequence complementary to a DNA target sequence (target sequence), i.e., a guide sequence.
For the sgRNA of AbCas pi_2 protein, firstly, three primer groups consisting of Caspi_2_sgF1 and Caspi_2_sgR1-Caspi_2_sgR3 are used for amplification; wherein:
Casπ_2_sgF1(SEQ ID NO:31):
5’-GACTTTAATACGACTCACTATAGTCTCGACTATGCCGTACCACTAGACCGAGCCT-3’
Casπ_2_sgR1(SEQ ID NO:32):
5’-TGTCACCACGCCTTGGTTAACGCTATGACCGCGTGCCGTGTAGGCTCGGTCTAGT-3’
Casπ_2_sgR2(SEQ ID NO:33):
5’-TTCTCTCAACGCTCTTAAGTGTCCGACGCCTGAAAGAGGCTTGTCACCACGCCT-3’
Casπ_2_sgR3(SEQ ID NO:34):
5’-GTTCACCAGGGTGTCGCCCTCTTTCATTCCCTAAGAGCGTTCTCTCAACGCTCTTAAGT-3’
Further amplification was then performed using shorter upstream and downstream primer sets:
Casπ_2_sgF1-1(SEQ ID NO:35):
5’-GACTTTAATACGACTCACTATAGTCTCGACTATGC-3’
Casπ_2_sgR1-1(SEQ ID NO:36):
5’-GTTCACCAGGGTGTCGCCCTC-3’
Overlap PCR gives an in vitro transcribed DNA template sequence for the sgRNA, i.e.the sequence corresponding to the guide sequence-containing sgRNA for the AbCas pi_1 protein is (SEQ ID NO: 37):
GUCUCGACUAUGCCGUACCACUAGACCGAGCCUACACGGCACGCGGUCAUAGCGUUAACCAAGGCGUGGUGACAAGCCUCUUUCAGGCGUCGGACACUUAAGAGCGUUGAGAGAACGCUCUUAGGGAAUGAAAGAGGGCGACACCCUGGUGAAC
The underlined part is the sequence complementary to the DNA target sequence (target sequence), i.e., the guide sequence.
(2) In vitro transcription system
An in vitro transcription system was formulated on ice as follows:
(3) The 10 XIVT buffer in the reaction system had a composition of 300mM Tris 8.1,250mM MgCl 2, 0.1% Triton (Triton) and 20mM spermidine (spermidine). The 5 XNTP fraction was ATP, GTP, UTP mM each and CTP was 5mM each. The system is placed in a metal bath with constant temperature of 37 ℃ for reaction for 8 hours;
(4) gel purification of gRNA products
Mixing the in vitro transcription reaction product with an equal volume of 2 Xformamide loading buffer (formamide loading buffer) as follows: 95% formamide, 0.02% SDS,0.02% BPB,0.01% xylene blue FF (xylene cyanole FF), 1mM EDTA. Samples were loaded onto 12.5% Urea-PAGE (Urea-PAGE) for electrophoresis. Placing the sample gel on a fluorescent coated TLC plate (fluorescent-coated TLC plate), marking a dark band due to nucleic acid absorption under the light of UV254, cutting the gel block by a medical disposable blade, placing into a 50mL centrifuge tube, and cutting;
(5) Adding RNA extraction buffer (0.38M NaAc (pH5.2), 0.8mM EDTA,0.8%SDS) with triple gel volume, and rotating at 4deg.C for incubation overnight;
(6) The sample was centrifuged at 2500rpm for 10 minutes and the supernatant was filtered through a 0.22. Mu.M filter into a 3kD concentration tube;
(7) Centrifuging at 4000rpm for 1 hour until the sample volume reaches about 500 mu L; diluting the sample to 15mL with DEPC water, centrifuging at 4000rpm, and concentrating to about 500 μL;
(8) Repeating the above operation for 2 times; and measuring the concentration of the sample after the last concentration by using Nanodrop, sub-packaging the sample in a small tube, and placing the small tube in a frozen state at the temperature of minus 80 ℃.
The experimental results are shown in fig. 5 to 6, and fig. 5 and 6 show the purification results of the DNA template after in vitro transcription and gel running separation, respectively, the RNA band positions are shown by uv, and the target RNA fragments are finally obtained by gel cutting and recovery.
Example 3: PAM sequences recognized by AbCas pi_1 and AbCas pi_2 proteins
Establishment of random PAM library N5_PAM_plasma_pool and detection of recognized PAM sequence
(1) Two single-stranded DNAs with complementary sequences were synthesized in the biological engineering (Shanghai) Co., ltd:
5’-GCCTGCAGGTCGACTCTAGAGGATCNNNNNAGGGCGACACCCTGGTGAACG-3’(SEQ ID NO:38),
5’-GGCCAGTGAATTCGAGCTCGGTACGTTCACCAGGGTGTCGCC-3’(SEQ ID NO:39)
Wherein N represents a random deoxynucleotide in A, T, G, C, and two primers were annealed in 10 Xannealing buffer (ANNEALINNG BUFFER,250mM KCl,100mM Tris-HCl 8.0). The procedure was 95℃for 5min, followed by slow cooling to room temperature.
(2) PUC19 vector (stored in this laboratory) was linearized with BamHI and KpnI endonucleases and recovered. Homologous recombination (century, CW 3034S) was performed on the product of step (1) and the linearized vector, DH 5. Alpha. Was transformed to competent cells, and 100mL of liquid LB bacteria were cultured. Plasmid extraction was performed using endotoxin-free plasmid large extraction kit (Tiangen, DP 117). A plasmid pool with random PAM was obtained.
(3) Plasmid cleavage experiments
Cas protein obtained in example 2, sgRNA (with guide sequence; complete sequence as shown in SEQ ID NO:30 or SEQ ID NO: 37), plasmid library obtained in step (2) were mixed at a molar ratio of 10:15:1 and reacted in three different salt concentration cleavage buffers (clear buffer,20mM Tris-HCl 7.5, 50/150/300mM NaCl,10mM MgCl 2, 1mM DTT) at 37℃for 60min. EDTA was added to terminate the reaction. Directly recovered using DNA beads (magbeads) (Northey Zan Co.).
The end repair was assembled according to the following system, with reaction at 11℃for 20min and 75℃for 10min.
(4) DA addition
To the reaction in step (3), 1. Mu.L of dATP, 1. Mu. L DREAMTAQ of polymerase (Thermo, EP 0702) was added and the reaction was carried out at 72℃for 30min.
(5) Adapter ligation
The product recovered in step (4) and the 32bp adapter sequence were ligated with T4 DNA ligase (ligase) (Biyun D7003). The reaction was carried out at room temperature for 30min. The product was recovered using beads (beads) (Norflua, N411-03).
(6) PCR enrichment
The desired fragment was amplified with a pair of primers from the plasmid and on the adapter, recovered using beads.
(7) Second generation sequencing library building
Library construction was performed on the fragments recovered in step (6) using TIANSeq Rapid DNA library construction kit (Tiangen Biochemical technologies Co., ltd., catalog number: NG 102). Samples were sequenced by Nostoc source. The analysis results after sequencing are shown in fig. 7-8, A, B, C in fig. 7 respectively represents the preferential weblogo result display of AbCas pi_1 protein to PAM sequences at three salt concentrations; a, B, C in fig. 8 shows the preferential weblogo result display of AbCas pi_2 protein for PAM sequences at three salt concentrations, respectively.
Therefore, the AbCas pi_1 and AbCas pi_2 proteins provided by the invention preferentially select a PAM sequence rich in C, which is different from all known proteins of Cas12 family subtypes, belong to a quite novel class of Cas12 family subtypes, and provide a good basis for the subsequent transformation of the protein into a cell editing tool. Specifically, the PAM sequence recognized by AbCas pi protein is selected from: TGCCC, CACCT, TTCCC, TCCCC, CTCCC, CTCCT.
Example 4: cy5 fluorescent-labeled dsDNA cleavage in vitro and efficiency of AbCas pi_1 and AbCas pi_2 proteins
In this example, two strands of a target nucleic acid with TGCCC as PAM sequence and a target sequence targeted by gRNA were designed and synthesized: AGGGCGACACCCTGGTGAAC (SEQ ID NO:40; also referred to as spacer) sequence), the target nucleic acid length is 55nt. The target nucleic acid has a 5' -Cy5 label on the strand of the non-target strand (non-TARGET STRAND). The two strands of the target nucleic acid are fitted to the double-stranded substrate DNA by annealing, i.e., the target nucleic acid.
The cleavage substrate DNA sequence (target nucleic acid) is as follows, wherein the target nucleic acid strand sequence containing the non-target strand is as follows (SEQ ID NO: 41): 5'-Cy5-GCCCGCGGGATGCCCAGGGCGACACCCTGGTGAACGACAATGAATATTTCGGCGC-3'
The target nucleic acid strand sequence containing the target strand is as follows (SEQ ID NO: 42): 5'-GCGCCGAAATATTCATTGTCGTTCACCAGGGTGTCGCCCTGGGCATCCCGCGGGC-3' A
In this example, a single guide RNA format was used to join the guide sequences, and a sgRNA containing the guide sequences was constructed.
In this example, the sgRNA sequences used, which contain the guide sequences, are as follows:
For AbCas pi_1 protein, its sgRNA containing the guide sequence is as follows (SEQ ID NO:30):GUCUGCCGAAGACGCCGCACGGAGCCUGGGCCGGAAUCGUAGAUCGAACGCGGCAUCGAAGCCCUGCAGCCCUUCGGGGCCAAGGCGGCGCAGCAAGCCUCUUUCAGGCGGCAGAGUCCUUUAGAGUGUGAGAGACACUCUAAAGGAAUGAAAGAGGGCGACACCCUGGUGAAC
The underlined part is the sequence complementary to the DNA target sequence (target sequence), i.e., the guide sequence.
For AbCas pi_2 protein, its sgRNA containing the guide sequence is as follows (SEQ ID NO:37):GUCUCGACUAUGCCGUACCACUAGACCGAGCCUACACGGCACGCGGUCAUAGCGUUAACCAAGGCGUGGUGACAAGCCUCUUUCAGGCGUCGGACACUUAAGAGCGUUGAGAGAACGCUCUUAGGGAAUGAAAGAGGGCGACACCCUGGUGAAC
The underlined part is the sequence complementary to the DNA target sequence (target sequence), i.e., the guide sequence.
Subsequently, the sgrnas obtained above were mixed with AbCas pi protein in example 2 and incubated at room temperature to reconstruct RNP complexes. The reaction was started by mixing AbCas pi protein sgRNA and Cy5 fluorescently labeled substrate (target nucleic acid) and sampled at different time points. After stopping the reaction, the corresponding signal was scanned using a urea-PAGE gel in combination with Typhoon FLA6500 fluorescence imager.
The cutting experiment is specifically as follows:
(1) The mixture was passed through a PCR apparatus, programmed at 95℃for 5min, and then slowly cooled to room temperature. Annealing (annealing) the two synthesized DNA strands to obtain a target cleavage substrate, namely a target nucleic acid;
(2) Incubating Cas protein-gRNA under the conditions of 20mM Tris-HCl 7.5, 150mM NaCl,10mM MgCl 2 and 1mM DTT according to the molar concentration ratio of 1:1.2 to obtain Cas protein RNP complex;
(3) Incubating the Cas protein RNP complex in a molar concentration ratio of dsDNA=50:1 for cleavage experiments, and sampling at a time gradient of 0h,2min,5min,15min,30min,60min,120min and 180 min;
(4) After 20. Mu.L of each sample, 20. Mu.L of loading buffer (loading buffer) containing 10mM EDTA and 8M Urea (Urea) was added to stop the reaction;
(5) mu.L of each sample was added to the prepared Urea-PAGE, and after 45min of 250V electrophoresis, the corresponding signal was scanned using a Typhoon FLA9500 fluorescence imager (GE HEALTHCARE Co.).
As a result, as shown in fig. 9 to 10, on the gel, the double strand that was not cleaved was over the shorter single strand that was cleaved, and the latter signal was stronger and stronger with increasing time, indicating that more substrate was cleaved. Meanwhile, the CRISPR-Cas system in the embodiment tries to cut under three different salt concentrations, and the fact that AbCas pi protein has high DNA cutting activity under the conditions of low-concentration salt ions (50 mM and 150mM NaCl) and high-concentration salt ions (300 mM NaCl) shows that the in-vitro biochemical activity of the protein is high.
Example 5: abCas pi proteins for gene editing in mammalian cells
In this example, mammalian cells were edited using AbCas pi protein-based system, briefly, plasmids with AbCas pi protein and gRNA expression system were constructed, and HEK293T-EGFP frameshift mutant (Out of Frame) cell lines were transfected with Lipo3000 reagent using plasmids with only the gene of AbCas pi protein as negative control. On day 4 post-transfection, the expression of cellular EGFP was observed under a fluorescent microscope (we first tried to select five possible editing sites). In addition, in this example, DNA sequences of editing sites were obtained by PCR and re-annealed by extracting the genome of the edited cells, and then the edited sequences were cut by T7E1 nuclease, so that the AbCas pi in vivo editing efficiency could be quantitatively calculated. The principle is as follows: before editing, the length of (3n+2) bp is arranged between the translation initiation site and the CDS sequence of EGFP, so that EGFP cannot be expressed in a normal translation mode. If the AbCas pi protein and the gRNA of the EGFP upstream sequence can edit genome, the upstream sequence can be subjected to base insertion or deletion due to NHEJ (non-homologous end joining), so that the EGFP has a certain probability of reverting to a normal reading frame for expression. Subsequently, the genome of these cells is extracted, the sequence of the editing site is obtained by PCR, and then the sequence is re-annealed to obtain a DNA double-stranded sequence with the editing sequence and a DNA double-stranded sequence with no mismatch of the editing sequence, and the T7E1 nuclease can recognize the site with the mismatch and cut double-stranded DNA at the position. Thus, cleavage of annealed DNA fragments by T7E1 and separation of substrates and products by agarose gel finally enabled calculation of the cleavage efficiency of T7E1, which also directly reflects the efficiency and activity of the AbCas pi system in editing in cells.
The present embodiment is specifically described below:
1. construction of intracellular editing plasmids
(1) The Caspi gene is inserted into the rear of the CMV promoter carried by the pBLO62.5 vector (ADDGENE PLASMID # 123124) by a homologous recombination method, and the sgRNA sequence (the guide sequence is two sections of sequences with SapI enzyme cutting sites at the moment so as to be convenient for replacing different guide sequences by a enzyme cutting connection method) is inserted into the rear of the U6 promoter by a homologous recombination method, so that E.ColiDH 5 alpha competence is transformed. Coating a resistance plate, and preserving after sequencing and verification of the selected bacteria.
(2) Two single-stranded DNAs with complementary sequences were synthesized in the biological engineering (Shanghai) Co., ltd, and the sequences were:
5’-AAGNNNNNNNNNNNNNNNNNNNN-3’,
5’-AAASSSSSSSSSSSSSSSSSSSS-3’
Wherein AAG and AAA represent sequences complementary to the cohesive ends after cleavage of the plasmid, N, S represents the same and complementary deoxynucleotide sequence to the selected editing region, and both primers were annealed and terminally phosphorylated in 10 XT 4 Ligase Buffer (10 mM ATP) and T4 PNK. The procedure was 30℃for 30min and 95℃for 5min, followed by slow cooling to room temperature.
(3) And (3) performing enzyme digestion seamless connection by Golden gate assembly, and connecting the product of the step (1) to a cell expression vector with a Sap I enzyme digestion site by using Sap I endonuclease and T4 ligase to convert E.Coli DH5 alpha competence. After coating a resistance plate and sequencing and verifying the selected bacteria, 10mL of liquid LB bacteria are shaken. Plasmid extraction was performed using endotoxin-free plasmid extraction kit (Tiangen Biotechnology Co., ltd., DP 118). Plasmids with the corresponding guide sequences were obtained.
(4) The selected spacer (target) sequence (target) and PAM are as follows:
Sequence number Spacer sequence (target sequence) PAM
1 CTGCTCAGGTGGAGATGAAC(SEQ ID NO:43) CACCT
2 GCTTCTTGTTCATCTCCACC(SEQ ID NO:44) TTCCC
3 GAGCTCAGCCACACTGTCCG(SEQ ID NO:45) TCCCC
4 CGAGCTCAGCCACACTGTCC(SEQ ID NO:46) CTCCC
5 TCTCCAGCTTCTGCTTCACC(SEQ ID NO:47) CTCCT
2. Intracellular EGFP_light_on experiment
(1) Liposome transfection plasmid
Cell resuscitation: gene editing was performed using the HEK293A-Myh8 (mouse) -out of frame-EGFP cell line (stored in this laboratory). The water bath was heated to 37 ℃. The frozen tube was removed from the liquid nitrogen, checked for cap screw-on, and rapidly placed in a 37 ℃ water bath with constant agitation. The frozen stock was allowed to thaw within 1 minute. The frozen stock solution was immediately added to a 15mL centrifuge tube containing more than 10 volumes of medium and centrifuged at 800rpm for 3 minutes. Taking out the centrifuge tube, discarding the supernatant, adding 1-2mL of complete culture medium, blowing and suspending cells, transferring the suspension to a new 10cm cell culture dish, placing the cell culture dish on an inverted microscope to observe the cell density, then placing the culture dish into a CO 2 cell culture box at 37 ℃, and lightly shaking the cells by a crisscross method to ensure that the cells are uniformly dispersed. The following day the cultured cells were placed on an inverted microscope and observed for cell resuscitations and viability recovery.
Cell passage: observing cells, observing the cells under a microscope, and when the cells are fully paved at the bottom of the culture dish, passaging when the confluence degree is about 80-90%, adding 1-2mL of pancreatin into the culture medium, slowly shaking to cover the cells by the pancreatin, rapidly transferring the culture dish to a 37 ℃ incubator for digestion for 1-2min, and taking out, wherein floating white cells are visible in the supernatant of the culture medium. The cell edges were observed under the microscope to be gradually clear, the cell shape became circular, and the change was very rapid, representing the completion of digestion. Digestion was stopped by adding 1mL of serum.
And (3) paving: the culture medium is sucked and transferred to a 24-well culture plate, 500 mu l of diluted cell suspension is paved in each well, and cells are uniformly mixed by a crisscross method at the density of 5-7 multiplied by 10 4 cells/Kong Weiyi.
Cell transfection: cell states are observed every other day, and transfection can be performed when the cell fusion degree reaches 70-90%. Plasmid transfection was performed using the Siemens Lipofectamine TM transfection method. Lipofectamine TM was diluted 3000 as in Table 1:
table 1 Lipofectamine TM 3000 dilution system
The pipette is gently blown and evenly mixed.
DNA was diluted as in table 2:
table 2 DNA dilution system
[1] The DNA is diluted by the culture medium Opti-MEM TM, and then added with P3000TM REAGENT for uniform mixing.
Mixing diluted Lipofectamine TM and DNA at a ratio of 1:1, and incubating at room temperature for 10-15min. The contents of the components in the final transfection system are shown in Table 3 below:
TABLE 3 cell transfection System of 24 well plate
/>
The mixture was gently poured into a 24-well plate using a pipette, and mixed by shaking. Cells were cultured in a cell incubator at 37 ℃. The day of transfection was day 0.
(2) Puromycin screening
Day 1: adding medicine. Puromycin dry powder was suspended in DMSO and after 24h of cell transfection, the medium in each well was replaced with 500. Mu.L fresh medium containing puromycin at a final concentration of 1.5. Mu.g/mL, and the untransfected plasmid wells were set as blank (passaged if the cell density was high, one-pass two-way into well plates containing 1.5. Mu.g/mL puromycin medium).
Day 3: and observing the cell state, if the control group cells die, the puromycin screening effect is good. The old medium was discarded, 500 μl of complete medium without puromycin was added to each well and incubated overnight.
Day 4: cells were collected, centrifuged at 10,000rpm for 1min, and the medium was discarded to extract the genome of the cells.
(3) Cell genome extraction
The cell genome was extracted using the root blood/cell/tissue genomic DNA extraction kit (DP 304).
(4) T7E1 detection of mutation Rate
Primer sequence:
Forward CMV-F1: ACGGTGGGAGGTCTATATAA (SEQ ID NO: 48)
Reverse E-seq3: GTCGGGGTAGCGGCTGAAGC (SEQ ID NO: 49)
And (3) PCR amplification: taking genome as a template, selecting a sequence of about 600bp near a target cutting site for PCR amplification, identifying a PCR product by agarose gel electrophoresis, and using HiPure Gel Pure DNA MINI KIT (Magen) cutting gel to recover a target strip.
T7E1 detection: using Vazyme T7 Endonuclease I, 100ng of the recovered PCR products were reacted as required in the specification, and analyzed by 2% agarose gel electrophoresis with Ultra GelRed (Vazyme).
Mutation rate analysis: calculating gray value by using imageJ, firstly calculating proportion of cut products, setting band gray of two cut products as X1 and X2 respectively, setting band gray of uncut band as Y, and calculating mutation rate by using n= (x1+x2)/(x1+x2+Y) of cut products
As shown in figures 11-13, abCas pi_1 and AbCas pi_2 provided by the invention have more than 10% of intracellular editing activity at editing sites 1,2 and 5 respectively, and have the potential of being developed into a new generation of cell editing tools.

Claims (12)

1. A CRISPR-Cas system, comprising:
(i) AbCas pi proteins; and
(Ii) AbCas pi guide RNAs, the AbCas pi guide RNAs form a complex with the AbCas pi protein, and the AbCas pi guide RNAs comprise a guide sequence that hybridizes to a target sequence in a target nucleic acid;
Wherein the AbCas pi protein recognizes a C base-rich protospacer adjacent motif.
2. The CRISPR-Cas system of claim 1, wherein the AbCas pi protein is derived from Armatimonadetes bacterium.
3. The CRISPR-Cas system according to claim 1 or 2, wherein the protospacer adjacent motif comprises 5'-CCN-3', wherein N is A, T, G or C base;
Preferably, the protospacer adjacent motif comprises any one selected from TGCCC, CACCT, TTCCC, TCCCC, CTCCC and CTCCT.
4. The CRISPR-Cas system according to any one of claims 1-3, wherein the AbCas pi protein comprises one or more of the following sequences:
1) An amino acid sequence shown in any one of SEQ ID NO 1 to SEQ ID NO 4;
2) An amino acid sequence having at least 80%, 82%, 85%, 87%, 90%, 92%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence set forth in any one of SEQ ID NOs 1 to 4 and which retains the activity of binding AbCas pi guide RNAs and/or nuclease activity of the amino acid sequence set forth in any one of SEQ ID NOs 1 to 4;
3) An amino acid sequence of 1 or more amino acid residues added, substituted, deleted or inserted in the amino acid sequence shown in any one of SEQ ID NO. 1 to SEQ ID NO. 4, and which retains the activity of binding AbCas pi guide RNA and/or nuclease activity of the amino acid sequence shown in any one of SEQ ID NO. 1 to SEQ ID NO. 4;
4) An amino acid sequence encoded by a nucleotide sequence that hybridizes with a polynucleotide sequence encoding an amino acid sequence as set forth in any one of SEQ ID NOs 1 to 4 under stringent conditions and retains the activity of binding AbCas pi guide RNAs and/or nuclease activity of the amino acid sequence as set forth in any one of SEQ ID NOs 1 to 4, said stringent conditions being medium stringent conditions, medium-high stringent conditions, high stringent conditions or very high stringent conditions.
5. The CRISPR-Cas system according to any one of claims 1 to 4, wherein the AbCas pi guide RNA is a double guide RNA;
Optionally, the AbCas pi guide RNA comprises any one selected from the group consisting of:
a) An activator RNA comprising a nucleotide sequence as shown in SEQ ID NO. 5 or a nucleotide sequence having 80% or more identity with the sequence shown in SEQ ID NO. 5, and
A targeting factor RNA comprising a nucleotide sequence as set forth in SEQ ID NO. 6 or a nucleotide sequence having 80% or more identity to the sequence set forth in SEQ ID NO. 6, and said guide sequence;
b) An activator RNA comprising a nucleotide sequence as shown in SEQ ID NO. 8 or a nucleotide sequence having 80% or more identity with the sequence shown in SEQ ID NO. 8, and
A targeting factor RNA comprising a nucleotide sequence as set forth in SEQ ID No. 9 or a nucleotide sequence having 80% or more identity to the sequence set forth in SEQ ID No. 9, and said guide sequence;
c) An activator RNA comprising the nucleotide sequence shown as SEQ ID NO. 11 or a nucleotide sequence having 80% or more identity with the sequence shown as SEQ ID NO. 11, and
A targeting factor RNA comprising a nucleotide sequence as set forth in SEQ ID NO. 12 or a nucleotide sequence having 80% or more identity to the sequence set forth in SEQ ID NO. 12, and said guide sequence;
d) An activator RNA comprising the nucleotide sequence shown as SEQ ID NO. 13 or a nucleotide sequence having 80% or more identity with the sequence shown as SEQ ID NO. 13, and
A targeting factor RNA comprising a nucleotide sequence as set forth in SEQ ID No. 14 or a nucleotide sequence having 80% or more identity to the sequence set forth in SEQ ID No. xx14, and said guide sequence.
6. The CRISPR-Cas system according to any one of claims 1 to 4, wherein the AbCas pi guide RNA is a single guide RNA;
Alternatively, the AbCas pi guide RNA comprises a nucleotide sequence as set forth in any one of SEQ ID NOS: 7, 10 or a nucleotide sequence having 80% or more identity to a sequence set forth in any one of SEQ ID NOS: 7, 10, and the guide sequence.
7. The CRISPR-Cas system according to any one of claims 1 to 6, wherein the CRISPR-Cas system further comprises: (iii) a donor polynucleotide.
8. A polynucleotide comprising one or more of the following:
i) A nucleotide sequence encoding a AbCas pi protein in a CRISPR-Cas system as defined in any one of claims 1 to 7;
ii) a nucleotide sequence encoding AbCas pi guide RNA in a CRISPR-Cas system as defined in any one of claims 1 to 7; and
Iii) A donor polynucleotide sequence.
9. A vector comprising the polynucleotide of claim 8;
preferably, the vector is an expression vector.
10. A cell comprising one or more of the following:
(a) The CRISPR-Cas system of any one of claims 1-7;
(b) The polynucleotide of claim 8; and
(C) The vector of claim 9.
11. A kit comprising one or more of the following:
(A) The CRISPR-Cas system of any one of claims 1-7;
(B) The polynucleotide of claim 8;
(C) The vector of claim 9; and
(D) The cell of claim 10.
12. A method of modifying a target nucleic acid comprising the step of contacting the target nucleic acid with the CRISPR-Cas system of any one of claims 1 to 7.
CN202211327698.9A 2022-10-27 2022-10-27 CRISPR-Cas system and method Pending CN117947000A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211327698.9A CN117947000A (en) 2022-10-27 2022-10-27 CRISPR-Cas system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211327698.9A CN117947000A (en) 2022-10-27 2022-10-27 CRISPR-Cas system and method

Publications (1)

Publication Number Publication Date
CN117947000A true CN117947000A (en) 2024-04-30

Family

ID=90798537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211327698.9A Pending CN117947000A (en) 2022-10-27 2022-10-27 CRISPR-Cas system and method

Country Status (1)

Country Link
CN (1) CN117947000A (en)

Similar Documents

Publication Publication Date Title
JP6737974B1 (en) Nuclease-mediated DNA assembly
US20230091847A1 (en) Compositions and methods for improving homogeneity of dna generated using a crispr/cas9 cleavage system
US9879283B2 (en) CRISPR oligonucleotides and gene editing
AU2016274452C1 (en) Thermostable Cas9 nucleases
AU2021231074C1 (en) Class II, type V CRISPR systems
WO2017147056A1 (en) Methods for modulating dna repair outcomes
RU2707542C1 (en) METHOD OF PRODUCING A RECOMBINANT NUCLEASE CAS ESSENTIALLY FREE OF BACTERIAL ENDOTOXINS, THE PREPARATION OBTAINED BY THIS METHOD AND CONTAINING A KIT FOR USE IN A CRISPR/Cas SYSTEM
JP7138712B2 (en) Systems and methods for genome editing
EP4159853A1 (en) Genome editing system and method
JP2020517299A (en) Site-specific DNA modification using a donor DNA repair template with tandem repeats
CA3240465A1 (en) C2c9 nuclease-based novel genome editing system and use thereof
CN113728097A (en) Enzymes with RUVC domains
WO2021226369A1 (en) Enzymes with ruvc domains
AU2021231769A1 (en) RNA-guided genome recombineering at kilobase scale
CN117947000A (en) CRISPR-Cas system and method
CN118139979A (en) Enzymes with HEPN domains
US20190218533A1 (en) Genome-Scale Engineering of Cells with Single Nucleotide Precision
Sung et al. Scarless chromosomal gene knockout methods
RU2816876C1 (en) NUCLEASE Cpf1 FROM BACTERIUM RUMINOCOCCUS BROMII, DNA MOLECULE OR RNA CODING NUCLEASE, VECTOR CONTAINING SAID DNA MOLECULE, CRISPR/Cpf1 SYSTEM CONTAINING SAID NUCLEASE AND GUIDE RNA, HOST CELL FOR PRODUCING NUCLEASE Cpf1, METHOD OF PRODUCING NUCLEASE Cpf1 AND USE THEREOF
WO2023206872A1 (en) Engineering-optimized nuclease, guide rna, editing system, and use
KR20240107373A (en) Novel genome editing system based on C2C9 nuclease and its application
CN116655803A (en) Gene editing tool based on pAgo system and application thereof
Jie et al. Co-expression of Cas9 and single-guided RNAs in Escherichia coli streamlines production of Cas9 ribonucleoproteins

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination