CN109384833B - TALE RVD for specifically recognizing methylated modified DNA base and application thereof - Google Patents

TALE RVD for specifically recognizing methylated modified DNA base and application thereof Download PDF

Info

Publication number
CN109384833B
CN109384833B CN201710660240.8A CN201710660240A CN109384833B CN 109384833 B CN109384833 B CN 109384833B CN 201710660240 A CN201710660240 A CN 201710660240A CN 109384833 B CN109384833 B CN 109384833B
Authority
CN
China
Prior art keywords
tale
gene
fusion protein
target sequence
5hmc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710660240.8A
Other languages
Chinese (zh)
Other versions
CN109384833A (en
Inventor
魏文胜
伊成器
张媛
郭生杰
朱晨旭
刘璐璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Edigene Beijing Biotechnology Co ltd
Peking University
Original Assignee
Edigene Beijing Biotechnology Co ltd
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Edigene Beijing Biotechnology Co ltd, Peking University filed Critical Edigene Beijing Biotechnology Co ltd
Priority to CN201710660240.8A priority Critical patent/CN109384833B/en
Publication of CN109384833A publication Critical patent/CN109384833A/en
Application granted granted Critical
Publication of CN109384833B publication Critical patent/CN109384833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/34Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving hydrolase
    • C12Q1/44Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving hydrolase involving esterase
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/90Enzymes; Proenzymes
    • G01N2333/914Hydrolases (3)
    • G01N2333/916Hydrolases (3) acting on ester bonds (3.1), e.g. phosphatases (3.1.3), phospholipases C or phospholipases D (3.1.4)
    • G01N2333/922Ribonucleases (RNAses); Deoxyribonucleases (DNAses)

Abstract

The present invention identifies RVDs with recognition preferences for 5mC, 5hmC and 6mA, with different binding properties for these apparent modifications. Methylation-dependent gene activation, efficient genome editing, and targeted detection of 5hmC can be achieved using these RVDs. The invention therefore provides isolated DNA-binding polypeptides comprising TALEs, fusion proteins, polynucleotides, vectors comprising polynucleotides, and host cells, and uses of TALE repeat domain-containing proteins in the preparation of reagents for detecting methylated bases in a target gene target sequence, as well as methods of targeted binding of a target sequence of a target gene in a cell.

Description

TALE RVD for specifically recognizing methylated modified DNA base and application thereof
Technical Field
The present invention relates to techniques for the regulation, editing and detection of genes using DNA binding proteins.
Background
Transcription activator-like effectors (TALEs) are virulence factors from the plant pathogenic Xanthomonas (Xanthomonas) and are capable of reprogramming the eukaryotic genome (1, 2). TALEs contain a DNA binding domain consisting of a variable number of tandem repeat units (3). Each repeat comprises a consensus sequence of 33-35 amino acid residues (consensus sequence), except for two highly variable amino acids (repeat-variable diresives or RVDs) at positions 12 and 13 (4, 5). Recognition of DNA by TALE proteins is mediated by tandem repeats that bind DNA in a sequence-specific manner by targeting nucleotides through their RVDs, which determine nucleotide specificity (4, 6). RVDs form direct, sequence-specific contacts with DNA bases TALEs can be fused to functional domains such as transcriptional activators (7,8), transcriptional repressors (9,10), or endonucleases (11,12), called programmable gene editing tools, through modular DNA recognition properties. Partial decoding of RVD-DNA identification codes has been carried out in previous studies using experimental and computational methods (4, 6); the four most commonly used RVDs (NI, NG, HD and NN) were found to preferentially bind A, T, C and G/A (4,6), respectively.
In addition to the four conventional deoxyribonucleotides, the mammalian genome contains modified DNA bases. For example, 5-methylcytosine (5mC), which is referred to as the fifth DNA base, is an important apparent marker that regulates gene expression (fig. 1a) (15, 16). 5mC can be sequentially oxidized by a 10-11 translocase (TET) family protein, producing 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxycytosine (5caC), which are substrates for thymine DNA glycosylase and eventually revert to unmodified cytosine (17, 22). 5hmC constitutes 1% -10% of modified cytosine and is considered a stable apparent marker; dysregulation of 5hmC is often observed in cancer.
In addition to methylation modifications on cytosines, another common DNA methylation modification modifies N6-methyladenine (N6-methyidenine, 6mA) plays an important role in prokaryotic cells as a covalent modification on adenine of DNA, involved in regulating a variety of biological pathways, including defense-modification (RM) as part of restriction-modification systems against foreign DNA invasion, regulation during DNA replication, mismatch repair, gene transcription and transposition, etc. (41, 47). While the related studies of 6mA in eukaryotes are relatively rare, the role of 6mA in epigenetic inheritance is not well understood (46).
Three articles in the Cell journal in 2015 reported 6mA in the genome of eukaryotes such as chlamydomonas, nematodes and drosophila (42,43, 48). For the determination of the 6-methylcytosine position in Chlamydomonas reinhardtii, a 6mA distribution was found in most of the genes of Chlamydomonas, and most appeared in the ApT double base pattern; meanwhile, 6mA is enriched at the transcription initiation site and is related to active expression of the gene (42). In contrast, in the study of Drosophila melanogaster and nematodes Caenorhabditis elegans, it was found that 6mA is likely to play an important regulatory role in differentiation and development (48). It was found that 6mA methylation and demethylation related enzymes were conserved during evolution, so 6mA was likely to be distributed in other eukaryotes as well (43). Until 2016, Koziol et al demonstrated the presence of 6mA in vertebrate genomes, including different tissues of Xenopus laevis (Xenopus laevis) as well as mouse and human tissues or cell lines. The 6mA modification is very low in abundance in vertebrates, and researches show that different from Chlamydomonas and Drosophila, the 6mA in the Xenopus and mouse genomes are widely distributed in regions except exons and also show a certain sequence motif rule, which indicates that the 6mA modification can have different functions in different eukaryotes (44). The distribution of the epigenetic modification of 6mA in higher organisms and the action and mechanism thereof in the processes of cell and individual development and the like are still to be studied deeply.
TALE proteins have been reported to recognize modified DNA bases (24-26). For example, NG or N x (asterisks indicate deletion of amino acid 13) have been reported to recognize 5mC in homologous DNA (25, 27-31); a combination of NG/N and HD was used to distinguish 5mC/5hmC from C in an in vitro assay (32). Recent studies also reported that TALE proteins with truncated repeating loops (G, S and T) can bind C, 5mC, 5hmC, 5fC and 5caC with similar affinities (33, 34). In the crystal structure of TALE-DNA complexes, RVD loops are in contact with the DNA double-stranded major groove (duplex major groove), where the first residue stabilizes the proper loop configuration and the second performs direct base-specific contact (35, 36). At present, the potential for RVDs to recognize 5mC, 5hmC and 6mA remains to be studied further.
Disclosure of Invention
The present invention identifies RVDs with recognition preferences for 5mC, 5hmC and 6mA, with different binding properties for these apparent modifications. Methylation-dependent gene activation, efficient genome editing, and targeted detection of 5hmC can be achieved using these RVDs.
According to one aspect of the invention, there is provided an isolated DNA-binding polypeptide comprising a TALE comprising one or more RVDs selected from the group consisting of:
HA or NA, which can specifically recognize 5 mC;
FS, which can specifically recognize 5 hmC;
n, NG or KP, which can identify both C and 5 mC;
HV or KV, which can recognize both C and 5 hmC;
k or RG, which can recognize both 5mC and 5 hmC;
g, H, R or Y, which can identify C, 5mC and 5 hmC;
NP, FT, CV or CP, which can specifically recognize 6 mA; or
RI, NI, KI or HI, which can recognize both A and 6 mA;
wherein denotes the amino acid deletion at that position.
According to another aspect of the invention, there is provided a fusion protein comprising a functional domain and a TALE comprising one or more RVDs selected from the group consisting of:
HA or NA, which can specifically recognize 5 mC;
FS, which can specifically recognize 5 hmC;
n, NG or KP, which can identify both C and 5 mC;
HV or KV, which can recognize both C and 5 hmC;
k or RG, which can recognize both 5mC and 5 hmC;
g, H, R or Y, which can identify C, 5mC and 5 hmC;
NP, FT, CV or CP, which can specifically recognize 6 mA; or
RI, NI, KI or HI, which can recognize both A and 6 mA;
wherein denotes the amino acid deletion at that position.
In some embodiments, the functional domain is a functional domain that regulates gene expression, an epigenetic modifying functional domain, a gene editing functional domain, or a fluorescent protein.
In some embodiments, the functional domain that regulates gene expression is a transcriptional activator, a transcriptional repressor, or a functional fragment thereof, the epigenetic modifying functional domain is a methyltransferase, a demethylase, or a functional fragment thereof, and the gene editing functional domain is a nuclease or a functional fragment thereof.
In some embodiments, the gene-editing functional domain is a DNA cleavage domain of an endonuclease, preferably a fokl endonuclease, more preferably a fokl endonuclease.
According to another aspect of the present invention there is provided a polynucleotide encoding a DNA binding polypeptide as described above or a fusion protein of any of the above.
According to another aspect of the present invention, there is provided a vector comprising the polynucleotide as described above.
According to another aspect of the present invention, there is provided a host cell comprising the polynucleotide as described above or the vector as described above.
According to another aspect of the invention, a use of a protein comprising a TALE repeat domain for the preparation of an agent for detecting a methylated base in a target sequence of a target gene, comprising:
(1) use of a protein comprising a TALE repeat domain, one or more RVDs of which are HA or NA, in the preparation of a reagent to detect methylated base 5mC in a target gene target sequence;
(2) use of a protein comprising a TALE repeat domain, one or more RVDs of which are FS, in the preparation of a reagent to detect methylated base 5hmC in a target gene target sequence; or
(3) Use of a protein comprising a TALE repeat domain, wherein one or more RVDs of the repeat domain of the TALE are NP, FT, CV, or CP, for the preparation of a reagent to detect methylated bases 6mA in a target sequence of a gene of interest.
According to another aspect of the present invention, there is provided the use of a DNA binding polypeptide as described above, a fusion protein as described above, a polynucleotide as described above, a vector as described above or a host cell as described above, in the preparation of a reagent for targeted binding to a target sequence of a gene of interest in a cell.
According to another aspect of the present invention, there is provided a use of any one of the fusion proteins described above or a polynucleotide encoding the fusion protein in the preparation of an agent for regulating expression of a target gene in a cell, wherein the functional domain contained in the fusion protein is a functional domain regulating expression of a gene.
In some embodiments, the functional domain that modulates gene expression is a transcriptional activator or a functional fragment thereof, or a transcriptional repressor or a functional fragment thereof.
According to another aspect of the present invention, there is provided a use of any one of the fusion proteins described above or a polynucleotide encoding the fusion protein in the preparation of a reagent for gene editing of a target gene in a cell, wherein the functional domain contained in the fusion protein is a gene editing functional domain.
In some embodiments, the gene editing is nucleic acid cleavage and the gene editing functional domain is a nuclease or a functional fragment thereof, preferably an endonuclease or a functional fragment thereof, more preferably a fokl endonuclease or a DNA cleavage domain thereof.
According to another aspect of the present invention, there is provided the use of any one of the fusion proteins described above, or a polynucleotide encoding the fusion protein, in the preparation of an agent for epigenetic modification of a gene of interest in a cell, wherein the functional domain comprised in the fusion protein is an epigenetic modification functional domain.
In some embodiments, the epigenetically modified functional domain is a methyltransferase, a demethylase, or a functional fragment thereof.
According to another aspect of the invention, there is provided a method of targeting a target sequence that binds a gene of interest in a cell, comprising: introducing the DNA-binding polypeptide, the fusion protein, or the polynucleotide into a cell, and allowing the TALE in the DNA-binding polypeptide or the fusion protein to bind to a target sequence of a target gene.
In some embodiments, in the above method:
the TALE in the DNA-binding polypeptide or fusion protein comprises an RVD selected from HA or NA that binds to the target sequence of the target gene only if it is 5mC in the target sequence of the target gene at the recognition site of the RVD;
the TALE in the DNA-binding polypeptide or fusion protein comprises an RVD selected from FS that binds to the target sequence of the target gene only if it is 5hmC at the recognition site of the RVD in the target sequence of the target gene;
the TALE in the DNA-binding polypeptide or fusion protein comprises a RVD selected from NP, FT, CV or CP, which binds to the target sequence of the target gene only if it is 6mA at the recognition site of the RVD in the target sequence of the target gene;
the TALE in the DNA binding polypeptide or fusion protein comprises an RVD selected from N, NG or KP, the methylation status of a specific base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly C or 5 mC;
the TALE in the DNA binding polypeptide or fusion protein comprises an RVD selected from HV or KV, the methylation state of a particular base at the recognition site of the RVD in the target sequence of the gene of interest is uncertain, possibly C or 5 hmC;
the TALE in the DNA binding polypeptide or fusion protein comprises an RVD selected from K or RG, the methylation state of a specific base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly 5mC or 5 hmC;
the TALE in the DNA binding polypeptide or fusion protein comprises an RVD selected from G, H, R or Y, the methylation state of a specific base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly C, 5mC or 5 hmC; or
The TALE in the DNA binding polypeptide or fusion protein comprises an RVD selected from RI, NI, KI or HI, the methylation state of a particular base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly a or 6 mA;
wherein denotes the amino acid deletion at that position.
According to another aspect of the present invention, there is provided a method of modulating expression of a target gene in a cell, comprising: introducing any one of the above fusion proteins, or a polynucleotide encoding the fusion protein, into a cell, and allowing the TALE in the fusion protein to bind to a target sequence of a gene of interest, thereby allowing expression of the gene of interest to be regulated by a functional domain in the fusion protein, wherein the functional domain is a functional domain that regulates gene expression.
In some embodiments, in the above method:
the TALE in the fusion protein comprises an RVD selected from HA or NA, which binds to the target sequence of the target gene only if it is 5mC in the target sequence of the target gene at the recognition site of the RVD;
the TALE in the fusion protein comprises an RVD selected from FS, the TALE in the fusion protein binds to the target sequence of the target gene only if it is 5hmC in the target sequence of the target gene at the recognition site of the RVD;
the TALE in the fusion protein comprises RVD selected from NP, FT, CV or CP, and the TALE in the fusion protein is combined with the target sequence of the target gene only when the recognition site of the RVD is 6mA in the target sequence of the target gene;
the TALE in the fusion protein comprises an RVD selected from N, NG or KP, the methylation state of a specific base at the recognition site of said RVD in the target sequence of the gene of interest is uncertain, possibly C or 5 mC;
the TALE in the fusion protein comprises an RVD selected from HV or KV, the methylation state of a particular base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly C or 5 hmC;
the TALE in the fusion protein comprises an RVD selected from K or RG, the methylation state of a specific base on the recognition site of the RVD in the target sequence of the target gene is uncertain, and the methylation state can be 5mC or 5 hmC;
the TALE in the fusion protein comprises an RVD selected from G, H, R or Y, the methylation state of a particular base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly C, 5mC or 5 hmC; or
The TALE in the fusion protein comprises an RVD selected from RI, NI, KI or HI, the methylation state of a specific base on the recognition site of the RVD in the target sequence of the target gene is uncertain, and the methylation state can be A or 6 mA;
wherein denotes the amino acid deletion at that position.
In some embodiments, in the above method, the functional domain that regulates gene expression is a transcription activator or a functional fragment thereof, or a transcription repressor or a functional fragment thereof.
According to another aspect of the present invention, there is provided a method of gene editing a target gene in a cell, comprising: introducing any one of the above fusion proteins, or a polynucleotide encoding the fusion protein, into a cell, and allowing the TALE in the fusion protein to bind to a target sequence of a gene of interest, thereby allowing the gene of interest to be edited by a functional domain in the fusion protein, wherein the functional domain is a gene editing functional domain.
In some embodiments, in the above method:
the TALE in the fusion protein comprises an RVD selected from HA or NA, the TALE in the fusion protein binds to the target sequence of the target gene only if it is 5mC in the target sequence of the target gene at the recognition site of the RVD;
the TALE in the fusion protein comprises an RVD selected from FS, the TALE in the fusion protein binds to the target sequence of the target gene only if it is 5hmC in the target sequence of the target gene at the recognition site of the RVD;
the TALE in the fusion protein comprises RVD selected from NP, FT, CV or CP, and the TALE in the fusion protein is combined with the target sequence of the target gene only when the recognition site of the RVD is 6mA in the target sequence of the target gene;
the TALE in the fusion protein comprises an RVD selected from N, NG or KP, the methylation state of a specific base at the recognition site of said RVD in the target sequence of the gene of interest is uncertain, possibly C or 5 mC;
the TALE in the fusion protein comprises an RVD selected from HV or KV, the methylation state of a particular base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly C or 5 hmC;
the TALE in the fusion protein comprises an RVD selected from K or RG, the methylation state of a specific base on the recognition site of the RVD in the target sequence of the target gene is uncertain, and the methylation state can be 5mC or 5 hmC;
the TALE in the fusion protein comprises an RVD selected from G, H, R or Y, the methylation state of a particular base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly C, 5mC or 5 hmC; or
The TALE in the fusion protein comprises an RVD selected from RI, NI, KI or HI, the methylation state of a specific base on the recognition site of the RVD in the target sequence of the target gene is uncertain, and the methylation state can be A or 6 mA;
wherein denotes the amino acid deletion at that position.
In some embodiments, in the above methods, the gene editing is nucleic acid cleavage and the gene editing functional domain is a nuclease or a functional fragment thereof, preferably an endonuclease or a functional fragment thereof, more preferably a fokl endonuclease or a DNA cleavage domain thereof.
According to another aspect of the invention, there is provided a method of epigenetic modification of a gene of interest in a cell, comprising: introducing the fusion protein of any one of claims 2-5, or a polynucleotide encoding the fusion protein, into a cell and allowing the TALE in the fusion protein to bind to the target sequence of the gene of interest, thereby allowing the gene of interest to be epigenetically modified by a functional domain in the fusion protein, wherein the functional domain is an epigenetically modified functional domain.
In some embodiments, in the above method:
the TALE in the fusion protein comprises an RVD selected from HA or NA, the TALE in the fusion protein binds to the target sequence of the target gene only if it is 5mC in the target sequence of the target gene at the recognition site of the RVD;
the TALE in the fusion protein comprises an RVD selected from FS, the TALE in the fusion protein binds to the target sequence of the target gene only if it is 5hmC in the target sequence of the target gene at the recognition site of the RVD;
the TALE in the fusion protein comprises RVD selected from NP, FT, CV or CP, and the TALE in the fusion protein is combined with the target sequence of the target gene only when the recognition site of the RVD is 6mA in the target sequence of the target gene;
the TALE in the fusion protein comprises an RVD selected from N, NG or KP, the methylation state of a specific base at the recognition site of said RVD in the target sequence of the gene of interest is uncertain, possibly C or 5 mC;
the TALE in the fusion protein comprises an RVD selected from HV or KV, the methylation state of a particular base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly C or 5 hmC;
the TALE in the fusion protein comprises an RVD selected from K or RG, the methylation state of a specific base on the recognition site of the RVD in the target sequence of the target gene is uncertain, and the methylation state can be 5mC or 5 hmC;
the TALE in the fusion protein comprises an RVD selected from G, H, R or Y, the methylation state of a particular base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly C, 5mC or 5 hmC; or
The TALE in the fusion protein comprises an RVD selected from RI, NI, KI or HI, the methylation state of a specific base on the recognition site of the RVD in the target sequence of the target gene is uncertain, and the methylation state can be A or 6 mA;
wherein denotes the amino acid deletion at that position.
In some embodiments, in the above methods, the epigenetically modified functional domain is a methyltransferase, a demethylase, or a functional fragment thereof.
According to another aspect of the present invention, there is provided a method for chromosome marking of a living cell, comprising: introducing any one of the fusion proteins or the polynucleotide for coding the fusion protein into a cell, and enabling the TALE in the fusion protein to be combined with the target sequence of the target gene, wherein the functional structural domain is a fluorescent protein, and the fluorescent labeling of the target sequence is realized through the combination of the TALE in the fusion protein and the target sequence of the target gene.
In some embodiments, in the above method:
the TALE in the fusion protein comprises an RVD selected from HA or NA, the TALE in the fusion protein binds to the target sequence of the target gene only if it is 5mC in the target sequence of the target gene at the recognition site of the RVD;
the TALE in the fusion protein comprises an RVD selected from FS, the TALE in the fusion protein binds to the target sequence of the target gene only if it is 5hmC in the target sequence of the target gene at the recognition site of the RVD;
the TALE in the fusion protein comprises RVD selected from NP, FT, CV or CP, and the TALE in the fusion protein is combined with the target sequence of the target gene only when the recognition site of the RVD is 6mA in the target sequence of the target gene;
the TALE in the fusion protein comprises an RVD selected from N, NG or KP, the methylation state of a specific base at the recognition site of said RVD in the target sequence of the gene of interest is uncertain, possibly C or 5 mC;
the TALE in the fusion protein comprises an RVD selected from HV or KV, the methylation state of a particular base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly C or 5 hmC;
the TALE in the fusion protein comprises an RVD selected from K or RG, the methylation state of a specific base on the recognition site of the RVD in the target sequence of the target gene is uncertain, and the methylation state can be 5mC or 5 hmC;
the TALE in the fusion protein comprises an RVD selected from G, H, R or Y, the methylation state of a particular base in the target sequence of the target gene at the recognition site of the RVD is uncertain, possibly C, 5mC or 5 hmC; or
The TALE in the fusion protein comprises an RVD selected from RI, NI, KI or HI, the methylation state of a specific base on the recognition site of the RVD in the target sequence of the target gene is uncertain, and the methylation state can be A or 6 mA;
wherein denotes the amino acid deletion at that position.
According to another aspect of the present invention there is provided a method of detecting the presence or absence of 5mC at a specific site of a target sequence in the genome of a cell, comprising:
(1) introducing a protein comprising a TALE into the cell, the TALE targeting a target sequence, the RVD in the TALE that recognizes the specific site being HA or NA;
(2) then introducing a nuclease into the cell, the targeted cleavage site of the nuclease being located in the TALE target sequence;
(3) detecting whether the target sequence is cut or not, and judging whether 5mC exists on a specific site of the target sequence or not; if the target sequence is not cleaved, the TALE binds to the target sequence such that the nuclease cannot bind to the target sequence and cleave, and 5mC is present at the specific site; if the target sequence is cleaved, the TALE does not bind to the target sequence, the nuclease binds to the target sequence and cleaves, and 5mC is not present at the specific site.
According to another aspect of the present invention there is provided a method of detecting the presence or absence of 5hmC at a specific site in a target sequence in the genome of a cell, comprising the steps of:
(1) introducing into a cell a protein comprising a TALE, the TALE targeting a target sequence, the RVD in the TALE that recognizes the specific site being FS;
(2) then introducing a nuclease into the cell, the targeted cleavage site of the nuclease being located in the TALE target sequence;
(3) detecting whether the target sequence is cut or not, and judging whether 5hmC exists on a specific site of the target sequence or not; if the target sequence is not cleaved, the TALE binds to the target sequence such that the nuclease cannot bind to the target sequence and cleave, and 5hmC is present at the specific site; if the target sequence is cleaved, the TALE does not bind to the target sequence, the nuclease binds to the target sequence and cleaves, and 5hmC is not present at the specific site.
According to another aspect of the present invention there is provided a method of detecting the presence or absence of 6mA at a specific site of a target sequence in the genome of a cell, comprising:
(1) introducing into a cell a protein comprising a TALE that targets a target sequence, the RVD in the TALE that recognizes the specific site being NP, FT, CV, or CP;
(2) then introducing a nuclease into the cell, the targeted cleavage site of the nuclease being located in the TALE target sequence;
(3) detecting whether the target sequence is cut or not, and judging whether 6mA exists on a specific site of the target sequence or not; if the target sequence is not cleaved, the TALE binds to the target sequence such that the nuclease cannot bind to the target sequence and cleave, 6mA being present at the specific site; if the target sequence is cleaved, the TALE does not bind to the target sequence, the nuclease binds to the target sequence and cleaves, and 6mA is not present at the specific site.
In some embodiments, the nuclease is an endonuclease.
In some embodiments, the nuclease is Cas9 nuclease and the Cas9 nuclease and sgRNA are co-introduced into the cell in step (1).
Drawings
Figure 1 is a schematic of a screen used to evaluate all potential TALE RVDs recognition modified cytosines. (a) Chemical structures of C, 5mC and 5 hmC. (b) A system schematic for screening new RVDs for modified cytosines, consisting of TALE activators and GFP expression reporter DNA fragments. (c) When the customized TALE does not bind the reporter DNA fragment (left panel), e.g., TALE- (E)3GFP expression was at baseline levels for the 5mC reporter DNA fragment (right panel); in contrast, when the TALE is tightly bound to the reporter DNA fragment (left panel), e.g., TALE- (G)3For the 5mC reporter DNA fragment, GFP expression was up-regulated (right panel). mCherry intensity indicator TALE- (XX')3Transfection efficiency of plasmids.
FIG. 2 shows the preparation of reporter DNA fragments containing 5mC and 5 hmC. 5mC and 5hmC were incorporated into primers used to generate reporter DNA fragments containing 5mC and 5 hmC. HPLC chromatography showed incorporation of (a)5mC and (b)5 hmC; the peak of 5hmC is clearly observed from the enlarged image. (c) Schematic representation of PCR amplification of reporter DNA fragments containing 5mC and 5 hmC.
Figure 3 shows a complete assessment of the efficiency and specificity of TALE RVD for 5mC and 5 hmC.
(a) The screening data for 5mC and 5hmC were summarized in a heatmap. The results of the conventional C and T reporter DNA fragments are also shown for ease of comparison. Different colors were used to indicate the identity of the reporter DNA fragment, and the EGFP activity of the different reporter DNA fragments was encoded, with the intensity of the color indicating the fold induction of the reporter DNA fragment by the TALE construct, normalized to the baseline level. Wherein the one-letter abbreviations for amino acids are used.
(b) From the preliminary screening results in panel a, some RVDs with the ability to identify 5mC and 5hmC were selected for experiments, specifically 3 replicates of validation experiments for some RVDs with greater EGFP activation times in the 5mC and 5hmC reporting systems, which shows the preference of RVDs for modified cytosines. RVDs are grouped according to base preference and grouped in each group according to the base at position 13. Data are shown as mean ± SD, n ═ 3; p <0.05, P < 0.005.
Figure 4 shows the binding preference of 420 TALE RVDs for modified cytosines. This data corresponds to a heatmap (fig. 3 a). The Y-axis is the fold induction of the EGFP reporter and the X-axis is the RVD. The histogram is sorted according to the first residue of the RVD and the data is listed according to the alphabetical order of the second residue.
Figure 5 shows quantitative measurements of DNA recognition of TALE RVDs by in vitro protection assays.
(a) Principle of in vitro protection assay. Briefly, binding of a TALE protein (i.e., TAL effector protein in the figure) to a DNA fragment of a specific sequence would block the MspI restriction endonuclease site, thereby inhibiting endonuclease cleavage and resulting in a protected full-length band and a cleaved DNA band in a denaturing PAGE analysis. The protection efficiency for DNA reflects the binding efficiency of TALE proteins to DNA.
(b) Normalized protection efficiencies were obtained by measuring fragments of uncleaved or protected DNA, which were fitted to protection curves for different TALE RVDs. The curve was fitted to a specific binding curve using Hill slope (GraphPad). All experiments were repeated three times.
(c) The inhibition constants calculated from (b), the ratio of each constant to the lowest inhibition constant of the same RVD is shown in parentheses. The inhibition constants of the RVDs were obtained by obtaining the protection efficiency of the TALE protein containing different RVDs for C, 5mC and 5hmC by the cleavage protection test, and fitting the protection efficiency curve using GraphPad Prism 6 software and calculating the inhibition constants characterizing the binding efficiency of different RVDs to C, 5mC and 5hmC, with smaller values of the inhibition constants indicating stronger protection efficiency of the RVDs and stronger binding to the corresponding DNA fragments. The minimum inhibition constant of the same RVD as used herein refers to the inhibition constant value of the group having the highest binding efficiency of the RVD to C, 5mC and 5 hmC.
Figure 6 shows specific binding of different TALE RVDs to apparent cytosines in an in vitro protection assay.
(a) Representative size exclusion chromatography of purified TALE proteins.
(b) SDS-PAGE analysis showed that the molecular weight of the purified TALE protein correlated well with the calculated molecular weight.
(c) Representative gel images of in vitro protection assays. As can be seen from fig. 6, MAPK6-HD can protect C with the highest efficiency, whereas HA protects 5mC and 5hmC with higher efficiency relative to unmodified C, and FS protects 5hmC with the highest efficiency.
FIG. 7 shows methylation-dependent gene expression activation and gene editing.
(a)TALETET1Targeting a 16bpDNA sequence approximately 80bp upstream of the Transcription Start Site (TSS) of the TET1 gene. All three cpgs in this region (where C is indicated in black) are highly methylated in HeLa cells, but unmethylated in HEK293T cells.
(b) Using TALEs containing different RVDsTET1Relative mRNA levels of TET1 in transfected HeLa and HEK293T cells.
(c)TALELRP2Targeting a 16bp sequence 100bp upstream of the TSS of the LRP2 gene. Both cpgs in these two regions contain moderate levels of methylation in HeLa cells, but are unmethylated in HEK293T cells.
(d) Using TALEs containing different RVDsLRP2Relative mRNA levels of LRP2 in transfected HeLa and HEK293T cells.
(e) TALEN (Transcription activator-like effector nucleic, i.e., TALE effector fused to FokI endonuclease) targets the position of the sequence. Methylated CpG is indicated in red.
(f) Gene editing efficiency of TALENs using different RVDs. Data are mean ± SD, n ═ 3; p <0.05, P < 0.005.
FIG. 8 shows methylation dependent gene expression activation and genome editing.
(A) Using TALE with RVD NA, G and YTET1Transfected HeRelative mRNA levels of TET1 in La and HEK293T cells.
(B) Using TALE with RVD NA, G and YLRP2Relative mRNA levels of LRP2 in transfected HeLa and HEK293T cells.
(C) Genome editing efficiency of TALENs with RVD NA, G, and Y. Data are mean ± SD, n ═ 3; p <0.05, P < 0.005.
Figure 9 shows the detection of 5hmC in genomic DNA at single base resolution.
(a) The newly identified RVDs detect a workflow of 5hmC at base level resolution. Briefly, the target genomic region is protected from Cas 9-mediated DNA cleavage by TALE.
(b) Protective efficiency of TALE-FS (black) and TALE-HD (grey) targeting a single 5hmC site in the mESC genome.
(c) TALE-FS protection efficiency for a single 5hmC site in the genomic DNA of mESC, RAW264.7, L-M (TK-) and L929 cells. At this given site, the mESC genome contained the highest level of 5hmC modification in all cell lines.
Figure 10 shows the selective protection of TALE-FS against DNA containing 5 hmC. DNA containing 5mC, 5hmC and unmodified C (with the same sequence as the MAPK6 gene) were mixed in pairs at different ratios. When the fraction of 5mC (grey circles) was increased, the protective efficiency increased only slightly. When the fraction of 5hmC increases (mixed with C and 5mC, black circles and black triangles), the protection efficiency increases greatly, which represents the selective protection of RVD FS against 5 hmC.
FIG. 11 shows TALE- (XX')3Binding characteristics for 6mA and A.
FIG. 12 shows a portion of TALE- (XX')3Binding characteristics for 6mA and A. Grouping according to the second amino acid of the RVDs, wherein each group of RVDs is ranked from low to high according to the activation efficiency of the 6mA reporting system; the vertical axis is the activation multiple of the EGFP of the report system, the gray corresponds to the A report system, the black corresponds to the 6mA report system, and the horizontal axis is the RVD; only data sets with 6mA averages greater than 5 after the repetition are shown. Data are means ± s.d., and n ═ 3.
Figure 13 shows the efficiency of identification of different RVDs for the A, T, C and G reporting systems.
Detailed Description
The present invention shows that binding of TALE proteins to DNA is affected by DNA base modifications. The present invention identifies RVDs with unique specificity for 5mC, 5hmC and/or 6mA through studies of 420 RVDs. 5mC, 5hmC and 6mA are important epigenetic markers in higher eukaryotes. The methylation and hydroxymethylation groups do not interfere with base pairing; but they are present in the major groove of the DNA duplex, which affects their interaction with TALE proteins.
The structure of TALE-DNA complexes shows that the amino acid at position 13 is the only residue that interacts directly with the DNA base of the sense strand, while the residue at position 12 functions to stabilize the proper loop conformation during base pair recognition (35, 36). The present invention demonstrates that small amino acids (Gly and Ala) or deletions at position 13 can increase the affinity for 5 mC. This observation is consistent with previous findings that N x and NG (natural recognition T) can bind 5 mC. It is possible that the absence of a large side chain at position 13 could create enough space to accommodate a methyl group of 5 mC. However, there are exceptions to this general trend. For example, it is also observed in the present invention that HG has a very weak affinity for 5mC, with HG containing a smaller residue at position 13 compared to HD, which is the natural binder of C. Interestingly, when His at position 12 was replaced by Arg (thus becoming RG), a strong binding to 5mC was observed. In fact, RG also recognizes 5 hmC. These observations suggest that there may be more complex patterns of modification recognition by the double residues.
The present invention demonstrates TALE-mediated methylation-dependent gene activation and genome editing of several highly methylated genomic regions. As an important control, little gene activation was observed when the same region lacked cytosine methylation (in different cells). Accordingly, the RVDs discovered by the present invention may offer the possibility of: the target gene is manipulated according to its modified state in vivo. It is known that there are many Differentially Methylated Regions (DMR) that are involved in many important biological events, including genetic imprinting and disease. Thus, the unique ability of TALE proteins to read epigenetic markers enables future epigenome-dependent applications of TALEs in vivo.
The term "polynucleotide" as used herein refers to a polymer of deoxyribonucleotides or ribonucleotides in either linear or circular conformation and single-or double-stranded form.
In the present invention, the terms "polypeptide", "peptide" and "protein" are used interchangeably and refer to a polymer of amino acids, wherein one or more of the amino acids may be a naturally occurring amino acid, or a chemical analogue or modified derivative thereof.
As used herein, "binding" refers to a sequence-specific, non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid). The term "binding polypeptide" as used herein is a polypeptide or protein capable of non-covalent binding to another molecule. The further molecule may be a DNA molecule, an RNA molecule and/or a protein molecule.
The term "TALE" as used herein refers to Transcription Activator-like Effectors (Transcription activators-like Effectors) that comprise a DNA binding domain (also referred to as TALE repeat domain or TALE repeat unit) flanked by N-terminal and C-terminal non-repeat sequences that specifically recognize DNA sequences. The DNA binding domain consists of tandem "repeat units". Each "repeat unit" comprises 33-35 amino acids, of which residues 12 and 13 are key sites for targeted recognition, called repeat-variable diresives (RVDs), each of which recognizes only one base. TALEs or their DNA binding domains recognize DNA target sequences that sequentially correspond to the RVDs by the RVDs.
Naturally occurring TALEs typically contain 1.5-33.5 repeat units, but studies have shown that at least 6.5 repeat units are typically required for efficient recognition and binding of DNA, while 10.5 or more repeat units exhibit greater activity (Boch, Jens, and Ulla bolts. "Xanthomonas AvrBs3family-type III effectors: discovery and function." Annual review of physiology 48(2010):419-
The TALE repeat unit can be a truncated repeat unit, also referred to as a hemirepeat unit, i.e., it is part of the N-terminus of the complete repeat unit, which comprises an RVD. Typically, the last repeat unit at the carboxy terminus of a native TALE repeat domain is a truncated repeat unit. The hemirepeating unit typically comprises 17-20 amino acids.
In some embodiments, the repeat units of the TALE can be 6, 7,8, 9,10, 11,12, 13, 14, 15,16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,34, or 35 in some embodiments. The repeat unit of the TALE may comprise 6, 7,8, 9,10, 11,12, 13, 14, 15,16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, or 34 complete repeat units and one half repeat unit.
In a preferred embodiment, the TALE comprises 14 complete repeat units and 1 half repeat unit, wherein the half repeat unit is located carboxy-terminal to the entire TALE repeat unit.
In a preferred embodiment, the single "repeat unit" in the TALE may be LTPEQVVAIASXX’GGKQALETVQRLLPVLCQAHG are provided. In some embodiments, the half-repeat unit sequence in the TALE is LTPEQVVAIASXX’GGKQ. Wherein XX' is RVD.
The TALE repeat unit sequence used in the examples of the present invention is an AvrBs3 protein amino acid sequence in Xanthomonas (Xanthomonas). In addition to this sequence, the RVDs of the present invention are equally applicable to TALEs containing other repeat sequences. AvrBs3 have different homologues in different subspecies of Xanthomonas, and the specific sequence thereof can be found in the article "Boch, Jens, and Ulla Bonas," Xanthomonas AvrBs3family-type III effectors: discovery and function, "Annual review of physiology 48(2010): 419. 436.".
In the present invention, amino acids in a polypeptide sequence are shown by single letter abbreviations, and amino acids and their single letter abbreviations involved in the present invention are shown below:
glycine Gly G
Alanine Ala A
Valine Valine Val V
Leucine Leucine Leu L
Isoleucine Isoleucine Ile I
Cloning of Proline Pro P
Phenylalanine Phe F
Tyrosine Tyrosine Tyr Y
Tryptophan Trp W
Serine Ser S
Threonine Threonine Thr T
Cysteine Cys C
Methionine Methionine Met M
Asparagine Asparagine Asn N
Glutamine Gln Q
Aspartic acid Asparticacic Asp D
Glutamic acid Glutamacacid Glu E
Lysine Lys K
Arginine Arg R
Histidine His H
In the present invention, when RVD is described, it indicates the amino acid deletion at that position.
In the present invention, "base" and "nucleotide" are used interchangeably, and refer to a compound consisting of purine or pyrimidine base, ribose or deoxyribose, and phosphate, which are the main components of DNA sequences and RNA sequences. Common deoxynucleotides include cytosine (C), thymine (T), adenine (A) and guanine (G).
In addition to the four conventional deoxyribonucleotides described above, the mammalian genome contains modified DNA bases. For example, 5-methylcytosine (5mC), which is referred to as the fifth DNA base, is an important apparent marker, toneNodal gene expression. 5mC can be sequentially oxidized by a 10-11 translocase (TET) family protein to yield 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxycytosine (5 caC). In addition to methylation modifications on cytosines, another common DNA methylation modification modifies N6-methyladenine (N6-methyidenine, 6mA) plays an important role in prokaryotic cells as a covalent modification on adenine of DNA.
The term "methylated modified base" as used herein refers to a base having a methylation modification, and includes 5-methylated cytosine (5mC), 5-hydroxymethylated cytosine (5hmC) and 6-methylated adenine (6 mA).
The invention has found RVDs with specific recognition of 5mC, 5hmC, 6mA, as well as degenerate RVDs recognizing these methylated and corresponding unmodified bases, see in particular the following table:
recognition of bases RVD
5mC HA,NA
5hmC FS
C,5mC N*,NG,KP
C,5hmC HV,KV
5mC,5hmC K*,RG
C,5mC,5hmC G*,H*,R*,Y*
6mA NP,FT,CV,CP
6mA,A RI,NI,KI,HI
According to the above table, RVD HA or NA can specifically recognize 5mC, i.e. 5mC can be distinguished from 5hmC and C; RVD FS can specifically recognize 5hmC, i.e. can distinguish 5hmC from 5mC and C; RVD NP, FT, CV or CP can specifically recognize 6mA, i.e. 6mA can be distinguished from a; degenerate RVD N, NG or KP can recognize C and 5 mC; and (4) degenerating.
In the present invention, when describing the base recognized by RVD, reference to "C" means cytosine without methylation modification, as not specifically indicated above or below; when referring to "a", it refers to adenine without methylation modification; when referring to "5 mC", it refers to 5-methylated cytosine; when referring to "5 hmC", 5-hydroxymethylated cytosine is meant; when referring to "6 mA", 6-methylated adenine is meant.
According to the present invention, "specifically recognizes" a particular methylated modified base means that the binding affinity of the RVD for that particular methylated modified base is significantly greater than the same base with other forms of modification, or greater than the same base without modification, or greater than other different bases.
Binding affinities can be determined by a variety of methods well known to those skilled in the art, for example, by constructing a TALE-VP64-mCherry construct as described in the references below, and constructing reporter DNA fragments comprising different modified bases and fluorescent protein genes, and determining the binding affinities of the RVDs contained in the TALE and the different modified bases contained in the reporter DNA fragments using the fold increase in fluorescent protein signal resulting from binding of the TALE-VP64 protein expressed in the cell by the TALE-VP64-mCherry construct and activation of the reporter DNA fragment. When the activation fold of EGFP by RVD for a particular modified base is significantly higher than that of other forms of base, it is considered that RVD can specifically recognize the particular modified base. Binding affinity can also be determined by an in vitro protection assay as described in example 4 of the present invention.
From the above table, the present inventors found that RVD HA or NA can specifically recognize 5 mC. The binding affinity of RVD HA or NA to 5mC is significantly higher than that of 5hmC and C, 5mC can be distinguished from 5hmC and C by the RVD, specific binding of TALE to 5mC is realized, and various specific applications depending on 5mC are realized.
Various specific applications that rely on 5mC include, but are not limited to, detection of 5mC in a gene, gene expression regulation, gene editing, epigenetic modification, etc., that rely on 5mC (i.e., no gene expression regulation, gene editing, or epigenetic modification only if 5mC is present in the target sequence, and gene expression regulation, gene editing, or epigenetic modification if C or 5hmC is present at the corresponding position), chromosome tagging of live cells that rely on 5mC (i.e., tagging only genes with 5mC at the corresponding position in the chromosome, and not tagging if C or 5hmC is present at the corresponding position, thereby allowing visualization of cytosine methylation of genes in live cells), and preparation of proteins that can specifically bind to sequences containing 5 mC.
The present invention also found that RVD FS can specifically recognize 5 hmC. The RVD FS has the binding affinity with 5hmC which is obviously higher than that of 5mC and C, 5hmC can be distinguished from 5mC and C by the RVD, the specific binding of TALE and 5hmC is realized, and various specific applications depending on 5hmC are realized.
Various specific applications that rely on 5hmC include, but are not limited to, detection of 5hmC in a gene, gene expression regulation, gene editing, epigenetic modification that relies on 5hmC (i.e., no gene expression regulation, gene editing, or epigenetic modification only in the presence of 5hmC in the target sequence, and gene expression regulation, gene editing, or epigenetic modification in the presence of C or 5mC at the corresponding position), chromosome labeling of live cells that relies on 5hmC (i.e., labeling only a gene with 5hmC at the corresponding position in the chromosome, and not labeling if the corresponding position is C or 5mC, thereby allowing visualization of cytosine methylation of the gene in the live cell), and making proteins that can specifically bind to sequences containing 5 hmC.
The invention also finds that RVD NP, FT, CV or CP can specifically recognize 6 mA. These RVDs bind 6mA with significantly higher affinity than a, and can be used to distinguish 6mA from a, achieve specific binding of TALE to 6mA, and achieve various specific applications depending on 6 mA.
Various specific applications depending on 6mA include, but are not limited to, detection of 6mA in a gene, gene expression regulation, gene editing, epigenetic modification depending on 6mA, etc. (i.e., gene expression regulation, gene editing or epigenetic modification is not performed only in the case where 6mA is present in a target sequence, and gene expression regulation, gene editing or epigenetic modification is performed in the case where A is present at a corresponding position), chromosome labeling of a living cell depending on 6mA (i.e., only a gene having 6mA at a corresponding position in a chromosome is labeled, and if A is present at a corresponding position, no labeling is performed, whereby cytosine methylation of a gene in a living cell can be observed), preparation of a protein capable of specifically binding to a sequence containing 6 mA.
The present invention also found that degenerate RVD N, NG or KP can recognize C and 5 mC. These degenerate RVDs bind C and 5mC with similar binding affinities, and these degenerate RVDs bind C and 5mC with significantly higher binding affinities than 5 hmC.
The present invention also found that degenerate RVDs HV or KV can recognize C and 5 hmC. These degenerate RVDs bind C and 5hmC with similar binding affinities, and these degenerate RVDs bind C and 5hmC with significantly higher binding affinities than 5 mC.
The present invention also found that degenerate RVD K or RG can recognize 5mC and 5 hmC. These degenerate RVDs bind 5mC and 5hmC with similar binding affinities, and these degenerate RVDs bind 5mC and 5hmC with significantly higher binding affinities than 5 mC.
The present invention also found that degenerate RVDs G, H, R or Y can recognize C, 5mC and 5 hmC. These degenerate RVDs bind C, 5mC and 5hmC with similar binding affinities.
The present invention also found that degenerate RVDs RI, NI, KI or HI could recognize 6mA and A. These degenerate RVDs bind a and 6mA with similar binding affinities.
These degenerate RVDs can recognize two or three different methylated or unmethylated bases simultaneously, and can be used without knowledge of the methylation modifications of the bases to increase the efficiency of targeted binding of the TALE and reduce the effect of the methylation modifications on binding of the TALE to the target sequence. For example, 5mC in a cell genome can be oxidized into 5hmC under the catalysis of a TET family protein, and the use of a degenerate RVD capable of recognizing 5mC and 5hmC simultaneously can avoid problems such as reduced binding efficiency caused by different cytosine methylation types. Thus, in a particular experiment, a combination of an RVD that specifically recognizes one of the methylated modified bases, a degenerate RVD that recognizes two of the methylated forms of the bases, and a degenerate RVD that recognizes three methylated forms of the bases can be used for different purposes to meet the needs of the particular experiment.
The RVDs of the invention can be used in any application where binding to a particular methylated form of a base is desired, either in vitro or in vivo, and these applications can be non-therapeutic.
TALEs containing RVDs of the invention can be expressed as DNA binding polypeptides for binding to bases having a particular methylation pattern. In some cases, such DNA-binding polypeptides may function as "antibodies" for binding to their "antigens" (i.e., target sequences containing bases in a specifically methylated form). In some cases, such DNA-binding polypeptides may bind to target sequences containing bases in a specifically methylated form, protecting them from nuclease cleavage or interaction with other DNA-binding polypeptides (e.g., transcription regulators, etc.).
The TALE containing the RVD of the present invention can also be coupled to a fluorescent protein to form a fusion protein, which can be used to bind to a target sequence containing a specific methylated form of a base on a chromosome in a living cell, thereby allowing observation of dynamic changes of the chromosome in the living cell.
Fluorescent proteins are well known to those skilled in the art and include, but are not limited to, Green Fluorescent Protein (GFP), Enhanced Green Fluorescent Protein (EGFP), Red Fluorescent Protein (RFP), or Blue Fluorescent Protein (BFP), among others.
TALEs containing RVDs of the invention can also be coupled to functional domains to form fusion proteins that can be used to manipulate target genes containing bases in a specific methylated form. The manipulation can be gene editing, modulating gene expression, epigenetic modification, or the like, and the functional domain can be a gene editing functional domain, a domain modulating gene expression, or an epigenetic modification domain.
The term "gene editing" refers to altering the sequence of a gene at a target site, including insertion, deletion, or substitution of the gene. For example, the gene editing may be DNA double strand cleavage of a target site by a nuclease, formation of a DNA single strand gap, etc., followed by insertion or deletion of DNA (indel) during repair of non-homologous end joining (NHEJ) of a DNA sequence, resulting in frame shift mutation, thereby achieving the purpose of gene knockout. The gene editing functional domain refers to an amino acid sequence capable of performing a gene editing function.
When gene editing is performed using a fusion protein comprising a TALE of RVD of the present invention and a gene editing functional domain, the gene editing functional domain may be a nuclease. Nucleases include, but are not limited to, endonucleases, Zinc Finger Nucleases (ZFNs), Cas9 nucleases. The use of Cas9 nuclease is well known in the art, and its use is typically to co-introduce Cas9 nuclease and sgRNA into cells to effect cleavage of a target sequence.
In the present invention, when performing gene editing, it is preferable that the fusion protein be provided in the form of TALEN, in which case the gene editing functional domain is the DNA cleavage domain of the FokI endonuclease.
The term "modulating gene expression" refers to altering the expression of a gene or the level of an RNA molecule, including non-coding RNAs and RNAs that encode one or more proteins or protein subunits. "modulating gene expression" also includes altering the activity of one or more gene products, proteins, or protein subunits. A functional domain that regulates gene expression refers to an amino acid sequence that is capable of regulating the expression of a target gene.
The functional domain that regulates gene expression may be a transcription activator or a functional fragment thereof, or a transcription repressor or a functional fragment thereof.
The term "epigenetic modification" refers to a modification to DNA without altering the DNA sequence of the target gene, including DNA methylation modifications, DNA demethylation, and the like. An epigenetic modifying functional domain refers to an amino acid sequence that is capable of epigenetic modification of a target gene.
The epigenetic modifying functional domain may be a methyltransferase or a demethylase.
The term "functional fragment" is a protein or polypeptide whose sequence is a portion of a full-length protein or polypeptide, yet which has the same function as the full-length protein or polypeptide, and may be, for example, a protein domain capable of performing the corresponding function under the particular experimental conditions, such as a cleavage domain of a nucleic acid cleaving enzyme.
The cells described herein can be any cell or cell line, and can be plant, animal (e.g., mammalian, e.g., mouse, rat, primate, livestock, rabbit, etc.), fish, etc. cells, and can also be eukaryotic (e.g., yeast, plant, fungal, fish, and mammalian cells such as cat, dog, mouse, cow, sheep, and pig) cells.
The cells described herein may be oocytes, K562 cells, CHO (Chinese hamster ovary) cells, HEP-G2 cells, BaF-3 cells, Schneider cells, COS cells (monkey kidney cells expressing SV 40T-antigen), CV-1 cells, HuTu80 cells, NTERA2 cells, NB4 cells, HL-60 cells, and HeLa cells, HEK293T cells, and the like.
The method of any of the embodiments of the present invention may be performed in vitro or in vivo.
The method of any of the embodiments of the present invention may be non-therapeutic.
EXAMPLE 1 materials and methods
1. DNA Synthesis and purification
Oligo DNA primers were synthesized on an Expedite 8909 DNA/RNA synthesizer using standard reagents containing 5mC and 5hmC phosphoramidites (Glen Research). Oligo DNA was deprotected by standard methods recommended by Glen Research Corp. and purified using Glen-Pak DNA purification cassette (purification cartridge).
The synthesized DNA was verified by High Performance Liquid Chromatography (HPLC), briefly: DNA was digested into riboside using ribozymes P1(Sigma, N8630) and alkaline phosphatase (Sigma, P4252). The nucleosides were separated on a SB-Aq C18 column (Agilent) with 5% -50% acetonitrile over 30 minutes.
2. Cell culture, transfection and flow cytometry
HEK293T cells (Stanley Cohen laboratories, Stanford university), HeLa cells (maintained in this laboratory) were cultured in DMEM, 10% FBS and 1% penicillin-streptomycin were added, and the mixture was incubated at 37 ℃ and 5% CO2Culturing under the condition. Cells were seeded in 24-well plates at a density of 7x 10 per well 24 hours prior to transfection4And (4) cells. 0.15. mu.g TALE- (XX') per well3The plasmid and 0.15. mu.g of reporter DNA were co-transfected with Polyethyleneimine (PEI). At 48h post-transfection, cells were collected and analyzed on a BD LSR Fortessa flow cytometer (BD Biosciences). The EGFP and mCherry protein expression was quantified with lasers at 488nm and 561nm, respectively. At least 10000 events were collected from each sample, yielding sufficient data for analysis. mCherry fluorescence density of 5x 103-5x 104The cells of (a) are used for analysis.
3. Construction of TALEN
The backbone of the TALEN plasmid contains the CMV promoter, nuclear localization signals, TALE amino-and carboxy-terminal non-repeat sequences, and endonuclease FokI monomers, the specific sequences of which are described in reference 37 below.
When in use, TALE repeating units containing different RVDs are inserted into a TALEN framework vector to verify the effects of the different RVDs, and the construction method is shown in Yang, Junjiao, et al, "Assembly of cured TAL Effectors Through Advanced MATE System," TALENs: Methods and Protocols (2016):49-60.
4. Expression and purification of TALE proteins
The expressed and purified TALE protein was used to perform in vitro protection assays.
TALE repeat units with canonical RVDs (i.e., NI, NG, HD, and NN) were constructed using the ULtiMATE system as described previously (37). For TALE repeat units using the new RVDs, TALE repeat unit monomers containing the new RVDs are synthesized separately. Final assembly of these TALE constructs was performed using the same ULtiMATE protocol as previously described (37).
The construction of the TALE expression plasmid is to construct TALE repeating units into TALEN skeletons. Fragments containing the N-and C-terminal sequences of TALE with intermediate repeats were amplified from the corresponding TALEN plasmids and cloned into NheI and HindIII sites of pET-28a (+).
Sequences of TALEs with different RVDs (containing His-tag for purification, the N-and C-terminal sequences of TALEs and TALE repeat units capable of specifically recognizing DNA) were cloned into pET28a vector (Novagen). When the cell density reached an OD600 of 8.0, the overexpression of TALE protein was induced in E.coli BL21(DE3) with 1.0mM isopropyl beta-D-thiogalactoside (IPTG). After 16 hours of growth at 20 ℃, the cells were harvested and resuspended in a buffer containing 25mM Tris-HCl pH 8.0 and 150mM NaCl and disrupted by sonication. By Ni2+Nitrilotriacetate affinity resin (Ni-NTA, GE Healthcare) (Buffer A:10mM Tris-HCl pH 8.0,300mM NaCl, Buffer B:10mM Tris-HCl pH 8.0,300mM NaCl and 500mM imidazole) and HiLoad superdax PG200(GE Healthcare) (Buffer GF:10mM Tris-HCl, pH 8.0,100mM NaCl) were used to purify the recombinant proteins sequentially.
5. TALE repeat units
TALE repeat units used in the following examples comprise 14 repeat units and one half repeat unit in series, each single repeat unit comprising 34 amino acid residues, the sequence of which is: LTPEQVVAIASXX' GGKQALETVQRLLPVLCQAHG, a half-repeat unit comprising the first 17 amino acid residues of a single repeat unit having the sequence: LTPEQVVAASXX' GGKQ. Wherein XX' is RVD.
The materials and methods described in this example were used in examples 2-7 below.
Example 2 construction of the Manual screening System
The artificial screening system consists of a reporter DNA element and a TALE-VP64 expression library.
The TALE-VP64 expression library comprises 400 TALE-VP64-mCherry constructs, each TALE-VP64-mCherry construct is a circular plasmid, and when the construct is transferred into cells, the TALE-VP64 fusion protein is expressed (the specific construction method is shown in the following reference 37). As shown in fig. 1b, each construct comprised an artificial TALE array containing 14.5 repeats fused to VP64, which was the same between the different constructs for 1-6, 10-14 and the last half of the repeats (shown as 14.5 repeats in fig. 1b), but the 7 th-9 th repeats were different between the different constructs. For each construct, the RVDs of three consecutive monomers located in the 7 th to 9 th repeats of an artificial TALE array, called TALE- (XX'), are encoded by the same 6 randomly synthesized nucleotides, i.e., the 7 th to 9 th repeat units express three identical RVDs in tandem3Thus 400 TALEs with different test RVD XX' were formed to test the recognition of 5mC and 5hmC by different RVDs. Wherein X and X' represent the 12 th and 13 th residues in the repeat, respectively (i.e., RVD). Furthermore, as N has been previously found to be able to recognise 5mC, an additional 20 TALE- (X) were assembled3Wherein residue 13 is deleted. Hereinafter, TALE- (XX')3And TALE- (X)3Collectively TALE- (XX')3Accordingly, the TALE-VP64 expression library used comprises a total of 420 different TALE- (XX')3420 TALE-VP64-mCherry constructs, hereinafter collectively TALE constructs, also referred to as TALE- (XX')3Plasmid or TALE- (XX')3An expression plasmid.
TALE-VP64 expression libraries were generated. Specifically, 420 TALE- (XX')3Divided into two categories, 400 of which TALE- (XX')3The amino acid residues at positions 12 and 13 of RVD of the 7 th to 9 th repeating units of the plasmid are a combination of 20 natural amino acid residues, and the TALE- (XX')3The plasmid was constructed as described in reference 13 below.
Another 20 TALE- (XX')3The RVD expressed by the 7 th to 9 th repeating units of (1) is position 13RVDs with amino acid residues deleted, i.e., a, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y. These 20 TALE- (XX')3Construction of expression plasmid TALE- (XX') reported in reference 13, infra3The respective construction methods of (1). Even with a forward primer 5 '-tcgttccgagaacaggttgtagccataggcttctnnnnnnnnggaggtaagcagcactgg AA-3' (NNNNNN stands for the base sequence coding for the specific RVD) encoding a specific RVD and an identical reverse primer 5'-aaCGTCTCaGTTCGGGGGTCAACCCATGAGCCTGACACAGTACTGGGAGCAGGCGCT GCACGGTTTCCAGTGCCTGCTT-3', a TALE monomer fragment 102bp long and containing BsmBI restriction endonuclease sites at both ends was generated by annealing and PCR extension. The TALE monomer fragments were then ligated together by 6 cycles of Golden-Gate ligation and the TALE multimers were amplified using primers G-lib-F and G-lib-R. And finally, selectively recovering fragments only containing three TALE monomers in a glue recovery mode, connecting the fragments to a library expression vector constructed in advance, and transforming the fragments into Trans1-T1 competent cells. TALE- (XX') with correct expression corresponding to RVD obtained by Sanger sequencing3A plasmid. Wherein:
G-lib-F:5’-TAGCTATACGTCTCATTGACCCCCGAACAGGTTGTAGCC-3’
G-lib-R:5’-TAGCTATACGTCTCACCCATGAGCCTGACACAGTACTGGGAGCA-3’。
the reporter DNA element is a segment comprising TALE- (XX')3A linear DNA fragment that recognizes the sequence, miniCMV promoter, EGFP protein coding sequence and polyA signal (fig. 1 b). TALE- (XX') in reporter DNA element3The recognition sequence is 15 bases in length, wherein 1-6 bases and 10-15 bases can be recognized by the RVD contained in 1-6 and 10-14.5 repeats of the library TALE construct respectively. TALE- (XX') in reporter DNA element3The 7 th to 9 th bases of the recognition sequence may be three consecutive 5mC, 5hmC or 6mA for detecting the binding ability of different RVDs to the corresponding methylation modified base, referred to as 5mC reporter DNA element, 5hmC reporter DNA element or 6mA reporter DNA element, respectively. One or more reporter DNA elements are used depending on the methylation modified base determination to be screened. Report onThe DNA element is obtained by PCR amplification using a chemically synthesized forward primer Report-F containing a specific methylated modified base and an identical reverse primer Report-R, and has a size of about 1450 bp.
The primer sequences are as follows:
Report-F:
5’-G*C*C*AGATATACGCGTTACTGGAGCCATCTGGCCNNNTACGTAGGCGTGTAC-3', wherein N represents 5mC, 5hmC or 6 mA;
Report-R:5’-A*G*C*GTCTCCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGC-3’
(. indicates a thio-modified base, which acts primarily to protect the reporter DNA element from nuclease degradation within the cell, and the underline indicates TALE- (XX')3Recognition sequences, i.e. TALE binding sequences)
The construction process of the report DNA element is as follows: first, a reporter plasmid pcDNA6_3A (FIG. 2c) comprising TALE- (XX') was amplified from E.coli3The template sequence CTGGCCAAATACGTA of the binding site of (a). Then, the above-mentioned primer containing 5mC, 5hmC or 6mA was chemically synthesized, and a linear reporter DNA element containing 5mC, 5hmC or 6mA in the TALE binding sequence was generated by PCR (FIG. 2c), and the forward primer contained TALE- (XX')3Binding sequence CTGGCCNNNTACGTA, wherein N represents 5mC, 5hmC or 6mA, immediately upstream of the minimal CMV promoter (pminiCMV) and its downstream EGFP gene. Wherein the bases in the TALE binding sequence corresponding to repeats 7-9 are 5mC, 5hmC, or 6mA linked by 3.
In addition, reporter DNA elements for C and T, which are circular DNA, can be included in the artificial screening system and constructed according to the method of reference 13 described below. TALE- (XX') contained therein3The recognition sequence is the same as described above except where NNN is CCC or TTT.
TALE- (XX') was detected by measuring the EGFP fluorescence level using the above-described manual screening system3Specificity of binding to TALE binding sequences in the reporter DNA element. Thus, a TALE RVD recognition screening platform for systematic evaluation is constructed.
Example 3 screening for identification of modified cellsTALE RVD of pyrimidines
To measure the binding affinity of 420 RVDs to 5mC and 5hmC, each of 420 TALE constructs was introduced into HEK293T cells with one of three EGFP reporter DNA elements (containing a triplex C, 5mC or 5hmC, respectively). The EGFP and mCherry fluorescence levels were measured using FACS analysis (fig. 1 c). The binding specificity of 420 RVDs in the TALE construct for C, 5mC and 5hmC, respectively, was determined by comparing the fold change in EGFP expression for each RVD relative to the baseline level of the corresponding base. 1260 data points for C, 5mC and 5hmC, and 420 data points for T in the previous work (see reference 13 below) were summarized in a heat map (fig. 3a and 4).
From the preliminary screening results of fig. 3a, some RVDs with higher binding capacity of 5mC and 5hmC were selected for 3 replicates of confirmation experiments, wherein RVDs with EGFP activation fold greater than or equal to 4 for 5mC, 5hmC reporter DNA fragments were considered as RVDs with higher binding capacity for these two nucleotides, respectively, and the results are shown in fig. 3 b.
As can be seen from the results, the screening yielded a specificity and degenerate RVD that efficiently recognized 5mC and 5 hmC. A number of binders with high binding to 5mC were identified and classified into three classes based on the amino acid residue at position 13: RVDs containing Gly (NG, KG and RG), Ala (HA and NA) and deletions (N, K, H, R, Y and G). Among the Gly-containing and deletion-containing RVDs, there are general RVDs (recognizing 5mC, 5hmC and conventional C) and degenerate RVDs (recognizing 5mC and 5 hmC); interestingly, the two Ala-containing RVDs (HA and NA) were selective for 5 mC. Previous studies used NG (natural binding agent of T) and N to recognize 5 mC; while we identified these two RVDs during the screening process, many of the novel RVDs reported in this study have binding affinities for 5mC that exceed them. For example, HA, NA and X (X refers to K, H, Y and G) all demonstrated higher binding affinity for 5 mC. We have not found that these three classes of RVDs also bind conventional T; this is not surprising, as they either have an amino acid residue with a small side chain or have a deletion of this residue at position 13.
RVDs that selectively bind 5hmC have not been previously reported. As noted above, we identified degenerate RVDs and universal RVDs that bind well to 5 hmC. Among them, a-15 fold induction was observed for these 5hmC binders, demonstrating their strong affinity for 5 hmC. Furthermore, we have observed a novel group of 5 hmC-binding RVDs with serine at residue 13 (FS, YS and WS). Although their affinity for 5hmC is lower than for general binders, they preferentially bind 5hmC over 5mC, which provides the possibility of a positive and selective recognition of 5 hmC. Taken together, we found that universal and degenerate binders to 5mC and 5hmC tend to contain a glycine or deletion at position 13, while specific binders to 5mC and 5hmC have an alanine or serine at their position 13, respectively.
Example 4 quantitative measurement of binding affinity and specificity of RVD for 5mC, 5hmC and conventional C
The DNA recognition by the novel RVDs obtained in example 3 was verified by in vitro protection assays (see FIG. 5a for the reaction principle). In this experiment, a gene sequence of MAPK6 gene, 5' -TTCAGCTGGAT [ C ], was synthesized by chemical synthesisCCGGAGGA]GCGGATATAACCAGG-3'. TALE recognition sequences designed from this sequence are shown in square brackets and contain one MspI restriction endonuclease recognition site (underlined). A DNA oligonucleotide containing C, 5mC or 5hmC at a given position (second C of the MspI recognition site) when chemically synthesized; the endonuclease MspI was added to the DNA probe in the presence of varying concentrations of TALE protein. Binding of the TALE protein to its recognized cytosine base inhibits DNA cleavage by the endonuclease, thereby resulting in a protected full-length band and a cleaved DNA band in a denaturing PAGE analysis. The protection efficiency was then calculated for each RVD and given as the inhibition constant (Ki, which is the reciprocal measure of binding affinity). Inhibition constants for RVDs were obtained by obtaining protection efficiencies by cleavage protection tests for C, 5mC and 5hmC for TALE proteins containing different RVDs, and fitting a protection efficiency curve using GraphPad Prism 6 software and calculating inhibition constants characterizing the different RVDsFor the binding efficiency of C, 5mC and 5hmC, a smaller value of inhibition constant indicates a stronger RVD protection efficiency and stronger binding to the corresponding DNA fragment.
An in vitro protection assay was performed using the endonuclease MspI (see FIG. 5a for its principle). Each 10. mu.l reaction contained 1nM labeled DNA, 1. mu.l 10X CutSmart Buffer (NEB), and 100nM NaCl. TALE protein was added at final concentrations between 10nM and 8 μ M. The binding system was incubated at 25 ℃ for 30 minutes. Then 0.4U MspI was added and incubation continued for 15 min. The reaction was quenched by the addition of 10. mu.l formamide and then heated at 95 ℃ for 5 minutes. Protected and cleaved DNA was separated by Urea-PAGE and imaged with the Cheminessecent Nucleic Acid Detection Module Kit (Thermo).
The assay was first optimized with RVD HD, a natural binder with high affinity for conventional cytosines. HD observed a low Ki for C, while Ki was at least 30-fold higher for 5mC and 5hmC (fig. 5b and C, fig. 6C), demonstrating the ability of the protection assay in quantitative assessment of binding affinity. In this in vitro assay, NG and N were shown to bind only 5mC, not 5hmC (fig. 5b and c). Representative RVDs were selected from the screening results (fig. 3b) for further evaluation. The 5mC specific RVD HA showed the lowest Ki for 5mC, which is-5-7 times more selective for 5mC than for C and 5hmC in this in vitro assay. The 5hmC specific RVD FS showed a 5-6 fold selectivity for 5hmC over C and 5mC, although its binding affinity for 5hmC seems not as strong as HA for 5 mC. Furthermore, degenerate RVD RG shows comparable protection to 5mC and 5hmC, while generic RVD R binding to C, 5mC and 5hmC has similar affinity to all three. (see fig. 5b and 5 c).
Example 5 activation of Gene expression by the novel RVD in a methylation dependent manner
To investigate the potential of newly identified RVDs to recognize cytosine methylation in vivo, their performance in human cells in target gene activation was studied. TALE-activators were designed and constructed using TALE-VP64 previously developed to achieve activation of specific genes (37). The backbone of the TALE-activator plasmid contains the CMV promoter, a nuclear localization signal, TALE amino-and carboxy-terminal non-repeat sequences, and the activator VP64, the specific sequences of which are described in reference 37 below.
When in use, TALE repeating units containing different RVDs are inserted into a TALE-activator framework to verify the effects of the different RVDs, and the construction method is shown in Yang, Junjiao, et al, "Assembly of cured TAL effects Through Advanced formulations systems," TALENs: Methods and Protocols (2016):49-60.
First, using existing methylation data from the USCS database, the TET1 gene was selected, the promoter of which had high methylation levels in HeLa cells but was demethylated in HEK293T cells (fig. 7 a). Constructing the TALE-activator containing TALE repeating units targeting TET1 genes. In HeLa cells, 5 mC-specific HA, degenerate RG and universal R all significantly activated the expression of TET1 (standard of significant activation is an increase in the expression of TET1 compared to control, with significant increase in expression compared to control, P < 0.05;, P <0.005) where RG was about 10-fold activated (fig. 7b), and all three new RVDs identified (HA, RG, R) were shown to perform better than NG and N); furthermore, HD did not significantly upregulate the expression of TET 1. Whereas in HEK293T cells, HD binds well to the demethylated TET1 promoter and further enhances its expression (although it already HAs a high expression level), HA and RG do not affect gene expression, whereas general R HAs a lower affinity for conventional C than HD, slightly upregulating gene expression; since NG and N hardly distinguish between unmodified C, they also slightly activated TET1 gene expression.
TALE-activators containing TALE repeat units targeting LRP2 gene were then constructed, targeting the promoter region of LRP2 gene, which is moderately methylated in HeLa cells and demethylated in HEK293T cells (fig. 7 c). Furthermore, this region contains only two cpgs and is therefore more challenging for RVD-mediated discrimination.
HEK293T and HeLa cells were seeded on 6-well plates and grown to 60% density. For each well, 2. mu.g of TALE-activator plasmid
Figure BDA0001370382780000261
2000 (Invitrogen). Transfected cells were cultured for 3 days prior to sorting of mCherry positive cells by flow cytometry. Total RNA was isolated from mCherry positive cells and reverse transcribed and Real-Time PCR analysis was performed on a ViiATM7 Real-Time PCR System (Applied Biosystems) using SYBR Green 2X premix II (Takara) under standard reaction conditions.
RVD (HA, RG) binding to 5mC was observed to significantly activate genes in HeLa cells. In HEK293T cells, only HD and universal RVD R activated the expression of the LRP2 gene, while RVD combined with 5mC did not activate the expression of the LRP2 gene (fig. 7 d). Thus, the novel RVDs identified (HA, RG) are able to distinguish moderately methylated sites from unmethylated sites in vivo.
Example 6 methylation dependent genome editing Using the novel RVDs
To examine the possibility of methylation-dependent genome editing, TALEN constructs containing different RVDs (obtained by inserting TALE repeat units into TALEN expression vectors, i.e., TALEN plasmid backbone, containing CMV promoter, nuclear localization signal, TALE amino and carboxy terminal non-repeats, endonuclease FokI monomers, the specific sequences of which are described in reference 37 below, and their construction Methods are described in Yang, Junjiao, et al, "assembled of conditioned TAL effecters Through Advanced complex systems," TALENs: Methods and Protocols (2016): 49-60.) were used to target the human PLXNB2 gene for DNA cleavage (fig. 7 e). The second exon of PLXNB2, which is highly methylated in HeLa cells (data from UCSC), was selected and TALEN-mediated DNA cleavage was evaluated using indel ratio (i.e. insertion deletion ratio).
HeLa cells were seeded on 6-well plates and grown to 60% density. For each well, a pair of TALEN plasmid and pmaxGFP (LonZa Group Ltd.) were co-transfected at a ratio of 9:9:2 (0.9. mu.g: 0.2. mu.g) using Xtreme Gene HP (Roche). Transfected cells were cultured for 3 days before sorting GFP-positive cells by flow cytometry. TALEN-target regions were PCR amplified from genomic DNA of isolated GFP positive cells. TALEN-mediated indels were analyzed using the mismatch-sensitive T7 endonuclease (T7E 1; New England Biolabs) as described previously (41).
The results show that TALEN-HD shows negligible editing efficiency (fig. 7f), indicating that there are three 5mC modifications in this region, effectively preventing its binding. When three HD-containing RVDs were replaced with RVDs binding 5mC (HA, R, NG and N detected), high indel ratios were observed (fig. 7f and 8C). These results indicate that these RVDs can achieve methylation-dependent genome editing in human cells.
Example 7 RVD-mediated detection of 5hmC in mammalian genomes at Single base resolution
The methylation ratio of cytosines can be determined by bisulfite sequencing; however, conventional bisulfite sequencing cannot distinguish between 5hmC and 5mC (38). The use of TALE proteins that bind C and 5mC for indirect detection of 5hmC has been previously reported (32). To investigate the possibility of direct 5hmC detection using TALE proteins containing 5 hmC-recognition RVDs, a model DNA sequence was first synthesized incorporating 5hmC, 5mC and C at specific sites and the selectivity of RVD FS for 5hmC detection was examined. In the in vitro protection assay, the full length DNA protected increased linearly with increasing proportion of 5hmC (fig. 10), whereas the protection rate showed a slight change when the ratio of 5mC and C was varied. In this experiment, DNA fragments containing C, 5mC and 5hmC and having the same sequence were mixed at the ratios shown in the figure. The black circles indicate the variation in the degree of protection when the proportion of 5hmC in the mixture of 5mC and 5hmC is increased from 0% to 100%. Black triangles indicate the change in the degree of protection when the proportion of 5hmC in the C and 5hmC mixture is increased from 0% to 100%. The grey circles indicate the change in the degree of protection of 5mC when the proportion of C and 5mC in the mixture is increased from 0% to 100%. As can be seen in fig. 10, the degree of protection of TALE-FS against DNA increases only slightly as the proportion of 5mC in the C and 5mC mixture increases. Compared with the above, as the proportion of 5hmC in the mixture of C and 5hmC and the mixture of 5mC and 5hmC increases, the degree of protection of TALE-FS for DNA also increases greatly, indicating that TALE-FS has a selective protective effect on DNA fragments containing 5 hmC. These observations indicate that 5 hmC-specific RVD FS can be used to detect 5hmC modifications in genomic DNA samples under complex modification scenarios (simultaneous presence of modifications of at least C, 5mC and 5hmC for the nucleotide of interest).
The FS-containing TALE protein (i.e., TAL effector protein in fig. 9a) was then used for site-specific 5hmC detection in genomic DNA. Considering the complexity of the genomic DNA, using the CRISPR/Cas9 system instead of restriction enzymes, DNA cleavage was generated in this protection assay (fig. 9 a). A10 bp sequence within the intron of the mouse slc9a9 gene was selected, of which the first cytosine was reported to be highly hydroxymethylated in mES cells (39).
The reaction conditions are as follows: each 10. mu.l reaction contained 50ng of genomic DNA, 1. mu.l of 10 XSA 9 nuclease reaction buffer (NEB) and 1nM DTT. TALE protein was added at final concentrations between 20nM and 500 nM. The binding reaction was incubated at 25 ℃ for 30 minutes. Add 5 μ l of pre-incubated Cas9 and sgRNA and continue incubation at 37 ℃ for 1 hour. The reaction was quenched by heating at 95 ℃ for 5 minutes. DNA was purified using Ampure Beads and SYBR Green 2X premix II (Takara) on
Figure BDA0001370382780000281
qPCR was analyzed 96 (Roche).
The results indicate that the protective efficiency of TALE-FS is much higher than TALE-HD (fig. 9b), which indicates that TALE-FS is able to detect a single 5hmC site in the complex environment of genomic DNA. To further investigate the ability of this method in the 5hmC assay, the method was applied to genomic DNA of other cell lines where the level of hydroxymethylation at the same site was unknown. When there was a relatively low concentration of TALE protein (RAW264.7, L-M (TK-) and L929 cells) compared to the mESC sample, there was much less protection of the genomic DNA of these cells (fig. 9c), indicating that the level of 5hmC was lower at this specific site in these cells. The above results indicate that TALE proteins containing the newly identified RVDs can be used to detect hydroxymethyl status in genomic DNA with resolution at the base level.
Example 8 identification of TALE protein RVD recognizing 6mA
Using the same phases as described in example 2The same screening system, i.e. TALE- (XX')3Independent RVD library and linear DNA reporter system containing 6mA modification, which are co-transferred to HEK293T cells respectively, and then TALE- (XX') is detected by flow cytometry analysis3Fold activation of EGFP expression on the 6mA reporter system. Figure 11 is a heatmap of the results of the 420 RVDs versus the 6mA screen.
TALE- (XX') with activation efficiency for the 6mA reporting system, as can be seen in the heat map for the 6mA screening results3The first amino acids are mainly His (H), Lys (K), Asn (N) and Arg (R); most of the second amino acids of these high efficiency RVDs are Ile (I), Pro (P), Ser (S), Thr (T) or Val (V). As can be seen from the superimposed heatmap (fig. 11), of the above RVDs with higher recognition for 6mA, many have better recognition for unmodified adenine, e.g., the series RVDs such as XI, XS, XT, XV; some are also better specific for 6mA, such as the XP series RVD. Fig. 12 shows the results of experiments with a number of RVDs selected from the preliminary screening results that identified 6mA, and specifically, 3 replicates of confirmatory experiments with a number of RVDs greater than 5 for the EGFP activation fold of the 6mA reporting system.
The identification capability and the preference of 6mA are still closely related to the second amino acid of RVD in the whole, and the research finds that the RVD of XP series and the RVDs such as NA, CV and FT show obvious preference of 6 mA; while XI, XC and partial XT series have no apparent preference for the recognition of unmodified adenine and N6-methyladenine. Wherein Ile (I) contacts with A base is Van der Waals interaction (45) formed between side chain and adenine C8 and N7, so that the contact is not influenced by adding methyl on 6-position amino. Among the RVDs with high specificity for 6mA (6mA/A >5), the background values for FT, CV, CP and NP for other base recognition without methylation modification were lower (FIG. 13), with NP being the highest recognition capability for 6mA, FT being the second lowest, CV and CP being lower, which can be considered as the RVD selection with the best preference for 6 mA.
In summary, the present study found that amino acids (Gly and Ala) small or deleted at position 13 generally increased the affinity to 5 mC. This observation is consistent with previous findings that N x and NG (natural recognition T) can bind to 5 mC. It is possible that the absence of a large side chain at position 13 could create enough space to accommodate a methyl group of 5 mC. However, there are exceptions to this general trend. For example, HG was observed to have a very weak affinity for 5mC, with HG containing a smaller residue at position 13 compared to HD, which is a natural binder of C. Interestingly, when His at position 12 was replaced by Arg (thus becoming RG), we observed a strong binding to 5 mC. In fact, RG also recognizes 5 hmC. These observations suggest that there may be more complex patterns of modification recognition by the double residues. To fully understand the recognition mechanism of TALEs for modifications, the crystal structure of these new RVDs in complexes with modified cytosines is required.
TALE-mediated methylation-dependent gene activation and genome editing of several highly methylated genomic regions is also demonstrated herein. As an important control, little gene activation was observed when the same region lacked cytosine methylation (in different cells). Thus, the new RVDs reported in this study may offer the possibility of: the target gene is manipulated according to its modified state in vivo. It is known that there are many Differentially Methylated Regions (DMR) that are involved in many important biological events, including genetic imprinting and disease. Thus, the unique ability of TALE proteins to read epigenetic markers enables future epigenome-dependent applications of TALEs in vivo.
In addition, the research finds the RVD with better preference for N6-methyladenine by a high-throughput screening method, such as CV, FT, NP and the like. The RVDs can be used for constructing sequence-specific N6-methyladenine combined TALE protein, play a role similar to an antibody, and can also be combined with the RVDs only recognizing unmodified A bases for use, so that the aim of quantitatively or qualitatively detecting 6mA is fulfilled. RVDs that are unbiased for 6mA with a bases, such as NI, can be used to unbiased target sequences containing potential adenine methylation modifications, thereby overcoming the problem of methylation modifications that result in inefficient gene editing.
Reference to the literature
1.Kay S&Bonas U(2009)How Xanthomonas type III effectors manipulate the host plant. CurrOpin Microbiol 12(1):37-43.
2.Kay S,Hahn S,Marois E,Hause G,&Bonas U(2007)A bacterial effector acts as a planttranscription factor and induces a cell size regulator.Science 318(5850):648-651.
3.Boch J&Bonas U(2010)Xanthomonas AvrBs3family-type III effectors:discovery andfunction.Annu Rev Phytopathol 48:419-436.
4.Boch J,et al.(2009)Breaking the code of DNA binding specificity of TAL-type III effectors.Science 326(5959):1509-1512.
5.Gurlebeck D,Thieme F,&Bonas U(2006)Type III effector proteins from the plant pathogenXanthomonas and their role in the interaction with the host plant.J Plant Physiol 163(3):233-255.
6.Moscou MJ&Bogdanove AJ(2009)A simple cipher governs DNA recognition by TALeffectors. Science 326(5959):1501.
7.Bogdanove AJ&Voytas DF(2011)TAL effectors:customizable proteins for DNA targeting.Science 333(6051):1843-1846.
8.Morbitzer R,Romer P,Boch J,&Lahaye T(2010)Regulation of selected genome loci usingde novo-engineered transcription activator-like effector(TALE)-type transcription factors.Proc Natl Acad Sci U S A 107(50):21617-21622.
9.Cong L,Zhou R,Kuo YC,Cunniff M,&Zhang F(2012)Comprehensive interrogation ofnatural TALE DNA-binding modules and transcriptional repressor domains.Nat Commun3:968.
10.Garg A,Lohmueller JJ,Silver PA,&Armel TZ(2012)Engineering synthetic TAL effectorswith orthogonal target sites.Nucleic Acids Res 40(15):7584-7595.
11.Christian M,et al.(2010)Targeting DNA double-strand breaks with TAL effector nucleases.Genetics 186(2):757-761.
12.Miller JC,et al.(2011)A TALE nuclease architecture for efficient genome editing. NatBiotechnol 29(2):143-148.
13.Yang J,et al.(2014)Complete decoding of TAL effectors for DNA recognition.Cell research24(5):628-631.
14.Miller JC,et al.(2015)Improved specificity of TALE-based genome editing using anexpanded RVD repertoire.Nat Methods 12(5):465-471.
15.Kohli RM&Zhang Y(2013)TET enzymes,TDG and the dynamics of DNA demethylation.Nature 502(7472):472-479.
16.Pastor WA,Aravind L,&Rao A(2013)TETonic shift:biological roles of TET proteins inDNA demethylation and transcription.Nat Rev Mol Cell Biol 14(6):341-356.
17.Kriaucionis S&Heintz N(2009)The nuclear DNA base 5-hydroxymethylcytosine is presentin Purkinje neurons and the brain.Science 324(5929):929-930.
18.Tahiliani M,et al.(2009)Conversion of 5-methylcytosine to 5-hydroxymethylcytosine inmammalian DNA by MLL partner TET1.Science 324(5929):930-935.
19.Ito S,et al.(2010)Role of Tet proteins in 5mC to 5hmC conversion,ES-cell self-renewal andinner cell mass specification.Nature 466(7310):1129-1133.
20.He YF,et al.(2011)Tet-mediated formation of 5-carboxylcytosine and its excision by TDG inmammalian DNA.Science 333(6047):1303-1307.
21.Maiti A&Drohat AC(2011)Thymine DNA glycosylase can rapidly excise 5-formylcytosineand 5-carboxylcytosine:potential implications for active demethylation of CpG sites.J BiolChem 286(41):35334-35338.
22.Pfaffeneder T,et al.(2011)The discovery of 5-formylcytosine in embryonic stem cell DNA.Angew Chem Int Ed Engl 50(31):7008-7012.
23.Huang Y&Rao A(2014)Connections between TET proteins and aberrant DNA modificationin cancer.Trends Genet 30(10):464-474.
24.Bultmann S,et al.(2012)Targeted transcriptional activation of silent oct4 pluripotency geneby combining designer TALEs and inhibition of epigenetic modifiers.Nucleic Acids Res40(12):5368-5377.
25.Valton J,et al.(2012)Overcoming transcription activator-like effector(TALE)DNA bindingdomain sensitivity to cytosine methylation.J Biol Chem 287(46):38427-38432.
26.Kim Y,et al.(2013)A library of TAL effector nucleases spanning the human genome. NatBiotechnol 31(3):251-258.
27.Deng D,et al.(2012)Recognition of methylated DNA by TAL effectors.Cell research22(10):1502-1504.
28.Dupuy A,et al.(2013)Targeted gene therapy of xeroderma pigmentosum cells usingmeganuclease and TALEN.PLoS One 8(11):e78678.
29.Hu J,et al.(2014)Direct activation of human and mouse Oct4 genes using engineered TALEand Cas9 transcription factors.Nucleic Acids Res 42(7):4375-4390.
30.Kubik G,Schmidt MJ,Penner JE,&Summerer D(2014)Programmable and highly resolvedin vitro detection of 5-methylcytosine by TALEs.Angew Chem Int Ed Engl 53(23):6002-6006.
31.Kubik G&Summerer D(2015)Achieving single-nucleotide resolution of 5-methylcytosinedetection with TALEs.Chembiochem 16(2):228-231.
32.Kubik G,Batke S,&Summerer D(2015)Programmable sensors of 5-hydroxymethylcytosine.J Am Chem Soc 137(1):2-5.
33.Maurer S,Giess M,Koch O,&Summerer D(2016)Interrogating Key Positions of Size-Reduced TALE Repeats Reveals a Programmable Sensor of 5-Carboxylcytosine.ACS ChemBiol 11(12):3294-3299.
34.Rathi P,Maurer S,Kubik G,&Summerer D(2016)Isolation of Human Genomic DNASequences with Expanded Nucleobase Selectivity.J Am Chem Soc 138(31):9910-9918.
35.Deng D,et al.(2012)Structural basis for sequence-specific recognition of DNA by TALeffectors.Science 335(6069):720-723.
36.Mak AN,Bradley P,Cernadas RA,Bogdanove AJ,&Stoddard BL(2012)The crystalstructure of TAL effector PthXo1 bound to its DNA target.Science 335(6069):716-719.
37.Yang J,et al.(2013)ULtiMATE system for rapid assembly of customized TAL effectors.PLoS One 8(9):e75649.
38.Wu H&Zhang Y(2015)Charting oxidized methylcytosines at base resolution.Nat StructMol Biol 22(9):656-661.
39.Yu M,et al.(2012)Base-resolution analysis of 5-hydroxymethylcytosine in the mammaliangenome.Cell 149(6):1368-1380.
40.Hsu PD,Lander ES,&Zhang F(2014)Development and applications of CRISPR-Cas9 forgenome engineering.Cell 157(6):1262-1278.
41.Mussolino C,et al.(2011)A novel TALE nuclease scaffold enables high genome editingactivity in combination with low toxicity.Nucleic Acids Res 39(21):9283-9293.
42.Fang,G.,Munera,D.,Friedman,D.I.,Mandlik,A.,Chao,M.C.,Banerjee,O.,Feng,Z.,Losic, B.,Mahajan,M.C.,Jabado,O.J.,et al.(2012).Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing.Nature biotechnology 30,1232-1239.
43.Fu,Y.,Luo,G.Z.,Chen,K.,Deng,X.,Yu,M.,Han,D.,Hao,Z.,Liu,J.,Lu,X.,Dore,L.C.,et al.(2015).N6-methyldeoxyadenosine marks active transcription start sites in Chlamydomonas.Cell 161,879-892.
44.Greer,E.L.,Blanco,M.A.,Gu,L.,Sendinc,E.,Liu,J.,Aristizabal-Corrales,D.,Hsu,C.H., Aravind,L.,He,C.,and Shi,Y.(2015).DNA Methylation on N6-Adenine in C.elegans.Cell 161, 868-878.
45.Koziol,M.J.,Bradshaw,C.R.,Allen,G.E.,Costa,A.S.,Frezza,C.,and Gurdon,J.B.(2016). Identification of methylated deoxyadenosines in vertebrates reveals diversity in DNA modifications.Nature structural&molecular biology 23,24-30.
46.Mak,A.N.,Bradley,P.,Cernadas,R.A.,Bogdanove,A.J.,and Stoddard,B.L.(2012).The crystal structure of TAL effector PthXo1 bound to its DNA target.Science 335,716-719.
Ratel,D.,Ravanat,J.L.,Berger,F.,and Wion,D.(2006).N6-methyladenine:the other methylated base of DNA.BioEssays:news and reviews in molecular,cellular and developmental biology 28, 309-315.
47.Wion,D.,and Casadesus,J.(2006).N6-methyl-adenine:an epigenetic signal for DNA-protein interactions.Nature reviews Microbiology 4,183-192.
48.Zhang,G.,Huang,H.,Liu,D.,Cheng,Y.,Liu,X.,Zhang,W.,Yin,R.,Zhang,D.,Zhang,P., Liu,J.,et al.(2015).N6-methyladenine DNA modification in Drosophila.Cell 161,893-906.

Claims (27)

1. Use of a protein comprising a TALE repeat domain, one or more RVDs of which are FS, in the preparation of a reagent to detect methylated base 5hmC in a target gene sequence of interest.
2. A method for detecting the presence of 5hmC at a specific site in a target sequence in the genome of a cell, comprising:
(1) introducing into a cell a protein comprising a TALE, the TALE targeting a target sequence, the RVD in the TALE that recognizes the specific site being FS;
(2) then introducing a nuclease into the cell, the targeted cleavage site of the nuclease being located in the TALE target sequence;
(3) detecting whether the target sequence is cut or not, and judging whether 5hmC exists on a specific site of the target sequence or not; if the target sequence is not cleaved, the TALE binds to the target sequence such that the nuclease cannot bind to the target sequence and cleave, and 5hmC is present at the specific site; if the target sequence is cleaved, the TALE does not bind to the target sequence, the nuclease binds to the target sequence and cleaves, and 5hmC is not present at the specific site.
3. The method of claim 2, wherein the nuclease is an endonuclease.
4. The method of claim 2, wherein the nuclease is Cas9 nuclease and the Cas9 nuclease and sgRNA are co-introduced into the cell in step (1).
5. Use of an isolated DNA-binding polypeptide comprising a TALE, a fusion protein comprising a functional domain and a TALE, a polynucleotide encoding said DNA-binding polypeptide or said fusion protein, a vector comprising said polynucleotide, or a host cell comprising said polynucleotide or said vector, for the preparation of an agent that targets a target sequence that binds a gene of interest in a cell, said TALE comprising one or more RVDs being FS and said gene of interest having a recognition site for said RVDs being 5 hmC.
6. The use of claim 5, wherein the functional domain is a functional domain that regulates gene expression, an epigenetic modifying functional domain, a gene editing functional domain, or a fluorescent protein.
7. The use of claim 6, wherein the functional domain that regulates gene expression is a transcriptional activator, a transcriptional repressor, or a functional fragment thereof, the epigenetic modifying functional domain is a methyltransferase, a demethylase, or a functional fragment thereof, and the gene editing functional domain is a nuclease or a functional fragment thereof.
8. Use according to claim 7, wherein the gene-editing functional domain is an endonuclease.
9. The use of claim 8, wherein the endonuclease is a FokI endonuclease or a DNA cleavage domain of a FokI endonuclease.
10. Use of a fusion protein comprising a functional domain and a TALE, or a polynucleotide encoding the fusion protein, in the preparation of a reagent that modulates expression of a target gene in a cell, wherein the functional domain comprised in the fusion protein is a functional domain that modulates expression of a gene;
wherein the TALE comprises one or more RVDs that are FS and a target sequence of a target gene that is 5hmC at the recognition site of the RVD.
11. The use of claim 10, wherein the functional domain that regulates gene expression is a transcriptional activator or a functional fragment thereof, or a transcriptional repressor or a functional fragment thereof.
12. The application of fusion protein containing functional structural domain and TALE or polynucleotide for coding the fusion protein in preparing a reagent for gene editing of target genes in cells, wherein the functional structural domain contained in the fusion protein is the gene editing functional structural domain;
wherein the TALE comprises one or more RVDs that are FS and a target sequence of a target gene that is 5hmC at the recognition site of the RVD.
13. Use according to claim 12, wherein the gene editing is nucleic acid cleavage and the gene editing functional domain is a nuclease or a functional fragment thereof.
14. Use according to claim 13, wherein the gene-editing functional domain is an endonuclease or a functional fragment thereof.
15. Use according to claim 13, wherein the gene-editing functional domain is a fokl endonuclease or a DNA cleavage domain thereof.
16. Use of a fusion protein comprising a functional domain and a TALE, or a polynucleotide encoding the fusion protein, in the preparation of an agent for epigenetic modification of a target gene in a cell, wherein the functional domain comprised in the fusion protein is an epigenetic modification functional domain;
wherein the TALE comprises one or more RVDs that are FS and a target sequence of a target gene that is 5hmC at the recognition site of the RVD.
17. The use of claim 16, wherein the epigenetically modified functional domain is a methyltransferase, a demethylase, or a functional fragment thereof.
18. A method of targeting a target sequence that binds a gene of interest in a cell, comprising: introducing an isolated DNA-binding polypeptide comprising a TALE, a fusion protein comprising a functional domain and a TALE, or a polynucleotide encoding the DNA-binding polypeptide or the fusion protein into a cell, such that the TALE in the DNA-binding polypeptide or fusion protein binds to a target sequence of a gene of interest;
wherein the TALE comprises one or more RVDs that are FS and a target sequence of a target gene that is 5hmC at the recognition site of the RVD.
19. A method of modulating expression of a target gene in a cell, comprising: introducing into a cell a fusion protein comprising a functional domain and a TALE, or a polynucleotide encoding the fusion protein, and allowing the TALE in the fusion protein to bind to a target sequence of a gene of interest, thereby allowing expression of the gene of interest to be modulated by the functional domain in the fusion protein, wherein the functional domain is a functional domain that modulates gene expression;
wherein the TALE comprises one or more RVDs that are FS and a target sequence of a target gene that is 5hmC at the recognition site of the RVD.
20. The method of claim 19, wherein the functional domain that regulates gene expression is a transcriptional activator or a functional fragment thereof, or a transcriptional repressor or a functional fragment thereof.
21. A method of gene editing a target gene in a cell, comprising: introducing a fusion protein comprising a functional domain and a TALE, or a polynucleotide encoding the fusion protein, into a cell, and allowing the TALE in the fusion protein to bind to a target sequence of a gene of interest, thereby allowing the gene of interest to be edited by the functional domain in the fusion protein, wherein the functional domain is a gene editing functional domain;
wherein the TALE comprises one or more RVDs that are FS and a target sequence of a target gene that is 5hmC at the recognition site of the RVD.
22. The method of claim 21, wherein the gene editing is nucleic acid cleavage and the gene editing functional domain is a nuclease or functional fragment thereof.
23. The method of claim 21, wherein the gene-editing functional domain is an endonuclease or a functional fragment thereof.
24. The method of claim 21, wherein the gene-editing functional domain is a fokl endonuclease or a DNA cleavage domain thereof.
25. A method of epigenetic modification of a target gene in a cell, comprising: introducing into a cell a fusion protein comprising a functional domain and a TALE, or a polynucleotide encoding the fusion protein, and allowing the TALE in the fusion protein to bind to a target sequence of a gene of interest, thereby allowing the gene of interest to be epigenetically modified by the functional domain in the fusion protein, wherein the functional domain is an epigenetically modified functional domain;
wherein the TALE comprises one or more RVDs that are FS and a target sequence of a target gene that is 5hmC at the recognition site of the RVD.
26. The method of claim 25, wherein the epigenetically modified functional domain is a methyltransferase, a demethylase, or a functional fragment thereof.
27. A method for chromosome labeling of living cells, comprising: introducing a fusion protein containing a functional structural domain and TALE or a polynucleotide for coding the fusion protein into a cell, and enabling the TALE in the fusion protein to be combined with a target sequence of a target gene, wherein the functional structural domain is a fluorescent protein, and the fluorescent labeling of the target sequence is realized through the combination of the TALE in the fusion protein and the target sequence of the target gene;
wherein the TALE comprises one or more RVDs that are FS and a target sequence of a target gene that is 5hmC at the recognition site of the RVD.
CN201710660240.8A 2017-08-04 2017-08-04 TALE RVD for specifically recognizing methylated modified DNA base and application thereof Active CN109384833B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710660240.8A CN109384833B (en) 2017-08-04 2017-08-04 TALE RVD for specifically recognizing methylated modified DNA base and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710660240.8A CN109384833B (en) 2017-08-04 2017-08-04 TALE RVD for specifically recognizing methylated modified DNA base and application thereof

Publications (2)

Publication Number Publication Date
CN109384833A CN109384833A (en) 2019-02-26
CN109384833B true CN109384833B (en) 2021-04-27

Family

ID=65412408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710660240.8A Active CN109384833B (en) 2017-08-04 2017-08-04 TALE RVD for specifically recognizing methylated modified DNA base and application thereof

Country Status (1)

Country Link
CN (1) CN109384833B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220204569A1 (en) * 2019-04-09 2022-06-30 Japan Science And Technology Agency Nucleic acid-binding protein
CN110106231B (en) * 2019-04-22 2021-08-17 武汉大学 Method for detecting methylation modification of adenine N6 or N1 bit in nucleic acid by using dUTP or dTTP
CN111876414A (en) * 2020-06-24 2020-11-03 湖南文理学院 Improved yeast upstream activation element and application thereof in fish
CN114591949A (en) * 2020-12-04 2022-06-07 中国科学院脑科学与智能技术卓越创新中心 Method for detecting endogenous low-abundance gene and lncRNA level of cell

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2571512B1 (en) * 2010-05-17 2017-08-23 Sangamo BioSciences, Inc. Novel dna-binding proteins and uses thereof
CN103987860B (en) * 2012-01-04 2017-04-12 清华大学 Method for specifically recognizing DNA containing 5-methylated cytosine

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Interrogating Key Positions of Size-Reduced TALE Repeats Reveals a Programmable Sensor of 5‑Carboxylcytosine;Sara Maurer;《ACS Chem.Biol.》;20161104;第11卷;第3294-3297页,第3297页图3,右栏第3-5段,第3298页表4 *
Isolation of Human Genomic DNA Sequences with Expanded Nucleobase Selectivity;Preeti Rathi;《J.Am.Chem.Soc.》;20160718;第138卷;第9910-9918页 *
Preeti Rathi.Isolation of Human Genomic DNA Sequences with Expanded Nucleobase Selectivity.《J.Am.Chem.Soc.》.2016,第138卷第9910-9918页. *
Sara Maurer.Interrogating Key Positions of Size-Reduced TALE Repeats Reveals a Programmable Sensor of 5‑Carboxylcytosine.《ACS Chem.Biol.》.2016,第11卷第3294-3299页,尤其是摘要,第3297页图3,右栏第3-5段,第3298页表4,第3297页图3,. *

Also Published As

Publication number Publication date
CN109384833A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
US11312937B2 (en) Nucleotide-specific recognition sequences for designer TAL effectors
AU2020213320B2 (en) Cas9 proteins including ligand-dependent inteins
KR102271292B1 (en) Using rna-guided foki nucleases (rfns) to increase specificity for rna-guided genome editing
JP5798116B2 (en) Rapid screening of biologically active nucleases and isolation of nuclease modified cells
CN109384833B (en) TALE RVD for specifically recognizing methylated modified DNA base and application thereof
KR20190059966A (en) S. The Piogenes CAS9 mutant gene and the polypeptide encoded thereby
EP2834357B1 (en) Tal-effector assembly platform, customized services, kits and assays
EP2927318B1 (en) Methods and compositions for targeted cleavage and recombination
CN111328343A (en) RNA targeting methods and compositions
US20200140835A1 (en) Engineered CRISPR-Cas9 Nucleases
US20130137173A1 (en) Nucleotide-specific recognition sequences for designer tal effectors
US20070134796A1 (en) Targeted integration and expression of exogenous nucleic acid sequences
AU2005220148A1 (en) Methods and compostions for targeted cleavage and recombination
CN111278848B (en) TALE RVD for specifically recognizing methylation modified DNA base and application thereof
WO2001068807A2 (en) Identification of in vivo dna binding loci of chromatin proteins using a tethered nucleotide modification enzyme
US20030212455A1 (en) Identification of in vivo dna binding loci of chromatin proteins using a tethered nucleotide modification enzyme
US20230227807A1 (en) Method for identifying rna binding protein binding sites on rna
Osula Functional Investigations of Proteins and Enzymatic Toxins from Full-Length-Enriched cDNA of Timber Rattlesnake (Crotalus Horridus)
Donovan et al. E-ChRPs: Engineered Chromatin Remodeling Proteins for Precise Nucleosome Positioning
AU2007201649B2 (en) Methods and Compositions for Targeted Cleavage and Recombination
WO2021187554A1 (en) Heat resistant mismatch endonuclease variant
Wower et al. Requirements for resuming translation in chimeric
KR20120087860A (en) A novel zinc finger nuclease and uses thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant