CN112979821B

CN112979821B - Fusion protein for improving gene editing efficiency and application thereof

Info

Publication number: CN112979821B
Application number: CN201911310969.8A
Authority: CN
Inventors: 李大力; 张晓辉; 刘明耀
Original assignee: East China Normal University; Bioray Laboratories Inc
Current assignee: East China Normal University; Bioray Laboratories Inc
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2022-02-08
Anticipated expiration: 2039-12-18
Also published as: CN112979821A

Abstract

The application discloses a fusion protein for improving gene editing efficiency and application thereof. The fusion protein comprises a single-stranded DNA binding protein functional domain, a nucleoside deaminase and a nuclease. During the process of converting C-G to T-A bases according to CBEs, nucleoside deaminases such as cytosine deaminase take single-stranded DNA as a substrate for deamination, and single-stranded DNA binding protein functional domains are fused on fusion proteins of the nucleoside deaminases and nucleases, so that the chance of mutexposing the single-stranded DNA to the nucleoside deaminases is greatly increased, and the base editing efficiency is obviously improved. The invention makes breakthrough improvement on the single-base gene editing technology and can greatly promote the application of the single-base gene editing technology in the aspects of gene editing, gene therapy, cell therapy, animal model making, crop genetic breeding and the like.

Description

Fusion protein for improving gene editing efficiency and application thereof

Technical Field

The invention relates to the field of biotechnology, in particular to a fusion protein for improving gene editing efficiency and application thereof.

Background

Since 2013, a new generation of gene editing technology represented by CRISPR/Cas9 enters various experiments in the field of biology, and the traditional gene operation means is changed. The single base gene editing technique was first reported by David Liu laboratories in year 2016, after which other types of single base gene editing techniques based on the principle of cytosine deaminase (e.g., cytosine deaminase from lamprey and humans fused differently to dCas9 or Cas9 n) were also reported in succession. It is derived from Streptococcus pyogenes (Streptococcus pyogenes) spCas9 in CRISPR/Cas9 with NGG as PAM and recognizes and specifically binds DNA to achieve single base mutations C to T or G to a upstream of NGG.

Single-base gene editing techniques have been reported to be useful for efficient gene mutation or repair of genomes, creation of disease animal models, and gene therapy. Among the single-base gene editing tools that have been found so far, BE3 (base editor 3) is most widely used. BE3 exhibits its great potential for use in single base mutation modification or single base mutation therapy of the genome with base substitution efficiencies up to 37%, much higher than those achieved with homologous recombination, while maintaining low off-target effects. With the progress of the research, it was found that introducing additional two or more copies of UGI (uracil glycosidase inhibitor) to BE3 can further enhance its editing efficiency and product purity. The editing efficiency is further improved by introducing double-type NLS (nuclear localization signal) and codon BE4 max. These methods have a uniform degree of improvement in efficiency, but are limited.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a fusion protein for improving the gene editing efficiency and application thereof.

In one aspect, the invention provides a fusion protein for increasing gene editing efficiency, comprising a single-stranded DNA binding protein domain, a nucleoside deaminase, and a nuclease.

Specifically, the connection sequence of the fusion protein is as follows: the nucleoside deaminase is positioned at the N-terminus or C-terminus of the nuclease, and the single-stranded DNA binding protein functional domain is positioned at the N-terminus, C-terminus of the nucleoside deaminase and the nuclease and/or between the nucleoside deaminase and the nuclease;

preferably, the nucleoside deaminase is located at the N-terminus of the nuclease;

more preferably, the single-stranded DNA-binding protein functional domain is located between the nucleoside deaminase and the nuclease.

In the above fusion protein, the single-stranded DNA binding protein includes a sequence-specific single-stranded DNA binding protein, and/or a non-sequence-specific single-stranded DNA binding protein, preferably, a non-sequence-specific single-stranded DNA binding protein,

preferably, the non-sequence-specific single-stranded DNA binding protein is selected from any one or more of RPA70 (70 subunit of human replication protein a), RPA32 (32 subunit of human replication protein a), BRCA2 (breast cancer gene No. 2), hnRNPK (heterogeneous nuclear ribonucleoprotein K), PUF60 (poly-U binding splice factor 60KDa) and Rad51 (a homologous recombinant repair protein);

preferably, the sequence-specific single-stranded DNA binding protein is selected from any one or more of TEBP (telomere binding protein), Teb1 (a constituent protein of telomerase) and POT1 (human telomere protection protein 1);

preferably, the single-stranded DNA-binding protein functional domain comprises at least one (any one, any two, any three or all four) of the following four domains or a partial polypeptide fragment having a function of binding to single-stranded DNA in the following four domains, and any combination thereof: OB fold (oligo/oligopeptide binding fold), KH domain (K homology domain), RRMS (RNA recognition motif), vortex domain (whirly domains) of the single stranded DNA binding protein;

more preferably, the single-stranded DNA binding protein functional domain comprises the DNA Binding Domain (DBD) of Rad51, more preferably, the amino acid sequence of the DNA binding domain of Rad51 comprises the sequence shown in SEQ ID No.1, more preferably, the coding sequence of the DNA binding domain of Rad51 comprises the sequence shown in SEQ ID No. 2;

more preferably, the amino acid sequence of the DNA binding domain of RPA70 comprises the sequence shown in SEQ ID No.19, and even more preferably, the coding sequence of the DNA binding domain of RPA70 comprises the sequence shown in SEQ ID No. 20.

In the above fusion protein, the deaminase comprises cytosine deaminase (APOBEC) and/or adenosine deaminase, preferably cytosine deaminase, which can be derived from different organisms,

more preferably, the cytosine deaminase is rat-derived cytosine deaminase, more preferably, the amino acid sequence of the rat-derived cytosine deaminase comprises the sequence shown in SEQ ID No.3, and more preferably, the coding sequence of the rat-derived cytosine deaminase comprises the sequence shown in SEQ ID No. 4;

the nuclease is selected from one or more of Cas9, Cas3, Cas8a, Cas8b, Cas10d, Cse1, Csy1, Csn2, Cas4, Cas10, Csm2, Cmr5, Fok1 and Cpf 1; preferably, the nuclease is Cas 9; more preferably, the Cas9 is selected from Cas9 derived from streptococcus pneumoniae, staphylococcus aureus, streptococcus pyogenes or streptococcus thermophilus, more preferably, the Cas9 is selected from Cas9 mutant VQR-spCas9, VRER-spCas9, spCas9n, more preferably, spCas9n, more preferably, the amino acid sequence of the spCas9n comprises the sequence shown in SEQ ID No.5, more preferably, the coding sequence of the spCas9n comprises the sequence shown in SEQ ID No. 6.

In the above fusion protein, a NLS (nuclear localization signal) is further included, and preferably, the NLS is located at least one end (C-terminal and/or N-terminal) of the fusion protein; more preferably, the amino acid sequence of the NLS comprises a sequence shown as SEQ ID No.7, and more preferably, the coding sequence of the NLS comprises a sequence shown as SEQ ID No. 8;

the fusion protein further comprises two copies of UGI (uracil glycosidase inhibitor), preferably, the UGI is located at least one end (C-and/or N-terminus) of the fusion protein; more preferably, the amino acid sequence of the UGI comprises the sequence shown in SEQ ID No.9, and more preferably, the coding sequence of the UGI comprises the sequence shown in SEQ ID No. 10.

In another aspect, the present invention also provides any one of the following a) -C) biomaterials:

A) a gene encoding a fusion protein as described in any one of the above; the gene is DNA or RNA (such as mRNA);

B) a recombinant vector comprising a) the gene; the recombinant vector comprises a viral vector and/or a non-viral vector; the virus vector comprises an adeno-associated virus vector, an adenovirus vector, a lentivirus vector, a retrovirus vector and/or an oncolytic virus vector, and the non-virus vector comprises a cationic high molecular polymer, a plasmid vector and/or a liposome;

C) a recombinant cell or recombinant bacterium containing the fusion protein or the gene of A), wherein the recombinant bacterium can be an engineering bacterium, and the recombinant cell can be a target cell to be edited, such as an immune cell (such as a T cell), a hematopoietic stem cell, a red blood cell and the like.

In another aspect, the present invention provides a single-base gene editing system, including any one of the above fusion proteins or the biological material, and sgrnas, wherein the sgrnas guide the fusion proteins to perform single-base gene editing on a target gene in a target cell;

preferably, the target sequence of the sgRNA includes at least one of SEQ ID nos. 11 to 18.

In specific implementation, the target sequence of the sgRNA includes any one, any two, any three, any four, any five, any six, any seven, or all eight of SEQ ID nos. 11 to 18.

In another aspect, the invention provides the use of any of the fusion proteins, the biological materials, and the single base gene editing systems described above in the preparation of gene editing products, disease treatment and/or prevention products, animal models, or new plant varieties.

In another aspect, the present invention provides a method for improving single-base gene editing efficiency, including the steps of introducing a fusion protein and sgRNA of any one of the above into a cell, and performing gene editing on a target gene, wherein the sgRNA guides the fusion protein to perform single-base gene editing on the target gene.

In the above method, preferably, the target sequence of the sgRNA includes at least one of SEQ ID nos. 11 to 18.

The invention has the following beneficial effects:

according to the invention, during the conversion process from C-G to T-A bases by CBEs (pyrimidine base conversion technology), the nucleoside deaminase such as cytosine deaminase takes single-stranded DNA as a substrate for deamination, and the single-stranded DNA binding protein functional domain is fused on the fusion protein of the nucleoside deaminase and nuclease, so that the chance of mutexposing the single-stranded DNA to the nucleoside deaminase is greatly increased, and the base editing efficiency is obviously improved.

The invention discovers that the fusion of one single-stranded DNA binding domain (1-114AA) of human-derived Rad51 shows the highest efficiency improvement between Apobec1 and Cas9n by screening 10 non-sequence-preferred single-stranded DNA binding protein domains for fusion with BE4max, which is named as hyBE4 max. Compared with BE4max, the C-G to T-A editing efficiency of hyBE4max is improved by 16 times to the maximum, and especially the site efficiency close to the PAM region is improved more obviously, and simultaneously lower indels (insertions or deletions) are kept.

The invention makes breakthrough improvement on the single-base gene editing technology and can greatly promote the application of the single-base gene editing technology in the aspects of gene editing, gene therapy, cell therapy, animal model making, crop genetic breeding and the like.

Drawings

FIG. 1 is a schematic diagram of the structure of the fusion of different single-stranded DNA-binding protein domains with BE4 max. Wherein NLS is a nuclear localization signal (the amino acid sequence is shown as SEQ ID No.7, and the coding sequence is shown as SEQ ID No. 8), rA1 is cytidine deaminase APOBEC1 (the amino acid sequence is shown as SEQ ID No.3, and the coding sequence is shown as SEQ ID No. 4), spCas9n is Cas9n (the amino acid sequence is shown as SEQ ID No.5, and the coding sequence is shown as SEQ ID No. 6) derived from Streptococcus pyogenes, UGI is a uracil glycosidase inhibitor (the amino acid sequence is shown as SEQ ID No.9, and the coding sequence is shown as SEQ ID No. 10), and SSDBD is a single-chain DNA binding protein functional domain.

FIG. 2 is a comparison of the C to T base editing efficiency (i.e., ordinate, in%) achieved by hyBE4max versus BE4max at 8 targets on 293T.

FIG. 3 is a comparison of the average C to T base editing efficiency (i.e., ordinate, in%) of 8 targets at 293T for hyBE4max versus BE4 max.

FIG. 4 is a comparison of base editing efficiency (i.e., ordinate in%) for indels generated by BE4max at 8 targets on 293T.

FIG. 5 is a schematic structural diagram of fusion proteins A3A-BE4max and hyA3A-BE4 max. Wherein hA3A is human cytidine deaminase APOBEC3A (the amino acid sequence is shown as SEQ ID No.21, and the coding sequence is shown as SEQ ID No. 22), and NLS, spCas9n and UGI are shown in FIG. 1.

FIG. 6 is a comparison of the C to T base editing efficiencies (i.e., ordinates, in%) achieved by hyA3A-BE4max versus A3A-BE4max at 8 endogenous targets on 293T.

FIG. 7 is a comparison of the C to T base editing efficiencies (i.e., ordinates, in%) achieved by hyA3A-BE4max versus A3A-BE4max at 8 endogenous targets on 293T.

FIG. 8 is a comparison of the base-editing efficiencies (i.e., ordinates in%) of hyA3A-BE4max versus indels produced by A3A-BE4max at 8 endogenous targets at 293T.

FIG. 9 is a schematic structural diagram of the fusion proteins eA3A-BE4max and hyeA3A-BE4 max. Wherein, A3A N57G is N57G mutant of hA3A used in figure 5, NLS, spCas9N and UGI are the same as figure 1.

FIG. 10 is a comparison of the C to T base editing efficiencies (i.e., ordinates, in%) achieved by hyeA3A-BE4max versus eA3A-BE4max at 11 endogenous targets on 293T.

FIG. 11 is a comparison of the C to T base editing efficiencies (i.e., ordinates, in%) achieved by hyeA3A-BE4max versus eA3A-BE4max at 11 endogenous targets on 293T.

FIG. 12 is a comparison of the base-editing efficiency (i.e., ordinate, in%) of indels produced by hyeA3A-BE4max versus 11 endogenous targets at 293T by eA3A-BE4 max.

Wherein, the abscissa C and the following numbers in fig. 2, 3, 5, 6, 11 represent the position of C edited as T on the corresponding target sequence, e.g., C5 represents the efficiency of C edited as T from the 5 th position 5' of the corresponding target sequence.

Detailed Description

The present invention will be described in further detail with reference to the following specific examples and drawings, and the present invention is not limited to the following examples. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, and the scope of the appended claims is intended to be protected. The procedures, conditions, reagents, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited. Such as described in Sambrook et al, molecular cloning, A Laboratory Manual (New York: Cold Spring Harbor Laboratory Press,1989), or according to the manufacturer's recommendations.

First, the BE4max editing efficiency of the functional domain fused with the Rad51DBD (1-114aa) single-stranded DNA binding protein is improved most obviously

1.1 plasmid design and construction

1.1.1, based on the property of single-stranded DNA as substrate of Apobec1 of CBEs in the single-base editing technology, we designed 10 different functional domains of non-sequence-biased single-stranded DNA binding protein derived from human (mainly RPA70(630aa) -A, RPA70-B, RPA70-AB, RPA70-C, RPA32-D, BRCA2-OB2, BRCA2-OB3, HNRNPK KH domain, PUF60 RRM, Rad51 DBD) (Table 1), and since the reported fusion protein is placed at the C-terminal of BE4max (the first diagram from top to bottom in FIG. 1) and tends to BE inactive, these functional domains are fused at the N-terminal of BE4max (the second diagram from top to bottom in FIG. 1), and two endogenous targets EMX1 site1, Tim3-sg1 derived from human (the sequence shown in Table 2) are designed at the same time.

1.1.2, the DNAs of 10 different domains of human-derived, non-sequence-biased, single-stranded DNA binding proteins shown in Table 1 were synthesized, and then seamlessly assembled into the N-terminus of BE4max in plasmid pCMV-BE4max (addendum, #112093), to construct 10 recombinant plasmids (FIG. 1): pRPA70-A-BE4max, pRPA70-B-BE4max, pRPA70-AB-BE4max, pRPA70-C-BE4max, pRPA32-D-BE4max, pBRCA2-OB2-BE4max, pBRCA2-OB3-BE4max, pKH-BE4max, pRRM-BE4max, pRad51DBD-BE4 max.

DNAs of target points EMX1 site1 and Tim3-sg1 shown in Table 2 are artificially synthesized and respectively connected to Bbs I site of sgRNA expression plasmid U6-sgRNA-EF1 alpha-GFP (used for expressing sgRNA of the corresponding target points) to obtain recombinant plasmids pE and pT.

1.1.3 plasmids constructed in 1.1.1 and 1.1.2 were sequenced by sanger to ensure complete correctness.

TABLE 1 sequences of different functional domains of the Single-stranded DNA binding proteins used

TABLE 2 targets and sequences used

Name of target point	Sequence (5 '-3')	SEQ ID No.
			EMX1 site1	GAGTCCGAGCAGAAGAAGAAGGG	11
Tim3-sg1	TTCTACACCCCAGCCGCCCCAGG	12
			VEGFA site2	GACCCCCTCCACCCCGCCTCCGG	13
Lag3-sg2	CGCTACACGGTGCTGAGCGTGGG	14
			HEK3	GGCCCAGACTGAGCACGTGATGG	15
HEK4	GGCACTGCGGCTGGAGGTGGGGG
			16
EMX1-sg2p	GACATCGATGTCCTCCCCATTGG			17
		Nme1-sg1	AGGGATCGTCTTTCAAGGCGAGG		18

1.2 transfection of cells

HEK293T 5X 10⁵Cells were plated in 24-well plates and plasmid combinations were transfected at pssDBD-BE4max: pE (or pT) 750ng:250ng when cells grew to 70% -80%, with 3-well replicates per plasmid combination, 2X 10 per well⁵And (4) cells. At the same time, a blank control without any plasmid transfection was set.

pssDBD-BE4max represents: any one of plasmids pRPA70-A-BE4max, pRPA70-B-BE4max, pRPA70-AB-BE4max, pRPA70-C-BE4max, pRPA32-D-BE4max, pBRCA2-OB2-BE4max, pBRCA2-OB3-BE4max, pKH-BE4max, pRRM-BE4max and pRad51DBD-BE4max, with plasmid pCMV-BE4max as a negative control.

1.3 genome extraction and preparation of amplicon libraries

At 72h after transfection, cell genomic DNA was extracted using a Tiangen cell genome extraction kit (DP 304). Then, the operation flow of the Hitom kit is used for designing corresponding identification primers (table 3), namely, a bridging sequence 5'-ggagtgagtacggtgtgc-3' is added at the 5 'end of the forward identification primer, a bridging sequence 5'-gagttggatgctggatgg-3'is added at the 5' end of the reverse identification primer, so that a round of PCR product is obtained, then, the round of PCR product is used as a template for carrying out two rounds of PCR, and the round of PCR product is mixed together for carrying out gel cutting, recovery and purification, and then, the mixture is sent to a company for deep sequencing.

TABLE 3 identifying primers for target used

1.4 deep sequencing result analysis and statistics

The deep sequencing results of step 1.3 were analyzed using the BE-analyzer website, and the ratio of C to T, Indels was counted, with the results shown in tables 4 and 5.

The results show that: compared with BE4max, BE4max (Rad51DBD-N-BE4max or Rad51DBD-BE4max) fused with the functional domain of Rad51 single-stranded DNA binding protein has the most obvious improvement on the C-to-T editing efficiency on a target spot, and BE4max fused with the functional domain of RPA70-C single-stranded DNA binding protein.

Second, the edit efficiency of hyBE4max is optimal

In order to further test the fusion position of the Rad51 single-stranded DNA binding protein domain with the highest efficiency of editing C to T on the target in step one, Rad51DBD was fused to two different positions of BE4max, three recombinant plasmids, BE4max (third to fifth from top to bottom in fig. 1) fused with Rad51DBD, were transfected with recombinant plasmid pE or pT according to the method of 1.2 in step one, and the editing efficiency results were obtained according to the methods of 1.3 and 1.4 in step one (tables 4 and 5).

The three types of BE4max fused with Rad51DBD shown in the third to fifth graphs from top to bottom in FIG. 1 are as follows:

rad51DBD-N-BE 4max: rad51DBD is fused between NLS and rA1 in BE4max, namely Rad51DBD is positioned at the N end of rA1 and spCas 9N;

rad51DBD-C-BE 4max: in BE4max, Rad51DBD is fused between spCas9n and UGI, namely Rad51DBD is positioned at the C ends of rA1 and spCas9 n;

hyBE 4max: rad51DBD was fused between rA1 and spCas9n in BE4 max.

TABLE 4 editing efficiency results (unit,%) for target EMX1 site1

TABLE 5 editing efficiency results (unit,%) for target Tim3-sg1

The results in tables 4 and 5 show that: compared with the fusion of Rad51DBD (namely Rad51DBD-N-BE4max) between NLS and rA1 in BE4max, the fusion of Rad51DBD (namely hyBE4max) between rA1 and spCas9N in BE4max has the most obvious improvement on the editing efficiency of C to T on a target point.

Operating characteristics of tri, hyBE4max

To further fairly describe the performance characteristics of hyBE4max, another 6 additional targets were designed VEGFA site2, Lag3-sg2, HEK3, HEK4, EMX1-sg2p, Nme1-sg1 (sequences as in table 2) and ligated to plasmid U6-sgRNA-EF1 α -GFP at the BbsI site to give recombinant plasmids pV, pL, pH3, pH4, pEP and pN. The plasmid was sequenced by sanger, ensuring complete correctness.

And (3) carrying out cell transfection on the recombinant plasmid containing hyBE4max in the second step and the recombinant plasmids pE, pT, pV, pL, pH3, pH4, pEP or pN according to the method 1.2 in the first step, obtaining the editing efficiency result according to the methods 1.3 and 1.4 in the first step, and carrying out statistical mapping by using graphpad prism 8.0.

As a result, as shown in FIGS. 2 and 3, in edit window C3-C8, the C to T edit efficiency of hyBE4max is 19-71%, and the corresponding BE4max is 13-47%; in edit window C9-C12, the C to T edit efficiency of hyBE4max is 19-55%, corresponding to BE4max being 1.4-17%. Relative to BE4max, within edit window C3-C8, hyBE4max has an average C to T edit efficiency that is 1.6-2.2 times BE4 max; within edit window C9-C12, hyBE4max has an average C to T edit efficiency that is 3.3-17 times BE4 max. While hyBE4max remained low for indels production (fig. 4).

Effect of fusion proteins containing different cytosine deaminases

(I) fusion protein hyA3A-BE4max working characteristics

4.1.1 Rad51-DBD was synthesized according to the coding sequence in Table 1, followed by seamless clonal assembly between hA3A and spCas9n in plasmid pCMV-A3A-BE4max (FIG. 5) expressing protein A3A-BE4max (FIG. 5), to construct recombinant plasmid pA expressing fusion protein hyA3A-BE4max (FIG. 5).

4.1.2, sequentially synthesizing 8 human endogenous targets: the target sequences of EMX1 site1, Tim3-sg1, VEGFA site2, EMX1-sg2p and Nme1-sg1 are shown in Table 2, and the target sequences of FANCF site1, EGFR-sg5 and EGFR-sg21 are shown in Table 6; respectively connected to Bbs I sites of sgRNA expression plasmids pU6-sgRNA-EF1 alpha-GFP to obtain recombinant plasmids pB1, pB2 and pB … … 8 of sgRNA expressing corresponding targets.

4.1.3 plasmids constructed in 4.1.1 and 4.1.2 were sequenced by sanger to ensure complete correctness.

TABLE 6 targets and sequences used

Name of target point	Sequence (5 '-3')	SEQ ID No.
			FANCF site1	GGAATCCCTTCTGCAGCACCTGG	23
EGFR-sg5	GTGCTGGGCTCCGGTGCGTTCGG	24
			EGFR-sg21	CAAAGCAGAAACTCACATCGAGG	25

4.1.4 transfection of cells

Will be 5X 10⁵HEK293T cells were plated in 24-well plates and plasmid combinations were transfected pA (or plasmid pCMV-A3A-BE4max) pB1 (or pB2, pB3, … … pB8) 750ng 250ng when the cells grew to 70% -80%, each plasmid combination transfection was repeated 3 wells, 2X 10 cells per well⁵And (4) cells. At the same time, a blank control without any plasmid transfection was set.

4.1.5 genome extraction and preparation of amplicon libraries

The method according to step 1.3, wherein the primers for identifying the target sites of FANCF site1, EGFR-sg5 and EGFR-sg21 are shown in Table 7, and the remaining primers for identifying the target sites are shown in Table 3.

TABLE 7 identifying primers for target used

4.1.6 deep sequencing result analysis and statistics

The procedure was as in step 1.4.

The results show that: compared with the protein A3A-BE4max, the editing efficiency of the fusion protein hyA3A-BE4max to a single base C to T at different positions (C3-C15) of each target point is obviously improved (FIG. 6). Compared with A3A-BE4max, the high activity window of hyA3A-BE4max is expanded from original C3-C11 to C3-C15; among them, the efficiency of editing a single base C to T by C3-C11, hyA3A-BE4max far away from the PAM region is 1.1-2.3 times of that of A3A-BE4max, and the efficiency of editing a single base C to T by C12-C15, hyA3A-BE4max near the PAM region is 3.1-4.1 times of that of A3A-BE4max, namely the efficiency of editing a single base C to T by C12-C15, hyA3A-BE4max near the PAM region is improved more obviously (FIG. 7). And hyA3A-BE4max while maintaining a lower indels (FIG. 8).

(II) fusion protein hyeA3A-BE4max working property

4.2.1 working System plasmid construction

Rad51-DBD was synthesized according to the coding sequence in Table 1, followed by seamless clonal assembly between eA3A and spCas9n in plasmid pCMV-eA3A-BE4max (FIG. 9) expressing protein eA3A-BE4max, to construct recombinant plasmid pAe expressing fusion protein hyeA3A-BE4max (FIG. 9).

4.2.2 construction of target plasmids

Simultaneously, 11 endogenous targets from human are designed and synthesized: the target sequences of EMX1-sg2p, EMX1 site1 and Nme1-sg1 are shown in Table 2, the target sequence of EGFR-sg21 is shown in Table 6, and the rest target sequences are shown in Table 8, and are respectively connected to BbsI sites of sgRNA expression plasmids U6-sgRNA-EF1 alpha-GFP for expressing sgRNAs of corresponding targets, so that recombinant plasmids pC1, pC2 and … … pC11 are obtained.

4.2.3 plasmids constructed in 4.2.1 and 4.2.2 were sequenced by sanger to ensure complete correctness.

TABLE 8 targets and sequences used

Name of target point	Sequence (5 '-3')	SEQ ID No.
			CTLA-sg1	CTCCCTCAAGCAGGCCCCGCTGG	26
EGFR-sg5	GTGCTGGGCTCCGGTGCGTTCGG	27
			CDK10-sg1	TTCTCGGAGGCTCAGGTGCGTGG	28
EMX1-sg1	GCTCCCATCACATCAACCGGTGG	29
			HPRT1-sg6	GCCCTCTGTGTGCTCAAGGGGGG		30
EGFR-sg26	CATGCCCTTCGGCTGCCTCCTGG	31
			CCR5-sg1	TAATAATTGATGTCATAGATTGG	32

4.2.4 cell transfection-validation hyeA3A-BE4max working System

Will be 5X 10⁵HEK293T cells were plated in 24-well plates and plasmid combinations were transfected pA (or plasmid pCMV-eA3A-BE4max) pC1 (or pC2, pC3, … … pC11) 750ng 250ng when the cells grew to 70% -80%, each plasmid combination transfection was repeated in 3 wells, 2X 10 per well⁵And (4) cells. At the same time, a blank control without any plasmid transfection was set.

4.2.5 genome extraction and preparation of amplicon libraries

The method is carried out according to the step 1.3, wherein the identification primers of EMX1-sg2p, EMX1 site1 and Nme1-sg1 are shown in Table 3, the identification primer of EGFR-sg21 is shown in Table 7, and the rest target sequences are shown in Table 9.

TABLE 9 identifying primers for targets used

4.2.6 analysis and statistics of deep sequencing results

The procedure was as in step 1.4.

The results show that: compared with the protein eA3A-BE4max, the editing efficiency of the fusion protein hyeA3A-BE4max to a single base C to T at different positions (C3-C15) of each target spot is mostly obviously improved, a high-activity window is expanded from the original C3-C11 to the C3-C15 position, and the single base C in TC motif can BE specifically targeted to realize C-T conversion (figure 10); wherein, the editing efficiency of the hyeA3A-BE4max on a single base C to T is 1.6-2.8 times of that of eA3A-BE4max at C3-C11 far away from the PAM region, and the editing efficiency of the hyeA3A-BE4max on a single base C to T is 4.5-31.9 times of that of eA3A-BE4max at C12-C15 near the PAM region, namely the editing efficiency of the hyeA3A-BE4max on a single base C to T is improved more obviously at C12-C15 near the PAM region (FIG. 11). While hyeA3A-BE4max remained low for indels (FIG. 12).

Those not described in detail in this specification are within the skill of the art. The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Sequence listing

<110> Shanghai Bodhisae Biotech Co., Ltd, university of east China

<120> fusion protein for improving gene editing efficiency and application thereof

<130> JH-CNP191374

<160> 32

<170> PatentIn version 3.5

<210> 1

<211> 114

<212> PRT

<213> human (Homo sapiens)

<400> 1

Met Ala Met Gln Met Gln Leu Glu Ala Asn Ala Asp Thr Ser Val Glu

1 5 10 15

Glu Glu Ser Phe Gly Pro Gln Pro Ile Ser Arg Leu Glu Gln Cys Gly

20 25 30

Ile Asn Ala Asn Asp Val Lys Lys Leu Glu Glu Ala Gly Phe His Thr

35 40 45

Val Glu Ala Val Ala Tyr Ala Pro Lys Lys Glu Leu Ile Asn Ile Lys

50 55 60

Gly Ile Ser Glu Ala Lys Ala Asp Lys Ile Leu Ala Glu Ala Ala Lys

65 70 75 80

Leu Val Pro Met Gly Phe Thr Thr Ala Thr Glu Phe His Gln Arg Arg

85 90 95

Ser Glu Ile Ile Gln Ile Thr Thr Gly Ser Lys Glu Leu Asp Lys Leu

100 105 110

Leu Gln

<210> 2

<211> 342

<212> DNA

<213> human (Homo sapiens)

<400> 2

atggcaatgc agatgcagct tgaagcaaat gcagatactt cagtggaaga agaaagcttt 60

ggcccacaac ccatttcacg gttagagcag tgtggcataa atgccaacga tgtgaagaaa 120

ttggaagaag ctggattcca tactgtggag gctgttgcct atgcgccaaa gaaggagcta 180

ataaatatta agggaattag tgaagccaaa gctgataaaa ttctggctga ggcagctaaa 240

ttagttccaa tgggtttcac cactgcaact gaattccacc aaaggcggtc agagatcata 300

cagattacta ctggctccaa agagcttgac aaactacttc aa 342

<210> 3

<211> 228

<212> PRT

<213> rat (Rattus norvegicus)

<400> 3

Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg

1 5 10 15

Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu Arg

20 25 30

Lys Glu Thr Cys Leu Leu Tyr Glu Ile Asn Trp Gly Gly Arg His Ser

35 40 45

Ile Trp Arg His Thr Ser Gln Asn Thr Asn Lys His Val Glu Val Asn

50 55 60

Phe Ile Glu Lys Phe Thr Thr Glu Arg Tyr Phe Cys Pro Asn Thr Arg

65 70 75 80

Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys Ser

85 90 95

Arg Ala Ile Thr Glu Phe Leu Ser Arg Tyr Pro His Val Thr Leu Phe

100 105 110

Ile Tyr Ile Ala Arg Leu Tyr His His Ala Asp Pro Arg Asn Arg Gln

115 120 125

Gly Leu Arg Asp Leu Ile Ser Ser Gly Val Thr Ile Gln Ile Met Thr

130 135 140

Glu Gln Glu Ser Gly Tyr Cys Trp Arg Asn Phe Val Asn Tyr Ser Pro

145 150 155 160

Ser Asn Glu Ala His Trp Pro Arg Tyr Pro His Leu Trp Val Arg Leu

165 170 175

Tyr Val Leu Glu Leu Tyr Cys Ile Ile Leu Gly Leu Pro Pro Cys Leu

180 185 190

Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala

195 200 205

Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala

210 215 220

Thr Gly Leu Lys

225

<210> 4

<211> 684

<212> DNA

<213> rat (Rattus norvegicus)

<400> 4

tcctcagaga ctgggcctgt cgccgtcgat ccaaccctgc gccgccggat tgaacctcac 60

gagtttgaag tgttctttga cccccgggag ctgagaaagg agacatgcct gctgtacgag 120

atcaactggg gaggcaggca ctccatctgg aggcacacct ctcagaacac aaataagcac 180

gtggaggtga acttcatcga gaagtttacc acagagcggt acttctgccc caataccaga 240

tgtagcatca catggtttct gagctggtcc ccttgcggag agtgtagcag ggccatcacc 300

gagttcctgt ccagatatcc acacgtgaca ctgtttatct acatcgccag gctgtatcac 360

cacgcagacc caaggaatag gcagggcctg cgcgatctga tcagctccgg cgtgaccatc 420

cagatcatga cagagcagga gtccggctac tgctggcgga acttcgtgaa ttattctcct 480

agcaacgagg cccactggcc taggtaccca cacctgtggg tgcgcctgta cgtgctggag 540

ctgtattgca tcatcctggg cctgccccct tgtctgaata tcctgcggag aaagcagccc 600

cagctgacct tctttacaat cgccctgcag tcttgtcact atcagaggct gccaccccac 660

atcctgtggg ccacaggcct gaag 684

<210> 5

<211> 1367

<212> PRT

<213> Streptococcus pyogenes (Streptococcus pyogenes)

<400> 5

Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly

1 5 10 15

Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys

20 25 30

Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly

35 40 45

Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys

50 55 60

Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr

65 70 75 80

Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe

85 90 95

Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His

100 105 110

Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His

115 120 125

Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser

130 135 140

Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met

145 150 155 160

Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp

165 170 175

Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn

180 185 190

Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys

195 200 205

Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu

210 215 220

Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu

225 230 235 240

Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp

245 250 255

Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp

260 265 270

Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu

275 280 285

Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile

290 295 300

Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met

305 310 315 320

Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala

325 330 335

Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp

340 345 350

Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln

355 360 365

Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly

370 375 380

Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys

385 390 395 400

Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly

405 410 415

Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu

420 425 430

Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro

435 440 445

Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met

450 455 460

Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val

465 470 475 480

Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn

485 490 495

Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu

500 505 510

Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr

515 520 525

Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys

530 535 540

Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val

545 550 555 560

Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser

565 570 575

Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr

580 585 590

Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn

595 600 605

Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu

610 615 620

Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His

625 630 635 640

Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr

645 650 655

Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys

660 665 670

Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala

675 680 685

Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys

690 695 700

Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His

705 710 715 720

Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile

725 730 735

Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg

740 745 750

His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr

755 760 765

Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu

770 775 780

Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val

785 790 795 800

Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln

805 810 815

Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu

820 825 830

Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp

835 840 845

Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly

850 855 860

Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn

865 870 875 880

Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe

885 890 895

Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys

900 905 910

Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys

915 920 925

His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu

930 935 940

Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys

945 950 955 960

Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu

965 970 975

Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val

980 985 990

Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val

995 1000 1005

Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys

1010 1015 1020

Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr

1025 1030 1035

Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn

1040 1045 1050

Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr

1055 1060 1065

Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg

1070 1075 1080

Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu

1085 1090 1095

Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg

1100 1105 1110

Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys

1115 1120 1125

Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu

1130 1135 1140

Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser

1145 1150 1155

Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe

1160 1165 1170

Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu

1175 1180 1185

Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe

1190 1195 1200

Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu

1205 1210 1215

Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn

1220 1225 1230

Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro

1235 1240 1245

Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His

1250 1255 1260

Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg

1265 1270 1275

Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr

1280 1285 1290

Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile

1295 1300 1305

Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe

1310 1315 1320

Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr

1325 1330 1335

Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly

1340 1345 1350

Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp

1355 1360 1365

<210> 6

<211> 4101

<212> DNA

<213> Streptococcus pyogenes (Streptococcus pyogenes)

<400> 6

gacaagaagt acagcatcgg cctggccatc ggcaccaact ctgtgggctg ggccgtgatc 60

accgacgagt acaaggtgcc cagcaagaaa ttcaaggtgc tgggcaacac cgaccggcac 120

agcatcaaga agaacctgat cggagccctg ctgttcgaca gcggcgaaac agccgaggcc 180

acccggctga agagaaccgc cagaagaaga tacaccagac ggaagaaccg gatctgctat 240

ctgcaagaga tcttcagcaa cgagatggcc aaggtggacg acagcttctt ccacagactg 300

gaagagtcct tcctggtgga agaggataag aagcacgagc ggcaccccat cttcggcaac 360

atcgtggacg aggtggccta ccacgagaag taccccacca tctaccacct gagaaagaaa 420

ctggtggaca gcaccgacaa ggccgacctg cggctgatct atctggccct ggcccacatg 480

atcaagttcc ggggccactt cctgatcgag ggcgacctga accccgacaa cagcgacgtg 540

gacaagctgt tcatccagct ggtgcagacc tacaaccagc tgttcgagga aaaccccatc 600

aacgccagcg gcgtggacgc caaggccatc ctgtctgcca gactgagcaa gagcagacgg 660

ctggaaaatc tgatcgccca gctgcccggc gagaagaaga atggcctgtt cggaaacctg 720

attgccctga gcctgggcct gacccccaac ttcaagagca acttcgacct ggccgaggat 780

gccaaactgc agctgagcaa ggacacctac gacgacgacc tggacaacct gctggcccag 840

atcggcgacc agtacgccga cctgtttctg gccgccaaga acctgtccga cgccatcctg 900

ctgagcgaca tcctgagagt gaacaccgag atcaccaagg cccccctgag cgcctctatg 960

atcaagagat acgacgagca ccaccaggac ctgaccctgc tgaaagctct cgtgcggcag 1020

cagctgcctg agaagtacaa agagattttc ttcgaccaga gcaagaacgg ctacgccggc 1080

tacattgacg gcggagccag ccaggaagag ttctacaagt tcatcaagcc catcctggaa 1140

aagatggacg gcaccgagga actgctcgtg aagctgaaca gagaggacct gctgcggaag 1200

cagcggacct tcgacaacgg cagcatcccc caccagatcc acctgggaga gctgcacgcc 1260

attctgcggc ggcaggaaga tttttaccca ttcctgaagg acaaccggga aaagatcgag 1320

aagatcctga ccttccgcat cccctactac gtgggccctc tggccagggg aaacagcaga 1380

ttcgcctgga tgaccagaaa gagcgaggaa accatcaccc cctggaactt cgaggaagtg 1440

gtggacaagg gcgcttccgc ccagagcttc atcgagcgga tgaccaactt cgataagaac 1500

ctgcccaacg agaaggtgct gcccaagcac agcctgctgt acgagtactt caccgtgtat 1560

aacgagctga ccaaagtgaa atacgtgacc gagggaatga gaaagcccgc cttcctgagc 1620

ggcgagcaga aaaaggccat cgtggacctg ctgttcaaga ccaaccggaa agtgaccgtg 1680

aagcagctga aagaggacta cttcaagaaa atcgagtgct tcgactccgt ggaaatctcc 1740

ggcgtggaag atcggttcaa cgcctccctg ggcacatacc acgatctgct gaaaattatc 1800

aaggacaagg acttcctgga caatgaggaa aacgaggaca ttctggaaga tatcgtgctg 1860

accctgacac tgtttgagga cagagagatg atcgaggaac ggctgaaaac ctatgcccac 1920

ctgttcgacg acaaagtgat gaagcagctg aagcggcgga gatacaccgg ctggggcagg 1980

ctgagccgga agctgatcaa cggcatccgg gacaagcagt ccggcaagac aatcctggat 2040

ttcctgaagt ccgacggctt cgccaacaga aacttcatgc agctgatcca cgacgacagc 2100

ctgaccttta aagaggacat ccagaaagcc caggtgtccg gccagggcga tagcctgcac 2160

gagcacattg ccaatctggc cggcagcccc gccattaaga agggcatcct gcagacagtg 2220

aaggtggtgg acgagctcgt gaaagtgatg ggccggcaca agcccgagaa catcgtgatc 2280

gaaatggcca gagagaacca gaccacccag aagggacaga agaacagccg cgagagaatg 2340

aagcggatcg aagagggcat caaagagctg ggcagccaga tcctgaaaga acaccccgtg 2400

gaaaacaccc agctgcagaa cgagaagctg tacctgtact acctgcagaa tgggcgggat 2460

atgtacgtgg accaggaact ggacatcaac cggctgtccg actacgatgt ggaccatatc 2520

gtgcctcaga gctttctgaa ggacgactcc atcgacaaca aggtgctgac cagaagcgac 2580

aagaaccggg gcaagagcga caacgtgccc tccgaagagg tcgtgaagaa gatgaagaac 2640

tactggcggc agctgctgaa cgccaagctg attacccaga gaaagttcga caatctgacc 2700

aaggccgaga gaggcggcct gagcgaactg gataaggccg gcttcatcaa gagacagctg 2760

gtggaaaccc ggcagatcac aaagcacgtg gcacagatcc tggactcccg gatgaacact 2820

aagtacgacg agaatgacaa gctgatccgg gaagtgaaag tgatcaccct gaagtccaag 2880

ctggtgtccg atttccggaa ggatttccag ttttacaaag tgcgcgagat caacaactac 2940

caccacgccc acgacgccta cctgaacgcc gtcgtgggaa ccgccctgat caaaaagtac 3000

cctaagctgg aaagcgagtt cgtgtacggc gactacaagg tgtacgacgt gcggaagatg 3060

atcgccaaga gcgagcagga aatcggcaag gctaccgcca agtacttctt ctacagcaac 3120

atcatgaact ttttcaagac cgagattacc ctggccaacg gcgagatccg gaagcggcct 3180

ctgatcgaga caaacggcga aaccggggag atcgtgtggg ataagggccg ggattttgcc 3240

accgtgcgga aagtgctgag catgccccaa gtgaatatcg tgaaaaagac cgaggtgcag 3300

acaggcggct tcagcaaaga gtctatcctg cccaagagga acagcgataa gctgatcgcc 3360

agaaagaagg actgggaccc taagaagtac ggcggcttcg acagccccac cgtggcctat 3420

tctgtgctgg tggtggccaa agtggaaaag ggcaagtcca agaaactgaa gagtgtgaaa 3480

gagctgctgg ggatcaccat catggaaaga agcagcttcg agaagaatcc catcgacttt 3540

ctggaagcca agggctacaa agaagtgaaa aaggacctga tcatcaagct gcctaagtac 3600

tccctgttcg agctggaaaa cggccggaag agaatgctgg cctctgccgg cgaactgcag 3660

aagggaaacg aactggccct gccctccaaa tatgtgaact tcctgtacct ggccagccac 3720

tatgagaagc tgaagggctc ccccgaggat aatgagcaga aacagctgtt tgtggaacag 3780

cacaagcact acctggacga gatcatcgag cagatcagcg agttctccaa gagagtgatc 3840

ctggccgacg ctaatctgga caaagtgctg tccgcctaca acaagcaccg ggataagccc 3900

atcagagagc aggccgagaa tatcatccac ctgtttaccc tgaccaatct gggagcccct 3960

gccgccttca agtactttga caccaccatc gaccggaaga ggtacaccag caccaaagag 4020

gtgctggacg ccaccctgat ccaccagagc atcaccggcc tgtacgagac acggatcgac 4080

ctgtctcagc tgggaggtga c 4101

<210> 7

<211> 18

<212> PRT

<213> Artificial sequence

<400> 7

Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Ser Pro Lys Lys Lys Arg

1 5 10 15

Lys Val

<210> 8

<211> 54

<212> DNA

<213> Artificial sequence

<400> 8

aaacggacag ccgacggaag cgagttcgag tcaccaaaga agaagcggaa agtc 54

<210> 9

<211> 176

<212> PRT

<213> Bacillus subtilis bacteriophage

<400> 9

Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val

1 5 10 15

Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile

20 25 30

Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu

35 40 45

Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr

50 55 60

Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile

65 70 75 80

Lys Met Leu Ser Gly Gly Ser Gly Gly Ser Gly Gly Ser Thr Asn Leu

85 90 95

Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val Ile Gln Glu

100 105 110

Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile Gly Asn Lys

115 120 125

Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu Ser Thr Asp

130 135 140

Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp

145 150 155 160

Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu

165 170 175

<210> 10

<211> 528

<212> DNA

<213> Bacillus subtilis bacteriophage

<400> 10

actaatctga gcgacatcat tgagaaggag actgggaaac agctggtcat tcaggagtcc 60

atcctgatgc tgcctgagga ggtggaggaa gtgatcggca acaagccaga gtctgacatc 120

ctggtgcaca ccgcctacga cgagtccaca gatgagaatg tgatgctgct gacctctgac 180

gcccccgagt ataagccttg ggccctggtc atccaggatt ctaacggcga gaataagatc 240

aagatgctga gcggaggatc cggaggatct ggaggcagca ccaacctgtc tgacatcatc 300

gagaaggaga caggcaagca gctggtcatc caggagagca tcctgatgct gcccgaagaa 360

gtcgaagaag tgatcggaaa caagcctgag agcgatatcc tggtccatac cgcctacgac 420

gagagtaccg acgaaaatgt gatgctgctg acatccgacg ccccagagta taagccctgg 480

gctctggtca tccaggattc caacggagag aacaaaatca aaatgctg 528

<210> 11

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 11

gagtccgagc agaagaagaa ggg 23

<210> 12

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 12

ttctacaccc cagccgcccc agg 23

<210> 13

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 13

gaccccctcc accccgcctc cgg 23

<210> 14

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 14

cgctacacgg tgctgagcgt ggg 23

<210> 15

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 15

ggcccagact gagcacgtga tgg 23

<210> 16

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 16

ggcactgcgg ctggaggtgg ggg 23

<210> 17

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 17

gacatcgatg tcctccccat tgg 23

<210> 18

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 18

agggatcgtc tttcaaggcg agg 23

<210> 19

<211> 181

<212> PRT

<213> human (Homo sapiens)

<400> 19

Gly Gly Ser Asn Thr Asn Trp Lys Thr Leu Tyr Glu Val Lys Ser Glu

1 5 10 15

Asn Leu Gly Gln Gly Asp Lys Pro Asp Tyr Phe Ser Ser Val Ala Thr

20 25 30

Val Val Tyr Leu Arg Lys Glu Asn Cys Met Tyr Gln Ala Cys Pro Thr

35 40 45

Gln Asp Cys Asn Lys Lys Val Ile Asp Gln Gln Asn Gly Leu Tyr Arg

50 55 60

Cys Glu Lys Cys Asp Thr Glu Phe Pro Asn Phe Lys Tyr Arg Met Ile

65 70 75 80

Leu Ser Val Asn Ile Ala Asp Phe Gln Glu Asn Gln Trp Val Thr Cys

85 90 95

Phe Gln Glu Ser Ala Glu Ala Ile Leu Gly Gln Asn Ala Ala Tyr Leu

100 105 110

Gly Glu Leu Lys Asp Lys Asn Glu Gln Ala Phe Glu Glu Val Phe Gln

115 120 125

Asn Ala Asn Phe Arg Ser Phe Ile Phe Arg Val Arg Val Lys Val Glu

130 135 140

Thr Tyr Asn Asp Glu Ser Arg Ile Lys Ala Thr Val Met Asp Val Lys

145 150 155 160

Pro Val Asp Tyr Arg Glu Tyr Gly Arg Arg Leu Val Met Ser Ile Arg

165 170 175

Arg Ser Ala Leu Met

180

<210> 20

<211> 543

<212> DNA

<213> human (Homo sapiens)

<400> 20

ggagggagta acaccaactg gaaaaccttg tatgaggtca aatccgagaa cctgggccaa 60

ggcgacaagc cggactactt tagttctgtg gccacagtgg tgtatcttcg caaagagaac 120

tgcatgtacc aagcctgccc gactcaggac tgcaataaga aagtgattga tcaacagaat 180

ggattgtacc gctgtgagaa gtgcgacacc gaatttccca atttcaagta ccgcatgatc 240

ctgtcagtaa atattgcaga ttttcaagag aatcagtggg tgacttgttt ccaggagtct 300

gctgaagcta tccttggaca aaatgctgct tatcttgggg aattaaaaga caagaatgaa 360

caggcatttg aagaagtttt ccagaatgcc aacttccgat ctttcatatt cagagtcagg 420

gtcaaagtgg agacctacaa cgacgagtct cgaattaagg ccactgtgat ggacgtgaag 480

cccgtggact acagagagta tggccgaagg ctggtcatga gcatcaggag aagtgcattg 540

atg 543

<210> 21

<211> 198

<212> PRT

<213> human (Homo sapiens)

<400> 21

Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His Ile

1 5 10 15

Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr Leu

20 25 30

Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met Asp

35 40 45

Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys Gly

50 55 60

Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro Ser

65 70 75 80

Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile Ser

85 90 95

Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala Phe

100 105 110

Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg Ile

115 120 125

Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg Asp

130 135 140

Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His Cys

145 150 155 160

Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp Asp

165 170 175

Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala Ile

180 185 190

Leu Gln Asn Gln Gly Asn

195

<210> 22

<211> 594

<212> DNA

<213> human (Homo sapiens)

<400> 22

gaggcatctc cagcaagcgg accaaggcac ctgatggacc cccacatctt cacctctaac 60

tttaacaatg gcatcggcag gcacaagaca tacctgtgct atgaggtgga gcgcctggac 120

aacggcacca gcgtgaagat ggatcagcac agaggcttcc tgcacaacca ggccaagaat 180

ctgctgtgcg gcttctacgg ccggcacgca gagctgagat ttctggacct ggtgcctagc 240

ctgcagctgg atccagccca gatctatagg gtgacctggt tcatcagctg gtccccatgc 300

ttttcctggg gatgtgcagg agaggtgcgc gccttcctgc aggagaatac acacgtgcgg 360

ctgagaatct ttgccgcccg gatctacgac tatgatcctc tgtacaagga ggccctgcag 420

atgctgagag acgcaggagc ccaggtgtcc atcatgacct atgatgagtt caagcactgc 480

tgggacacat ttgtggatca ccagggctgt ccctttcagc cttgggacgg actggatgag 540

cactcccagg ccctgtctgg caggctgagg gccatcctgc agaaccaggg caat 594

<210> 23

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 23

ggaatccctt ctgcagcacc tgg 23

<210> 24

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 24

gtgctgggct ccggtgcgtt cgg 23

<210> 25

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 25

caaagcagaa actcacatcg agg 23

<210> 26

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 26

ctccctcaag caggccccgc tgg 23

<210> 27

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 27

gtgctgggct ccggtgcgtt cgg 23

<210> 28

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 28

ttctcggagg ctcaggtgcg tgg 23

<210> 29

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 29

gctcccatca catcaaccgg tgg 23

<210> 30

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 30

gccctctgtg tgctcaaggg ggg 23

<210> 31

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 31

catgcccttc ggctgcctcc tgg 23

<210> 32

<211> 23

<212> DNA

<213> human (Homo sapiens)

<400> 32

taataattga tgtcatagat tgg 23

Claims

1. A fusion protein for increasing gene editing efficiency, comprising a single-stranded DNA binding protein domain, a nucleoside deaminase, and a nuclease;

the connection sequence of the fusion protein is as follows: the nucleoside deaminase is positioned at the N-terminus of the nuclease and the single-stranded DNA binding protein functional domain is positioned between the nucleoside deaminase and the nuclease;

the single-stranded DNA-binding protein domain comprises the DNA-binding domain of Rad51 and/or the DNA-binding domain of RPA 70;

the amino acid sequence of the DNA binding domain of Rad51 is shown in SEQ ID No. 1;

the amino acid sequence of the DNA binding domain of the RPA70 is shown as SEQ ID No. 19;

the fusion protein further comprises NLS;

the NLS is located at least one end of the fusion protein;

the fusion protein further comprises two copies of UGI;

the UGI is located at least one end of the fusion protein.

2. The fusion protein of claim 1, wherein the single-stranded DNA binding protein domain is the DNA binding domain of Rad 51.

3. The fusion protein of claim 1, wherein the coding sequence for the DNA binding domain of Rad51 is set forth in SEQ ID No. 2.

4. The fusion protein of claim 1, wherein the coding sequence for the DNA binding domain of RPA70 is set forth in SEQ ID No. 20.

5. The fusion protein of claim 1, wherein the nucleoside deaminase comprises a cytosine deaminase and/or an adenosine deaminase.

6. The fusion protein of claim 5, wherein the nucleoside deaminase is a cytosine deaminase.

7. The fusion protein of claim 6, wherein the cytosine deaminase is rat-derived.

8. The fusion protein of claim 7, wherein the amino acid sequence of the rat cytosine deaminase is as set forth in SEQ ID No. 3.

9. The fusion protein of claim 8, wherein the coding sequence of the rat cytosine deaminase is as set forth in SEQ ID No. 4.

10. The fusion protein of claim 1, wherein the nuclease is selected from one or more of Cas9, Cas3, Cas8a, Cas8b, Cas10d, Cse1, Csy1, Csn2, Cas4, Cas10, Csm2, Cmr5, Fok1 and Cpf 1.

11. The fusion protein of claim 10, wherein the nuclease is Cas 9.

12. The fusion protein of claim 11, wherein the Cas9 is selected from Cas9 derived from streptococcus pneumoniae, staphylococcus aureus, streptococcus pyogenes, or streptococcus thermophilus.

13. The fusion protein of claim 12, wherein the Cas9 is selected from the group consisting of Cas9 mutant VQR-spCas9, VRER-spCas9, spCas9 n.

14. The fusion protein of claim 13, wherein the Cas9 mutant is spCas9 n.

15. The fusion protein of claim 14, wherein the spCas9n has the amino acid sequence shown as SEQ ID No. 5.

16. The fusion protein of claim 15, wherein the coding sequence of spCas9n is set forth in SEQ ID No. 6.

17. The fusion protein of claim 1, wherein the amino acid sequence of NLS is shown in SEQ ID No. 7.

18. The fusion protein of claim 1, wherein the coding sequence of NLS is shown in SEQ ID No. 8.

19. The fusion protein of claim 1, wherein the UGI has an amino acid sequence as set forth in SEQ ID No. 9.

20. The fusion protein of claim 1, wherein the coding sequence of the UGI is set forth in SEQ ID No. 10.

21. Any of the following A) -C) biomaterials:

A) a gene encoding the fusion protein of any one of claims 1-20;

B) a recombinant vector comprising a) the gene;

C) a recombinant cell or recombinant bacterium comprising the fusion protein of any one of claims 1 to 20, or comprising the gene of a).

22. A single base gene editing system comprising the fusion protein of any one of claims 1-20 and/or the biological material of claim 21 and sgrnas, the sgrnas directing the fusion protein to perform single base gene editing of a gene of interest in a cell of interest;

the target sequence of the sgRNA includes at least one of SEQ ID Nos. 11 to 18.

23. Use of the fusion protein according to any one of claims 1 to 20, the biological material according to claim 21 or the single base gene editing system according to claim 22 for the preparation of a gene editing product, a disease treatment and/or prevention product, an animal model or a new plant variety.