CN113755512B

CN113755512B - Method for preparing tandem repeat protein and application thereof

Info

Publication number: CN113755512B
Application number: CN202011405477.XA
Authority: CN
Inventors: 毕昌昊; 张学礼; 刘丽; 赵东东; 李斯微
Original assignee: Tianjin Institute of Industrial Biotechnology of CAS
Current assignee: Tianjin Institute of Industrial Biotechnology of CAS
Priority date: 2020-12-03
Filing date: 2020-12-03
Publication date: 2023-11-10
Anticipated expiration: 2040-12-03
Also published as: CN113755512A

Abstract

The application provides a method for preparing tandem repeat protein, and a related product and application thereof. The method for preparing tandem repeat protein comprises the steps of introducing an expression vector containing double-stranded DNA molecules named as single copy gene expression cassettes into receptor cells to obtain recombinant cells, extracting total RNA of the recombinant cells, and translating the total RNA to obtain the tandem repeat protein, so that the time for preparing the long tandem repeat protein is greatly shortened. Experiments prove that the tandem repeat MaSp1 which is repeated for 40 times can be obtained only by 7 days, and the time is greatly shortened compared with the traditional method. The method has the characteristics of short experimental period, time and cost saving, high efficiency and the like.

Description

Method for preparing tandem repeat protein and application thereof

Technical Field

The application relates to a method for preparing tandem repeat protein and application thereof in the field of biotechnology.

Background

Tandem repeat proteins are proteins whose amino acid sequence is highly repetitive, resulting from tandem repeat gene expression. In the past, tandem repeat proteins were prepared by constructing an expression vector containing tandem repeat DNA, and then expressing the tandem repeat proteins. The current construction method of tandem repeat DNA expression vector mainly comprises 2 methods of asymmetric cohesive end complementation method and isotail enzyme method. The asymmetric cohesive end complementation method generates random copy number and requires multiple enzymes for enzyme digestion connection. The homotail enzyme method is also complicated, and needs to repeatedly carry out enzyme digestion connection. Both methods are time-consuming and laborious.

The traction silk protein in the spider silk has high strength, and the traction silk strength of the spider silk is 5 times of that of the steel wire and 3 times of that of the artificial Kevlar fiber under the same weight. Meanwhile, spider silk has good plasticity, and the two characteristics lead the spider silk to be widely applied in various fields. In the industrial sector, for example, in the preparation of parachutes, protective clothing, composite materials for aircraft. In the biomedical field, including wound sutures, carriers for the transport of biological drugs, scaffolds for cell culture and organ transplantation. The dragline silk is mainly composed of spider silk proteins MaSp1 (major ampullate spidroins 1) and MaSp2 (major ampullate spidroins 2). These two proteins are highly modular proteins with long repeats within the sequence, with the flanking sequences being approximately 100 amino acid residues in length. However, spider silk is difficult to obtain in large quantities by breeding spiders, because of its strong field awareness and aggressiveness. Thus, many studies have attempted to express recombinant spider silk proteins in other hosts. Increasing the length of recombinant dragline silk proteins is one of the key factors in improving the mechanical properties of spider silk spinning. The size of the traction silk protein in nature is 250-320kDa. One scholars expressed 284.9kDa recombinant spider silk protein by using expression E.coli in 2010 and the spinning mechanical properties were similar to those of natural spider silk. The recombinant spider silk protein expressing 184.9kDa needs to synthesize a repeating unit MaSp1, then uses the homotail enzyme seamless splicing technology MaSp2 concatemer, and further sequentially synthesizes MaSp4, maSp8, maSp16, maSp32 and MaSp48 by repeating the same method, and finally splices MaSp96. The steps are complicated, and time and labor are wasted. And if it is desired to optimize the spider silk sequence, it is necessary to resynthesize the gene, and it takes a lot of time to reconstruct a series of concatemers.

Disclosure of Invention

The problem to be solved by the present application is how to prepare tandem repeat proteins.

In order to solve the technical problems, the application provides a method for preparing tandem repeat protein, which comprises the steps of introducing an expression vector containing double-stranded DNA molecules named as single copy gene expression cassettes into receptor cells to obtain recombinant cells, culturing the recombinant cells, and expressing to obtain the tandem repeat protein; the single copy gene expression cassette contains a promoter, an intron named 3' intron linked to the promoter, a target protein coding gene named single copy gene linked to the 3' intron, a coding sequence of a Ribosome Binding Site (RBS) linked to the target protein coding gene, a spacer sequence (Interval sequence) linked to the coding sequence of the ribosome binding site, a start codon linked to the spacer sequence, and an intron named 5' intron linked to the start codon; the 3 'intron and the 5' intron satisfy condition a that precursor RNAs transcribed from the single copy gene expression cassette form splice vesicles by base complementary pairing in the recombinant cell and produce mature circular single stranded RNA molecules by a splicing reaction; the target protein coding gene does not contain a stop codon.

In the above method, the spacer sequence is a sequence between RBS and ATG that acts to bind the ribosome to mRNA with high strength. The spacer sequence may be a double-stranded DNA of 4-10bp, e.g., a double-stranded DNA in which the nucleotide sequence of one strand is nucleotides 5535-5543 of sequence 1.

In the above method, the tandem repeat protein may contain more than 2 copies of the single copy protein, such as more than 7 copies of the single copy protein, and more than 10 copies of the single copy protein.

In the above method, the single copy gene expression cassette is formed by ligating a promoter, the 3 'intron, the target protein encoding gene, the ribosome binding site encoding sequence, the spacer sequence, the initiation codon and the 5' intron.

The single copy gene expression cassette may include, as an expression cassette for circular mRNA of the target protein, a promoter for initiating transcription of the gene encoding the target protein, and a terminator for terminating transcription of the gene encoding the target protein. Further, the single copy gene expression cassette may also include an enhancer sequence. Promoters useful in the present application include, but are not limited to: constitutive promoters, tissue, organ and development specific promoters, and inducible promoters. Examples of promoters include, but are not limited to: t7 promoter of T7 phage, constitutive promoter 35S of cauliflower mosaic virus. They may be used alone or in combination with other promoters. Suitable transcription terminators include, but are not limited to: agrobacterium nopaline synthase terminator (NOS terminator), cauliflower mosaic virus CaMV 35S terminator, tml terminator.

In the above method, the single copy gene is a target protein-encoding gene which does not contain a stop codon (TAA, TGA or TAG).

In the above method, the initiation codon is ATG.

In the above method, the target gene may further comprise a replication initiation site (pMB 1) gene.

In the above method, the target gene may further comprise a selectable marker gene. The selectable marker gene is a gene of known function and sequence that is capable of functioning as a specific marker. For example, genes encoding enzymes or luminescent compounds which produce a color change (GUS gene, luciferase gene, etc.), antibiotic marker genes (such as nptII gene which confers resistance to kanamycin and related antibiotics, bar gene which confers resistance to the herbicide phosphinothricin, hph gene which confers resistance to the antibiotic hygromycin, and dhfr gene which confers resistance to methotrexate, EPSPS gene which confers resistance to glyphosate) or chemical agent marker genes, etc. (such as herbicide resistance genes), mannose-6-phosphate isomerase gene which provides the ability to metabolize mannose.

In the above method, the target protein encoding gene encodes a target protein, and the target protein may be MaSp1; the MaSp1 is a protein with an amino acid sequence of SEQ ID No. 3.

In the above method, the recipient cell is any one of C1) -C4):

c1 Prokaryotic microbial cells;

c2 Gram negative bacterial cells;

c3 A bacterial cell of the genus Escherichia;

c4 Coli BL21 (DE 3) cells.

In the above method, the 3 'intron and the 5' intron satisfying the condition a are a pair of introns as follows:

the 3 'intron contains 6 splice vesicles and 3' splice sites, the names of the encoding DNA of the 6 splice vesicles are respectively 3'sp1 gene, 3' sp2 gene, 3'sp3 gene, 3' sp4 gene, 3'sp5 gene and 3' sp6 gene, and the names of the encoding DNA of the 3 'splice sites are 3' ss gene; the nucleotide sequence of the 3' sp1 gene which is one chain is a double-stranded DNA molecule of 5193-5214 sites of a sequence 1 in a sequence table; the nucleotide sequence of the 3' sp2 gene which is one chain is a double-stranded DNA molecule of 5278-5289 bits of sequence 1 in a sequence table; the nucleotide sequence of the 3' sp3 gene which is one chain is a double-stranded DNA molecule of 5293-5306 sites of sequence 1 in a sequence table; the nucleotide sequence of the 3' sp4 gene which is one chain is a double-stranded DNA molecule of 5318 th to 5337 th positions of a sequence 1 in a sequence table; the nucleotide sequence of the 3' sp5 gene which is one chain is a double-stranded DNA molecule of 5352 th to 5370 th sites of a sequence 1 in a sequence table; the nucleotide sequence of the 3' sp6 gene which is one chain is a double-stranded DNA molecule of 5371-5386 bits of sequence 1 in a sequence table; the 3' splice site is a double-stranded DNA molecule with a nucleotide sequence of one chain being 5419-5423 bits of a sequence 1 in a sequence table;

the 5' intron contains a 5' splice site and a 5' ss sequence; the nucleotide sequence of the 5' splice site is 5547-5556 of sequence 1; the nucleotide sequence of the 5' ss sequence is 5557-5721 of the sequence 1, and comprises 4 splicing vesicles, and the names of the encoding DNA are respectively 5' sp1 gene, 5' sp2 gene, 5' sp3 gene and 5' sp4 gene; the 5' sp1 gene is a double-stranded DNA molecule of which the nucleotide sequence of one strand is 5569-5590 bits of a sequence 1 in a sequence table; the nucleotide sequence of the 5' sp2 gene which is one chain is double-stranded DNA molecules of 5634-5643 positions of a sequence 1 in a sequence table; the nucleotide sequence of the 5'sp3 gene is a double-stranded DNA molecule of 5648-5698 sites of the sequence 1 in the sequence table, and the nucleotide sequence of the 5' sp4 gene is a double-stranded DNA molecule of 5671-5687 sites of the sequence 1 in the sequence table.

In the above method, the 3 'intron is a double-stranded DNA having a nucleotide sequence of one strand (coding strand) of nucleotides 5190 to 5423 of the sequence 1, and the 5' intron is a double-stranded DNA having a nucleotide sequence of one strand (coding strand) of nucleotides 5547 to 5721 of the sequence 1.

In the above method, the single copy gene expression cassette is a double stranded DNA molecule having a nucleotide sequence of one strand (coding strand) of SEQ ID No.1 at positions 5117-5835;

or the expression vector is a double-stranded DNA molecule (expressing tandem repeat MaSp protein) with the nucleotide sequence of one strand of SEQ ID No. 1.

The application also provides any one of the following products related to the method:

a1 The double-stranded DNA molecule named single copy gene expression cassette in the method;

a2 A) a vector containing the double stranded DNA molecule of A1);

a3 A) a recombinant microorganism comprising the double stranded DNA molecule of A1).

The vector of A2) can be constructed using existing expression vectors. The existing expression vectors comprise pMD 18-T vector, pET21b and the like. The existing expression vectors may also contain the 3' -untranslated region of the foreign gene, i.e., contain the polyadenylation signal and any other DNA fragments involved in mRNA processing or gene expression. The polyadenylation signal may direct the addition of polyadenylic acid to the 3' end of the mRNA precursor. In constructing the vector of A2), enhancers, such as transcription enhancers, may also be used, which may be ATG start codon or adjacent region start codon, etc., but must be in the same reading frame as the coding sequence to ensure proper translation of the entire sequence. To facilitate identification and screening of the transgene results, existing expression vectors used may be processed, such as by adding genes encoding enzymes or luminescent compounds that produce color changes (GUS genes, luciferase genes, etc.), antibiotic marker genes (such as nptII genes conferring resistance to kanamycin and related antibiotics, bar genes conferring resistance to the herbicide phosphinothricin, hph genes conferring resistance to the antibiotic hygromycin, dhfr genes conferring resistance to methatrexa, EPSPS genes conferring resistance to glyphosate) or chemical reagent marker genes, etc. (such as herbicide resistance genes), mannose-6-phosphate isomerase genes providing the ability to metabolize mannose.

The application provides the application of the method or the product in preparing tandem repeat protein.

The application provides a method for preparing tandem repeat proteins, which comprises the steps of introducing an expression vector containing double-stranded DNA molecules named as single copy gene expression cassettes into receptor cells to obtain recombinant cells, extracting total RNA of the recombinant cells, and translating the total RNA into the tandem repeat proteins, so that the time for preparing the long tandem repeat proteins is greatly shortened. Experiments prove that the tandem repeat MaSp1 protein which is repeated 40 times can be obtained only by 7 days, and the time is greatly shortened.

Drawings

FIG. 1 is a schematic diagram of the MaSp1 RNA expression cassette according to example 1 of the present application. In the figure RBS is the coding sequence for the ribosome binding site and ATG is the start codon.

FIG. 2 is a schematic representation of the mechanism by which introns splice to form circular MaSp1 RNA in example 1 of the present application. BSJ is a back' splice junction (splice) site in the figure; RBS is a ribosome binding site; ATG is the initiation codon.

FIG. 3 is a schematic representation of the mechanism of translation of MaSp1 tandem repeat proteins according to example 1 of the present application. RBS is a ribosome binding site; ATG is the initiation codon.

FIG. 4 is a verification electrophoretogram of the MaSp1 RNA loop formation in example 1 of the present application.

FIG. 5 is a graph showing the results of sanger sequencing of MaSp1 RNA splice junctions in example 1 of the present application.

FIG. 6 is a schematic representation of a spider silk protein PAGE gel after translation of the circular MaSp1 RNA of example 1 of the present application, wherein M is Marker,1 is MaSp1 inclusion body and 2 is MaSp1 supernatant.

FIG. 7 is a Western diagram of the protein after translation of the circular MaSp1 RNA of example 1 of the present application, wherein 1 is the MaSp1 inclusion body and 2 is the MaSp1 supernatant.

FIG. 8 is a mass spectrum of a suspected MaSp1 protein according to example 1 of the present application.

Detailed Description

The following detailed description of the application is provided in connection with the accompanying drawings that are presented to illustrate the application and not to limit the scope thereof. The experimental methods in the following examples are conventional methods unless otherwise specified. Materials, reagents and the like used in the examples described below are commercially available unless otherwise specified.

In the following examples, E.coli DH 5. Alpha. (BC 102-02) is a product of Biomed corporation; coli BL21 (CW 0809S) is a product of Beijing kang, century corporation.

In the following examples, the RNAprep Pure cultured cells/bacteria Total RNA extraction kit (DP 430) is available from TIANGEN company; rever Tra Ace qPCR RT kit cDNA A synthesis kit (FSQ-101) is a product of TOYOBO company.

In the following examples, the 10xBSA protein solution (B9000S) is a NEB company product; 2xEs Taq MasterMix (containing dye) (CW 0690H) is a product of Beijing kang, century corporation.

In the following examples, the media used are in particular as follows:

the solid LB culture medium is a sterile culture medium prepared from tryptone, yeast extract, naCl, agar and deionized water, and the contents of the tryptone, the yeast extract, the NaCl and the agar are as follows: 10g/L tryptone, 5g/L yeast extract, 10g/L NaCl,15g/L agar.

The liquid LB medium is a sterile medium prepared from tryptone, yeast extract, naCl and deionized water, and the contents of the tryptone, the yeast extract and the NaCl are as follows: 10g/L tryptone, 5g/L yeast extract, 10g/L NaCl.

The solid LB medium with ampicillin concentration of 100. Mu.g/mL was a sterile medium made of ampicillin, tryptone, yeast extract, naCl, agar and deionized water, the contents of ampicillin, tryptone, yeast extract, naCl, agar were as follows: 100. Mu.g/mL ampicillin, 10g/L tryptone, 5g/L yeast extract, 10g/L NaCl,15g/L agar.

The liquid LB medium with ampicillin concentration of 100. Mu.g/mL was a sterile medium made of ampicillin, tryptone, yeast extract, naCl and deionized water, and the contents of ampicillin, tryptone, yeast extract, naCl were as follows: 100. Mu.g/mL ampicillin, 10g/L tryptone, 5g/L yeast extract, 10g/L NaCl.

Example 1 preparation of tandem repeat MaSp1

This example prepared an expression vector containing a single copy gene expression cassette designated pMaSp1, the pMaSp1 being a double stranded DNA having one strand of nucleotide sequence of sequence 1 (SEQ ID No. 1) in the sequence Listing. In sequence 1, the 494-1459 th is the apramycin resistance gene, the 5117-5835 th is the DNA molecule named single copy gene expression cassette, and the MaSp1 RNA expression cassette is called below. The MaSp1 RNA expression cassette has a structure shown in FIG. 1, and consists of a T7 promoter (the nucleotide sequence is 5117-5135 nucleotide of the sequence 1), an intron (the nucleotide sequence is 5190-5423 nucleotide of the sequence 1) which is connected with the T7 promoter and is named 3' intron, wherein 5190-5418 nucleotide is 3' ss gene, 5419-5423 nucleotide is 3' splice site), a target protein coding gene (hereinafter referred to as MaSp1 gene, the nucleotide sequence is nucleotide 5424-5528 nucleotide of the sequence 1), a coding sequence (the nucleotide sequence is 5529-5534 nucleotide of the sequence 1) of a Ribosome Binding Site (RBS) which is connected with the MaSp1 gene, a spacer (Interval sequence) (the nucleotide sequence is nucleotide 5535-5543 nucleotide of the sequence 1), a start codon ATG (the nucleotide sequence is 5544 nucleotide 5544-5546 nucleotide of the sequence 1) which is connected with the spacer, and a coding sequence (5557 nucleotide 5547-5557 nucleotide sequence which is connected with the start codon of the sequence is 5547 ' nucleotide sequence, and a coding sequence (5547-5547) which is connected with the nucleotide sequence of the sequence 5547 ' nucleotide sequence which is 5547. The 5788 th to 5835 th sites of the sequence 1 in the sequence table are terminators for stopping transcription of the introns and the MaSp1 genes, and the 12 th to 467 th sites are replication initiation sites.

The MaSp1 gene does not contain a stop codon (TAA, TGA or TAG). The MaSp1 gene encodes MaSp1, and MaSp1 is a protein with an amino acid sequence of sequence 2. The 3 'and 5' introns meet condition a that the precursor RNA transcribed from the single copy gene expression cassette forms a splice vesicle by base complementary pairing in the recombinant cell and produces a mature circular single stranded RNA molecule by a splicing reaction (G-OH catalyzed splicing reaction) (mechanism see fig. 2).

The 3 'intron contains 6 splice vesicles and a 3' splice site, the names of the DNA encoding the 6 splice vesicles are respectively 3'sp1 gene, 3' sp2 gene, 3'sp3 gene, 3' sp4 gene, 3'sp5 gene and 3' sp6 gene, and the DNA encoding the 3 'splice site is called 3' ss gene; the nucleotide sequence of the 3' sp1 gene which is one chain is a double-stranded DNA molecule of 5193-5214 sites of a sequence 1 in a sequence table; the nucleotide sequence of the 3' sp2 gene which is one chain is a double-stranded DNA molecule of 5278-5289 bits of sequence 1 in a sequence table; the nucleotide sequence of the 3' sp3 gene which is one chain is a double-stranded DNA molecule of 5293-5306 sites of sequence 1 in a sequence table; the nucleotide sequence of the 3' sp4 gene which is one chain is a double-stranded DNA molecule of 5318 th to 5337 th positions of a sequence 1 in a sequence table; the nucleotide sequence of the 3' sp5 gene which is one chain is a double-stranded DNA molecule of 5352 th to 5370 th sites of a sequence 1 in a sequence table; the nucleotide sequence of the 3' sp6 gene which is one chain is a double-stranded DNA molecule of 5371-5386 bits of sequence 1 in a sequence table; the 3' splice site is a double-stranded DNA molecule with a nucleotide sequence of one strand being 5419-5423 positions of a sequence 1 in a sequence table.

The mechanism for preparing tandem repeat proteins using the expression vector pMaSp1 containing the single copy gene expression cassette described above is to introduce pMaSp1 into recipient cells to obtain recombinant cells in which pMaSp1 transcribes a precursor RNA, also called nuclear pre-mRNA (pre-mRNA), shown in the left diagram in fig. 2. In the precursor RNA, the 3 'intron and the 5' intron form a splicing vesicle by base complementation, and a splicing reaction occurs by G-OH catalysis, so that a loop is formed, and a mature circular single-stranded RNA molecule shown in the right diagram in FIG. 2 is generated, which is called circular MaSp1 RNA. Ribosome binds to the Ribosome Binding Site (RBS) sequence on circular MaSp1 RNA, which initiates translation of the protein from AUG, and because circular MaSp1 RNA does not contain UAA, UGA or UAG, the ribosome will continue to translate on circular MaSp1 mRNA, thereby producing a MaSp1 tandem repeat protein (FIG. 3). The specific process is as follows:

1. preparation of expression vector pMaSp1 containing Single copy Gene expression cassette

The pMaSp1 is constructed in a modularized mode, and each module is connected by adopting a Goldengate method: the protective base and enzyme recognition site of restriction endonuclease BsaI and complementary sticky ends are added at two ends of each module, and the protective base and enzyme cleavage site and complementary sticky ends are added by way of primer embedding, specifically as follows:

1.1 Module

Construction of pMaSp1 requires module a and module B:

the deoxyribonucleotide sequence of the module A is shown as 5547-5423 of the sequence 1 in the sequence table, wherein 5547-5721 is 5 'intron (wherein 5547-5556 is 5' splice site, 5557-5721 is 5'ss gene), 5788-5835 is transcription terminator, 12-467 is replication initiation site (pMB 1) gene, 494-1459 is ampicillin resistance gene, 5117-5135 is T7 promoter, 5190-5423 is 3' intron (wherein 5190-5418 is 3'ss gene, 5419-5423 is 3' splice site).

The module B contains a MaSp1 gene, and the deoxyribonucleotide sequence of the module B is shown as 5424-5528 th positions of the sequence 1 in the sequence table.

1.2 processing of modules

Adding a protective base and an enzyme recognition site of restriction endonuclease BsaI and a complementary sticky end at two ends of the module A through PCR reaction to obtain a module A with restriction endonuclease BsaI sites at two ends, which is named as a module A-BsaI; the primer pairs used for this PCR reaction were PartA-F and PartA-R.

PartA-F：5’-CCAGGTCTCAAAGGAGTACTCGATGGATCTCAGGTCAATTGAGGCCTGAGTA-3' (underlined nucleotides are BsaI recognition sites)

PartA-R：5’-CCAGGTCTCAGGTAGCATTATGTTCAGATAAGGTC-3'. (underlined nucleotides are BsaI recognition sites)

The two ends of the module B are added with a protective base and an enzyme recognition site of restriction endonuclease BsaI and complementary sticky ends through PCR reaction, so that a module B with restriction endonuclease BsaI sites at the two ends is obtained, and the module B is named as a module B-BsaI; the primer pairs used for this PCR reaction were PartB-F and PartB-R.

PartB-F：5’-CCAGGTCTCATACCAGCGGACGTGG-3' (underlined nucleotides are BsaI recognition sites)

PartB-R：5’-CCAGGTCTCACCTTTGTTCCCTGGCTTCC-3 (underlined nucleotides are BsaI recognition sites).

1.3 construction of pMaSp1

The modules A-BsaI and B-BsaI were ligated into circular MaSp1 RNA in a molar ratio of 1:1 to prepare plasmids by: 20. Mu.L of the reaction system of module A-BsaI 5.05E-8mol (about 100 ng), module B-BsaI 5.05E-8mol; bsaI enzyme 1. Mu.L, T4 DNA Ligase (T4 DNA Ligase) 1. Mu.L, 10x T4 buffer (10 x T4 buffer) 2. Mu.L, 10x BSA protein solution 2. Mu.L, and the solution was made up to 20. Mu.L with deionized water. The ligation was performed by the following reaction conditions: reacting for 3min at 37 ℃; reacting for 4min at 25 ℃ for 25 cycles; the unligated fragments were excised by reaction at 50℃for 5min, and then the enzyme was inactivated by reaction at 80℃for 5 min. After the completion of the reaction, a Goldengate reaction solution of pMaSp1 was obtained.

Transferring 5 mu L of Goldengate reaction solution of pMaSp1 into competent cells of escherichia coli DH5 alpha, screening on a solid LB culture medium with ampicillin concentration of 100 mu g/mL, performing bacterial picking sequencing, screening and constructing a correct plasmid, and amplifying and extracting the plasmid to obtain pMaSp1.

2. Construction and validation of circular MaSp1 RNA

Transferring 0.5 μl of pMaSp1 into competent cells of Escherichia coli BL21 (DE 3) at ampicillin concentration of 1Screening on a solid LB culture medium with the concentration of 00 mug/mL to obtain an escherichia coli BL21 (DE 3) positive transformant (a recombinant cell transferred into pMaSp 1), transferring the escherichia coli BL21 (DE 3) positive transformant into a liquid LB culture medium with the concentration of 100 mug/mL of ampicillin, and culturing at 37 ℃ until OD _600nm 1mM isopropyl-beta-D-thiogalactoside (IPTG) was used to induce for 12h, 1mL of bacterial solution was taken, 12000rmp was centrifuged for 2min, the supernatant was discarded, and the total RNA of E.coli BL21 (DE 3) positive transformants was extracted according to the method described in the specification using RNAprep Pure culture cell/bacterial total RNA extraction kit.

Total RNA was reverse transcribed into cDNA using a Rever Tra Ace qPCRRT kit cDNA synthesis kit according to the protocol described.

PCR reactions were performed on cDNA using primer pairs Testify cirMaSp1-F and Testify cirMaSp1-R to verify whether the MaSp1 RNA was circular.

Testify cirMaSp1-F:5’-CAGGACAGGGAGGATATGGA-3’；

Testify cirMaSp1-R:5’-CTCCTCCCATGGCTGC-3’。

The DNA polymerase used was verified to be 2xEs Taq MasterMix (containing dye). The samples after the PCR reaction were run and the electrophoresis pattern was shown in FIG. 4. The bands with a molecular weight of 100bp were sent for sequencing and the results of sanger sequencing of the MaSp1 RNA splice junction (Splicejunction) are shown in FIG. 5. Sequencing results indicated that the 5 'splice site was ligated to the 3' splice site, indicating that the MaSp1 RNA had been circularized.

3. Preparation of MaSp1 tandem repeat proteins

Inoculating the positive transformant of Escherichia coli BL21 (DE 3) with the circular MaSp1 RNA verified in step 2 into liquid LB medium with ampicillin concentration of 100 μg/mL, and culturing at 37deg.C to OD ₆₀₀ The induction was continued for 6h at 37℃with 1mM IPTG (labeled MaSp1 cirmRNA 6h in FIG. 6); replacement of MaSp1 RNA-primed BL21 strain with the empty vector BL21 strain served as control (labeled empty vector 6h in FIG. 6). After induction, a protein gel sample is prepared, the result of a gel detection experiment is shown in fig. 6, and three more protein gel strips are obtained after the treatment of escherichia coli BL21 (DE 3) positive transformant with MaSp1 RNA being looped compared with a control. Dividing three stripsThe mass spectrum was then isolated and the result is shown in FIG. 8, wherein the molecular weight of the band was greater than 118kD, which is spider silk protein.

Wherein, BL21 strain of the empty vector is a transformant obtained by transferring the empty vector into E.coli BL21 (DE 3). The empty vector is a plasmid obtained by removing the MaSp1 RNA expression cassette in pMaSp1 and keeping other nucleotides of pMaSp1 unchanged. The empty vector differs from pMaSp1 only in that it does not contain a MaSp1 RNA expression cassette.

Experiments prove that the tandem repeat MaSp1 which is repeated for 40 times can be obtained only by 7 days, and the time is greatly shortened compared with the traditional method.

The present application is described in detail above. It will be apparent to those skilled in the art that the present application can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the application and without undue experimentation. While the application has been described with respect to specific embodiments, it will be appreciated that the application may be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. The application of some of the basic features may be done in accordance with the scope of the claims that follow.

Sequence listing

<110> institute of Tianjin Industrial biotechnology, national academy of sciences

<120> a method for preparing tandem repeat protein and use thereof

<130> GNCSY200930

<160> 2

<170> SIPOSequenceListing 1.0

<210> 1

<211> 5860

<212> DNA

<213> Artificial sequence (Artificial Sequence)

<400> 1

tggcgaatgg gacgcgccct gtagcggcgc attaagcgcg gcgggtgtgg tggttacgcg 60

cagcgtgacc gctacacttg ccagcgccct agcgcccgct cctttcgctt tcttcccttc 120

ctttctcgcc acgttcgccg gctttccccg tcaagctcta aatcgggggc tccctttagg 180

gttccgattt agtgctttac ggcacctcga ccccaaaaaa cttgattagg gtgatggttc 240

acgtagtggg ccatcgccct gatagacggt ttttcgccct ttgacgttgg agtccacgtt 300

ctttaatagt ggactcttgt tccaaactgg aacaacactc aaccctatct cggtctattc 360

ttttgattta taagggattt tgccgatttc ggcctattgg ttaaaaaatg agctgattta 420

acaaaaattt aacgcgaatt ttaacaaaat attaacgttt acaatttcag gtggcacttt 480

tcggggaaat gtgcgcggaa cccctatttg tttatttttc taaatacatt caaatatgta 540

tccgctcatg agacaataac cctgataaat gcttcaataa tattgaaaaa ggaagagtat 600

gagtattcaa catttccgtg tcgcccttat tccctttttt gcggcatttt gccttcctgt 660

ttttgctcac ccagaaacgc tggtgaaagt aaaagatgct gaagatcagt tgggtgcacg 720

agtgggttac atcgaactgg atctcaacag cggtaagatc cttgagagtt ttcgccccga 780

agaacgtttt ccaatgatga gcacttttaa agttctgcta tgtggcgcgg tattatcccg 840

tattgacgcc gggcaagagc aactcggtcg ccgcatacac tattctcaga atgacttggt 900

tgagtactca ccagtcacag aaaagcatct tacggatggc atgacagtaa gagaattatg 960

cagtgctgcc ataaccatga gtgataacac tgcggccaac ttacttctga caacgatcgg 1020

aggaccgaag gagctaaccg cttttttgca caacatgggg gatcatgtaa ctcgccttga 1080

tcgttgggaa ccggagctga atgaagccat accaaacgac gagcgtgaca ccacgatgcc 1140

tgcagcaatg gcaacaacgt tgcgcaaact attaactggc gaactactta ctctagcttc 1200

ccggcaacaa ttaatagact ggatggaggc ggataaagtt gcaggaccac ttctgcgctc 1260

ggcccttccg gctggctggt ttattgctga taaatctgga gccggtgagc gtgggtctcg 1320

cggtatcatt gcagcactgg ggccagatgg taagccctcc cgtatcgtag ttatctacac 1380

gacggggagt caggcaacta tggatgaacg aaatagacag atcgctgaga taggtgcctc 1440

actgattaag cattggtaac tgtcagacca agtttactca tatatacttt agattgattt 1500

aaaacttcat ttttaattta aaaggatcta ggtgaagatc ctttttgata atctcatgac 1560

caaaatccct taacgtgagt tttcgttcca ctgagcgtca gaccccgtag aaaagatcaa 1620

aggatcttct tgagatcctt tttttctgcg cgtaatctgc tgcttgcaaa caaaaaaacc 1680

accgctacca gcggtggttt gtttgccgga tcaagagcta ccaactcttt ttccgaaggt 1740

aactggcttc agcagagcgc agataccaaa tactgtcctt ctagtgtagc cgtagttagg 1800

ccaccacttc aagaactctg tagcaccgcc tacatacctc gctctgctaa tcctgttacc 1860

agtggctgct gccagtggcg ataagtcgtg tcttaccggg ttggactcaa gacgatagtt 1920

accggataag gcgcagcggt cgggctgaac ggggggttcg tgcacacagc ccagcttgga 1980

gcgaacgacc tacaccgaac tgagatacct acagcgtgag ctatgagaaa gcgccacgct 2040

tcccgaaggg agaaaggcgg acaggtatcc ggtaagcggc agggtcggaa caggagagcg 2100

cacgagggag cttccagggg gaaacgcctg gtatctttat agtcctgtcg ggtttcgcca 2160

cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg gggcggagcc tatggaaaaa 2220

cgccagcaac gcggcctttt tacggttcct ggccttttgc tggccttttg ctcacatgtt 2280

ctttcctgcg ttatcccctg attctgtgga taaccgtatt accgcctttg agtgagctga 2340

taccgctcgc cgcagccgaa cgaccgagcg cagcgagtca gtgagcgagg aagcggaaga 2400

gcgcctgatg cggtattttc tccttacgca tctgtgcggt atttcacacc gcatatatgg 2460

tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagtatac actccgctat 2520

cgctacgtga ctgggtcatg gctgcgcccc gacacccgcc aacacccgct gacgcgccct 2580

gacgggcttg tctgctcccg gcatccgctt acagacaagc tgtgaccgtc tccgggagct 2640

gcatgtgtca gaggttttca ccgtcatcac cgaaacgcgc gaggcagctg cggtaaagct 2700

catcagcgtg gtcgtgaagc gattcacaga tgtctgcctg ttcatccgcg tccagctcgt 2760

tgagtttctc cagaagcgtt aatgtctggc ttctgataaa gcgggccatg ttaagggcgg 2820

ttttttcctg tttggtcact gatgcctccg tgtaaggggg atttctgttc atgggggtaa 2880

tgataccgat gaaacgagag aggatgctca cgatacgggt tactgatgat gaacatgccc 2940

ggttactgga acgttgtgag ggtaaacaac tggcggtatg gatgcggcgg gaccagagaa 3000

aaatcactca gggtcaatgc cagcgcttcg ttaatacaga tgtaggtgtt ccacagggta 3060

gccagcagca tcctgcgatg cagatccgga acataatggt gcagggcgct gacttccgcg 3120

tttccagact ttacgaaaca cggaaaccga agaccattca tgttgttgct caggtcgcag 3180

acgttttgca gcagcagtcg cttcacgttc gctcgcgtat cggtgattca ttctgctaac 3240

cagtaaggca accccgccag cctagccggg tcctcaacga caggagcacg atcatgcgca 3300

cccgtggggc cgccatgccg gcgataatgg cctgcttctc gccgaaacgt ttggtggcgg 3360

gaccagtgac gaaggcttga gcgagggcgt gcaagattcc gaataccgca agcgacaggc 3420

cgatcatcgt cgcgctccag cgaaagcggt cctcgccgaa aatgacccag agcgctgccg 3480

gcacctgtcc tacgagttgc atgataaaga agacagtcat aagtgcggcg acgatagtca 3540

tgccccgcgc ccaccggaag gagctgactg ggttgaaggc tctcaagggc atcggtcgag 3600

atcccggtgc ctaatgagtg agctaactta cattaattgc gttgcgctca ctgcccgctt 3660

tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc gcggggagag 3720

gcggtttgcg tattgggcgc cagggtggtt tttcttttca ccagtgagac gggcaacagc 3780

tgattgccct tcaccgcctg gccctgagag agttgcagca agcggtccac gctggtttgc 3840

cccagcaggc gaaaatcctg tttgatggtg gttaacggcg ggatataaca tgagctgtct 3900

tcggtatcgt cgtatcccac taccgagata tccgcaccaa cgcgcagccc ggactcggta 3960

atggcgcgca ttgcgcccag cgccatctga tcgttggcaa ccagcatcgc agtgggaacg 4020

atgccctcat tcagcatttg catggtttgt tgaaaaccgg acatggcact ccagtcgcct 4080

tcccgttccg ctatcggctg aatttgattg cgagtgagat atttatgcca gccagccaga 4140

cgcagacgcg ccgagacaga acttaatggg cccgctaaca gcgcgatttg ctggtgaccc 4200

aatgcgacca gatgctccac gcccagtcgc gtaccgtctt catgggagaa aataatactg 4260

ttgatgggtg tctggtcaga gacatcaaga aataacgccg gaacattagt gcaggcagct 4320

tccacagcaa tggcatcctg gtcatccagc ggatagttaa tgatcagccc actgacgcgt 4380

tgcgcgagaa gattgtgcac cgccgcttta caggcttcga cgccgcttcg ttctaccatc 4440

gacaccacca cgctggcacc cagttgatcg gcgcgagatt taatcgccgc gacaatttgc 4500

gacggcgcgt gcagggccag actggaggtg gcaacgccaa tcagcaacga ctgtttgccc 4560

gccagttgtt gtgccacgcg gttgggaatg taattcagct ccgccatcgc cgcttccact 4620

ttttcccgcg ttttcgcaga aacgtggctg gcctggttca ccacgcggga aacggtctga 4680

taagagacac cggcatactc tgcgacatcg tataacgtta ctggtttcac attcaccacc 4740

ctgaattgac tctcttccgg gcgctatcat gccataccgc gaaaggtttt gcgccattcg 4800

atggtgtccg ggatctcgac gctctccctt atgcgactcc tgcattagga agcagcccag 4860

tagtaggttg aggccgttga gcaccgccgc cgcaaggaat ggtgcatgca aggagatggc 4920

gcccaacagt cccccggcca cggggcctgc caccataccc acgccgaaac aagcgctcat 4980

gagcccgaag tggcgagccc gatcttcccc atcggtgatg tcggcgatat aggcgccagc 5040

aaccgcacct gtggcgccgg tgatgccggc cacgatgcgt ccggcgtaga ggatcgagat 5100

ctcgatcccg cgaaattaat acgactcact ataggggaat tgtgagcgga taacaattcc 5160

cctctagaaa taattttgtt taactttaaa attctagaga aaatttcgtc tggattagtt 5220

acttatcgtg taaaatctga taaatggaat tggttctaca taaatgccta acgactatcc 5280

ctttggggag tagggtcaag tgactcgaaa cgatagacaa cttgctttaa caagttggag 5340

atatagtctg ctctgcatgg tgacatgcag ctggatataa ttccggggta agattaacga 5400

ccttatctga acataatgct accagcggtc gcggcggtct gggtggccag ggtgcaggta 5460

tggcggctgc ggctgcaatg ggcggtgctg gccaaggtgg ctacggcggc ctgggttctc 5520

agggtactaa ggagatatac catatggatc tgcgttcaat tgaggcctga gtataaggtg 5580

acttatactt gtaatctatc taaacgggga acctctctag tagacaatcc cgtgctaaat 5640

tgtaggactg ccctttaata aatacttcta tatttaaaga ggtatttatg aaaagcggaa 5700

tttatcagat taaaaatact ttgagatccg gctgctaaca aagcccgaaa ggaagctgag 5760

ttggctgctg ccaccgctga gcaataacta gcataacccc ttggggcctc taaacgggtc 5820

ttgaggggtt ttttgctgaa aggaggaact atatccggat 5860

<210> 2

<211> 35

<212> PRT

<213> Artificial sequence (Artificial Sequence)

<400> 2

Ser Gly Arg Gly Gly Leu Gly Gly Gln Gly Ala Gly Met Ala Ala Ala

1 5 10 15

Ala Ala Met Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser

20 25 30

Gln Gly Thr

35

Claims

1. A method for producing tandem repeat proteins, characterized by: comprises introducing an expression vector containing a double-stranded DNA molecule named single copy gene expression cassette into a recipient cell to obtain a recombinant cell, culturing the recombinant cell, and expressing to obtain tandem repeat proteins; the single copy gene expression cassette comprises a promoter, an intron which is connected with the promoter and is named as a 3' intron, a target protein coding gene which is connected with the 3' intron and is named as a single copy gene, a coding sequence of a ribosome binding site which is connected with the target protein coding gene, a spacer sequence which is connected with the coding sequence of the ribosome binding site, a start codon which is connected with the spacer sequence, and an intron which is connected with the start codon and is named as a 5' intron; the 3 'intron and the 5' intron satisfy condition a that precursor RNAs transcribed from the single copy gene expression cassette form splice vesicles by base complementary pairing in the recombinant cell and produce mature circular single stranded RNA molecules by a splicing reaction; the target protein coding gene does not contain a stop codon;

the 3 'intron and the 5' intron satisfying condition a are a pair of introns:

2. The method according to claim 1, characterized in that: the single copy gene expression cassette is formed by connecting a promoter, the 3 'intron, the target protein coding gene, a coding sequence of the ribosome binding site, the spacer sequence, the initiation codon and the 5' intron.

3. The method according to claim 1 or 2, characterized in that: the target protein is MaSp1.

4. A method according to claim 3, characterized in that: the MaSp1 is a protein with an amino acid sequence of a sequence 2.

5. The method as claimed in claim 4, wherein: the recipient cell is a prokaryotic microbial cell.

6. The method as claimed in claim 5, wherein: the recipient cell is a gram-negative bacterial cell.

7. The method as claimed in claim 6, wherein: the recipient cell is an Escherichia bacterial cell.

8. The method as claimed in claim 7, wherein: the receptor cell is an E.coli BL21 (DE 3) cell.

9. The method as claimed in claim 5, wherein: the 3 'intron is double-stranded DNA with one strand of nucleotide sequence being 5190-5423 nucleotides of the sequence 1, and the 5' intron is double-stranded DNA with one strand of nucleotide sequence being 5547-5721 nucleotides of the sequence 1.

10. The method as claimed in claim 9, wherein: the single copy gene expression cassette is a double-stranded DNA molecule with the nucleotide sequence of one strand being 5117-5835 of the sequence 1;

or the expression vector is a double-stranded DNA molecule with one strand of nucleotide sequence of sequence 1.

11. A product of any one of the following:

a1 A double stranded DNA molecule of the name single copy gene expression cassette in the method of any one of claims 1-10;

a2 A) a vector containing the double stranded DNA molecule of A1);

12. Use of the method of any one of claims 1-10 or the product of claim 11 for the preparation of tandem repeat proteins.