CN116751763B - 一种Cpf1蛋白、V型基因编辑***及应用 - Google Patents

一种Cpf1蛋白、V型基因编辑***及应用 Download PDF

Info

Publication number
CN116751763B
CN116751763B CN202310510289.0A CN202310510289A CN116751763B CN 116751763 B CN116751763 B CN 116751763B CN 202310510289 A CN202310510289 A CN 202310510289A CN 116751763 B CN116751763 B CN 116751763B
Authority
CN
China
Prior art keywords
gene editing
sequence
protein
cpf1
crispr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310510289.0A
Other languages
English (en)
Other versions
CN116751763A (zh
Inventor
田瑞
赵停停
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Shutong Medical Technology Co ltd
Original Assignee
Zhuhai Shutong Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Shutong Medical Technology Co ltd filed Critical Zhuhai Shutong Medical Technology Co ltd
Priority to CN202310510289.0A priority Critical patent/CN116751763B/zh
Publication of CN116751763A publication Critical patent/CN116751763A/zh
Application granted granted Critical
Publication of CN116751763B publication Critical patent/CN116751763B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明属于基因工程技术领域,公开了一种Cpf1蛋白、V型CRISPR/Cas12a基因编辑***及应用。本发明提供一种Cpf1蛋白和编码所述Cpf1蛋白的核苷酸序列。本发明提供一种V型CRISPR/Cas12a基因编辑***,包括所述的Cpf1蛋白、辅助蛋白和CRISPR array。本发明将所述V型CRISPR/Cas12a基因编辑***在原核或真核生物基因编辑中、制备生物基因编辑制剂中应用。本发明的基因编辑***扩大了基因编辑工具的种类,丰富了现有Cpf1作为基因编辑工具的PAM多样性,为Cpf1用于临床治疗提供更多的工具选择,对推动将基因编辑应用于临床治疗具有重要的作用。

Description

一种Cpf1蛋白、V型基因编辑***及应用
技术领域
本发明涉及基因工程技术领域,具体涉及一种Cpf1蛋白、V型CRISPR/Cas12a基因编辑***及应用。
背景技术
微生物自适应免疫***CRISPR/Cas(Clustered Regularly InterspacedPalindromic Repeats/CRISPR-associated proteins system)帮助细菌和古菌防御外来核酸的入侵。CRISPR/Cas***包含直接重复序列(Direct Repeat,DR),这些重复序列由外源DNA的独特间隔序列(Spacer)分离。CRISPR array被转录成长转录物(pre-crRNA,CRISPRRNA的前体),然后被加工处理以产生小的成熟的CRISPR RNA(crRNA),由间隔序列和部分相邻的直接重复组成。crRNA与Cas核酸内切酶形成复合物,在某些情况下,还与辅助Cas蛋白形成复合物并用作靶向和切割外来核酸的指南,从而实现干扰。Cas-crRNA复合物的DNA识别需要靶位点附近存在原间隔物相邻基序(PAM,Proto-Spacer Adjacent Motif),这有助于自我与非自我辨别。CRISPR/Cas***根据不同蛋白酶数量大致分为两类:I类***使用多种Cas蛋白的复合物,如Cascade,而II类***使用单一效应酶,如Cas9、Cas12。
然而,CRISPR/Cas9***中因Cas9蛋白通常较大,由约1300个氨基酸组成,常用来进行体内递送的AAV病毒能容纳的最大装载量为4500kb,除了容纳Cas蛋白,还需要装载tracrRNA等其他发挥基因编辑必不可缺的功能元件,这一条件限制了大部分Cas9蛋白无法进行包装递送,而导致其应用困难。此外,Cas9识别富含G的PAM,导致无法编辑一些特定的靶位点,所以在哺乳动物基因组中的靶向范围有限。因此,研发新的基因组编辑***,以使其在基因编辑的应用上提供更多的可能具有重要意义。
发明内容
本发明的目的在于克服现有技术的不足之处而提供一种Cpf1蛋白、V型CRISPR/Cas12a基因编辑***及应用。
为实现上述目的,本发明采取的技术方案如下:
第一方面,本发明提供一种Cpf1蛋白,所述Cpf1蛋白的氨基酸序列如SEQ ID NO.1~6中任一种序列所示。
本发明的Cpf1蛋白可识别多种不同的PAM序列,只需要识别3位碱基、2位甚至1位碱基的PAM,可靶向的范围更广泛,丰富了现有Cpf1蛋白的PAM多样性,还可极大的提高对特定基因位点进行精准编辑的可能性,为进一步挖掘更多的PAM富含C碱基的新型Cpf1核酸酶提供参考价值,扩充了现有的基因编辑种类。
第二方面,本发明提供种编码所述Cpf1蛋白的核酸,所述核酸的碱基序列如SEQID NO.16~21中任一种序列所示。
第三方面,本发明提供一种V型CRISPR/Cas12a基因编辑***,包括所述的Cpf1蛋白、辅助蛋白和CRISPR array。
本发明的基因编辑***可以识别各自独特的PAM序列,能够在crRNA的引导下在体外环境和真核细胞中行使基因编辑功能,进一步扩大了基因编辑工具的种类,丰富了现有Cpf1作为基因编辑工具的PAM多样性,为Cpf1用于临床治疗提供更多的工具选择,对推动将基因编辑应用于临床治疗具有重要的作用。
作为本发明所述的V型CRISPR/Cas12a基因编辑***的优选实施方式,所述CRISPRarray包括直接重复序列和间隔序列;所述直接重复序列和所述间隔序列间隔排列。
进一步的,所述直接重复序列的核苷酸序列如SEQ ID NO.10~15中任一种序列所示。
作为本发明所述的V型CRISPR/Cas12a基因编辑***的优选实施方式,上述辅助蛋白的氨基酸序列如SEQ ID NO.7~8中任一种序列所示。
第四方面,本发明将所述V型CRISPR/Cas12a基因编辑***在原核或真核生物基因编辑中应用。
第五方面,本发明将所述V型CRISPR/Cas12a基因编辑***在制备生物基因编辑制剂中应用。
与现有技术相比,本发明的有益效果为:
(1)本发明通过宏基因组生物信息学分析,首次挖掘出六种全新V型CRISPR/Cas12a基因编辑***,并预测其各自对应的直接重复序列。6种新型编辑***的Cpf1蛋白可识别多种不同的PAM序列,只需要识别3位碱基、2位甚至1位碱基的PAM,可靶向的范围更广泛,还可极大的提高对特定基因位点进行精准编辑的可能性,为进一步挖掘更多的PAM富含C碱基的新型Cpf1核酸酶提供参考价值。
(2)本发明通过实验证明这六种CRISPR/Cas12a基因编辑***可以识别各自独特的PAM序列,能够在crRNA的引导下在体外环境和真核细胞中行使基因编辑功能。本发明新的六种基因编辑***的发现进一步扩大了基因编辑工具的种类,丰富了现有Cpf1作为基因编辑工具的PAM多样性,为Cpf1用于临床治疗提供更多的工具选择,对推动将基因编辑应用于临床治疗具有重要的作用。
附图说明
图1为本发明所述CRISPR/Cas12a基因编辑***由Cas蛋白及CRISPR array的组成示意图。
图2为本发明所述六种CRISPR/Cas12a基因编辑***的crRNA的二级结构预测图。
图3为本发明所述六种CRISPR/Cas12a基因编辑***的PAM图。
图4为本发明所述六种CRISPR/Cas12a基因编辑***的体外切割实验图。
图5为本发明所述六种CRISPR/Cas12a基因编辑***的真核细胞中发生基因编辑后***dsODN的PCR检测图。
具体实施方式
为更好地说明本发明的目的、技术方案和优点,下面将结合具体实施例对本发明作进一步说明。本领域技术人员应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。
本发明中,所述Cas12a也称为Cpf1,根据同源性将其划分到CRISPR***2类V型中。效应蛋白Cpf1可以通过与引导RNA的互补性应用于基因组编辑。与Cas9相比,Cas9和Cpf1之间靶DNA识别和切割机制不同,所述的Cpf1具有以下特征:(1)Cpf1由单个crRNA引导,而Cas9使用crRNA和第二种小RNA,即反式激活crRNA(tracrRNA);(2)Cpf1识别富含T的PAM,与Cas9青睐的富含G的PAM相反;(3)Cpf1在PAM远端靶位点产生交错端,而Cas9在PAM近端靶位点产生钝端,相比而言,Cpf1切割之后的黏性末端更容易发生同源重组修复;(4)Cpf1含有RuvC结构域,但缺少可检测的第二个核酸内切酶结构域,而Cas9使用HNH和RuvC核酸内切酶域分别切割靶DNA链和非靶DNA链。
实施例中所用的试验方法如无特殊说明,均为常规方法;所用的材料、试剂等,如无特殊说明,均可从商业途径得到。
实施例1:六种V型CRISPR/Cas12a基因编辑***
本实施例利用CRISPRCas Finder软件进行宏基因组注释,通过NUPACK软件预测crRNA结构等生物信息手段对V型CRISPR/Cas12a***相关蛋白及其元件进行分析、预测、筛选得到本发明所述新型编辑***,如图1所示。本发明d的基因编辑***由以下多个元件组成:基因编码的核酸内切酶Cpf1,辅助蛋白Cas1、Cas2、Cas4、CRISPR array。本发明发现六种新的Cpf1蛋白,分别命名为:28c2、28c6、28c12、28c13、28c15和30c9。28c2蛋白编码1262个氨基酸,其序列如SEQ ID NO.1所示;28c6蛋白编码1253个氨基酸,其序列如SEQ ID NO.2所示;28c12蛋白编码1265个氨基酸,其序列如SEQ ID NO.3所示;28c13蛋白编码1274个氨基酸,其序列如SEQ ID NO.4所示;28c15蛋白编码1260个氨基酸,其序列如SEQ ID NO.5所示;30c9蛋白编码1251个氨基酸,其序列如SEQ ID NO.6所示。
辅助蛋白Cas1,其序列如SEQ ID NO.7所示,辅助蛋白Cas2,其序列如SEQ ID NO.8所示、辅助蛋白Cas4,其序列如SEQ ID NO.9所示,三种辅助蛋白参与外源基因捕获,以及crRNA的成熟。
CRISPR array包括直接重复序列和间隔序列,这两种序列是间隔排列的,两个重复序列中间夹一个间隔序列,重复序列在同一细菌中的碱基组成和长度是相对保守的,在不同的细菌之间会有些许差异。本发明所述六种新型CRISPR/Cas12a***所对应的重复序列(Direct Repeats,R序列)D,如SEQ ID NO.10~15所示。CRISPR array转录形成pre-crRNA,pre-crRNA的Spacer是能够与靶序列互补配对杂交的序列,随后pre-crRNA在剪切加工后形成5’为重复序列、3’为间隔序列的成熟crRNA,crRNA与目的锚定基因互补,引导Cpf1蛋白在与间隔序列互补的靶序列上行使编辑功能。
上述序列具体如下:
SEQ ID NO.1(28c2蛋白质序列):
MVNGTKNYFDCFTGFYPINKTLRFELKPIGKTNALIDEFKKGYVDSIVSLDEKRAESRKKVIEVLDNYYEYFINCVLSKEVLLVNDINEAYKLYKDFKADKKDKNFKSYKVKMRTKISEKFQSEKIKFALKDYKDLFGKKRLQESLLYEWYKQKLNNEEINNEAFEDIVKTLSYFIGFTTSLKDYQENRNNFFVPDEKSTSIAYRIIDENMIRYFDNCIRFETFIENKIDLFESLKQWEEYFKPENYIKYFTQDGIDNYNQIIGRKGKDIYSKGINQLINEYRQINKIKNKNLPTMNQLYKQLLSKHNSEELIVGFKDEKDMLQKIETTYYEYSEIVSKLVSFLSESLADDINLYIRSDSLTNLSNSMFGRWDFINDAIYSYTSGFSEKDKLKYEKDVKEVISLVKLQKVIDTYVSSLDIDEKGKYISNSSIYKYLLSINDLNLKNAYSEAKPILVLNEIDNERTNDSERIQQINKIKSLLDAMLEIMHFYKPLYLYKNGKSLVEVEKDEVFYSEFDYLYSQLMPITKLYDKVRNHITKKPYSKDKFKIYFNKPTLLDGWDLNKENSNLGVLLTKNNNYYLGIMNGKYNTSFDTTVAEVKNQINESSTAIGYLKMEYKQVSGANKMFPKVFFAESNKHIYKPSKEILNIRENKLYTKGADDVESRIKWIDFCKHCIKLHPEWNKYFNFKFKPTTEYEDVNTFYEDADAQMYNVSFISFNESYINELVNEGKLYLFQIYNKDFSPNSKGKPNLHTMYWKMIFEDSNITNINNTGLPVFKLNGEAEIFYRKASLNKKVTHEKNLPIKNKNRNNPKEESIFSYDLYKDKRFMADKFFLHCPITINYRTKPLSSSEFNKKINCIVENNKDISILGVDRGERHLLYYSLINQKGEILKQGSLNSLSTSYERDGQEISVLTDYNSILQGREDERDDARKNWGTIQNIKEIKDGYMSHIVHQLSKILIDNNAVLVLENLNSGFKRGRFKIEKQVYQKFEKAMIEKLNYLVFKDRNSTSPGYYLNGYQLTAPFEGFKNLYSQSGIIYYVWPSYTSKICPRTGFVNLLKLNYENIEKSKEIFNNFDIISYNKAKDYFEFGLDYRRFGKDAGKSKWLICTYGNERYFYNSKLKKFECIDITNKIKELFKSNNIDYLNEKDLRNKITNVNSKDFFNSLLFYLRITLQLRYTNGGNLDENDYILSPINDGSDKFFDSRCASESEPKNCDANGAYHIALKGLRLIHSIEDGTTSKIGNETTDWFTFAQNKNKLVE;
SEQ ID NO.2(28c6蛋白质序列):
MSKGKIWENFINQYSVSKTLRFELKPVGKTLENINAKGLIEEDEQRAEDYKKAKKIIDEYHKYFIEGALGSCSLDLNILNEFLQLYNKAQKTDADKKEYEKIQTTLRKNIAESFGKNADKKTKEQYENLFKKELLRNDLPDWVEDEEDAKIIERFKTFTTYFTGFHENRKNIYDNEEKSTAIGYRIVHENLPKFIDNMNAFEKISKALDLSEIDRDFQSELGEIKAEEFFTIEFFNQCLNQFGIDRYNTLLGGISEGENIKKKQGLNERINLYNQQLKGERKKERLPKLKVLYKQILSDSSSHSFSIDEFENDNELLESLEIFYKNELIGFNHSGVDSNIFDLVKDLLLKIDESEQSSIYLKNDKGLTEISQRIFGDWNIIKSALEEYYDEHYPPKKDTFNKKELDERSRWLKENHSIGVIEKALANYENEIVREHLKQNSAPIVSYFKSLEVDGENLIDKIYSAYGNISDLLNSSYPDEKKLVSDRTSKDKIKVFLDSLMSLLHFLKPLDVKDLGNKDSAFYGDYDFIVEQLSKLVRLYNKTRNYLTRKPYSIEKIKLNFENSTLLAGWDVNKERDNNCVIFKRQDGDRELFYLGIMDKSHNKIFTKIEEAKSDDVYQKMNYKLLPGPNKMLPKVFFSKKSIDFYAPGEELLKNYKNGTHKKGENFNLQHCHELIDFFKRSINKHEDWSQFNFKFSDTSEYEDTSFFFKEVSQQGYSITFKNIDRETIEKFVDEGKLYLFQIYNKDFSPKSKGRPNLHTLYWKMLFDERNLANTVYQLNGEAEVFYRKKSISEKDRVVHRADEPIGLKNSENSAQKSLFPYDIVKDRRFTVDKFQFHVPITLNFKSEGNERLNISVNKFLKDNPDVNIIGLDRGERHLIYLTLINQKGEILHQESLNEVMGVNYQQKLHRVEKDRTEERRNWDRIENIKELKSGYLSQVVHKISQLMVEYNAIVVMEDLNFGFKRGRIKVEKQVYQKFEKTLIDKLNYLVFKDREPEEPAGVLNALQLTNKFESFKKLGKQCGFLFYVTSDYTSKIDPATGFVNLLYPKYESVEKSQNFFRKFDNICFNSGAGYFEFDFDYSNFTDRADGTRTRWKVCTVGNERFGYNPKTKASETVNVTESLKELLLQHEIAFENGESLVESISKNTTKYFHKSLLNFLRLTLTLRHSKTGTDIDYILSPVANEEGVFFDSRNASDKMPKDADANGAYNVALKGLMVLERINAAEDLSQFKFKDMSIKNKDWLKFVQDRQG;
SEQ ID NO.3(28c12蛋白质序列):
MIEYTNFIGLYPLSKTLRFKLLPIGKTLENITRNGILTDDKHRAQSYQEVKKLIDEYHKEFIEHTLETFNLELLSTNKQNSLEEYHQLYLKEKNESELKNFTKTQENLRKQIAKTLQNEAKKASLFDKDMIKKNLPDFIQQHPDLKDKENLVKEFDEFTTYFTGFHENRRNMYSDEEKSTAIGYRIIHQNLPKFIDNMIVFSRIQSELQGELNLIAADFKDLLVVNNLDEMFTLPYFNQVLTQSQIDLYNMVIGGKSEEGKIKKQGLNEYINLYNQNHKEQKLPLFKPLFKQILSDRQSLSWLPQQFEEDQELLNAVRECFYSLNDSQCNLKHLQALLVSLADYNLNGIYLTNGPAITTISQQMFNDWNLINRAIIERMSRDIKASSKQKSEAKLEEEIRKRMDSTESFSIQYLNECIETSEIEDIKNAADKRIESAHFARLMICNKKTNEQENLFERIYTAYNEAQTLLNTPYPENQNLIQDQENVARIKYLLDTVKDLQLFVKPLLGKGYEIGKDDTFYGILTRLWTVIDQLTPLYDKVRNYLTRKPYSDKKIKLNFKNSTLLNGWDKNKEADNTAIIMRKEGLFYLGIMNKDIKGYKRMFEKCPQCSEEEAYYEKMEYKLLPGPNKMLPKVFFAKNNIELFKPSERIMAIRENETFKKGDKFNLADCHAFIDFYKESIAKHPEWKDFDFHFSETQLYNDISGFYREVEHQGYKMSFRKIPATYIDQLVENNELYLFQIYNKDFSEYSKGTPNMHTLYWKMLFDERNLADVVYKLNGQAELFYRPASLNYNRPTHPKNEPITNKNKNNPKKESIFKYDLTKDKRYTQDTFLLHVPITLNFKGTNNGNINQQVNSYLQTADNTHIIGIDRGERHLLYLVVIDMKGNIKEQFSLNEIANQNKGIEYRTNYHQLLENREKERVEARVNWQNIENIKDLKEGYLSQVIHLITQLMLKYHAIVVLEDLNFGFMKGRQKVEKSVYQKFEKQLIDKLNYLVNKQIDAEKPGGLLKAYQLAKPFESFQKMGKQSGFLFYIPAWMTSKIDPVTGFVNLLNTNYVNVKESQKFFSNFDRIAYNPEKDWLEWDIDYNKFTTKAKNSRHNWTICTQGERIENHRNEKNGQWNSQNVNLTEEFKKLFALYDIDLAQDLKKYIIQQNDAKFFKELHRILKLTLQMRNSQINSDIDYLVSPVANAEGCFYNSQTANATLPANADANGAYNIARKGLYLLQQIKKAPDLAKLKLTISNEEWLKFAQEKTYQND*;
SEQ ID NO.4(28c13蛋白质序列):
MFNQFTNLYPVIKTLRFELKSIGNTMDTIESNQVIHNDEKRADAYAKLKVTLDAYHKDIIEKVLSRARLTGLEDYAIAVNNLKTSKGNAAYGKELTKNKEQLRKQIAGFFKQPEFAPIFKDLFKEGVIKKDVKAWIDTQPNPSDYFYSDDFANFTGYFGNYNLIRQNLYSPEAKHGTIAYRLIDENLPKFIDNLSILQNIQNKNPDLFDQLSDQYQQYFSELLPSKPTLADFVSLDTFNDLLTQKGLDAYQQIIGGIKTENQLIQGINVLINLHNQQHPEQSKTPKLKPLYKQLLSDRGTFKLPRKFEDDAEMIQANRQYFEEVLGNNTLFETGETPTEAMNQLFLSIENYDLSKIFIESPLLVTSISQKIYGSYAVIPQALEYYHDNHVNPSYAAKFNKAKSDKSRETMEKAKAAWVKGVHAVSVIHQAVIAYNDVLPDDAKLTDTQPVISYYKDIQYSEKTGESQQIFDALMRRYHQAKGMLNTDYPKGSKQILNNKSSFAIVKNLLDVSKAYVNAARDLTIKKPEGLDLDLLFYERLAKTYTYLQDLHALYDTTRNYVTQKPFSTDKIKLNFDCAQLLAGWDFNVIDAKRGVFLVKNGRYYLVIIDNKHKKAMNNLPAPITNNCYDKYNMRLSKDAHMALPKKLFTKDNLKIPAIAEMERRCRDKNGGHHLRKSPDFDKDFMHQMIDTFKDIIKKDKDFDVFGFQFKPTHQYEDINEFYADFNEQALVTWYDKVDSDVIDSLVAEGKIYLFEVYSKDFSDKSTGTPNQQSLILQYLFSQDNLAKRHFKLNGEAEVFYRKASIDKDKAVVHKKGSLLENKNPARPNSKIAKFDIVKDRHYTEDKLFLHIPITLNNNAADMKSYAMNSKVLNTLKTNGGVNVIGIDRGERNLLKITVINSAGEILHQESLNKITSGQDMVTDYHELLDKKEQSRAESRLNWQEVESIKEIKQGYLSQVVYRLSQLMLQYKAIVVLEDLNIGFKRGRFKIEKQVYQNFEKALINKLNYLVLKQLEATEVGGTAHGYQLTAPFESFQKLGKQSGWLFYVPAWNTSHIDPTTGFVNLHHFKYESVAQATDIIDKLSNIRYNPEKDYFEFAIDYNEFTFKGGDSQKYWVVCSTPYKRYVFDKKANMGRGGTKAVDVNAELKALFAAHGVDYASGEDLRPQIKAKANKELLSQLLFLLKTLTAMRYTNASSYEDYILSPVVNKAGEFFDSRKGDATLPLDADSNGSYHIALKGLCLLQRVYDWRGEEFKGLDLFISNNDWLKFAQDRH*;
SEQ ID NO.5(28c15蛋白质序列):
MSNTKDNIFNNFTGIYPINKTLRFELRPVGKTYDLIKDFKNGYVESIVAIDEKRSEARKRIIEIIDEYYEEFINTVLSKKVFYSDDIWQTYTSYKAYKSDKRNKEFVTQKAIMRKKISDAFQNEKTKFNLKDFKDLFGKKSNLKESPLYKWYKNKLDIGEITGEDFEDIIKIITYFIGFTTSLKDYQENRNNLFVAEEQSTAISHRIIDVNMIRYFENCIRFENMKDSELLEDMGKWEKYFVPANYDNFFTQEGIDNYNEIIGRKSKDLYYKGVNQLINEYRQKNKIKNKDMPTMNQLYKQHISKNGDNEINNDFSNEKEMLEQIEQAYITSLDKINRIVSFINENITEGNKIFIRKDFVTNISNRLFGEWNFINNALYSYLSGLSAKNKELFVKQTEEVIKISELQNIIDLYINNLDEDEKEKYLKTDAIYTHFCSFDVCGVQNAYYEAKTVLAVDEINKDREKEEEGAKQISKVKKLLDEILEAVHFYKPLYLYKNGKEIDEIEKDEIFYSEFDYLYSQLMLVTELYDRVRNYLTKKPYSKDKFKIYFNKPTLLDGWDLNKEKNNLSVLLIKDGFYYLGIMDSKYNSVFDVSADDVKINTTELSEEATFLKMEYKQVSGASKMFPKVFFAASNKDMFKPSEEILNIRENKQYLKGANNREAVIKWIDFCKDCLKIHPEWNRYFNFNFRHSDEYENVNSFYEDADTQMYYINFVKFKETYINDLVEEGKLFLFQIYNKDFSEYSKGKPNLHTVYWKMLFDENNVRNINDNTGKPVFKLNGEAEIFYRKASLDKKVTHKKNYPIKNKNKHNNKTESIFEYDLYKDKRFMDDKFFFHCPITINYRAKNILSSEFNKKFNLHIKNSDNMNILGVDRGERHLLYYSLINIKGGIIKQGSLNTIYDSYEKDGINIPVITDYKSILKDREDERMDSRKNWGTIKNIKEMKEGYLSHVVHQVSKLLIDNNAILVLENLNSGFKRRRLKIEKQVYQNFEKSLINKLNYLVLKDADNKDVGHFLKGYQLTAPFEGFQRLNNQSGIIYYVWPSYTSKICPRTGFVSLLHINYENIEKSKEFFNKFDKISYNKDKDYFEFHLDYTRFGKNAGKNKWVICTYGKDRYFFNQKLKKYEYIDITEKIKELLSNNGIDFINENDMRKSIVENNSKNFFGSLLFYLKVVMQLRYTNSNDGCRNENDYILSPVADINGMFFDSRHACDNEPENADANGAYHIALKGLRMIQFIENGVITKQGNETTDWFKFAQNKL*;
SEQ ID NO.6(30c9蛋白质序列):
MSAQSALSTLINKYSLSKTLRFELIPIGKTKESIDRKGLLSQDVKRAQSYKEVKKIIDEYHKEFIEKSLINAKLKGLEEFSKLYYKLQKEDKDKKNIKKMQDNLREQISDLFKNNKKDKWNILFKEDLIKKELPLFAKDDKQKNLINEFNKFTTYFTGFHKNRKNMYAEEEKSTSIPYRIIHQNLPKFLDNIRIFEKIKKNKINTDVIEKELSLFLNGIKINDIFSINFFNDVLNQKGITFYNTILGGVSEKDRTKIKGINEYVNTEYNQKQLDKKSKIPKLKQLYKQILSDTETASFVLEQFENDNQLLEKIEQFYNTELINYETEGKTQSVFLQFEQLFKNMQNYDASKIYISNLSIANISKIIFGDWSIICNALAEWYDKHNTKGKKINEYKKENFLKQDFSIQQIEDAVLEYKNDTLNKEINFLLNYFASFLNEKSKKNIIQRIETEYSKVKDLLNTDYPEKKKLASDKDNVSKIKAFLDSLMDFLHFVKPFNIKKDTGLEKEENFYSIYVPLFEQIDKIIPLYNKVRNYLTKKPYSTEKIKLNFENSTLLDGWDLNKESDNTSVVLRKDDLYYLGIMDKKHNRIFKELPSQNGNESSYEKMIYKLLPGPNKMLPKVFFSKKGKKQFKPSKKLLKKYEDGTHLKGDNFNINDCHNLIDFFKESIAEHEDWKQFDFKFSSTSSYKDLSNFYKEVEKQGYKITFQNISENYINQLIDEGKLYLFQIYNKDFSKYSKGTPNLHTLYWKMLFDNDNLKNIVYKLNGKAEVFYRKSSLILGDNIVHKAGEAIINKNPDNEKKHSTFDYDLIKDKRFTLDKFQFHVPITLNFKSEGRQNLNEDVRKFLKNNPDINIIGIDRGERHLLYLTLINQKGKILFQKSLNEITNEYNNKNGKSQIKSTNYHSLLDKKEKKRDEARKNWGIIENIKELKEGYMSQIVHYISKLMIEKNAILSLEDLNFGFKRGRQKVEKQVYQKFEKMMIDKLNYLVFKDKKANETGGLLNALQLTNKFESFAKLYNQSGFIFYVPAWNTSKIDPITGFVNLLKPYYENLNKSQEFFKKFNNIKYNPKQEYFEFNFDYKNFTNKAEGSKNVWEICTTNNERFMWDKTLNSGKGAQKAVDVTQELKKLFDSSKINYLNGNDIKEDIINQNSADFFRKLMKLLSVVLSLRHNNGLKGKDEKDFILSPVEPFFNSLNAKMEEPKDADANGAYNIALKGLLILKQINESEDLRKIKFNLSNKEWLKFAQSKSF;
SEQ ID NO.7(cas1蛋白质序列):
MNQLVTGGISVLNKGEFIKKQILVYEPFLGDKMSYKNDNMVIRDGNGKIKYQVSCYRIFMVLIVGDVTITTGILRRQQKFGFRLCFLTLGLKVYSVIGPQLQGNTLLHCKQYAYDELTVGKSIIINKILNQRAALTRLRSKTEDVWECISLLEQYSKRLQNDSLNLQEIIGIEGMASKIYFPRIFSNTQWIGRKPRIKFDYINTLLDIGYNALFNFIDAILQVFGFDVYYGVLHTCFYMRKSLVCDIMEPMRPIVDWQIRKSINLKQFKQDDFVQVGKQYQLKYKKSTQYLQVFLEAILNYKEEIFVYVRDYYRSFMKNNPIEAYPVFKLEEL;
SEQ ID NO.8(cas2蛋白质序列):
MIIVSYDISDDKLRTKFSKYLSRFGHRIQYSMFEIDNSERILNNIICDIHNQFEKKFSQEDSIYIFNLSKWCKIE RFGYAKNETNDLLVLTGCKPRP;
SEQ ID NO.9(cas4蛋白质序列):
MEDIILITELNDFIFCPASIYFHHLYGSRDPVLFQSEAQIKGTKAHEAVDSGCYSKKSSILQSLDVYCEKYRL LGKIDIYDGKKKILRERKRQIKQVYDGYIFQLYGQYFSLIEMGYEVDKMELYSMIDNKKYPIELPHNNINMLM KFEMLIHEMREFRLDDRFIQENANKCKNCIYEPACDRGNIGAK;
SEQ ID NO.10(28c2直接重复序列):
GAATTTCTACTGTTGTAGAT;
SEQ ID NO.11(28c6直接重复序列):
AAATTTCTACTTCTGTAGAT;
SEQ ID NO.12(28c12直接重复序列):
TAATTTCTACTATTGTAGAT;
SEQ ID NO.13(28c13直接重复序列):
AATTTCTACTATGTGTAGAT;
SEQ ID NO.14(28c15直接重复序列):
AAATTTCTACTGTTGTAGAT;
SEQ ID NO.15(30c9直接重复序列):
TAATTTCTACTATTGTAGAT。
实施例2:预测基因编辑***识别靶序列的crRNA的二级结构
本实施例为预测实施例1所述的6种V型CRISPR/Cas12a基因编辑***用于识别靶序列的crRNA的二级结构。
具体操作如下:
通过使用AlphaFold模拟体外37℃重复序列的作用过程,并进行二级结构预测,得到成熟crRNA的二级结构。Pre-crRNA在Cpf1核酸酶的作用下去掉重复序列的上游序列,得到20nt的重复序列,和23nt的间隔序列形成成熟的crRNA,与Cpf1蛋白融合形成crRNA-Cpf1复合体。crRNA-Cpf1复合体首先通过扫描寻找合适的PAM,然后继续扫描与间隔序列互补配对的DNA序列,则Cpf1核酸酶的活性被激活,结果如图2所示。
实施例3:体外PAM耗竭实验
本实施例通过体外PAM耗竭实验挖掘实施例1所述的6种V型CRISPR/Cas12a基因编辑体系Cas核酸酶识别间隔序列所需的PAM序列。
具体操作如下:
(1)对于上述实施例1的6中V型CRISPR/Cas12a基因编辑***,将编码Cas蛋白所对应的核苷酸序列通过同源重组***到psumo蛋白表达载体上,将测序正确的重组质粒转化到E.Coli Rosseta 2(全式金,CD811-02)感受态中,激活后涂Kana抗性(50μg/mL)培养皿,次日挑取单克隆,经过大量菌液培养,重组蛋白依次经过Ni柱亲和层析及分子筛纯化后,于-80℃保存备用。
28c2、28c6、28c12、28c13、28c15和30c9的核苷酸序列分别如SEQ ID NO.16~21所示。
(2)在library间隔序列(其序列如SEQ ID NO.22)的3’端添加6个位置随机碱基NNNNNN(共4096条***片段,N表示A、G、C、T),采用overlap PCR方法把library构建到骨架载体上,得到具有4096种不同PAM组合,但5’端间隔序列是一样的Spacer-PAM混合质粒,经二代测序检测到6个位置的随机碱基丰度Gini值小于0.1,表明6个位置的随机碱基分布较均匀。
(3)使用上述实施例2所述V型CRISPR/Cas12a基因编辑***的重复序列,以及对应的library的间隔序列,构建成5’-T7启动子+重复序列+间隔序列形式-3’,通过体外转录(NEB,E2040S)及RNA纯化(NEB,T2040S)获得crRNA。
(4)取10pmol步骤(a)所得Cpf1纯化蛋白与10pmol crRNA混合,室温孵育得到蛋白-crRNA复合物。再与200ng Spacer-PAM混合质粒混合均匀,置于37℃孵育30min,Cpf1蛋白通过识别4096条混合PAM组合中合适的PAM,切割与crRNA互补的Spacer;加入适量蛋白酶K,室温孵育15min,以消化多余的Cpf1蛋白,再98℃灭活10min以灭活蛋白酶K活性。
(5)在Spacer-PAM混合质粒上,随机碱基的两端设计合适的引物对含有间隔序列和PAM组合的位置进行PCR扩增及纯化,在产物两端加上接头进行二代测序(接头采用商业的illumina测序接头引物:Hieff NGS384 Dual Index Primer Kit for Set1,货号12613ES02;I5 primer:TAAGATTA;I7 primer:GAGATTCC),以阴性对照组的PAM耗竭阈值为对照基准,利用Weblogo 3分析6个随机碱基的消耗,通过负向筛选的方式得到每个Cpf1蛋白所识别的PAM序列。
PAM分析结果如图3所示,这六种基因编辑***可识别多种不同的PAM序列,28c2、28c6、28c12、28c15识别3位碱基的PAM,30c9和28c13只需要识别2位甚至1位碱基的PAM,更简单的PAM在基因组上出现的频率更高,这意味着本发明的新型Cpf1蛋白可靶向的范围更广泛,还可极大的提高对特定基因位点进行精准编辑的可能性。
尤其是,相较于现有技术中的Cpf1蛋白大多识别PAM都是富含T碱基的,对于一些T碱基含量较少的基因序列,限制了这些Cpf1蛋白的使用。本发明的30c9蛋白识别PAM富含C碱基,其在真核体外和真核条件下均具有编辑功能,一方面丰富了现有Cpf1蛋白的PAM多样性,可作为编辑一些高GC含量的基因序列的理想编辑工具;另一方面参考30c9核酸酶的蛋白序列和编辑特性,可为进一步挖掘更多的PAM富含C碱基的新型Cpf1核酸酶提供参考价值,扩充现有的基因编辑种类。
上述序列具体如下:
SEQ ID NO.16(28c2基因序列):
ATGGTGAACGGCACCAAGAACTACTTCGACTGTTTCACCGGGTTCTACCCCATCAACAAGACCCTGCGGTTCGAGCTGAAGCCGATCGGGAAAACCAACGCCCTCATCGACGAGTTCAAGAAGGGCTACGTGGACTCCATCGTGAGCCTGGACGAGAAGCGGGCCGAGTCCAGGAAGAAAGTGATCGAGGTGCTGGACAACTACTATGAGTACTTCATCAACTGCGTGCTGAGCAAGGAGGTCCTGCTGGTGAACGACATCAACGAGGCCTACAAGCTATACAAGGACTTCAAGGCCGACAAGAAGGACAAGAACTTCAAGTCCTATAAGGTGAAGATGAGGACCAAGATCTCCGAGAAGTTCCAGTCCGAGAAGATCAAGTTCGCCCTGAAAGACTACAAGGACCTCTTCGGCAAGAAGCGCCTGCAGGAGTCCCTGCTGTACGAGTGGTACAAGCAGAAGCTGAACAACGAGGAGATCAACAACGAGGCCTTTGAGGACATCGTGAAAACCCTGAGCTACTTCATCGGCTTCACCACCAGCCTGAAGGACTACCAGGAGAACAGGAACAACTTCTTCGTGCCCGACGAGAAGAGCACCTCCATCGCTTACCGCATCATCGACGAGAACATGATCCGGTACTTCGATAACTGCATCCGGTTCGAGACCTTCATCGAGAATAAGATTGACCTGTTTGAGAGCCTGAAGCAGTGGGAGGAGTACTTTAAGCCCGAGAATTACATCAAGTACTTTACACAGGACGGGATCGACAACTACAACCAGATCATCGGGCGGAAGGGGAAGGACATCTACTCCAAGGGAATCAACCAACTGATCAACGAGTACCGGCAGATTAACAAGATCAAAAATAAGAACCTGCCGACCATGAATCAGCTCTACAAGCAGCTCCTGAGCAAGCACAACAGCGAAGAGCTGATCGTCGGCTTCAAGGACGAGAAGGACATGCTGCAGAAGATCGAGACCACTTACTACGAGTACTCCGAAATCGTGTCCAAGCTGGTGAGCTTCCTGAGCGAGTCCCTGGCCGACGACATCAACCTTTACATCCGCTCCGACAGCCTGACTAATCTGAGCAACAGTATGTTTGGCC
GCTGGGACTTTATCAACGACGCCATCTACTCTTACACCAGCGGATTCTCTGAGAAAGACAAGCTGAAGTA
CGAGAAGGACGTTAAGGAAGTGATCAGCCTCGTGAAGCTGCAGAAGGTTATCGACACCTATGTGAGCAG
CCTCGATATAGACGAGAAGGGGAAGTACATCTCCAATTCAAGTATCTACAAGTACCTGCTGTCCATCAAT
GACCTGAACCTGAAGAACGCCTACTCCGAGGCAAAGCCTATCCTCGTTCTCAACGAGATCGATAACGAGA
GGACAAATGACAGCGAGCGCATCCAGCAGATCAATAAGATCAAGTCCCTGCTGGACGCCATGCTGGAGA
TTATGCACTTCTATAAGCCCCTGTACCTGTATAAGAACGGCAAGAGCCTCGTCGAGGTGGAGAAGGACGA
GGTGTTCTATTCCGAGTTTGACTACCTCTACAGCCAGCTCATGCCAATCACAAAACTGTACGATAAGGTGC
GGAACCACATCACAAAGAAGCCCTACAGCAAGGACAAGTTCAAGATCTACTTCAATAAGCCCACTCTCCT
CGATGGCTGGGACCTTAATAAGGAAAACTCAAACTTGGGGGTGCTGCTTACCAAGAACAACAACTACTAC
CTGGGCATCATGAACGGGAAGTATAACACTTCCTTCGATACAACAGTGGCCGAGGTGAAAAACCAGATTA
ACGAGAGCTCTACAGCTATCGGGTATCTGAAGATGGAGTACAAGCAGGTCTCCGGGGCCAACAAGATGTT
CCCTAAGGTGTTCTTCGCCGAGTCCAATAAGCACATCTACAAGCCCTCCAAGGAGATACTGAACATCAGA
GAGAACAAGCTCTACACTAAGGGCGCTGACGATGTGGAGTCTCGCATCAAGTGGATTGACTTCTGCAAGC
ACTGTATCAAGCTGCACCCTGAGTGGAACAAATACTTCAACTTCAAGTTCAAGCCCACCACCGAGTACGA
GGACGTTAACACATTTTATGAAGATGCTGACGCCCAGATGTATAACGTGTCTTTTATCTCTTTCAACGAGA
GTTACATCAACGAGCTCGTCAATGAGGGGAAACTGTACCTGTTTCAGATCTATAATAAGGATTTTTCCCCA
AACAGCAAGGGCAAGCCAAATCTGCACACCATGTATTGGAAGATGATCTTCGAGGATAGCAATATTACTA
ACATCAACAATACCGGCCTCCCAGTGTTTAAGCTGAACGGCGAGGCCGAGATCTTCTACCGCAAGGCCAG
CCTGAATAAGAAGGTGACACACGAGAAGAACTTGCCCATCAAGAACAAGAACCGCAACAACCCCAAGGA
GGAGAGCATCTTCTCCTACGACCTCTACAAGGACAAGCGCTTCATGGCCGACAAGTTTTTCCTGCACTGTC
CTATCACCATCAACTATCGGACAAAGCCCCTCAGCAGTAGCGAGTTTAACAAGAAAATCAATTGCATCGT
GGAGAATAATAAGGACATCAGCATCCTGGGCGTGGATAGAGGCGAGCGCCATCTGCTGTACTATTCCCTG
ATCAATCAGAAGGGGGAGATCCTGAAGCAGGGCAGCCTGAACTCCCTTAGCACAAGTTACGAGCGTGAC
GGCCAGGAAATCAGCGTGCTCACCGACTACAACTCCATCCTGCAGGGCAGGGAGGACGAGCGCGACGAT
GCTAGGAAAAACTGGGGGACCATCCAGAATATCAAAGAGATCAAAGACGGCTACATGTCCCACATTGTG
CACCAACTGAGTAAGATCCTCATTGACAACAACGCCGTGCTCGTGCTCGAAAACCTGAACAGCGGCTTTA
AGCGGGGCCGGTTCAAGATCGAGAAGCAGGTCTACCAGAAGTTTGAGAAGGCCATGATCGAGAAGCTGA
ACTACCTAGTCTTTAAGGACCGGAACAGCACCAGCCCAGGCTACTATCTGAACGGCTACCAGCTCACCGC
CCCGTTCGAGGGCTTCAAGAACCTGTATAGCCAGAGTGGCATCATCTACTACGTGTGGCCATCCTACACCT
CTAAGATCTGTCCACGCACCGGCTTTGTCAACCTCCTGAAGCTGAATTACGAGAACATCGAGAAGTCCAA
GGAGATCTTTAACAACTTTGACATCATCTCCTACAATAAGGCAAAGGACTATTTCGAGTTTGGCCTCGACT
ATCGCAGATTTGGGAAGGACGCAGGCAAGTCAAAGTGGCTGATCTGCACCTATGGAAATGAGAGGTACTT
CTACAACAGCAAGCTGAAGAAGTTCGAGTGCATCGACATCACCAACAAGATCAAGGAGTTGTTTAAGTCC
AACAACATCGACTACCTGAACGAGAAGGACCTGCGGAACAAGATCACCAACGTGAACAGCAAAGATTTC
TTCAACTCCCTGCTGTTCTACCTGCGCATCACCCTGCAGCTCCGCTACACCAATGGGGGAAACCTGGATGA
GAACGACTATATCCTGAGCCCCATCAACGACGGATCTGATAAGTTCTTCGACTCCCGGTGCGCCTCCGAG
AGCGAGCCTAAGAACTGCGACGCCAACGGGGCCTACCACATCGCTCTGAAGGGCCTGCGTCTGATCCACA
GCATCGAGGACGGCACTACCAGCAAAATCGGCAATGAAACCACCGATTGGTTCACCTTCGCCCAGAACAAGAACAAGCTGGTGGAG;
SEQ ID NO.17(28c6基因序列):
ATGAGCAAGGGCAAGATCTGGGAGAACTTCATCAACCAGTATAGCGTGAGCAAGACCCTGAGGTTCGAGCTGAAGCCCGTGGGCAAGACCCTGGAGAACATTAACGCTAAGGGGCTGATTGAGGAGGACGAGCAGCGGGCCGAGGATTACAAGAAGGCTAAGAAGATCATCGATGAGTACCATAAGTACTTTATCGAGGGGGCTCTGGGAAGCTGCAGCCTGGACCTGAACATCCTGAACGAGTTTCTGCAGCTCTACAACAAGGCCCAGAAAACCGACGCCGACAAGAAGGAGTACGAGAAGATCCAGACCACCCTGCGGAAGAATATCGCCGAGAGCTTTGGCAAGAACGCCGATAAAAAGACCAAGGAGCAGTATGAGAACCTGTTCAAAAAGGAGCTCCTGCGGAACGATCTGCCTGACTGGGTGGAGGACGAGGAGGACGCCAAAATCATCGAGCGCTTCAAGACTTTCACCACCTATTTTACCGGGTTCCACGAGAACAGGAAGAACATCTACGACAACGAGGAGAAGTCCACCGCCATTGGGTATCGGATCGTCCACGAGAACCTCCCCAAGTTCATTGACAATATGAACGCTTTCGAGAAGATCAGCAAGGCCCTGGATCTGTCCGAGATCGACCGGGACTTCCAGAGCGAGCTGGGGGAGATCAAGGCCGAGGAGTTCTTTACCATTGAGTTCTTCAACCAGTGTCTGAACCAGTTCGGCATCGATCGCTACAATACTCTGCTCGGCGGCATCTCCGAGGGCGAGAATATCAAGAAGAAGCAGGGGCTGAATGAGAGGATCAACCTGTATAACCAGCAGTTGAAGGGAGAGAGGAAGAAGGAGAGGCTGCCCAAGCTGAAGGTGCTCTACAAGCAGATTCTCAGCGACAGCTCCAGCCACTCCTTTAGCATCGACGAGTTCGAGAACGACAACGAGCTGCTGGAGTCCCTGGAAATCTTTTACAAGAATGAGCTGATCGGCTTTAATCACAGCGGCGTGGACTCTAACATCTTTGACCTCGTGAAGGACCTGCTGCTGAAGATCGACGAGTCCGAGCAGTCCTCAATCTACCTGAAGAACGATAAGGGACTGACAGAGATCTCTCAGCGGATCTTTGGCGACTGGAACATTATCAAGAGCGCCCTGGAGGAGTACTATGACGAGCACTACCCTCCAAAGAAGGACACATTCAACAAGAAGGAGCTGGATGAGCGCTCACGGTGGCTGAAGGAGAACCACAG
CATCGGCGTCATCGAGAAGGCCTTGGCCAACTACGAGAACGAAATTGTGAGGGAGCATCTGAAACAGAA
CTCCGCCCCCATCGTGAGCTATTTCAAGTCCCTGGAGGTGGACGGCGAGAACCTGATCGATAAGATCTAC
AGCGCCTACGGCAACATCAGCGATCTCCTGAATAGCAGCTACCCTGACGAGAAGAAGCTGGTGAGCGATC
GGACCAGCAAGGACAAGATTAAGGTGTTCCTGGACAGCCTCATGTCCCTGCTGCACTTTCTCAAGCCTCTG
GACGTTAAAGACCTGGGGAATAAGGACAGCGCATTTTACGGCGACTACGATTTTATCGTGGAGCAACTGT
CCAAGCTGGTGCGGCTCTACAATAAGACAAGGAATTATCTGACCAGAAAACCCTACAGCATCGAGAAAA
TCAAACTGAACTTCGAGAACAGCACCTTGCTGGCCGGATGGGATGTGAACAAGGAACGGGACAACAACT
GCGTGATCTTTAAGAGGCAGGACGGCGACCGCGAGCTGTTCTACCTGGGAATCATGGACAAATCCCACAA
TAAGATCTTCACTAAGATTGAAGAGGCTAAGTCCGACGATGTGTACCAGAAGATGAATTATAAGCTGCTG
CCAGGGCCTAACAAGATGCTGCCCAAGGTCTTTTTCTCTAAGAAATCCATCGACTTTTACGCACCTGGGGA
GGAACTGCTGAAGAACTACAAGAATGGGACCCATAAGAAGGGCGAAAACTTCAACCTCCAGCACTGCCA
CGAGCTGATTGACTTCTTTAAGCGGTCCATCAATAAGCACGAGGACTGGTCTCAGTTCAACTTCAAGTTTT
CTGACACCAGCGAGTACGAGGACACCTCCTTCTTCTTCAAGGAAGTGTCCCAGCAGGGCTACAGTATCAC
ATTCAAGAATATTGATAGGGAAACAATCGAGAAGTTCGTGGACGAGGGGAAGCTGTATCTGTTCCAGATC
TATAACAAAGATTTCAGCCCCAAGAGCAAGGGCAGACCCAACCTGCACACCCTGTACTGGAAGATGCTGT
TCGATGAGCGGAATCTGGCCAACACCGTGTACCAGCTCAATGGGGAGGCCGAGGTGTTTTACCGCAAGAA
GAGCATCAGCGAGAAAGATAGGGTGGTGCACAGGGCCGACGAGCCTATTGGCCTGAAGAACTCCGAGAA
CAGTGCCCAGAAGAGCCTTTTTCCTTATGACATCGTGAAGGATCGCCGGTTCACCGTGGACAAGTTTCAGT
TCCATGTGCCCATCACTCTGAACTTCAAGAGCGAGGGGAACGAGCGGCTGAATATTAGCGTGAACAAGTT
CCTGAAGGACAACCCCGACGTTAACATCATCGGCCTGGACAGAGGCGAGCGGCACCTGATCTACCTGACC
CTGATCAATCAGAAGGGTGAAATCCTTCACCAGGAGTCCCTGAACGAGGTCATGGGAGTGAACTACCAGC
AGAAGCTGCACAGAGTTGAGAAGGACAGGACAGAAGAGAGGCGGAACTGGGACCGGATCGAGAACATA
AAGGAGCTGAAGTCTGGATACCTGAGCCAGGTGGTCCATAAGATTAGCCAGCTCATGGTGGAGTACAATG
CCATCGTGGTCATGGAGGATCTGAATTTTGGCTTCAAGCGGGGCCGAATCAAGGTGGAGAAGCAGGTGTA
TCAGAAGTTCGAAAAGACCCTGATCGACAAGCTGAATTATCTGGTGTTCAAGGACCGGGAACCTGAAGAA
CCTGCCGGAGTGCTCAACGCCCTGCAGCTCACCAACAAATTTGAGTCCTTCAAGAAGCTGGGCAAGCAGT
GCGGCTTCCTGTTCTACGTGACAAGTGACTACACTAGCAAGATCGACCCCGCCACCGGCTTCGTCAACCTG
CTGTACCCTAAGTATGAGTCAGTGGAGAAGTCCCAGAACTTCTTCAGAAAATTCGACAACATCTGCTTCA
ACTCCGGCGCAGGCTACTTCGAGTTCGACTTCGACTACTCCAACTTCACCGATAGAGCCGATGGGACCCG
CACCCGCTGGAAGGTGTGCACCGTGGGCAACGAGAGGTTCGGCTACAATCCAAAGACCAAGGCCAGCGA
GACCGTGAATGTGACCGAGTCCCTGAAGGAGCTGCTGCTGCAGCACGAGATCGCCTTCGAGAATGGCGAA
TCTCTGGTGGAGTCCATCAGCAAGAACACTACCAAATACTTCCACAAGTCCCTGCTGAATTTTCTGAGGCT
GACCCTGACCCTGAGACATAGCAAGACCGGCACCGACATCGATTACATCCTGAGCCCTGTGGCCAACGAG
GAGGGCGTGTTCTTCGACTCCCGGAATGCCAGCGATAAGATGCCAAAGGACGCCGACGCCAACGGAGCC
TACAACGTGGCCCTGAAGGGCCTGATGGTGCTGGAGAGGATTAACGCCGCCGAGGACCTGAGCCAGTTCAAGTTTAAGGACATGAGCATCAAGAACAAGGACTGGCTGAAGTTCGTGCAGGACAGGCAGGGC;
SEQ ID NO.18(28c12基因序列):
ATGATCGAGTACACCAACTTCATCGGCCTGTACCCCCTGTCCAAGACCCTGAGATTCAAGCTGCTGCCCATCGGCAAGACTCTGGAGAATATCACCCGCAACGGCATCCTGACAGATGACAAGCACCGCGCCCAGAGCTATCAGGAGGTGAAGAAGCTGATCGATGAGTACCACAAGGAGTTCATCGAGCACACCCTGGAGACCTTTAACCTGGAACTGCTTAGCACCAACAAGCAGAACTCCCTGGAGGAGTACCACCAGCTTTACCTGAAGGAGAAGAACGAGTCCGAGCTGAAGAACTTCACCAAGACACAGGAGAACCTGCGCAAGCAGATCGCCAAAACCCTGCAGAACGAGGCCAAGAAGGCTAGTCTGTTCGACAAGGATATGATTAAGAAGAACCTGCCCGACTTTATTCAGCAGCACCCCGACCTGAAGGACAAGGAAAACCTCGTGAAGGAGTTCGATGAGTTCACCACATACTTTACAGGCTTCCATGAGAACCGGAGGAACATGTATAGCGACGAGGAGAAGAGCACCGCCATCGGCTATCGGATTATCCACCAGAACCTGCCCAAGTTCATTGACAATATGATCGTCTTTAGCCGCATCCAGTCCGAGCTGCAGGGCGAGCTGAACCTGATCGCCGCTGACTTCAAGGACCTGCTGGTGGTCAACAACCTGGATGAGATGTTTACCCTGCCCTACTTCAACCAAGTGCTGACCCAGAGCCAGATCGACCTCTATAACATGGTAATTGGCGGGAAGAGCGAGGAGGGAAAGATTAAGAAGCAGGGACTGAACGAGTACATAAACCTGTATAACCAGAACCATAAGGAGCAGAAGCTGCCCCTGTTCAAGCCACTCTTCAAGCAGATCCTGAGCGATCGGCAGAGCCTGTCCTGGCTGCCCCAGCAGTTTGAGGAGGACCAGGAGCTGCTGAACGCCGTGAGGGAGTGCTTCTACTCCCTGAACGACTCCCAGTGCAACCTGAAGCACCTGCAGGCTCTGCTGGTTAGCCTGGCCGATTATAACCTGAATGGGATCTACCTGACCAATGGCCCCGCCATCACCACCATTAGCCAGCAGATGTTTAACGACTGGAACCTGATTAACCGCGCCATCATCGAGCGGATGAGCCGGGACATCAAGGCCAGCTCCAAGCAGAAGAGCGAGGCCAAACTGGAGGAGGAGATCAGGAAGCGGATGGACAGCACTGAGTCTTTCTCCATCCAGTACCTGAACGAATGCATCGAGACCAGCGAGATCGAGGACATCAAAAATGCCGCCGACAAGCGCATCGAAAGCGCCCACTTTGCCAGGCTGATGATCTGCAACAAGAAAACCAACGAGCAGGAGAATCTCTTCGAAAGGATCTACACCGCCTACAACGAGGCCCAGACCCTGCTGAATACCCCCTACCCAGAAAATCAGAATCTGATCCAGGACCAGGAGAACGTG
GCCCGGATCAAGTACCTGCTAGACACCGTAAAGGACCTCCAGCTTTTCGTTAAGCCACTGCTGGGGAAGG
GCTACGAAATCGGAAAGGATGACACCTTTTATGGTATACTGACCCGGCTGTGGACTGTGATCGACCAGCT
CACCCCCCTGTACGATAAGGTGCGAAATTACCTGACCCGCAAGCCTTACAGCGATAAGAAAATCAAGCTG
AATTTTAAGAACTCTACTCTGCTGAACGGCTGGGATAAAAATAAGGAGGCAGATAACACTGCCATCATCA
TGCGCAAGGAGGGACTGTTTTACCTGGGCATCATGAACAAGGACATTAAGGGGTATAAGAGGATGTTCGA
GAAGTGCCCTCAGTGCAGCGAGGAGGAGGCCTACTACGAGAAGATGGAGTACAAGCTCCTGCCTGGGCC
AAACAAGATGCTCCCTAAGGTGTTTTTCGCCAAGAACAACATTGAGCTGTTCAAACCCTCCGAGAGGATC
ATGGCAATCCGGGAGAACGAGACCTTTAAGAAAGGCGACAAGTTCAACCTCGCTGACTGCCACGCCTTCA
TCGACTTCTACAAGGAAAGCATCGCCAAACACCCCGAGTGGAAGGACTTTGACTTTCACTTTTCCGAAAC
CCAGCTCTACAATGACATTTCCGGGTTCTATCGCGAGGTGGAACACCAGGGATATAAGATGAGCTTTAGA
AAGATCCCAGCCACCTACATTGATCAGCTCGTGGAGAACAATGAACTGTACCTGTTCCAGATCTATAACA
AGGACTTTAGTGAATATAGCAAGGGCACCCCTAACATGCATACCCTGTACTGGAAGATGCTGTTTGACGA
GAGAAACCTGGCTGATGTTGTGTATAAGCTGAACGGCCAGGCTGAGCTGTTTTACCGACCCGCCAGCCTG
AACTACAACCGGCCCACTCACCCTAAGAACGAGCCCATCACCAACAAGAACAAGAACAACCCCAAAAAG
GAGTCTATCTTCAAGTACGACCTGACTAAGGATAAGCGGTACACCCAGGATACCTTCCTGCTGCACGTTCC
CATTACCCTGAACTTCAAAGGCACTAATAATGGCAATATCAACCAGCAAGTCAACAGCTACCTGCAGACT
GCTGATAATACACACATCATCGGCATCGACAGGGGCGAACGCCACCTGCTGTACCTCGTCGTCATCGACA
TGAAGGGGAACATCAAGGAGCAGTTCTCCCTGAATGAGATCGCCAACCAGAACAAGGGGATTGAGTACC
GGACAAACTACCACCAGCTCCTGGAGAACAGGGAGAAGGAGCGGGTGGAGGCACGGGTGAATTGGCAG
AACATCGAGAACATTAAGGACCTGAAGGAGGGCTACCTGAGCCAAGTGATCCACCTGATTACCCAGCTCA
TGCTGAAGTATCACGCCATCGTGGTGCTCGAAGATCTCAACTTTGGCTTCATGAAGGGGAGACAGAAGGT
GGAGAAGTCCGTGTACCAGAAGTTCGAGAAGCAGCTCATCGATAAACTGAACTATCTCGTGAATAAGCAG
ATCGACGCCGAGAAGCCTGGAGGCCTGCTCAAGGCCTACCAGCTCGCCAAGCCTTTTGAGAGCTTTCAGA
AGATGGGCAAGCAGTCCGGCTTCCTGTTCTACATCCCCGCTTGGATGACATCCAAGATCGATCCTGTGACC
GGCTTCGTCAATCTGCTGAACACCAACTACGTCAACGTTAAGGAGTCCCAGAAGTTTTTCAGCAACTTCGA
CCGGATCGCCTACAATCCAGAGAAGGACTGGCTGGAGTGGGATATTGACTACAATAAGTTCACCACTAAG
GCCAAGAATAGCAGGCACAACTGGACCATCTGTACCCAGGGCGAGCGGATCGAGAATCACAGGAATGAG
AAGAACGGCCAGTGGAACAGCCAGAACGTCAACCTGACCGAGGAGTTTAAGAAGCTGTTCGCACTCTAT
GACATCGACCTGGCCCAGGATCTGAAGAAGTACATCATCCAGCAGAATGACGCTAAGTTCTTTAAAGAGC
TGCACAGAATCCTGAAGCTGACCCTGCAGATGAGGAACTCCCAGATCAACAGCGACATTGACTACCTCGT
GAGCCCCGTGGCCAACGCCGAGGGCTGCTTCTACAATTCCCAGACCGCTAACGCCACCCTGCCAGCCAAC
GCCGACGCCAACGGGGCCTACAATATCGCCCGCAAGGGCCTGTACCTGCTGCAGCAGATCAAGAAGGCC
CCTGACCTGGCCAAGCTGAAGCTCACCATCTCTAACGAGGAGTGGCTGAAGTTCGCCCAGGAGAAAACCTACCAGAATGAC;
SEQ ID NO.19(28c13基因序列):
ATGTTTAACCAGTTCACCAACCTGTACCCAGTGATTAAGACCCTGAGATTCGAGCTGAAGAGCATCGGCAACACTATGGACACTATCGAGAGCAATCAGGTCATCCACAATGACGAGAAGAGGGCCGACGCCTACGCCAAGCTGAAGGTGACCCTCGATGCCTACCACAAGGATATTATTGAGAAGGTGCTGAGCCGCGCCAGACTGACCGGCCTGGAGGACTACGCCATCGCTGTGAACAACCTGAAAACCTCTAAGGGCAACGCCGCTTACGGCAAAGAGCTGACCAAGAACAAGGAGCAGTTGAGAAAGCAGATCGCAGGATTCTTCAAGCAGCCCGAGTTCGCCCCAATTTTCAAAGATCTGTTCAAGGAGGGCGTGATCAAGAAAGACGTTAAGGCCTGGATCGACACCCAGCCTAACCCTAGCGATTACTTCTACTCCGATGACTTCGCCAATTTCACCGGCTACTTCGGCAACTATAACCTGATCCGGCAGAACCTGTATAGCCCTGAGGCTAAGCACGGCACCATCGCCTATCGGCTGATTGACGAGAACCTGCCCAAGTTCATCGACAATCTGAGCATTCTGCAGAACATTCAGAATAAGAATCCCGACCTGTTCGACCAGTTGAGCGACCAGTACCAGCAGTACTTCAGCGAGCTGCTGCCTTCTAAGCCTACACTGGCCGACTTCGTGAGCCTGGACACCTTCAATGATCTGCTGACCCAGAAAGGCCTGGACGCCTACCAGCAGATCATCGGCGGCATCAAGACTGAGAACCAACTGATCCAGGGCATTAATGTGCTGATCAATCTGCACAACCAGCAGCACCCCGAGCAGAGCAAGACCCCCAAACTGAAGCCCCTCTATAAGCAGCTCCTGTCCGACCGCGGCACTTTCAAGCTCCCACGGAAGTTTGAGGATGACGCTGAAATGATCCAGGCCAACCGCCAGTACTTCGAGGAGGTGCTGGGCAACAACACTCTGTTCGAGACCGGCGAAACACCCACCGAAGCCATGAACCAGCTTTTCCTGAGCATCGAGAATTACGATCTGAGCAAGATCTTCATCGAGTCCCCCCTGCTGGTGACCTCCATCTCCCAGAAGATCTATGGCTCCTATGCCGTGATTCCCCAGGCCCTGGAGTACTACCACGATAATCACGTTAACCCCTCTTACGCCGCCAAGTTCAATAAGGCCAAGTCCGACAAGAGCAGGGAGACTATGGAAAAGGCCAAAGCCGCCTGGGTGAAAGGCGTGCACGCCGTGAGTGTGATCCACCAGGCTGTGATCGCATACAATGATGTGCTGCCTGATGACGCAAAGCTGACAGATACCCAGCCCGTGATTAGCTACTACAAGGACATCCAGTACTCCGAAAAGACTGGCGAGTCCCAGCAGATCTTCGATGCCCTGATGCGCCGCTACCACCAGGCCAAAGGCATGCTGAATACTGATTACCCAAAGGGCTCCAAGCAGATCCTGAACAACAAGTCTAGCTTCGCCATCGTGAAAAACCTGCTGGATGTGTCCAAGGCCTACGTGAACGCCGCCCGCGATCTGACAATCAAAAAGCCCGAAGGCCTTGACCTGGACCTGCTGTT
CTACGAGAGGCTCGCCAAAACTTACACATACCTGCAGGACCTGCACGCACTGTACGACACCACGAGAAAC
TACGTGACCCAGAAACCTTTCTCCACCGATAAGATCAAGCTGAATTTTGACTGCGCTCAGCTCCTGGCCGG
GTGGGACTTTAATGTGATCGATGCCAAGAGGGGCGTGTTTCTGGTCAAGAATGGGCGGTATTACCTCGTC
ATCATCGATAATAAGCATAAGAAGGCCATGAATAACCTGCCCGCTCCTATCACTAATAACTGCTACGACA
AATATAACATGAGACTGAGTAAGGACGCCCACATGGCCCTGCCTAAAAAGCTCTTTACCAAGGATAACCT
CAAGATCCCTGCCATTGCCGAGATGGAGCGCAGGTGTCGGGACAAAAATGGCGGCCACCACCTGAGGAA
GAGTCCCGACTTTGATAAGGACTTTATGCACCAGATGATTGACACCTTTAAGGACATTATCAAGAAGGAC
AAGGACTTCGACGTTTTCGGCTTCCAGTTTAAGCCCACTCACCAGTACGAGGACATCAATGAGTTTTACGC
CGACTTCAATGAGCAGGCCTTAGTGACTTGGTACGATAAGGTTGATAGCGATGTGATTGATAGCCTGGTG
GCCGAGGGGAAGATCTACCTGTTCGAAGTGTACTCCAAAGATTTTAGCGACAAGAGTACCGGGACTCCCA
ACCAGCAGAGCCTGATCCTGCAGTACCTGTTCTCTCAGGATAATCTGGCCAAAAGGCACTTTAAGCTGAA
CGGCGAAGCCGAAGTGTTCTACCGGAAGGCCTCTATTGATAAGGACAAGGCCGTGGTGCATAAGAAGGG
CTCCCTGCTGGAGAACAAAAACCCTGCACGGCCCAATTCTAAGATCGCTAAGTTCGACATTGTGAAGGAT
AGACACTACACCGAAGATAAGCTGTTCCTGCATATCCCAATCACACTGAACAACAATGCCGCCGACATGA
AATCCTACGCTATGAATAGCAAGGTGCTGAACACCCTGAAAACAAACGGAGGCGTGAACGTGATCGGCA
TTGACAGAGGGGAAAGAAATCTGCTGAAGATCACCGTGATTAATAGTGCCGGGGAGATCTTGCATCAGG
AGTCCCTGAATAAGATCACTAGCGGGCAGGACATGGTGACTGATTACCATGAGCTTCTGGACAAGAAGGA
GCAGAGCCGCGCTGAGTCTAGGCTGAATTGGCAGGAGGTCGAATCCATTAAGGAGATCAAGCAGGGCTA
CCTGTCCCAGGTGGTGTATAGACTGTCCCAACTGATGCTGCAGTATAAAGCCATCGTGGTGCTGGAAGAT
CTGAATATCGGCTTTAAGCGCGGGAGGTTTAAGATCGAGAAACAGGTGTACCAGAATTTCGAGAAGGCCC
TCATCAACAAGTTAAATTACCTCGTGCTGAAGCAGTTGGAGGCTACCGAGGTGGGGGGCACTGCTCATGG
ATACCAGCTCACAGCCCCCTTTGAGAGCTTTCAGAAGCTGGGGAAGCAGTCTGGCTGGCTCTTTTACGTCC
CCGCCTGGAATACATCCCATATTGACCCCACCACAGGCTTCGTGAACCTGCACCACTTCAAATACGAGAG
CGTCGCCCAGGCAACAGACATCATCGACAAACTGAGCAATATCCGCTACAATCCAGAGAAGGACTACTTC
GAGTTCGCCATTGACTACAACGAGTTCACTTTTAAGGGGGGCGACAGCCAGAAGTACTGGGTGGTGTGCT
CAACCCCTTACAAGAGGTACGTGTTTGATAAAAAAGCCAACATGGGCAGAGGCGGCACCAAGGCCGTGG
ATGTGAACGCCGAGCTGAAGGCCCTCTTTGCAGCCCACGGCGTGGATTATGCAAGCGGAGAGGATCTGAG
GCCCCAGATTAAGGCCAAGGCCAACAAGGAGCTGCTGAGTCAACTGCTGTTTCTGCTGAAAACCCTGACC
GCCATGCGGTACACCAACGCCAGCTCCTACGAGGACTACATCCTGTCTCCAGTGGTGAATAAGGCCGGAG
AGTTCTTTGACAGCAGGAAGGGCGACGCCACCCTGCCACTGGACGCCGACTCTAACGGGTCCTACCACAT
CGCCCTGAAGGGACTGTGCCTGCTGCAGAGGGTGTACGACTGGCGCGGCGAGGAGTTTAAGGGCCTGGACCTGTTCATCTCCAATAATGACTGGCTGAAGTTCGCCCAGGACCGGCAC;
SEQ ID NO.20(28c15基因序列):
ATGAGCAACACTAAGGACAACATCTTTAACAACTTCACCGGCATCTACCCCATCAACAAGACCCTGCGGTTCGAGCTGCGGCCCGTGGGCAAGACCTACGACCTGATCAAGGACTTCAAGAACGGGTACGTGGAGTCCATTGTGGCCATCGACGAGAAGCGGTCCGAGGCCCGGAAGCGGATCATCGAGATCATCGACGAGTACTACGAGGAGTTCATCAACACCGTGCTGAGCAAGAAGGTGTTCTACTCCGACGACATCTGGCAGACCTACACCAGCTACAAGGCCTACAAGAGTGACAAGCGGAACAAGGAGTTTGTCACACAAAAGGCCATCATGCGGAAGAAGATCAGCGATGCCTTCCAGAACGAGAAAACCAAGTTTAACCTGAAGGACTTCAAAGACCTGTTCGGCAAGAAGAGCAATCTGAAGGAGTCCCCCCTGTATAAGTGGTACAAGAACAAGCTGGACATCGGGGAGATCACGGGCGAGGATTTCGAGGACATCATCAAGATAATCACCTACTTCATCGGCTTCACCACCTCCCTGAAGGATTACCAGGAGAACCGGAACAACCTGTTCGTGGCCGAGGAGCAGAGCACCGCCATCAGCCACAGGATTATCGATGTGAACATGATTCGCTACTTCGAGAATTGTATCAGATTCGAGAATATGAAGGACTCCGAACTGCTGGAGGACATGGGGAAGTGGGAGAAGTACTTCGTGCCAGCTAACTACGACAATTTCTTCACTCAGGAGGGTATCGATAACTACAATGAGATTATTGGCCGGAAGTCCAAAGATCTCTACTATAAAGGCGTGAACCAGTTGATCAATGAGTATAGGCAGAAGAACAAGATCAAAAATAAGGATATGCCAACGATGAACCAGCTCTACAAACAGCACATCAGCAAGAACGGCGACAACGAAATCAACAACGACTTCTCCAACGAGAAAGAGATGCTGGAGCAGATCGAGCAAGCCTACATCACCAGCCTCGATAAGATCAATAGGATCGTGTCCTTCATCAATGAGAACATTACCGAAGGAAATAAGATCTTCATTAGGAAGGACTTCGTGACTAATATCAGTAACCGCCTGTTCGGGGAGTGGAACTTCATTAACAACGCCCTCTACAGCTACCTGAGCGGCCTGAGCGCAAAGAACAAGGAGCTGTTCGTGAAGCAGACAGAGGAGGTCATCAAGATCAGCGAGCTCCAGAACATCATCGACCTCTACATCAACAATCTGGATGAGGATGAGAAAGAGAAGTACCTCAAGACCGACGCCATCTACACCCACTTCTGCTCCTTCGATGTGTGCGGGGTGCAGAACGCATACTATGAGGCCAAGACCGTGCTCGCCGTGGACGAGATCAATAAGGACCGGGAGAAAGAGGAAGAGGGAGCCAAGCAGATTTCTAAGGTGAAGAAGCTGCTCGACGAGATCCTCGAAGCCGTCCACTTCTACAAGCCCCTTTACCTCTACAAGAACGGGAAGGAGATCGACGAGATTGAGAAGGATGAGATTTTCTACAGCGAGTTCGACTACCTGTATTCCCAGCTCATGCTGGTGACCGAGCTGTACGACAGGGTGCGCAACTACCTGACCAAGAAACCCTATAGCAAGGATAAATTCAAGATCTACTTTAACAAGCCTACACTGCTCGACGGCTGGGATCTGAACAAGGAGAAAAACAATCTGTCCGTGCTCCTCATCAAGGACGGCTTCTATTATCT
CGGCATCATGGACTCCAAGTACAATAGCGTGTTCGATGTGTCCGCAGACGATGTGAAGATCAACACCACC
GAGCTGTCCGAGGAGGCTACCTTCCTGAAGATGGAGTATAAGCAGGTGAGCGGAGCTTCCAAGATGTTCC
CCAAGGTGTTCTTCGCCGCCTCCAACAAGGACATGTTCAAGCCAAGCGAGGAGATTTTGAACATCCGGGA
GAATAAGCAGTACCTCAAGGGGGCCAATAACAGGGAGGCTGTAATCAAGTGGATCGATTTCTGCAAGGA
CTGTCTCAAGATCCATCCAGAATGGAACCGCTACTTTAACTTCAACTTCCGCCACAGCGACGAGTATGAG
AACGTGAATAGCTTCTATGAGGACGCCGATACTCAGATGTACTACATCAACTTCGTGAAGTTCAAGGAGA
CTTACATCAATGATCTGGTGGAGGAGGGGAAGCTGTTCCTGTTTCAGATCTACAACAAGGACTTCTCCGA
GTACTCCAAGGGCAAGCCCAACCTCCACACCGTGTATTGGAAGATGCTGTTCGACGAGAATAACGTGCGG
AACATCAATGACAATACCGGCAAGCCCGTGTTCAAGCTGAACGGCGAGGCTGAGATCTTTTATCGGAAGG
CCAGCCTGGATAAGAAGGTGACTCACAAGAAAAACTACCCTATCAAAAACAAGAATAAGCACAATAACA
AGACTGAGAGTATCTTTGAGTACGACCTCTACAAGGACAAGCGGTTCATGGATGACAAGTTCTTCTTCCAT
TGCCCCATCACCATCAACTACCGGGCCAAGAATATCCTGTCCAGCGAGTTCAATAAGAAGTTCAACTTGC
ACATCAAAAACAGCGATAACATGAACATTCTGGGCGTGGACAGAGGCGAAAGGCATCTGCTGTACTACTC
CCTGATCAACATTAAGGGAGGAATCATCAAGCAGGGGAGTCTGAACACCATCTACGATTCCTACGAAAAG
GACGGCATCAATATCCCCGTGATTACCGACTACAAGTCCATTCTGAAGGACCGCGAGGACGAGCGGATGG
ACTCCAGGAAGAACTGGGGCACCATCAAGAACATCAAGGAGATGAAGGAGGGCTATCTGAGCCATGTGG
TGCATCAGGTCAGCAAGCTCCTCATCGACAACAATGCCATCCTGGTCCTGGAGAACCTGAACAGCGGCTT
CAAGCGGCGCAGACTGAAGATCGAGAAGCAGGTGTACCAGAACTTCGAGAAAAGCCTGATCAACAAGCT
GAACTACCTCGTCCTGAAGGATGCCGATAACAAGGATGTGGGGCACTTCCTGAAGGGCTACCAGCTCACC
GCTCCTTTCGAGGGGTTCCAGCGCCTGAACAACCAGTCCGGCATCATCTACTACGTGTGGCCCAGCTATAC
CAGCAAGATCTGCCCCCGCACCGGTTTCGTGAGCCTCCTGCACATCAACTACGAGAACATCGAGAAGTCC
AAGGAGTTCTTTAACAAGTTTGACAAGATCTCATATAACAAGGACAAGGACTACTTCGAGTTCCACCTGG
ATTACACCCGGTTCGGGAAGAACGCTGGCAAGAACAAGTGGGTCATCTGCACTTACGGCAAGGATCGCTA
CTTCTTCAACCAGAAGCTGAAGAAGTACGAGTACATCGACATCACAGAGAAGATCAAGGAGCTGCTGAG
CAACAACGGGATCGACTTCATCAACGAGAACGACATGCGCAAGTCCATCGTGGAGAACAACTCCAAGAA
CTTCTTCGGCTCCCTGCTGTTTTACCTCAAGGTCGTGATGCAGTTGCGCTACACCAACAGCAACGACGGGT
GCCGGAATGAGAACGACTACATCCTGAGCCCCGTGGCCGACATTAACGGCATGTTCTTCGACTCCCGGCA
CGCCTGCGACAACGAGCCCGAGAACGCCGACGCCAACGGGGCCTACCACATCGCTCTGAAGGGCCTGCG
CATGATCCAGTTCATCGAGAACGGCGTGATCACCAAGCAGGGCAACGAGACCACCGACTGGTTCAAGTTCGCCCAGAATAAGCTG;
SEQ ID NO.21(30c9基因序列):
ATGAGCGCCCAGAGCGCCCTGAGCACCCTGATCAACAAGTACAGCCTGAGCAAGACCCTGCGCTTCGAGCTGATCCCCATCGGCAAGACCAAGGAGAGCATCGACCGGAAAGGCCTGCTGAGCCAGGATGTGAAGCGAGCCCAGTCCTACAAGGAGGTGAAGAAGATCATCGACGAGTACCACAAGGAGTTCATCGAGAAGTCCCTGATCAACGCCAAGCTGAAGGGCCTCGAAGAGTTCAGCAAGCTGTACTACAAGCTGCAGAAGGAGGACAAGGATAAGAAGAATATCAAGAAGATGCAGGATAACCTGCGCGAGCAGATCTCCGACCTCTTCAAGAACAACAAAAAGGACAAGTGGAACATCCTGTTTAAGGAGGACCTGATCAAGAAGGAGCTGCCACTGTTTGCGAAGGATGATAAGCAGAAGAACCTGATCAATGAGTTCAACAAGTTCACCACATACTTCACCGGCTTCCACAAGAACCGGAAGAACATGTACGCCGAGGAAGAGAAGTCCACCTCTATTCCCTACCGGATCATTCACCAGAATCTGCCTAAGTTTCTGGATAACATCAGGATTTTCGAGAAGATTAAGAAGAACAAGATCAACACTGACGTAATCGAGAAGGAGCTGAGTCTGTTCCTGAACGGAATCAAGATCAACGATATTTTCAGCATTAACTTTTTCAACGATGTGCTGAACCAGAAGGGCATCACCTTCTATAACACCATCCTGGGCGGAGTGAGCGAGAAGGACCGCACCAAGATCAAGGGCATTAATGAGTATGTGAACACCGAGTACAACCAGAAGCAACTGGACAAGAAGAGCAAGATCCCCAAGCTGAAGCAGCTCTACAAGCAGATCCTGAGCGACACCGAGACCGCCAGCTTCGTGCTGGAGCAGTTCGAGAACGACAACCAGCTCCTGGAGAAGATCGAGCAGTTCTACAACACAGAGCTCATCAATTACGAGACCGAGGGCAAGACCCAGTCCGTGTTCCTGCAGTTTGAGCAACTGTTTAAAAACATGCAGAATTACGACGCCTCCAAGATCTACATTAGCAATCTCTCCATCGCTAACATCAGCAAGATCATCTTCGGCGACTGGTCCATCATCTGCAACGCCCTGGCCGAGTGGTACGACAAGCACAACACAAAGGGGAAGAAGATTAACGAGTATAAGAAGGAAAACTTCCTGAAGCAGGATTTCAGCATCCAGCAGATTGAGGACGCCGTGCTGGAGTACAAGAACGACACCTTGAACAAGGAGATCAACTTCCTCCTGAACTACTTCGCCAGCTTCCTCAACGAGAAGTCCAAGAAAAACATCATCCAGCGCATCGAGACCGAGTACTCCAAGGTGAAGGACCTCCTGAACACCGATTACCCCGAGAAGAAGAAGCTGGCCAGCGACAAGGACAACGTGAGCAAGATCAAGGCCTTCCTGGACTCGCTGATGGACTTTCTGCACTTCGTGAAACCCTTCAATATTAAGAAGGACACAGGGCTGGAGAAGGAGGAGAACTTCTACTCCATCTACGTGCCCCTGTTCGAGCAGATCGACAAGATCATCCCCCTTTACAACAAGGTGCGCAACTACCTGACCAAGAAGCCCTATAGCACCGAAAAGATCAAGCTGAACTTCGAGAACAGCACCCTGCTTGACGGCTGGGACCTGAACAAGGAGTCCGACAACACTAGCGTGGTGCTGCGCAAGGACGACCTCTACTACCTGGGCATTATGGATAAGAAGCACAATCGGATCTTCAAAGAACTGCCCAGCCAGAACGGCAATGAGAGTAGCTATGAGAAGATGATCTACAAGCTGCTGCCGGGGCCAAATAAGATGCTGCCCAAGGTGTTCTTCTCCAAAAAGGGCAAGAAGCAGTTCAAGCCCTCCAAGAAACTTCTGAAGAAGTACGAGGACGGGACCCACCTGAAGGGCGATAACTTTAATATCAATGACTGCCACAACCTGATCGACTTCTTTAAGGAGTCCATCGCCGAGCACGAGGACTGGAAGCAGTTCGACTTCAAGTTTAGCAGCACAAGTAGCTACAAGGACCTGTCAAATTTCTATAAGGAGGTGGAGAAACAGGGCTACAAGATCACATTCCAGAACATCTCTGAGAACTATATCAACCAGCTCATCGACGAGGGCAAGCTCTACCTGTTCCAGATCTACAATAAGGACTTCAGCAAGTACAGCAAGGGGACCCCCAACCTGCACACCCTGTACTGGAAGATGCTGTTTGATAACGACAACCTGAAGAACATTGTGTATAAGCTGAATGGCAAGGCCGAGGTGTTCTACCGCAAGTCCTCCCTGATCCTGGGGGACAACATCGTGCACAAGGCTGGCGAGGCAATCATCAACAAGAACCCCGACAACGAGAAAAAGCACAGTACCTTCGATTACGACCTGATTAAGGACAAACGCTTCACCCTCGACAAGTTTCAGTTCCATGTGCCCATTACCCTGAACTTCAAGAGCGAGGGGAGGCAGAACCTGAACGAGGATGTGAGGAAGTTCCTGAAGAACAACCCTGACATAAACATCATCGGTATCGACCGGGGGGAGCGGCACCTCCTGTACCTGACCCTCATCAACCAGAAGGGAAAGATCCTCTTCCAGAAAAGCCTGAACGAGATCACCAACGAGTACAATAACAAGAACGGTAAATCCCAGATCAAGAGCACCAACTACCACTCCCTGCTCGACAAGAAGGAGAAGAAGCGCGATGAGGCCCGCAAGAACTGGGGCATAATCGAGAACATCAAGGAGCTGAAGGAGGGCTACATGAGCCAGATCGTCCACTATATCAGCAAGCTGATGATCGAGAAAAACGCCATTCTGAGCCTTGAGGACCTGAACTTCGGGTTCAAGCGCGGACGCCAGAAGGTCGAGAAGCAGGTGTACCAGAAGTTCGAAAAGATGATGATTGACAAGCTCAACTACCTTGTGTTCAAGGACAAGAAGGCCAACGAGACCGGCGGCCTGCTCAATGCCCTGCAATTGACTAACAAGTTCGAGTCCTTCGCCAAGCTGTATAACCAGTCCGGGTTCATCTTCTACGTCCCAGCTTGGAACACCAGCAAGATCGACCCAATCACCGGCTTTGTGAACCTCCTGAAGCCTTACTACGAGAACCTGAATAAGAGCCAGGAGTTTTTCAAGAAGTTCAACAACATCAAGTACAACCCTAAGCAGGAGTACTTCGAGTTCAACTTCGACTACAAGAACTTCACCAACAAAGCCGAGGGCAGCAAGAACGTCTGGGAGATCTGCACCACTAACAATGAGCGGTTCATGTGGGACAAGACCCTGAACAGCGGCAAGGGCGCTCAGAAGGCCGTGGATGTGACACAGGAGCTGAAGAAGCTGTTTGACAGCAGCAAGATCAACTACCTGAACGGAAACGACATCAAGGAGGACATTATCAATCAGAACTCCGCCGACTTCTTTCGGAAGCTGATGAAGCTGCTGTCCGTGGTGCTGAGCCTGCGGCACAACAACGGCCTGAAGGGGAAGGACGAGAAGGACTTCATCCTGAGCCCCGTGGAGCCCTTCTTTAACAGCCTGAACGCTAAGATGGAGGAGCCTAAGGACGCCGACGCTAACGGCGCATACAACATCGCCCTGAAGGGCCTGCTGATCCTGAAGCAGATTAACGAGAGTGAGGACCTGCGCAAGATCAAGTTCAACCTGAGCAATAAGGAGTGGCTGAAGTTCGCCCAGTCTAAGAGCTTC;
SEQ ID NO.22(Library-间隔序列):
ATGGCGAATACTTTTAAAGTCAT;
引物序列为:
library-NGS-F:
ACACTCTTTCCCTACACGACGCTCTTCCGATCTgtctacaatcggctcgatcga;
library-NGS-R:
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTgcgcagaccaaaacgatctc。
实施例4:体外切割实验
本实施例通过体外切割实验验证本发明实施例1的6种V型CRISPR/Cas12a基因编辑***在体外具有切割能力,并且验证本发明实施例3所鉴定的PAM的正确性。
具体操作如下:
(1)根据上述实施例3所鉴定的PAM结果,在CDKN2A基因上寻找合适的靶位点,通过退火反应得到靶点序列相同但PAM不同的DNA片段,在该位点两侧设计引物得到长度为2000bp的DNA片段,本发明所述六种CRISPR/Cas12a***所选择的靶序列和需要测试的PAM序列如表1所述。
表1.体外切割靶序列与PAM序列
(2)使用上述实施例1所述V型CRISPR/Cas12a基因编辑***的重复序列,通过体外转录获得包含重复序列和CDKN靶位点对应的crRNA;将本实施例3所述Cpf1纯化蛋白与crRNA孵育形成复合物,再与CDKN2A-DNA片段,置于37℃孵育30min后,加入适量蛋白酶K,室温孵育15min再98℃灭活10min;通过1.5%琼脂糖胶检测体外切割结果。
检测结果如图4所示。结果显示:本发明的六种CRISPR/Cas12a基因编辑***在体外均具有良好的切割能力,可将长度为2000bp的CDKN2A-DNA片段切割为长度分别为500bp和1500bp的两个片段,说明表达纯化得到的Cpf1蛋白是具有生物活性的;这六种Cpf1蛋白可在crRNA的引导下识别与间隔序列互补的靶点进行正确的切割。
实施例5:dsODN***实验
本实施例通过dsODN***实验验证本发明实施例1的6种V型CRISPR/Cas12a基因编辑***在哺乳动物真核细胞中靶向目标基因的编辑能力。
具体操作如下:
(1)根据本发明实施例1所述六种Cpf1蛋白进行人源密码子优化,将对应的核苷酸序列克隆进PX330真核表达载体上(addgene,59909),获得PX330-蛋白真核表达质粒。
(2)在哺乳动物细胞中,以HEK293T细胞为例,选取内源性CDKN2A基因,以本实施例3所鉴定的能在体外条件下进行识别切割的PAM序列,寻找到合适的靶位点,序列格式为5’-与六个Cpf1蛋白结合的直接重复序列-crRNA间隔序列-3’,通过Gibson方法克隆到PXZ载体上(addgene,160229),构建靶向不同靶位点,具有不同PAM的PXZ-CDKN2A target质粒,同时转染PX330-蛋白真核表达质粒与PXZ-CDKN2A target质粒,以LtCpf1为阳性对照组,以只转染PX330-蛋白真核表达质粒作为阴性对照。本实施例所选择的CDKN2A基因靶位点和对应的PAM如表2所示。
表2.真核实验使用的靶位点、PAM序列和检测引物
(c)在生长状态良好的HEK293T细胞24孔板中共转染PX330-蛋白真核质粒、PXZ-Cpf1蛋白-CDKN2A target质粒、1.2μLdsODN,72h后收细胞抽提DNA。
(d)在CDKN2A基因靶点上游以及dsODN序列上设计引物(见表2)进行dsODN-PCR扩增,跑琼脂糖胶检测是否出现目的条带,用来判断是否有dsODN的***,通过检测dsODN的***情况验证本发明所述V型CRISPR/Cas12a基因编辑***在真核细胞环境下是否具有编辑能力。
dsODN-PCR电泳结果如图5所示。结果显示:对应长度的PCR条带用红色三角标注,实施例1的六种CRISPR/Cas12a基因编辑***在真核细胞中均具有切割能力。
综上,本发明通过宏基因组生物信息学分析,首次挖掘出六种全新V型CRISPR/Cas12a基因编辑***,并预测其各自对应的直接重复序列。6种新型编辑***的Cpf1蛋白分别命名为:28c2、28c6、28c12、28c13、28c15和30c9。Cpf1作为一种单一的RNA引导的内切酶,只需要crRNA进行靶向,整体体积比Cas9小,更方便进行体内递送;Cpf1的向导RNA设计比Cas9更简单、更方便。通过实验证明这六种CRISPR/Cas12a基因编辑***可以识别各自独特的PAM序列,能够在crRNA的引导下在体外环境和真核细胞中行使基因编辑功能。本发明新的六种基因编辑***的发现进一步扩大了基因编辑工具的种类,丰富了现有Cpf1作为基因编辑工具的PAM多样性,为Cpf1用于临床治疗提供更多的工具选择,对推动将基因编辑应用于临床治疗具有重要的作用。
最后所应当说明的是,以上实施例仅用以说明本发明的技术方案而非对本发明保护范围的限制,尽管参照较佳实施例对本发明作了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的实质和范围。

Claims (5)

1.一种Cpf1蛋白,其特征在于,所述Cpf1蛋白的氨基酸序列如SEQ ID NO.1~6中任一种序列所示。
2.一种编码权利要求1所述Cpf1蛋白的核酸,其特征在于,所述核酸的碱基序列如SEQID NO.16~21中任一种序列所示。
3.一种V型CRISPR/Cas12a基因编辑***,其特征在于,包括如权利要求1所述的Cpf1蛋白、辅助蛋白和CRISPR array;所述CRISPR array包括直接重复序列和间隔序列;两个所述直接重复序列中间夹一个所述间隔序列;所述直接重复序列和所述间隔序列间隔排列;所述直接重复序列的核苷酸序列如SEQ ID NO.10~15中任一种序列所示;所述辅助蛋白的氨基酸序列如SEQ ID NO.7~8中任一种序列所示。
4.权利要求3所述V型CRISPR/Cas12a基因编辑***在原核或非疾病的诊断或治疗的真核生物基因编辑中的应用。
5.权利要求3所述V型CRISPR/Cas12a基因编辑***在制备生物基因编辑制剂中的应用。
CN202310510289.0A 2023-05-08 2023-05-08 一种Cpf1蛋白、V型基因编辑***及应用 Active CN116751763B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310510289.0A CN116751763B (zh) 2023-05-08 2023-05-08 一种Cpf1蛋白、V型基因编辑***及应用

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310510289.0A CN116751763B (zh) 2023-05-08 2023-05-08 一种Cpf1蛋白、V型基因编辑***及应用

Publications (2)

Publication Number Publication Date
CN116751763A CN116751763A (zh) 2023-09-15
CN116751763B true CN116751763B (zh) 2024-02-13

Family

ID=87948550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310510289.0A Active CN116751763B (zh) 2023-05-08 2023-05-08 一种Cpf1蛋白、V型基因编辑***及应用

Country Status (1)

Country Link
CN (1) CN116751763B (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016205711A1 (en) * 2015-06-18 2016-12-22 The Broad Institute Inc. Novel crispr enzymes and systems
CN109312316A (zh) * 2016-02-15 2019-02-05 本森希尔生物***股份有限公司 修饰基因组的组合物和方法
CN111757889A (zh) * 2018-10-29 2020-10-09 中国农业大学 新型CRISPR/Cas12f酶和***
CN111836894A (zh) * 2017-11-21 2020-10-27 韩国生命工学研究院 使用CRISPR/Cpf1***的基因组编辑组合物及其用途
CN112331264A (zh) * 2020-09-11 2021-02-05 中山大学附属第一医院 一种同源2型CRISPR/Cas基因编辑***的构建方法
CN112703250A (zh) * 2018-08-15 2021-04-23 齐默尔根公司 CRISPRi在高通量代谢工程中的应用
CN113234701A (zh) * 2020-10-20 2021-08-10 珠海舒桐医疗科技有限公司 一种Cpf1蛋白及基因编辑***

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230056843A1 (en) * 2019-08-19 2023-02-23 Southern Medical University Construction of high-fidelity crispr/ascpf1 mutant and uses thereof

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016205711A1 (en) * 2015-06-18 2016-12-22 The Broad Institute Inc. Novel crispr enzymes and systems
CN109312316A (zh) * 2016-02-15 2019-02-05 本森希尔生物***股份有限公司 修饰基因组的组合物和方法
CN111836894A (zh) * 2017-11-21 2020-10-27 韩国生命工学研究院 使用CRISPR/Cpf1***的基因组编辑组合物及其用途
CN112703250A (zh) * 2018-08-15 2021-04-23 齐默尔根公司 CRISPRi在高通量代谢工程中的应用
CN111757889A (zh) * 2018-10-29 2020-10-09 中国农业大学 新型CRISPR/Cas12f酶和***
CN112331264A (zh) * 2020-09-11 2021-02-05 中山大学附属第一医院 一种同源2型CRISPR/Cas基因编辑***的构建方法
WO2022052211A1 (zh) * 2020-09-11 2022-03-17 中山大学附属第一医院 一种同源2型CRISPR/Cas9基因编辑***及其构建方法
CN113234701A (zh) * 2020-10-20 2021-08-10 珠海舒桐医疗科技有限公司 一种Cpf1蛋白及基因编辑***

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
多重基因组编辑中CRISPR-Cas9***和CRISPR-Cpf1***的应用和比较;郭婷等;中国细胞生物学学报;第41卷(第11期);第2234-2244页 *

Also Published As

Publication number Publication date
CN116751763A (zh) 2023-09-15

Similar Documents

Publication Publication Date Title
US11713471B2 (en) Class II, type V CRISPR systems
Murray et al. Nucleotide sequences of transcription and translation initiation regions in Bacillus phage phi 29 early genes.
WO2022199511A1 (zh) 一种Lt1Cas13d蛋白及基因编辑***
CN113234701B (zh) 一种Cpf1蛋白及基因编辑***
CN112430586B (zh) 一种VI-B型CRISPR/Cas13基因编辑***及其应用
CN114075559A (zh) 一种2型CRISPR/Cas9基因编辑***及其应用
Fitzgerald et al. Rapid shotgeun cloning utillizing the two base recongition endonuclease Cvi JI
CN116751764B (zh) 一种Cas9蛋白、II型CRISPR/Cas9基因编辑***及应用
US20040091886A1 (en) Method for generating recombinant polynucleotides
CN116751763B (zh) 一种Cpf1蛋白、V型基因编辑***及应用
EP3676396B1 (en) Transposase compositions, methods of making and methods of screening
CN113549650B (zh) 一种CRISPR-SaCas9基因编辑***及其应用
RU2804422C1 (ru) Система редактирования геномной днк эукариотической клетки на основе нуклеотидной последовательности, кодирующей белок sucas9nls
RU2712497C1 (ru) Средство разрезания ДНК на основе Cas9 белка из биотехнологически значимой бактерии Clostridium cellulolyticum
RU2712492C1 (ru) Средство разрезания днк на основе cas9 белка из defluviimonas sp.
RU2788197C1 (ru) Средство разрезания ДНК на основе Cas9 белка из бактерии Streptococcus uberis NCTC3858
CN116179513B (zh) 一种Cpf1蛋白及其在基因编辑中的应用
WO2024119052A2 (en) Genomic cryptography
CN116004762A (zh) 一种基于CRISPR-Cas9技术的体外剪切效率试剂盒及其应用
CN118006584A (zh) CRISPR基因座完全缺失Cas1、Cas2和Cas4的可编程核酸酶及其应用
JP2024509047A (ja) Crispr関連トランスポゾンシステム及びその使用方法
JP2024509048A (ja) Crispr関連トランスポゾンシステム及びその使用方法
EA042517B1 (ru) Средство разрезания днк
CN117866924A (zh) 多sgRNA介导的EXPERTplus先导基因编辑***及其应用
KR20040036371A (ko) 염색체의 특정부위가 제거된 미생물 변이주의 제조를 위한선형 dna 단편 및 이를 이용한 염색체의 특정부위가제거된 미생물 변이주의 제조방법

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant