WO2022007863A1 - 一种靶基因区域快速富集方法 - Google Patents

一种靶基因区域快速富集方法 Download PDF

Info

Publication number
WO2022007863A1
WO2022007863A1 PCT/CN2021/105073 CN2021105073W WO2022007863A1 WO 2022007863 A1 WO2022007863 A1 WO 2022007863A1 CN 2021105073 W CN2021105073 W CN 2021105073W WO 2022007863 A1 WO2022007863 A1 WO 2022007863A1
Authority
WO
WIPO (PCT)
Prior art keywords
probe
nucleic acid
exonuclease
sequence
purification
Prior art date
Application number
PCT/CN2021/105073
Other languages
English (en)
French (fr)
Inventor
姜正文
丁慧
Original Assignee
天昊基因科技(苏州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天昊基因科技(苏州)有限公司 filed Critical 天昊基因科技(苏州)有限公司
Publication of WO2022007863A1 publication Critical patent/WO2022007863A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the invention relates to the field of biotechnology, and more particularly to a method for rapid enrichment of target gene regions.
  • the total cost of genomic research should include the cost of DNA sequencing, data management, and data analysis (producing directly interpretable data), which makes research at the large population level as well as clinical applications , the actual cost of genomic research is difficult to reduce in a short period of time.
  • a new research method that can enrich for specific regions and biological pathways of diseases, genes, and even the entire exome (1% of the genome), and then conduct unbiased research, this method is Target region enrichment for high-throughput sequencing.
  • Target region enrichment high-throughput sequencing is to design probes for one or several sequences of interest, capture and enrich by different methods, and further sequence and analyze the captured sequences. Due to its flexible probe design and high coverage depth, it is more suitable for large-scale disease sample analysis, or to verify the results of whole genome, GWAS analysis or linkage analysis; it can not only verify the discovered loci, but also further Find disease susceptibility loci in candidate regions.
  • the method of enriching the target gene and then performing high-throughput sequencing (NGS) has the following advantages: 1) it can significantly reduce the cost, 2) the high sequencing depth of the target region ensures more accurate sequencing results, and 3) it is shorter The project turnaround time, 4) the clear function of the target area makes our analysis of the results easier.
  • target region enrichment combined with high-throughput sequencing can analyze larger sample populations, and this method can also have important applications in biomedical research and clinical diagnosis of Mendelian diseases value, and finally individualized medicine based on individual genetic characteristics.
  • the enrichment of the target region is the first task. How to choose the most suitable enrichment method in a specific research project needs to consider the size of the entire enrichment region and the number of samples. and whether multiple samples need to be sequenced simultaneously (the most efficient use of the throughput of the sequencer).
  • enrichment techniques used in scientific research and some commercial platforms, but they can be divided into three categories according to their core reaction principles: target region enrichment based on PCR amplification, circularization and hybridization capture.
  • PCR amplification PCR amplification of the target region is carried out directly by multiple long-range PCR (Long-range PCR), or standard multiplex PCR with limited multiples or multiple multiplex PCR with high multiples can be selected to amplify a large number of short fragments.
  • can also be innovative multiplex PCR Ion AmpliSeqTM from Life Technologies, GeneRead DNAseq System from Qiagen, TargetRichTM from Kailos), microdroplet PCR (RainDance), or chip-based PCR (Access ArrayTM from Fluidigm).
  • PCR-based methods are best suited for small target regions in the 10-100 kb range, and such enrichment methods typically require target region-specific primer design and PCR reactions.
  • the main problems of the PCR amplification method are: the sequence variation of the primer binding region can easily lead to the loss of the amplicon, and the structural variation can only be found by the reduction of sequencing reads.
  • “Circularization” Also called Molecular inversion probes (MIPs), Gap-fill padlock probes or Selector probes.
  • MIPs Molecular inversion probes
  • Gap-fill padlock probes Gap-fill padlock probes
  • Selector probes In the range of 100-500kb, a single-stranded DNA loop containing the sequence of the target region is formed in a highly specific manner (gap filling and ligation reaction), thereby generating a structure containing a common DNA element for the target of interest.
  • the region is selectively amplified, and the representative methods are Haloplex (Agilent) and MIPs.
  • the main problems of this method are: sequence variation in the primer binding region easily leads to loss of amplicon, relatively low sensitivity and uniformity, and relatively high probe cost.
  • Hybridization capture The nucleic acid in the sample is hybridized with a DNA/RNA probe complementary to the target region anchored on a solid support or directly in a liquid, and then the sequence of interest is isolated by physical capture.
  • the capture range is from 500kb to the whole whole exome, and some classic commercial hybridization methods have been developed, such as SureSelect (Agilent), Nextera (Illumina), TruSeq (Illumina), SeqCap (Nimble-Gen), Ion TargetSeq (Life Technologies), these methods have better capture efficiency and cost-effectiveness for large and pre-engineered areas.
  • the main problems of the "hybrid capture method” are: the quality and quantity of the sample are relatively high, and generally cannot be used for FFPE samples. In practice, the optimized TruSeq and SureSelect methods can also be used for FFPE samples.
  • probe hybridization-based methods are the most commonly applicable and have been widely used for exome capture in humans and mice. This method is further divided into solid phase capture (eg: chip capture) and liquid phase capture, depending on how the capture reaction takes place. Among them, liquid phase capture is more popular because its automated mechanical capture method has more advantages.
  • solid phase capture eg: chip capture
  • liquid phase capture is more popular because its automated mechanical capture method has more advantages.
  • probe libraries in addition to the inherent shortcomings of these hybrid capture methods, such as their low capture efficiency and the need for more tedious and time-consuming DNA library construction steps, their pre-made probe libraries also greatly limit the selection of target regions and species for most studies. flexibility.
  • the PCR-based enrichment can bypass the preparation of shotgun libraries, but directly use appropriate 5' primers for fragment amplification in the final amplification stage for sequencing, which is relatively flexible in the selection of candidate regions and laboratory operations.
  • the main drawback of this method is that it is not easy to scale (problems such as cross-matching of multiple primers, dimer formation, and non-specific matching), whether for enrichment of very large genomic regions or simultaneous processing of large numbers of sample.
  • the circularization method based on molecular inversion probe is very different from other methods. The most notable feature is that it has extremely high specificity, but it is difficult to process multiple samples simultaneously in a single reaction. .
  • Each pair of probes used for circular enrichment consists of a single-stranded DNA oligonucleotide, the sequences at both ends of which are complementary to part of the discontinuous fragments in the enriched region, respectively, in an inverted linear sequence.
  • Targeting complementary arms such as the 5' and 3' ends of a padlock probe rapidly approach and hybridize when hybridized to the target sequence, leaving a gap in the target region; if it is phosphorylated at 5', DNA ligase will ligate From both ends, a circular padlock probe is formed, which is linked to the target region. Following circularization, exonuclease digestion removes large amounts of uncircularized probes and DNA fragments.
  • rolling circle amplification or direct PCR targeting common sequences on all circles to amplify the target region to generate an NGS library. To detect the presence or absence of target sequences, the sensitivity of this reaction requires only a single hybridization event, and the specificity is excellent.
  • TruSeq Custom Amplicon developed by illumina is a fully customizable, expansion-based A targeted resequencing system for amplicon detection that allows researchers to focus on any key region of the genome of our interest, allowing simultaneous sequencing of up to 1,536 amplicons covering a genomic region of 600 kb in length in a single reaction .
  • the system is based on the extension-ligation reaction.
  • a pair of oligonucleotide probes (the sequence consists of a general sequence and a specific sequence complementary to the sequence on both sides of the amplicon) are designed for each target region amplicon.
  • composition multiple pairs of probes (up to 1,536 pairs) are mixed in a reaction tube (Custom amplicon tube, CAT), unfragmented sample DNA is added, the CAT probes hybridize to the flanking sequences of the target region, and pass the fragmentation Size selection removes unhybridized oligonucleotide sequences, and then extends and ligates under the action of polymerase and ligase successively to obtain amplicon fragments containing the target region.
  • a reaction tube Customer amplicon tube, CAT
  • unfragmented sample DNA is added
  • the CAT probes hybridize to the flanking sequences of the target region, and pass the fragmentation Size selection removes unhybridized oligonucleotide sequences, and then extends and ligates under the action of polymerase and ligase successively to obtain amplicon fragments containing the target region.
  • the primers with complementary sequences are used for PCR amplification, so as to obtain the amplicon library of multi-target regions, and multiple samples (a single MiSeq run supports up to 96 samples to mix) can be mixed into a library, which can be sequenced and analyzed by the MiSeq System.
  • this method also has some shortcomings: (1) the enrichment process only performs a single round of extension ligation reaction after probe hybridization, which is prone to hybridization off-target and non-specific hybridization, and the capture efficiency of complex sequences is low; (2) ) Remove unhybridized probes by fragment size sorting, which cannot effectively avoid non-specific hybridization and non-target fragment residues.
  • the invention technology of "A high-throughput nucleic acid analysis method and its application” (ZL201210581830.9) disclosed and authorized by the applicant earlier can also be based on the extension ligation reaction (Extention-ligation) to achieve rapid enrichment of the target region, but the same Compared with TruSeq Custom Amplicon technology, it uses 5' anti-exonuclease-modified 5-terminal extension primers and 3' anti-exonuclease-modified 3-terminal ligation probes, through denaturing hybridization/multiple extension ligation for simultaneous reaction, and the reaction product is used.
  • exonuclease I exonuclease I
  • exonuclease III exonuclease III
  • lambda exonuclease lambda exonuclease
  • This method reduces the operation steps through the sample genomic DNA/probe hybridization and polymerase/ligase extension and ligation in the same tube at the same time, and provides the utilization effect of genomic DNA template through multiple extension and ligation cycles, which has certain advantages, but it also has difficulties in system optimization and Insufficient digestion of non-specific amplification products.
  • the purpose of the present invention is to provide a rapid enrichment method for target gene regions.
  • a first aspect of the present invention provides a method for enriching nucleic acid fragments, the method comprising the steps of:
  • reaction system includes: a sample to be tested and n probe groups;
  • each probe set includes a first probe and a second probe respectively;
  • the first probe and the second probe are respectively specifically hybridized to the 3' end and the 5' end of the same target nucleic acid fragment (the specific hybridization refers to at least partial complementarity or complete complementarity);
  • the first probe cannot be degraded by an exonuclease in the 5'->3' direction and/or the second probe cannot be degraded by an exonuclease in the 3'->5' direction;
  • the first probe includes a first part that specifically hybridizes to the 3' end of the target nucleic acid fragment and a second part corresponding to the sequence of subsequent PCR amplification primers (the correspondence refers to the reverse complement of the second part).
  • the sequence and PCR amplification primers can specifically hybridize);
  • the second probe includes a first part that specifically hybridizes to the 5' end of the target nucleic acid fragment and a second part that specifically hybridizes to the sequence of subsequent PCR amplification primers;
  • the 3' end of the first probe and the 5' end of the second probe are separated by at least one nucleus distance of nucleotides
  • reaction mixture I (2) Perform high temperature denaturation and annealing treatment on the reaction system, and the first probe and the second probe specifically hybridize with the target nucleic acid fragment of the sample to be tested during the high temperature denaturation and annealing process to form hybridization product, thereby obtaining reaction mixture I containing the hybridization product;
  • reaction mixture II contains the undigested hybrid product
  • reaction mixture IV containing the ligation product
  • purification treatment such as nucleic acid-specific exonuclease digestion
  • PCR amplification is performed to obtain a PCR amplification product, that is, an enriched nucleic acid fragment.
  • the purification treatment in steps (4) and (5), also removes salt ions and proteins in the reaction mixture I at the same time.
  • the hybridization product is a ternary complex formed by the single-stranded binding of the first probe and the second probe to the target nucleic acid fragment.
  • step (4) a physical method is used for purification.
  • the single-stranded nucleic acid-specific exonuclease cleaves (or digests): the single-stranded DNA (especially the complementary single-stranded DNA that is not specifically hybridized with the probe to form the hybridization product) strand), the unbound (or free) first probe, and the bound (or free) second probe.
  • step (3) the single-stranded nucleic acid-specific exonuclease does not cleave (or digest) or substantially does not cleave the hybrid product.
  • step (5) a physical method is used for purification.
  • the nucleic acid-specific exonuclease cleaves (or digests) the single-stranded DNA (especially the complementary strand), the unbound (or free) first probe, and bound (or free) said second probe;.
  • the nucleic acid-specific exonuclease does not cleave (or digest) or does not substantially cleave the extension ligation product.
  • the n probe sets target different target nucleic acid fragments respectively.
  • the lower limit of n is 20, 30, 40, 50, 100, 200, or 500, and/or the upper limit of n is 2000, 5000, 10000, 100000, 500000, or 1000000.
  • the method further includes the step of: preparing the PCR amplification product into a nucleic acid fragment library.
  • step (5) under the action of the nucleic acid polymerase, the first probe extends the DNA strand along the target nucleic acid fragment to extend to 5 of the second probe. ' end is blocked by it to obtain the first probe to extend the DNA chain; and under the action of the nucleic acid ligase, the 3' end of the first probe to extend the DNA chain and the second probe 5' The ends are ligated to form a reaction mixture containing the ligated product.
  • the first probe cannot be degraded by an exonuclease in the 5'->3' direction, but can be degraded by an exonuclease in the 3'->5' direction.
  • the 5' end of the first probe is provided with a protective group to prevent degradation by exonuclease.
  • the second probe cannot be degraded by an exonuclease in the 3'->5' direction, but can be degraded by an exonuclease in the 5'->3' direction.
  • the 3' end of the second probe is provided with a protective group to prevent degradation by exonuclease.
  • the first probe cannot be degraded by an exonuclease in the 5'->3' direction, and the exonuclease used in step (3) is single-stranded in the 5'->3' direction Nucleic acid specific exonuclease.
  • the second probe cannot be degraded by an exonuclease in the 3'->5' direction, and the exonuclease used in step (3) is single-stranded in the 3'->5' direction Nucleic acid specific exonuclease.
  • the first probe cannot be degraded by 5'->3' exonuclease and the second probe cannot be degraded by 3'->5' exonuclease
  • the 3'->5' direction single-stranded nucleic acid-specific exonuclease is used in step (3), and the 5'->3' direction single-stranded nucleic acid-specific exonuclease and 3'->5' direction are simultaneously used in step (5) Single-stranded nucleic acid specific exonuclease.
  • the 5' end of the first probe and/or the 3' end of the second probe are modified with resistance to exonuclease to achieve the first One probe cannot be degraded by the 5' exonuclease and/or the second probe cannot be degraded by the 3' exonuclease.
  • the modifications include but are not limited to: Phosphorothioates modification, 5-Propyne pdC modification, pdU modification, 2'-Fluoro bases modification, 2'-O-methyl bases modification, 2'-5'linked bases modification modification, LNA bases modification, Chimeric linkage modification, 3'Inverted dT modification, or a combination thereof.
  • 1-10, preferably 2-6 bases at the 5' end of the first probe are modified with resistance to exonuclease.
  • 1-10, preferably 2-6 bases at the 3' end of the second probe are modified with resistance to exonuclease.
  • the exonuclease is selected from the group consisting of: T5 Exonuclease, T7 Exonuclease, Lambda Exonuclease, RecJ f, Exonuclease T, Exonuclease I, Exonuclease V, Exonuclease III, or combinations thereof.
  • the nucleic acid polymerase is a high-temperature thermostable nucleic acid polymerase, preferably, the nucleic acid polymerase is selected from the following group: Hemo (NEB), AmpliTaq DNA Polymerase (AmpliTaq DNA Polymerase), Stoffel Fragment (Life Technologies); Hot Start Flex DNA Polymerase (NEB).
  • the nucleic acid polymerase is a polymerase having substantially no 5' to 3' exonuclease activity.
  • the nucleic acid ligase is a high temperature thermostable nucleic acid ligase, preferably, the nucleic acid ligase is selected from the following group: Taq DNA Ligase (NEB); Ampligase (Epicentre); 9°N TM DNA Ligase (NEB).
  • the Tm value of the second probe that amplifies the same target nucleic acid fragment is higher than the Tm value of the first probe.
  • the Tm value of the second probe is 3°C-10°C higher than the Tm value of the first probe, and preferably the Tm value of the second probe is higher than that of the first probe
  • the Tm value of the probe is 4°C-6°C, such as 5°C.
  • the Tm value of each first probe in each of the probe sets is 59°C to 68°C.
  • the Tm value of each second probe in each probe set is 68°C-75°C.
  • the 5' end of the second probe is modified by phosphorylation.
  • the n (the number of probe sets) is 20-1,000,000, preferably 30-500,000, more preferably 40-100,000, most preferably 50-10,000, such as 100-10,000, 500- 10000, 1000-10000.
  • the probe sets for the same target (target) nucleic acid fragment are referred to as one (one) probe set.
  • n 2
  • the two probe sets are respectively directed to Two different target nucleic acid fragments.
  • the length of the first part of the first probe is 16-50 bp (preferably 21-36 bp, more preferably 33 bp), and/or the length of the second part is 18-30 bp.
  • the length of the first part of the second probe is 16-50 bp (preferably 21-36 bp, more preferably 32 bp), and/or the length of the second part is 21-36 bp.
  • the second parts of the first probes of each probe set are the same or substantially the same.
  • the second part of the second probe of each probe set is the same or substantially the same.
  • the total amount of target nucleic acid fragments in the sample is 1-2000ng, preferably 200-500ng.
  • the sample is a nucleic acid sample derived from animals, plants or microorganisms, preferably a DNA sample or an RNA reverse transcription product cDNA sample.
  • the sample is a nucleic acid sample derived from an animal (preferably a mammal, more preferably a human), preferably a DNA sample or an RNA reverse transcription product cDNA sample.
  • the sample to be tested includes only one kind of sample or the sample to be tested includes multiple detection samples from different subjects (for example, samples taken from multiple patients, or multiple different samples, respectively). tissue samples).
  • reaction system further includes a buffer.
  • the conditions of high temperature denaturation and annealing treatment in the step (2) are 95-100°C for 2-20min, followed by treatment at 50°C for 0.5-20h, preferably 1-5h.
  • the hybridization product in the step (3), will not be degraded by exonuclease, and the first probe and/or the second probe that has not been hybridized will be degraded by exonuclease .
  • the purification treatment in the step (4) includes: magnetic bead purification, silica gel column purification, membrane filtration purification, ethanol or isopropanol precipitation purification, or a combination thereof.
  • the length of the specific sequence of the extension ligation product (ie the target nucleic acid sequence) in the step (5) is 30-5000 bp, preferably 100-1000 bp, more preferably 150-310 bp.
  • no amplification cycle is performed in the step (5).
  • the PCR amplification primers have a tag sequence, and the tag sequence length is 1-100 bp, preferably 5-10 bp.
  • the ligation products of different samples can be amplified with PCR amplification primers with different tag sequences, so that the amplification products of different samples can be mixed together, and the sequenced sequences can be classified according to the tag sequence in the subsequent sequencing data.
  • the length of the PCR amplification primer is 42-58 bp.
  • step (6) only one PCR amplification primer pair is used.
  • the primers (PCR amplification primers) used in the PCR amplification include forward primers and reverse primers, and the forward primers include A sequence that specifically hybridizes to the reverse complement of the second portion of the sequence of the first probe, the reverse primer comprising a sequence that specifically hybridizes to the second portion of the second probe.
  • the forward primer and/or the reverse primer contains a universal sequence compatible with a high-throughput chip sequencing platform.
  • the forward primer and/or the reverse primer contains a tag sequence, and different tag sequences are used for different samples.
  • the ligation products are amplified using universal primers containing different tag sequences to establish a library suitable for the next-generation sequencing platform, and the libraries constructed using universal primers containing different tag sequences can be mixed together for next-generation sequencing. Sequencing.
  • the second partial sequence of the first probe is:
  • the second partial sequence of the second probe is:
  • the forward primer sequence is:
  • [X] is no or tag sequence; preferably, [X] length is 0bp-100bp, preferably 0bp-10bp, such as 8bp.
  • the reverse primer sequence is:
  • sequence of the first part of the first probe is shown in SEQ ID NO.: 2a
  • sequence of the first part of the second probe is shown in SEQ ID NO.: 2a+1 , where a is an integer from 2 to 51.
  • the method is suitable for enrichment and amplification of multiple gene fragments, and the number of gene fragments amplified at the same time can be tens, hundreds or thousands, or even tens of thousands. Contains some reference gene fragments, the number can be 0-999.
  • the sequencing data of the nucleic acid fragments enriched by the method can be analyzed to obtain the copy number of the target gene fragment, and the analysis method is to count the sequencing depth of each target and reference fragment, and each target fragment of the patient sample
  • the sequencing depth of each reference fragment is divided by the sequencing depth of each reference fragment to obtain m ratios (m is the reference gene fragment, the reference gene can be any gene fragment other than this fragment), and each ratio is divided by the corresponding ratio of the normal sample or
  • the median ratio of all samples is multiplied by the copy number of the normal sample on the target fragment or the copy number of most samples on the fragment, so that m values are obtained, and the median is taken as the sample on the target fragment. copy number detection value.
  • the second aspect of the present invention provides a nucleic acid sequencing method, which comprises the steps of: enriching the target nucleic acid fragments by using the method described in the first aspect of the present invention.
  • a high-throughput chip sequencing platform is used to perform single-molecule amplification sequencing or direct single-molecule sequencing on the target nucleic acid fragments enriched by the method described in the first aspect of the present invention .
  • the method further comprises the steps of: analyzing the sequencing data, classifying the samples of the sequencing sequence, reading gene mutation sites and/or calculating the copy number of each gene fragment.
  • a third aspect of the present invention provides a kit for enriching nucleic acid fragments, the kit includes: one or more probes corresponding to the nucleotide sequences in the sample to be tested groups, nucleic acid polymerases and nucleic acid ligases;
  • the probe set includes a first probe and a second probe
  • the first probe and the second probe are respectively specifically hybridized to the 3' end and the 5' end of the same target nucleic acid fragment (the specific hybridization refers to at least partial complementarity or complete complementarity);
  • the first probe cannot be degraded by exonuclease in the 5'->3' direction;
  • the second probe cannot be degraded by exonuclease in the 3'->5' direction;
  • the first probe includes a first part that specifically hybridizes to the 3' end of the target nucleic acid fragment and a second part corresponding to the sequence of subsequent PCR amplification primers (the correspondence refers to the reverse complement of the second part).
  • the sequence and PCR amplification primers can specifically hybridize);
  • the second probe includes a first part that specifically hybridizes to the 5' end of the target nucleic acid fragment and a second part that corresponds to the sequence of subsequent PCR amplification primers (the correspondence refers to the second part and the PCR amplification). primers capable of specific hybridization);
  • the 3' end of the first probe and the 5' end of the second probe are separated by at least one nucleus nucleotide distance.
  • the kit further includes a PCR amplification primer, the PCR amplification primer includes a forward primer and a reverse primer, and the forward primer includes all the primers capable of interacting with the first probe. a sequence that specifically hybridizes to the reverse complement of the second portion, and the reverse primer comprises a sequence that specifically hybridizes to the second portion of the second probe.
  • the forward primer and/or the reverse primer contains a universal sequence compatible with a high-throughput chip sequencing platform.
  • the forward primer and/or the reverse primer contains a tag sequence, and different tag sequences are used for different samples.
  • the kit further includes conventional PCR reagents.
  • Figure 1 shows the operational flow of the invention.
  • Fig. 2 shows the detection value of the copy number of the target gene fragment in the three patient samples in the Example.
  • FIG. 3 shows the capillary electrophoresis results of Example 1 and Example 2.
  • the present inventor unexpectedly discovered a new technology of target gene region enrichment based on extension ligation reaction for the first time.
  • the experimental results show that the method of the present invention can achieve rapid enrichment of multiple target gene fragments, significantly improve the enrichment efficiency of target sequences, and improve the effective reading and sequencing depth of target gene fragments.
  • the enriched products of multiple target gene fragments can be used for sequencing analysis on various high-throughput chip sequencing platforms such as next-generation sequencing platforms after modification and purification. The present invention has been completed on this basis.
  • the present invention invented a new multiple target based on extension ligation reaction A rapid enrichment method for gene regions, which uses extension primers or/and blocking probes that are resistant to exonuclease modification, and performs single or multiple single-stranded nucleic acid-specific exonuclease enzymes after the primer-probe pair is denatured and hybridized with the sample genomic DNA.
  • the products are purified by enzymatic digestion, and then purified by physical methods such as magnetic bead purification, silica gel column purification or membrane filtration purification, and then amplified and purified with universal primers that match the next-generation sequencing platform to obtain a sequencing library.
  • the method is specific and efficient for capturing the target sequence, and the sequencing data of the amplified product can also be used for the analysis of the copy number of the target gene fragment, so as to realize the simultaneous detection of the point mutation and the copy number of the target gene fragment.
  • the second half is the specific sequence hybridized with the target nucleic acid fragment
  • the 5' end of the 3' end probe is phosphorylated
  • the first half is the specific sequence hybridized with the target nucleic acid fragment
  • the second half is the follow-up PCR amplification primers are consistent with the general sequence, and the 5' end of the 5' end probe is protected and modified 1 from exonuclease degradation, or the 3' end of the 3' end probe is modified by a few bases at the 3' end Protect modification 1 from exonuclease degradation, or both are modified at the same time, there are several bases between the two probes,
  • the probe is hybridized with the template DNA, it is digested with one or more single-stranded nucleic acid specific exonuclease 2 to remove the residual primer probe that is not hybridized with the template DNA.
  • Enzymatic digestion products are then purified by physical methods such as magnetic bead purification, silica gel column purification or membrane filtration purification
  • the purified product is subjected to an extension ligation reaction in a reaction system containing both polymerase and ligase: under the action of a polymerase without 5'->3' exonuclease activity, the gap between the two probes is filled, and then Ligation under the action of ligase;
  • the ligation reaction product is purified by physical methods such as magnetic bead purification, silica gel column purification or membrane filtration purification;
  • PCR primers also have a tag sequence with a length of several to dozens of bases.
  • the ligation products of different samples can be amplified with PCR primers with different tag sequences, so that the amplification products of different samples can be mixed in At the same time, in the subsequent sequencing data, the sequencing sequences can be classified into different samples according to the tag sequence;
  • Exonuclease-resistant modifications of the present invention include but are not limited to the following types: Phosphorothioates, 5-Propyne pdC, pdU, 2'-Fluoro bases, 2'-O-methyl bases, 2'-5'linked bases, LNA bases ,Chimeric linkage,3'Inverted dT.
  • exonuclease of the present invention includes but is not limited to the following types: T5 Exonuclease, T7 Exonuclease, Lambda Exonuclease, RecJ f , Exonuclease T, Exonuclease I, Exonuclease V, Exonuclease III.
  • an anti-exonuclease modification is first introduced into the 5' end of the extension primer and/or the 3' end of the blocking probe, and then the hybrid product is digested and purified, and then the secondary purification is carried out by physical methods. It is possible to remove the residual primer probes of unhybridized genomic DNA, purify the product, and then use high-temperature ligase and polymerase to complete the extension and ligation reaction in one reaction system at the same time. Purification removes residual primer probes as much as possible.
  • the method of the present invention can significantly reduce non-specific amplification and improve enrichment efficiency.
  • the method of the present invention can realize the enrichment of multiple target gene fragments, and the number of gene fragments can be from tens to thousands, or even tens of thousands.
  • the method of the present invention is simple and quick to operate, and can achieve the enrichment of target fragments of hundreds of samples within a few hours.
  • the method of the present invention can significantly improve the signal-to-noise ratio of the detection results unexpectedly, especially when multiple probe sets (n probe sets) are used in the same system, for example, n ⁇ 20, ⁇ 30, ⁇ 40, ⁇ 50, ⁇ 100, ⁇ 200, or ⁇ 500.
  • the sequencer performs sequencing, and the sequencing data is first sorted according to different tag sequences.
  • the sequencing data of each sample is paired with the human reference genome using the Burrows-Wheeler Aligner (BWA) software, and then the sequencing data is counted. Copy number estimation of target gene fragments.
  • BWA Burrows-Wheeler Aligner
  • primer3 primer design software http://bioinfo.ut.ee/primer3-0.4.0/primer3/
  • self-developed program designed for all exons of MVK, MVD, PMVK, FDPS4 genes 42 pairs of probes were designed, and the specific sequence amplification length was 183bp-280bp.
  • 8 pairs of probes were designed for 8 reference gene fragments, and the specific sequence amplification length was 185bp-283bp.
  • the 5' extension primer (the first probe) is composed of the 5' end universal sequence (the second part) plus the 3' end specific sequence (the first part), and the 5' end universal sequence is 5' ACACTCTTTCCCTACACGACGCTCTTCCGATCT3' (SEQ ID NO: 1 ), the 3' blocking probe (second probe) consists of a specific sequence at the 5' end (the first part) plus a general sequence at the 3' end (the second part), the 5' end is phosphorylated, and the 3' end is phosphorylated.
  • the phosphoester bond between the last 2 bases of the terminal is replaced with a thioester bond, and the general sequence at the 3' end is 5' AGATCGGAAGAGCACACGTCTGAACTCCAGTC3' (SEQ ID NO: 2).
  • the Tm value of the specific sequence of the 5' extension primer is 59°C-68°C
  • the Tm value of the specific sequence of the 3' blocking probe is 68°C-75°C
  • the Tm value of the 3' blocking probe of the same amplified fragment is usually More than 5°C larger than the 5' extension primer.
  • the enriched fragments and probe-specific sequence information are shown in Table 1.
  • probe hybridization solution 1.5 ⁇ l 10 ⁇ hybridization solution, 1.5 ⁇ l primer probe mixture (0.01 ⁇ M/5’ extension primer + 0.02 ⁇ M/3’ blocking probe), 2 ⁇ l ddH 2 0.
  • Hybridization reaction After shaking and mixing, put it on a PCR instrument, the PCR program is "95°C for 5 minutes, 50°C for 3 hours", and leave it at room temperature for 10 minutes for use.
  • the PCR amplification primer pair is a forward universal primer (5'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGC
  • the PCR reaction system is 20 ⁇ l, which contains 1 ⁇ HF buffer (NEB), 2.5 mM MgCl 2 , 0.3 mM dNTP mix, 0.3 ⁇ M each primer pair, 1 U Phusion DNA polymerase (NEB) and 10 ⁇ l of the above extension ligation purified product.
  • reaction system mixture was run according to the following PCR program: 98°C for 30s; (98°C for 10s, 65°C for 30s, 72°C for 1 min) ⁇ 30; 72°C for 5 minutes; 4°C incubation.
  • the quantified library was sequenced on the MiSeq second-generation sequencer of Illumina Company in the United States.
  • Data analysis Sort the sequencing data according to different tag sequences to obtain the sequencing data of each sample; use the Burrows-Wheeler Aligner (BWA) program to pair the sequencing data with the human genome reference sequence, and count the total sequencing amount of each sample, The sequencing depth of each target and reference fragment and the enrichment efficiency of each sample; the sequencing depth of each target fragment of the patient sample was divided by the sequencing depth of 8 reference fragments to obtain 8 ratios, and each ratio was divided by The corresponding ratio of the normal sample is multiplied by 2, so that 8 values are obtained, and the median is taken as the detection value of the copy number of the sample on the target fragment.
  • BWA Burrows-Wheeler Aligner
  • the sequencing depth of each fragment of 3 patient samples (P1, P2, P3) and 1 normal sample (C1) is shown in Table 2, and the statistical results of sequencing data are shown in Table 3. From the statistical data, all four samples achieved effective enrichment of 50 gene fragments: the enrichment efficiency was over 85%, the average effective reads were over 500 ⁇ , and the sequencing depth of all fragments was over 10 ⁇ .
  • the copy number calculation of each fragment was performed using the sequencing depth data.
  • the copy number detection values of the 42 gene fragments of the three patient samples (P1, P2 and P3) are shown in Figure 2. It can be seen from the figure that P1 has at least the deletion of exon 1 to exon 5 in the MVK gene. , while P2 and P3 deleted the exon 1 to exon 3 segment and the exon 5 to exon 8 segment in the FDPS gene, respectively. The results of these deletion mutations were confirmed by RT-PCR experiments.
  • MVK NM_000431.2
  • PMVK NM_006556.3
  • MVD NM_002461.1
  • FDPS NM_002004.2
  • the phosphoester bond between the 2 bases at the 5' end of the 5' extension primer (the first probe) is replaced with a thioester bond, i.e. the 5' end of the universal sequence at the 5' end of the first probe (the second part)
  • the phosphoester bond between the two bases was replaced with a thioester bond, and the rest was the same as in Example 1.
  • the 3' blocking probe (second probe) is exactly the same as in Example 1, and the phosphoester bond between the last 2 bases at the 3' end is replaced by a thioester bond, that is, the general sequence of the 3' end of the second probe The phosphoester bond between the last 2 bases at the 3' end of (Part II) was replaced with a thioester bond.
  • Example 1 increase the digestion step of the extension ligation product: add 5 ⁇ l of digestion and purification mixture: 0.5 ⁇ l Exonuclease I (20U/ ⁇ l, NEB), 1 ⁇ l Lambda (5U/ ⁇ l, NEB), 3.5 ⁇ l ddH 2 O .
  • the digested product was purified using 37.5 ⁇ l of magnetic beads (1.5 ⁇ , Vazyme), and finally eluted with 15 ⁇ l of 10 mM Tris.Cl, pH 8.0.
  • Example 3 Compared with Example 1, the results of capillary electrophoresis are shown in Figure 3, wherein the upper two figures are the capillary electrophoresis results of the sample of Example 1 and the blank control, respectively, and the following two figures are the sample of Example 2 and the blank control respectively. capillary electrophoresis results. The results show that both can enrich the target region, but Example 2 has fewer heterobands, and the enrichment effect is better.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明提供了一种靶基因区域快速富集方法。所述方法采用抗外切核酸酶修饰的延伸引物和/或阻滞探针,在引物探针对与样本DNA变性杂交后进行单链核酸特异外切酶酶切纯化,经酶切纯化的产物再通过磁珠纯化、硅胶柱纯化或膜过滤纯化等物理方法二次纯化,然后采用与二代测序平台相匹配的通用引物进行扩增纯化获得测序文库。本发明的方法可以实现多重目的基因片段的富集,提高靶序列的富集效率,提高目的基因片段的有效读数和测序深度,可以用于各种高通量芯片测序平台如二代测序平台的测序分析。

Description

一种靶基因区域快速富集方法 技术领域
本发明涉及生物技术领域,更具体地涉及一种靶基因区域快速富集方法。
背景技术
虽然用1000美元测定人类基因组的目标快要实现,但基因组研究的总成本应包括DNA测序、数据管理和数据分析(产生可直接解读的数据)的成本,这使得在大的群体水平研究以及临床应用中,基因组研究的实际成本在短时间内很难降低。近期,一种新的研究方法可以对疾病的特定区域和生物学通路、基因,甚至对整个外显子组(占基因组的1%)进行富集,然后进行无偏的研究,这种方法就是目标区域富集高通量测序。
目标区域富集高通量测序是针对感兴趣的一段或几段序列进行探针设计,通过不同的方法进行捕获富集,并进一步对捕获到的序列进行测序分析。由于其探针设计灵活、覆盖深度高的特点,更适用于大样本量疾病样本分析,或对全基因组、GWAS分析或连锁分析等结果进行验证;不仅能验证已发现的位点,同时能够进一步找到候选区域中的疾病易感位点。对目标基因进行富集后再进行高通量测序(NGS)的方法有如下优势:1)可显著的降低成本,2)目标区域的高测序深度保证了更准确的测序结果,3)更短的项目周转时间,4)目标区域的明确功能使得我们对结果的分析更加容易。鉴于这些优势,相对全基因组测序,目标区域富集联合高通量测序可以对更大的样本群体进行分析,这种方法还能在生物医学的研究和孟德尔疾病的临床诊断上有重要的应用价值,最后还可以根据个体的遗传特征进行个体化医疗。
对目标区域富集进行高通量测序,靶区域的富集是首要进行的工作,在一个具体的研究项目中怎么选择最合适的富集方法,需要考虑整个富集区域的大小,样本的数目和是否需要对多个样品同时测序(最高效的利用测序仪的通量)等因素。科学研究中和一些商业化平台使用的富集技术有很多,但根据其核心的反应原理可以分为三类:分别是基于PCR扩增、环化和杂交捕获的目标区域富集。
“PCR扩增”:通过多个长片段PCR(Long-range PCR)对目标区域直接进行PCR扩增,也可以选择有限重数的标准多重PCR或者高重数的多重PCR扩增大 量的短片段,还可以是创新的多重PCR(Ion AmpliSeqTM from Life Technologies,GeneRead DNAseq System from Qiagen,TargetRichTM from Kailos),微液滴PCR(RainDance),或者基于芯片的PCR(Access ArrayTM from Fluidigm)。基于PCR的方法最适于10-100kb范围的小目标区域,这种富集方法通常需要进行目标区域特异性的引物设计和PCR反应。PCR扩增方法的主要问题有:引物结合区域的序列变异容易导致扩增子丢失,对结构变异只能通过测序Reads的降低来发现。
“环化”:也叫分子倒置探针(Molecular inversion probes,MIPs),间隙填充挂锁探针(Gap-fill padlock probes)或选择器探针(Selector probes)。在100-500kb的范围区间,通过一种高特异性的方式(间隙填充和连接反应)形成包含目标区域序列的单链DNA环,进而产生包含共同DNA原件的结构,用于对感兴趣的靶区域进行选择性扩增,比较代表性的方法有Haloplex(Agilent)和MIPs。这种方法的主要问题有:引物结合区域的序列变异容易导致扩增子丢失,灵敏度和均一性相对较低,探针成本相对较高。
“杂交捕获”:样品中的核酸和锚定在固相支持物或者直接存于液体中的与目标区域互补的DNA/RNA探针杂交,然后通过物理捕获的方式分离出感兴趣的序列。捕获范围从500kb到整个全外显子组,发展出了一些经典的商业化杂交方法,比如SureSelect(Agilent),Nextera(Illumina),TruSeq(Illumina),SeqCap(Nimble-Gen),Ion TargetSeq(Life Technologies),这些方法对大范围和预设计的区域有更好的捕获效率和成本效率。“杂交捕获方法”的主要问题有:对样品的品质和量都有较高的要求,一般不能用于FFPE样品,而在实际操作中,优化的TruSeq和SureSelect方法也可用于FFPE样品。
在现在众多的对基因组区域进行靶向捕获的方法中,基于探针杂交的方法是最普遍适用并已广泛用于人和小鼠的外显子组捕获。这种方法又分为固相捕获(例如:芯片捕获)和液相捕获,取决于捕获反应发生装置的方式。而其中液相捕获更流行,因为其自动化的机械捕获方式更有优势。但这些杂交捕获的方法除了其本身的捕获效率较低、需要比较繁琐耗时的DNA文库构建步骤等固有缺陷外,其预定制的探针库也大大限制了大多数研究对目标区域和物种选择的灵活性。而基于PCR的富集可以绕开shotgun文库的制备,而是直接在最后的扩增阶段通过合适的5’引物进行片段扩增后用于测序,在候选区域的选择上相对灵活而且实验室操作性较强,但这种方法的主要缺陷是不易实现规模化(多重引物的交叉匹配、二聚体形成和非特异性匹配等问题),无论是对非常大的基 因组区域进行富集还是同时处理大量的样本。
基于分子倒置探针(MIP)的环化方法,与其它的方法有很大不同,最显著的特点是其有极高的特异性,但却很难在单个反应中对多个样本进行同时处理。用于环化富集的每对探针包含一段单链的DNA寡核苷酸,其两端的序列分别与富集区域的部分不连续片段互补,并成颠倒的线性顺序。靶向互补的臂比如挂锁探针的5’和3’端在与靶向序列杂交时迅速接近并杂交,在靶区域留下一段空隙;如果其5’是磷酸化的,DNA连接酶会连接起两端,形成环状的挂锁探针,链接在靶向区域。在环化之后,核酸外切酶消化去除大量的未环化探针和DNA片段。接下来在一个多模版PCR反应中,再经滚环扩增或者直接PCR,靶向所有环上的共同序列来扩增目标区域以此产生NGS文库。为检测靶序列的出现或者缺失,该反应的敏感度只需要单个的杂交事件,并且特异性很好。分子内挂锁反应的快速动力学更偏向于靶序列-探针的杂交,而不是探针-探针相互作用,因此,挂锁探针可以进行高度的多重反应。但该方法的一个重要的缺陷是其MIP倒置探针的长度很长(~70-300nt),在合成难度和成本上都很受限制,同时较长的探针形成环后容易出现空间构象上的约束,比如环内杂交,这样会导致其单次靶向的目标区域的片段大小有限(100-200bp)。
为了有效利用这些方法的优势的同时规避其缺陷,满足日益多元化的研究需求,一些新的靶向富集的方法被开发出来,其中illumina开发的TruSeq Custom Amplicon是一个可完全定制的、基于扩增子检测的靶向重测序***,通过该***,研究人员可以关注基因组中任何我们感兴趣的关键区域,允许在单个反应中同时对覆盖基因组区间600kb长度的多达1,536个扩增子进行测序。该体系是基于延伸连接反应(Extention-ligation),首先,针对每个目标区域扩增子设计一对寡核苷酸探针(序列由一段通用序列和与扩增子两侧序列互补的特异序列组成),多对探针(可达1,536对)混合在一个反应管中(Custom amplicon tube,CAT),加入未片段化的样品DNA,CAT探针与目标区域的两侧序列杂交,并通过片段大小选择去除未杂交上的寡核苷酸序列,再先后在聚合酶和连接酶的作用下经延伸和连接后得到包含靶区域的扩增子片段,通过包含测序接头和样品标签序列的与通用序列互补的引物进行PCR扩增,这样就得到多目标区域的扩增子文库,多个样品(单次MiSeq run最多支持96个样品混合)可以混合成库,经MiSeq System进行测序分析。但该方法也有一些不足之处:(1)该富集过程在探针杂交后只进行单轮的延伸连接反应,容易出现杂交脱靶和非特异性杂交,对复杂序列的捕获效率较低;(2)通过片段大小分选的方式去除 未杂交探针,无法有效避免非特异性杂交和非目的片段残留。
申请人前期公开并获授权的“一种高通量核酸分析方法及其应用”(ZL201210581830.9)的发明技术也可基于延伸连接反应(Extention-ligation)实现目标区域的快速富集,但相比TruSeq Custom Amplicon技术,它采用5’抗外切核酸酶修饰的5端延伸引物以及3’抗外切核酸酶修饰的3端连接探针,通过变性杂交/多重延伸连接同时反应、反应产物用各种核酸外切酶联合作用如核酸外切酶I(exonucleaseI)、核酸外切酶III(exonuclease III)及λ核酸外切酶(lamda exonuclease)共同消化处理去除非连接产物的单链或双链DNA、用包含测序接头和样品标签序列的与通用序列互补的引物进行PCR扩增,从而实现多目标区域的扩增子文库构建。该方法通过样本基因组DNA/探针杂交与聚合酶/连接酶延伸连接同管同时进行减少操作步骤,通过多重延伸连接循环提供基因组DNA模板利用效果,具有一定优势,但同时也存在体系优化难以及非特异扩增产物酶切不彻底等不足。
为了实现高通量,低成本,快速高效的靶区域富集测序,找出该区域内的致病突变、孟德尔遗传病等位基因的新突变或者外显子编码信息的改变,充分开发利用序列信息为人类进行疾病诊断与预防,实现个性化的医疗方案,以及药物开发,生物工程等领域的具体研究具有重要而深远的意义,上述的几种方法都具有很高的应用价值,但也存在许多不足。本领域中需要开发出新的低成本、高效率、非特异扩增少的靶基因富集技术。
发明内容
本发明的目的在于提供一种靶基因区域快速富集方法。
本发明的第一方面,提供了一种核酸片段的富集方法,所述方法包括步骤:
(1)提供一反应体系,所述反应体系包括:待测样本、n个探针组;
其中,所述n≥2,各个探针组中分别包含第一探针和第二探针;
所述第一探针和所述第二探针分别与同一条目标核酸片段的3’端和5’端特异性杂交(所述特异性杂交是指至少部分互补或完全互补);
所述第一探针不能被5’->3’方向核酸外切酶降解和/或所述第二探针不能被3’->5’方向核酸外切酶降解;
所述第一探针包括与目标核酸片段3’端特异性杂交的第一部分和与后续PCR扩增引物序列相对应的第二部分(所述相对应是指所述第二部分的反向互 补序列与PCR扩增引物能够特异性杂交);
所述第二探针包括与目标核酸片段5’端特异性杂交的第一部分和与后续PCR扩增引物序列特异性杂交的第二部分;
当所述第一探针和所述第二探针与同一目标核酸片段特异性杂交时,所述第一探针的3’末端与所述第二探针的5’末端至少间隔1个核苷酸的距离;
(2)对所述反应体系进行高温变性、退火处理,所述第一探针和所述第二探针在高温变性、退火过程中与所述待测样本的目标核酸片段特异性杂交形成杂交产物,从而获得反应混合物I,所述反应混合物I中含有所述杂交产物;
(3)用单链核酸特异外切酶对所述的反应混合物I进行消化处理,从而消化去除未与目标核酸片段杂交的第二探针,从而获得经消化的反应混合物II,所述反应混合物II中含有未被消化的所述杂交产物;
(4)对所述反应混合物II进行纯化处理,进一步去除残留的未与目标核酸片段杂交的第一探针和第二探针,从而获得经纯化的、含所述杂交产物的反应混合物III;
(5)利用核酸聚合酶和核酸连接酶对所述反应混合物III中的所述杂交产物进行延伸连接反应形成连接产物,从而获得含连接产物的反应混合物IV;任选地,对所述反应混合物IV进行纯化处理(如核酸特异外切酶消化),进一步去除残留的未与目标核酸片段杂交的第一探针和第二探针,从而获得经纯化的、含所述延伸连接产物的反应混合物V;和
(6)以所述反应混合物IV或V中的连接产物为模板,进行PCR扩增,从而获得PCR扩增产物,即为富集的核酸片段。
在另一优选例中,在步骤(4)和(5)中,所述纯化处理还同时去除反应混合物I中的盐离子和蛋白。
在另一优选例中,所述的杂交产物为第一探针和第二探针与目标核酸片段单链结合形成的三元复合物。
在另一优选例中,在步骤(4)中,用物理方法进行纯化处理。
在另一优选例中,在步骤(3)中,所述的单链核酸特异外切酶切割(或消化):未与探针特异性杂交形成所述杂交产物的单链DNA(尤其是互补链)、未结合的(或游离的)第一探针、和结合的(或游离的)所述第二探针。
在另一优选例中,在步骤(3)中,所述的单链核酸特异外切酶不切割(或消化)或基本不切割所述杂交产物。
在另一优选例中,在步骤(5)中,用物理方法进行纯化处理。
在另一优选例中,在步骤(5)中,所述的核酸特异外切酶切割(或消化)未延伸连接形成的延伸连接产物单链DNA(尤其是互补链)、未结合的(或游离的)第一探针、和结合的(或游离的)所述第二探针;。
在另一优选例中,在步骤(5)中,所述的核酸特异外切酶不切割(或消化)或基本不切割所述延伸连接产物。
在另一优选例中,所述的n个探针组分别靶向不同的目标核酸片段。
在另一优选例中,所述的n的下限为20、30、40、50、100、200、或500,和/或所述的n的上限为2000、5000、10000、100000、500000、或1000000。
在另一优选例中,在步骤(6)之后,还包括步骤:将所述PCR扩增产物制作成核酸片段文库。
在另一优选例中,在步骤(5)中,在所述核酸聚合酶作用下,所述第一探针沿所述目标核酸片段进行DNA链延伸,延伸至所述第二探针的5’末端时被其阻滞,获得第一探针延伸DNA链;以及在所述核酸连接酶的作用下,将所述第一探针延伸DNA链3’端与所述第二探针5’端连接,从而形成含连接产物的反应混合物。
在另一优选例中,所述第一探针不能被5’->3’方向核酸外切酶降解,能够被3’->5’方向核酸外切酶降解。
在另一优选例中,所述第一探针的5’端带有防止核酸外切酶降解的保护基团。
在另一优选例中,所述第二探针不能被3’->5’方向核酸外切酶降解,能够被5’->3’方向核酸外切酶降解。
在另一优选例中,所述第二探针的3’端带有防止核酸外切酶降解的保护基团。
在另一优选例中,所述第一探针不能被5’->3’方向核酸外切酶降解,并且步骤(3)中所用的核酸外切酶为5’->3’方向单链核酸特异外切酶。
在另一优选例中,所述第二探针不能被3’->5’方向核酸外切酶降解,并且步骤(3)中所用的核酸外切酶为3’->5’方向单链核酸特异外切酶。
在另一优选例中,所述第一探针不能被5’->3’方向核酸外切酶降解且所述第二探针不能被3’->5’方向核酸外切酶降解,并且步骤(3)中使用3’->5’方向单链核酸特异外切酶,步骤(5)中同时使用5’->3’方向单链核酸特异外切酶和3’->5’方向单链核酸特异外切酶。
在另一优选例中,所述方法中通过在所述第一探针的5’端和/或所述第二 探针的3’端进行抗核酸外切酶的修饰,以实现所述第一探针不能被5’端核酸外切酶降解和/或所述第二探针不能被3’端核酸外切酶降解。
在另一优选例中,所述修饰包括但不限于:Phosphorothioates修饰,5-Propyne pdC修饰,pdU修饰,2’-Fluoro bases修饰,2’-O-methyl bases修饰,2’-5’linked bases修饰,LNA bases修饰,Chimeric linkage修饰,3’Inverted dT修饰、或其组合。
在另一优选例中,所述第一探针的5’端的1-10个,较佳地2-6个碱基具有抗核酸外切酶的修饰。
在另一优选例中,所述第二探针的3’端的1-10个,较佳地2-6个碱基具有抗核酸外切酶的修饰。
在另一优选例中,所述的核酸外切酶选自下组:T5 Exonuclease、T7 Exonuclease、Lambda Exonuclease、RecJ f、Exonuclease T、Exonuclease I、Exonuclease V、Exonuclease III、或其组合。
在另一优选例中,所述核酸聚合酶为高温耐热核酸聚合酶,优选地,所述核酸聚合酶选自下组:Hemo
Figure PCTCN2021105073-appb-000001
(NEB)、AmpliTaq DNA Polymerase(AmpliTaq DNA聚合酶)、Stoffel Fragment(Life Technologies);
Figure PCTCN2021105073-appb-000002
Hot Start Flex DNA Polymerase(NEB)。
在另一优选例中,所述核酸聚合酶为基本上没有5’到3’核酸外切酶活性的聚合酶。
在另一优选例中,所述核酸连接酶为高温耐热核酸连接酶,优选地,所述核酸连接酶选自下组:Taq DNA Ligase(NEB);Ampligase(Epicentre);9°N TM DNA Ligase(NEB)。
在另一优选例中,扩增同一个目标核酸片段的所述第二探针的Tm值高于所述第一探针的Tm值。
在另一优选例中,所述第二探针的Tm值高出所述第一探针的Tm值3℃-10℃,优选地所述第二探针的Tm值高出所述第一探针的Tm值4℃-6℃,如5℃。
在另一优选例中,各所述探针组中的各第一探针的Tm值为59℃-68℃。
在另一优选例中,各所述探针组中的各第二探针的Tm值为68℃-75℃。
在另一优选例中,所述第二探针的5’端是经磷酸化修饰的。
在另一优选例中,所述n(探针组的种数)为20-1000000,优选为30-500000,更优选为40-100000,最优选为50-10000,如100-10000,500-10000,1000-10000。
在另一优选例中,本发明中将针对同一目标(目的)核酸片段的探针组称为 一种(个)探针组,例如,当n为2时,则两种探针组分别针对两种不同的目标核酸片段。
在另一优选例中,所述的第一探针的第一部分的长度为16-50bp(优选为21-36bp,更优选为33bp),和/或第二部分的长度为18-30bp。
在另一优选例中,所述的第二探针的第一部分的长度为16-50bp(优选为21-36bp,更优选为32bp),和/或第二部分的长度为21-36bp。
在另一优选例中,各个探针组的第一探针的第二部分是相同或基本相同的。
在另一优选例中,各个探针组的第二探针的第二部分是相同或基本相同的。
在另一优选例中,所述样本中目标核酸片段的总量为1-2000ng,优选为200-500ng。
在另一优选例中,所述样本为源自动物、植物或微生物的核酸样本,优选为DNA样本或RNA反转录产物cDNA样本。
在另一优选例中,所述样本为源自动物(优选为哺乳动物,更优选为人)的核酸样本,优选为DNA样本或RNA反转录产物cDNA样本。
在另一优选例中,所述待测样本中仅包含一种样本或者所述待测样本中包含来自于不同对象的多种检测样本(如分别取自多个患者的样本、或多个不同种组织的样本)。
在另一优选例中,所述反应体系中还包含缓冲液。
在另一优选例中,所述步骤(2)中高温变性、退火处理的条件为95-100℃2-20min,随后50℃处理0.5-20h,较佳地为1-5h。
在另一优选例中,所述步骤(3)中,所述杂交产物不会被核酸外切酶降解,未进行杂交的第一探针和/或第二探针会被核酸外切酶降解。
在另一优选例中,所述步骤(4)中纯化处理包括:磁珠纯化、硅胶柱纯化、膜过滤纯化、乙醇或异丙醇沉淀纯化、或其组合。
在另一优选例中,所述步骤(5)中延伸连接产物的特异序列(即目标核酸序列)长度为30-5000bp,优选为100-1000bp,更优选为150-310bp。
在另一优选例中,所述步骤(5)中不进行扩增循环。
在另一优选例中,所述PCR扩增引物上带有标签序列,所述标签序列长度为1-100bp,优选为5-10bp。不同样本的连接产物可以用带不同标签序列的PCR扩增引物进行扩增,这样不同样本的扩增产物可以混合在一起,在后续测序数据中可以根据该标签序列对测序序列进行归类。
在另一优选例中,所述的PCR扩增引物的长度为42-58bp。
在另一优选例中,在步骤(6)中,仅采用一种PCR扩增引物对。
在另一优选例中,所述步骤(6)中,所述PCR扩增中所使用的引物(PCR扩增引物)包括正向引物和反向引物,所述正向引物包含能够与所述第一探针的所述第二部分序列的反向互补序列特异性杂交的序列,所述反向引物包含与所述第二探针的所述第二部分特异性杂交的序列。
在另一优选例中,所述步骤(6)中,所述正向引物和/或所述反向引物中含有与高通量芯片测序平台兼容的通用序列。
在另一优选例中,所述步骤(6)中,所述正向引物和/或所述反向引物中含有标签序列,针对不同的样本采用不同的标签序列。
在另一优选例中,所述连接产物利用含不同标签序列的通用引物进行扩增建立适合下一代测序平台的文库,同时利用含不同标签序列的通用引物构建的文库可以混合在一起进行下一代测序。
在另一优选例中,所述第一探针的所述第二部分序列为:
5’A*C*ACTCTTTCCCTACACGACGCTCTTCCGATCT3’(SEQ ID NO:1),其中,*表示硫代修饰。
在另一优选例中,所述第二探针的所述第二部分序列为:
5’pAGATCGGAAGAGCACACGTCTGAACTCCAG*T*C3’(SEQ ID NO:2),其中,*表示硫代修饰,p表示磷酸化修饰。
在另一优选例中,所述步骤(6)中,所述正向引物序列为:
5’AATGATACGGCGACCACCGAGATCT[x]ACACTCTTTCCCTACACGACGC 3’(SEQ ID NO:3),其中[X]为无或标签序列;优选地,[X]长度为0bp-100bp,优选为0bp-10bp,如8bp。
在另一优选例中,所述步骤(6)中,所述反向引物序列为:
5’CAAGCAGAAGACGGCATACGAGAT[x]GTGACTGGAGTTCAGACGTGTGCT 3’(SEQ ID NO:104),其中[X]为无或标签序列;优选地,[X]长度为1bp-100bp,优选为5bp-10bp,如8bp。
在另一优选例中,所述第一探针的第一部分的序列如SEQ ID NO.:2a所示,所述第二探针的第一部分的序列如SEQ ID NO.:2a+1所示,其中,a为2-51的整数。
在另一优选例中,所述方法适用于多重基因片段的富集扩增,同时扩增的基因片段数目可以数十,数百或数千,甚至数万,除了目的基因片段之外还可包含一 些参照基因片段,数目可以是0-999。
在另一优选例中,所述方法富集的核酸片段的测序数据可以通过分析获得目的基因片段的拷贝数,分析方法为统计每个目的及参照片段的测序深度,患者样本的每个目的片段的测序深度分别除以每个参照片段的测序深度获得m个比值(m为参照基因片段,参照基因可以是该片段以外的任何基因片段),每个比值再分别除以正常样本的对应比值或所有样本的中位数比值再乘以正常样本在该目的片段的拷贝数或者绝大部分样本在该片段上的拷贝数,这样就获得m个数值,取其中位数作为该样本在目标片段上的拷贝数检测值。
本发明的第二方面,提供了一种核酸测序方法,所述方法中包括步骤:使用本发明第一方面所述的方法,对目的核酸片段进行富集。
在另一优选例中,所述核酸测序方法中使用高通量芯片测序平台对经使用本发明第一方面所述的方法富集的目的核酸片段进行单分子扩增测序或直接进行单分子测序。
在另一优选例中,所述方法还包括步骤:对测序数据进行分析,对测序序列的样本归类,读取基因突变位点和/或计算各个基因片段拷贝数。
本发明的第三方面,提供了一种试剂盒,所述试剂盒用于核酸片段的富集,所述试剂盒中包括:对应于待测样本中核苷酸序列的一种或多种探针组、核酸聚合酶和核酸连接酶;
探针组中包含第一探针和第二探针,
所述第一探针和所述第二探针分别与同一条目标核酸片段的3’端和5’端特异性杂交(所述特异性杂交是指至少部分互补或完全互补);
所述第一探针不能被5’->3’方向核酸外切酶降解;
所述第二探针不能被3’->5’方向核酸外切酶降解;
所述第一探针包括与目标核酸片段3’端特异性杂交的第一部分和与后续PCR扩增引物序列相对应的第二部分(所述相对应是指所述第二部分的反向互补序列与PCR扩增引物能够特异性杂交);
所述第二探针包括与目标核酸片段5’端特异性杂交的第一部分和与后续PCR扩增引物序列相对应的第二部分(所述相对应是指所述第二部分与PCR扩增引物能够特异性杂交);
当所述第一探针和所述第二探针与同一目标核酸片段特异性杂交时,所述 第一探针的3’末端与所述第二探针的5’末端至少间隔1个核苷酸的距离。
在另一优选例中,所述试剂盒中还包括PCR扩增引物,所述PCR扩增引物包括正向引物和反向引物,所述正向引物包含能够与所述第一探针的所述第二部分的反向互补序列特异性杂交的序列,所述反向引物包含与所述第二探针的所述第二部分特异性杂交的序列。
在另一优选例中,所述正向引物和/或所述反向引物中含有与高通量芯片测序平台兼容的通用序列。
在另一优选例中,所述正向引物和/或所述反向引物中含有标签序列,针对不同的样本采用不同的标签序列。
在另一优选例中,所述试剂盒中还包括常规的PCR试剂。
应理解,在本发明范围内中,本发明的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合,从而构成新的或优选的技术方案。限于篇幅,在此不再一一累述。
附图说明
图1显示了发明的操作流程。
图2显示了实施例中3个患者样本目的基因片段拷贝数的检测值。
图3显示了实施例1和实施例2的毛细管电泳结果。
具体实施方式
本发明人经过广泛而深入地研究,首次意外地发现一种基于延伸连接反应的靶基因区域富集新技术。实验结果表明,本发明的方法可以实现多重目的基因片段的快速富集,显著提高的靶序列的富集效率,提高目的基因片段的有效读数和测序深度。多重目的基因片段的富集产物可以通过修饰及纯化定量后用于各种高通量芯片测序平台如二代测序平台的测序分析。在此基础上完成了本发明。
具体地,本发明在总结了TruSeq Custom Amplicon以及发明人前期开发的“一种高通量核酸分析方法及其应用”技术的不足之处后,发明了一种新的基于延伸连接反应的多重目标基因区域快速富集方法,它采用抗外切核酸酶修饰的延伸引物或/及阻滞探针,在引物探针对与样本基因组DNA变性杂交后进行单个或多个单链核酸特异外切酶酶切纯化,酶切纯化产物再通过磁珠纯化、硅胶 柱纯化或膜过滤纯化等物理方法二次纯化,然后采用与二代测序平台相匹配的通用引物进行扩增纯化获得测序文库。该方法对靶序列的捕获特异而高效,其扩增产物的测序数据还可以用于目标基因片段拷贝数分析,从而实现目的基因片段点突变以及拷贝数的同时检测。
在本发明一个优选的实施方式中,所述方法的步骤如下(如图1所示):
a)针对目的核酸片段设计两个特异性DNA探针,一个是5’端延伸引物探针,另一个3’端延伸阻滞探针,5’端探针前半部分序列是后续PCR扩增引物相一致的通用序列,而后半部分为与目的核酸片段杂交的特异序列,3’端探针的5’端进行磷酸化修饰,前半部分为与目的核酸片段杂交的特异序列,后半部分是后续PCR扩增引物相一致的通用序列,5’端探针的5’末端几个碱基进行保护修饰 免受核酸外切酶降解,或者3’端探针的3’末端几个碱基进行保护修饰 免受核酸外切酶降解,或者2者同时修饰,这两个探针之间有若干个碱基距离,
b)探针与模板DNA杂交后用一种或多种单链核酸特异外切酶②进行消化去除未与模板DNA杂交上的残留引物探针。
c)酶切消化产物再利用磁珠纯化、硅胶柱纯化或膜过滤纯化等物理方法进行二次纯化
d)纯化产物在同时含聚合酶和连接酶的反应体系中进行延伸连接反应:在没有5’->3’外切酶活性的聚合酶作用下延伸将两个探针间隙补上,然后在连接酶作用下进行连接;
e)连接反应产物用磁珠纯化、硅胶柱纯化或膜过滤纯化等物理方法进行纯化;
f)利用一对与后续高通量芯片测序平台扩增引物或测序引物相匹配的PCR引物对纯化连接产物进行扩增获得富集了多个目的基因片段的适合后续高通量芯片测序平台的测序文库。通常情况下,PCR引物还有一段数个至数十个碱基长度的标签序列,不同样本的连接产物可以用带不同标签序列的PCR引物进行扩增,这样不同样本的扩增产物可以混合在一起,在后续测序数据中可以根据该标签序列将测序序列归类到不同样本中去;
g)连接探针扩增产物利用下一代高通量芯片测序平台进行单分子扩增测序或直接单分子测序;
h)对测序数据进行分析,实现测序序列的样本归类,基因突变位点读取以及各个基因片段拷贝数计算:首先根据标签序列将测序获得的序列归到相应的样本上,然后利用相应软件将每个序列与参照基因组序列进行匹配并读取差异 序列差异获得突变位点,统计每个连接产物的测序序列数目,通过参照基因片段的校正后再与正常样本的该校正值对比计算该基因片段的拷贝数。
本发明的抗核酸外切酶的修饰包括但不限于以下类型:Phosphorothioates,5-Propyne pdC,pdU,2’-Fluoro bases,2’-O-methyl bases,2’-5’linked bases,LNA bases,Chimeric linkage,3’Inverted dT。
本发明的核酸外切酶包括但不限于以下类型:T5 Exonuclease,T7 Exonuclease,Lambda Exonuclease,RecJ f,Exonuclease T,Exonuclease I,Exonuclease V,Exonuclease III。
本发明的主要优点包括:
(a)本发明的方法首先在延伸引物5’端和/或阻滞探针3’端引入抗外切核酸酶修饰,再对杂交产物进行酶切纯化,然后再进行物理方法二次纯化尽可能去除未杂交上基因组DNA的残留引物探针,纯化产物然后利用高温连接酶及聚合酶,在1个反应体系同时完成延伸连接反应,对延伸连接产物进行酶切纯化,再进行物理方法二次纯化尽可能去除残留引物探针。本发明的方法可以显著减少非特异性扩增,提高富集效率。
(b)本发明的方法可以实现多重目的基因片段富集,基因片段数可以数十至数千,甚至数万。
(c)本发明的方法操作简单快速,可以在数小时之内实现数百个样本的目的片段富集。
(d)通过本发明方法富集的产物中不同片段含量的相对比例同原始模板这些片段的相对比例存在一定的对应关系,因此,这些产物的测序数据除了能够提供点突变信息外还可以通过参照片段以及参照样本的双重校正后获得目标片段的拷贝数信息。
(e)本发明方法可以出乎意料地显著提高检测结果的信噪比,尤其是在同一体系中采用多个探针组(n个探针组)的情况下尤其显著,例如n≥20、≥30、≥40、≥50、≥100、≥200、或≥500时。
下面结合具体实施例,进一步阐述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。下列实施例中未注明具体条件的实验方法,通常按照常规条件,例如Sambrook等人,分子克隆:实验室手册(New York:Cold Spring Harbor Laboratory Press,1989)中所述的条件,或按照制造厂商所建议的条件。除非另外说明,否则百分比和份数是重量百分比和重量份数。
实施例1
针对MVK,MVD,PMVK,FDPS 4个基因的各个外显子设计了42对探针,特异序列扩增长度为183bp-280bp,同时也针对8个参照基因片段设计了8对探针,特异序列扩增长度为185bp-283bp。利用这些探针对采用本发明技术对目的基因片段以及参照基因片段在1个反应体系中进行同时扩增。3个患者样本以及1个正常人样本在延伸连接之后的PCR扩增时采用含不同标签序列的通用引物对,不同样本的扩增产物先进行混合,纯化定量后采用美国illumina公司的MiSeq二代测序仪进行测序,测序数据先根据不同标签序列进行分拣,每个样本的测序数据利用Burrows-Wheeler Aligner(BWA)软件进行与人参照基因组进行配对然后进行测序数据统计,同时利用该统计数据进行目的基因片段的拷贝数估计。
(一)具体实验步骤
1,探针设计
依据primer3引物设计软件(http://bioinfo.ut.ee/primer3-0.4.0/primer3/)基本原理,采用自行开发的程序,针对MVK,MVD,PMVK,FDPS4个基因的所有外显子设计了42对探针,特异序列扩增长度为183bp-280bp,同时也针对8个参照基因片段设计了8对探针,特异序列扩增长度为185bp-283bp。5’延伸引物(第一探针)由5’端通用序列(第二部分)加上3’端特异序列(第一部分)组成,5’端通用序列为5’ACACTCTTTCCCTACACGACGCTCTTCCGATCT3’(SEQ ID NO:1),3’阻滞探针(第二探针)由5’端特异序列(第一部分)加上3’端通用序列(第二部分)组成,其5’端进行磷酸化修饰,而3’端最后2个碱基之间的磷酯键用硫酯键代替,3’端通用序列为5’AGATCGGAAGAGCACACGTCTGAACTCCAGTC3’(SEQ ID NO:2)。5’延伸引物特异序列的Tm值为59℃-68℃,3’阻滞探针特异序列的Tm值为68℃-75℃,同一个扩增片段的3’阻滞探针的Tm值通常比5’延伸引物大5℃以上。富集片段及探针特异序列信息见表1。
2,杂交纯化
1)配制10×杂交液:100mM Tris.Cl,500mM NaCl,1mM EDTA,pH8.0。
2)将基因组DNA稀释到25ng/μl,配制10μl变性体系:1.375μl 1×TE,pH8.0,0.625μl 4×GC溶液(Genesky),8μl基因组DNA。
3)进行基因组DNA片段化及变性:98℃10min;4℃保温。
4)加入5μl探针杂交混合液:1.5μl 10×杂交液,1.5μl引物探针混合液(0.01μΜ/5’延伸引物+0.02μΜ/3’阻滞探针),2μl ddH 20。
5)杂交反应:震荡混匀后,上PCR仪器,PCR程序为“95℃5min,50℃反应3h”,室温放置10分钟后备用。
6)加入5μl酶切纯化混合液:0.5μl Exonuclease I(20U/μl,NEB),2μl 10x Exonuclease I缓冲液,1μl MgCl 2(100mM),1.5μl ddH 2O。
7)轻微振荡混匀,3000rpm离心2分钟,然后37度30分钟。
8)使用30μl磁珠(1.5×,Vazyme)进行纯化,最后用15μl洗脱液(30mM KCl 10mM Tris.Cl,pH8.0)洗脱。
3,延伸连接反应
1)配制延伸连接反应混合液:1.25μl 4×GC溶液,0.4μl HemoKlenTaq(NEB),4μl 5×HemoKlenTaq缓冲液,0.1μl Taq DNA ligase(500U/ul,Genesky),0.4μl NAD(50mM),0.4μl 10mM dNTP,0.5μl MgCl 2(100mM)。
2)加入13μl上述杂交纯化洗脱液。
3)延伸连接反应:56℃30min,4℃保温。
4)使用30μl磁珠(1.5×,Vazyme)进行纯化,最后用15μl 10mM Tris.Cl,pH8.0洗脱。
4,连接产物PCR扩增
1)PCR扩增引物对为一条正向通用引物(5’AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGC
3’,SEQ ID NO:3)和一条样本特异反向引物(5’CAAGCAGAAGACGGCATACGAGAT[n 1n 2n 3n 4n 5n 6n 7n 8]GTGACTGGAGTTCAGACGTGTGCT3’SEQ ID NO:104),这里n 1n 2n 3n 4n 5n 6n 7n 8为标签序列,4个样本对应的标签序列为TGGAAGGA,CGCCTTCA,TAGAAATC和CATTCTGC
2)PCR反应体系为20μl,其中含1×
Figure PCTCN2021105073-appb-000003
HF缓冲液(NEB),2.5mM MgCl 2,0.3mM dNTP混合液,0.3μM每对引物,1U Phusion DNA聚合酶(NEB)以及10μl上述延伸连接纯化产物。
3)反应体系混合液按如下PCR程序运行:98℃30s;(98℃10s,65℃30s,72℃1min)×30;72℃5min;4℃保温。
5,将上述4个样本的PCR扩增产物混合之后,用2%琼脂糖凝胶电泳后割胶分离出200bp-500bp之间的片段,片段产物采用RT-qPCR进行分子数定量。
6,定量后的文库上美国illumina公司的MiSeq二代测序仪进行测序。
7,数据分析:对测序数据根据不同标签序列进行分拣获得每个样本的测序数据;测序数据用Burrows-Wheeler Aligner(BWA)程序进行与人基因组参照序列配对,统计每个样本总测序量,每个目的及参照片段的测序深度以及每个样本的富集效率;患者样本的每个目的片段的测序深度分别除以8个参照片段的测序深度获得8个比值,每个比值再分别除以正常样本的对应比值再乘以2,这样就获得8个数值,取其中位数即为该样本在目标片段上的拷贝数检测值。
(二)实验数据结果
1)4个样本50片段的测序数据统计
3个病人样本(P1,P2,P3)以及1个正常样本(C1)的每个片段的测序深度见表2,测序数据的统计结果见表3。从统计数据看,4个样本均实现50个基因片段的有效富集:其富集效率均达到85%以上,平均有效读数500×以上,所有片段的测序深度都达到10×以上。
并且,PCR扩增产物的琼脂糖凝胶电泳结果显示,非特异性扩增显著下降,几乎没有杂带,显著降低了背景。
2)样本拷贝数检测值
利用测序深度数据进行各个片段的拷贝数计算。三个病人样本(P1,P2及P3)的42个基因片段的拷贝数检测值见图2,从图中可以看出P1在MVK基因上至少缺失了外显子1至外显子5区段,而P2及P3在FDPS基因上分别缺失了外显子1至外显子3区段和外显子5至外显子8区段。经RT-PCR实验的验证,这些缺失突变结果是准确的。
表1富集基因片段及其探针特异序列信息
Figure PCTCN2021105073-appb-000004
Figure PCTCN2021105073-appb-000005
Figure PCTCN2021105073-appb-000006
Figure PCTCN2021105073-appb-000007
a,该基因位置的统计对应的mRNA分别为MVK(NM_000431.2),PMVK(NM_006556.3),MVD(NM_002461.1)以及FDPS(NM_002004.2)。
表2 4个样本50个基因片段的测序深度数据
片段名 P1 P2 P3 C1
eControl01 908 682 622 631
eControl02 631 387 474 459
eControl03 711 340 526 439
eControl04 1087 564 705 683
eControl05 870 563 707 623
eControl06 2283 1446 1698 1459
eControl07 904 532 871 538
eControl08 94 45 86 75
eMVKE01 393 444 626 488
eMVKE02 369 435 540 417
eMVKE03 603 851 1036 883
eMVKE04 751 862 1214 969
eMVKE05 137 145 169 151
eMVKE06 360 248 324 249
eMVKE07 1245 790 918 889
eMVKE08 814 465 561 559
eMVKE09 989 517 678 626
eMVKE10 2415 1356 1887 1622
eMVKE11 736 435 646 484
ePMVKE01a 971 546 715 569
ePMVKE01b 882 602 722 676
ePMVKE01c 3047 1895 2358 2149
ePMVKE02 745 499 546 483
ePMVKE03 703 523 697 517
ePMVKE04 3817 2083 2809 2366
ePMVKE05 1196 602 937 678
eMVDE01 164 95 113 118
eMVDE02 444 244 351 287
eMVDE03 684 388 545 419
eMVDE04 755 357 467 438
eMVDE05a 718 400 537 447
eMVDE05b 458 302 384 281
eMVDE06 2064 1627 1497 1633
eMVDE07a 230 123 181 161
eMVDE07b 2075 1309 1503 1587
eMVDE08 185 105 149 120
eMVDE09 127 81 100 84
eMVDE10 173 122 117 111
eFDPSE01a 69 27 51 50
eFDPSE01b 642 160 503 369
eFDPSE02 92 24 62 53
eFDPSE03 108 30 85 75
eFDPSE04 1015 513 747 611
eFDPSE05 631 340 236 405
eFDPSE06 1934 1095 540 1077
eFDPSE07 237 150 95 168
eFDPSE08 84 47 28 45
eFDPSE09 158 95 123 101
eFDPSE10 618 357 494 426
eFDPSE11 471 303 378 348
表3 4个样本50个基因片段的测序数据统计
数据统计 P1 P2 P3 C1
(>2×)% 100% 100% 100% 100%
(>10×)% 100% 100% 100% 100%
中位数读数 693 417 540 453
平均读数 835 523 647 581
有效读数 41797 26151 32358 29096
总读数 47795 29566 36533 32415
富集效率 87.45% 88.45% 88.57% 89.76%
实施例2
本实施例的具体实验步骤与实施例1大体一致,区别如下:
(一)具体实验步骤1,探针设计
5’延伸引物(第一探针)的5’末端2个碱基之间的磷酯键用硫酯键代替,即第一探针的5’端通用序列(第二部分)的5’末端2个碱基之间的磷酯键用硫酯键代替,其余部分与实施例1相同。
3’阻滞探针(第二探针)与实施例1完全相同,3’末端最后2个碱基之间的磷酯键用硫酯键代替,即第二探针的3’端通用序列(第二部分)的3’末端最后2个碱基之间的磷酯键用硫酯键代替。
2,延伸连接反应
与实施例1相比,增加延伸连接产物酶切步骤:加入5μl酶切纯化混合液:0.5μl Exonuclease I(20U/μl,NEB),1μl Lambda(5U/μl,NEB),3.5μl ddH 2O。酶切产物使用37.5μl磁珠(1.5×,Vazyme)进行纯化,最后用15μl 10mM Tris.Cl,pH8.0洗脱。
(二)实验数据结果比较
与实施例1比较,毛细管电泳结果如图3所示,其中,上面两个图分别为实施例1的样本与空白对照的毛细管电泳结果,下面两个图分别为实施例2的样本与空白对照的毛细管电泳结果。结果显示,二者都能对目标区域进行富集,但是实施例2的杂带更少,富集效果更好。
在本发明提及的所有文献都在本申请中引用作为参考,就如同每一篇文献被单独引用作为参考那样。此外应理解,在阅读了本发明的上述讲授内容之后,本领域技术人员可以对本发明作各种改动或修改,这些等价形式同样落于本申请所附权利要求书所限定的范围。

Claims (10)

  1. 一种核酸片段的富集方法,其特征在于,所述方法包括步骤:
    (1)提供一反应体系,所述反应体系包括:待测样本、n个探针组;
    其中,所述n≥2,各个探针组中分别包含第一探针和第二探针;
    所述第一探针和所述第二探针分别与同一条目标核酸片段的3’端和5’端特异性杂交(所述特异性杂交是指至少部分互补或完全互补);
    所述第一探针不能被5’->3’方向核酸外切酶降解和/或所述第二探针不能被3’->5’方向核酸外切酶降解;
    所述第一探针包括与目标核酸片段3’端特异性杂交的第一部分和与后续PCR扩增引物序列相对应的第二部分(所述相对应是指所述第二部分的反向互补序列与PCR扩增引物能够特异性杂交);
    所述第二探针包括与目标核酸片段5’端特异性杂交的第一部分和与后续PCR扩增引物序列特异性杂交的第二部分;
    当所述第一探针和所述第二探针与同一目标核酸片段特异性杂交时,所述第一探针的3’末端与所述第二探针的5’末端至少间隔1个核苷酸的距离;
    (2)对所述反应体系进行高温变性、退火处理,所述第一探针和所述第二探针在高温变性、退火过程中与所述待测样本的目标核酸片段特异性杂交形成杂交产物,从而获得反应混合物I,所述反应混合物I中含有所述杂交产物;
    (3)用一种或多种单链核酸特异外切酶对所述的反应混合物I进行消化处理,从而消化去除未与目标核酸片段杂交的第一探针和/或第二探针,从而获得经消化的反应混合物II,所述反应混合物II中含有未被消化的所述杂交产物;
    (4)对所述反应混合物II进行纯化处理,进一步去除残留的未与目标核酸片段杂交的第一探针和第二探针,从而获得经纯化的、含所述杂交产物的反应混合物III;
    (5)利用核酸聚合酶和核酸连接酶对所述反应混合物III中的所述杂交产物进行延伸连接反应形成连接产物,从而获得含连接产物的反应混合物IV;任选地,对所述反应混合物IV进行纯化处理,进一步去除残留的未与目标核酸片段杂交的第一探针和第二探针,从而获得经纯化的、含所述延伸连接产物的反应混合物V;和
    (6)以所述反应混合物IV或V中的连接产物为模板,进行PCR扩增,从而获得PCR扩增产物,即为富集的核酸片段。
  2. 如权利要求1所述的方法,其特征在于,在步骤(5)中,在所述核酸聚合酶 作用下,所述第一探针沿所述目标核酸片段进行DNA链延伸,延伸至所述第二探针的5’末端时被其阻滞,获得第一探针延伸DNA链;以及在所述核酸连接酶的作用下,将所述第一探针延伸DNA链3’端与所述第二探针5’端连接,从而形成含连接产物的反应混合物。
  3. 如权利要求1所述的方法,其特征在于,所述第一探针不能被5’->3’方向核酸外切酶降解,并且步骤(3)中所用的核酸外切酶为5’->3’方向单链核酸特异外切酶;和/或
    所述第二探针不能被3’->5’方向核酸外切酶降解,并且步骤(3)中所用的核酸外切酶为3’->5’方向单链核酸特异外切酶。
  4. 如权利要求1所述的方法,其特征在于,所述n(探针组的种数)为20-1000000,优选为30-500000,更优选为40-100000,最优选为50-10000。
  5. 如权利要求1所述的方法,其特征在于,所述步骤(4)中纯化处理包括:磁珠纯化、硅胶柱纯化、膜过滤纯化、乙醇或异丙醇沉淀纯化、或其组合。
  6. 如权利要求1所述的方法,其特征在于,所述第二探针的5'端是经磷酸化修饰的。
  7. 如权利要求1所述的方法,其特征在于,所述核酸聚合酶为高温耐热核酸聚合酶;和/或所述核酸连接酶为高温耐热核酸连接酶。
  8. 如权利要求1所述的方法,其特征在于,所述步骤(6)中,所述PCR扩增中所使用的引物包括正向引物和反向引物,所述正向引物包含能够与所述第一探针的所述第二部分序列的反向互补序列特异性杂交的序列,所述反向引物包含与所述第二探针的所述第二部分特异性杂交的序列。
  9. 一种核酸片段富集方法,其特征在于,所述方法中包括步骤:使用权利要求1-8任一项所述的方法,对目的核酸片段进行富集。
  10. 一种试剂盒,其特征在于,所述试剂盒用于核酸片段的富集,所述试剂盒中包括:对应于待测样本中核苷酸序列的一种或多种探针组、核酸外切酶、核酸聚合酶、核酸连接酶、杂交缓冲液和延伸连接反应缓冲液;
    探针组中包含第一探针和第二探针,
    所述第一探针和所述第二探针分别与同一条目标核酸片段的3’端和5’端特异性杂交(所述特异性杂交是指至少部分互补或完全互补);
    所述第一探针不能被5'端方向核酸外切酶降解;
    所述第二探针不能被3'端方向核酸外切酶降解;
    所述第一探针包括与目标核酸片段3'端特异性杂交的第一部分和与后续PCR扩增引物序列相对应的第二部分(所述相对应是指所述第二部分的反向互补序列与PCR扩增引物能够特异性杂交);
    所述第二探针包括与目标核酸片段5’端特异性杂交的第一部分和与后续PCR扩增引物序列相对应的第二部分(所述相对应是指所述第二部分与PCR扩增引物能够特异性杂交);
    当所述第一探针和所述第二探针与同一目标核酸片段特异性杂交时,所述第一探针的3'末端与所述第二探针的5'末端至少间隔1个核苷酸的距离。
PCT/CN2021/105073 2020-07-07 2021-07-07 一种靶基因区域快速富集方法 WO2022007863A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010647922.7 2020-07-07
CN202010647922.7A CN113913493B (zh) 2020-07-07 2020-07-07 一种靶基因区域快速富集方法

Publications (1)

Publication Number Publication Date
WO2022007863A1 true WO2022007863A1 (zh) 2022-01-13

Family

ID=79231364

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/105073 WO2022007863A1 (zh) 2020-07-07 2021-07-07 一种靶基因区域快速富集方法

Country Status (2)

Country Link
CN (1) CN113913493B (zh)
WO (1) WO2022007863A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115011594A (zh) * 2022-05-16 2022-09-06 纳昂达(南京)生物科技有限公司 一种用于检测hpv的液相杂交捕获探针、应用及其试剂盒

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060073511A1 (en) * 2004-10-05 2006-04-06 Affymetrix, Inc. Methods for amplifying and analyzing nucleic acids
WO2014101655A1 (zh) * 2012-12-27 2014-07-03 上海天昊生物科技有限公司 一种高通量核酸分析方法及其应用
CN105803055A (zh) * 2014-12-31 2016-07-27 天昊生物医药科技(苏州)有限公司 一种基于多重循环延伸连接的靶基因区域富集新方法
US20180363039A1 (en) * 2015-12-03 2018-12-20 Accuragen Holdings Limited Methods and compositions for forming ligation products

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10017761B2 (en) * 2013-01-28 2018-07-10 Yale University Methods for preparing cDNA from low quantities of cells

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060073511A1 (en) * 2004-10-05 2006-04-06 Affymetrix, Inc. Methods for amplifying and analyzing nucleic acids
WO2014101655A1 (zh) * 2012-12-27 2014-07-03 上海天昊生物科技有限公司 一种高通量核酸分析方法及其应用
CN105803055A (zh) * 2014-12-31 2016-07-27 天昊生物医药科技(苏州)有限公司 一种基于多重循环延伸连接的靶基因区域富集新方法
US20180363039A1 (en) * 2015-12-03 2018-12-20 Accuragen Holdings Limited Methods and compositions for forming ligation products

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115011594A (zh) * 2022-05-16 2022-09-06 纳昂达(南京)生物科技有限公司 一种用于检测hpv的液相杂交捕获探针、应用及其试剂盒
CN115011594B (zh) * 2022-05-16 2023-10-20 纳昂达(南京)生物科技有限公司 一种用于检测hpv的液相杂交捕获探针、应用及其试剂盒

Also Published As

Publication number Publication date
CN113913493A (zh) 2022-01-11
CN113913493B (zh) 2024-04-09

Similar Documents

Publication Publication Date Title
US10538759B2 (en) Compounds and method for representational selection of nucleic acids from complex mixtures using hybridization
CN106591441B (zh) 基于全基因捕获测序的α和/或β-地中海贫血突变的检测探针、方法、芯片及应用
CN113166797A (zh) 基于核酸酶的rna耗尽
JP6925424B2 (ja) 短いdna断片を連結することによる一分子シーケンスのスループットを増加する方法
CN109536579B (zh) 单链测序文库的构建方法及其应用
JP7232643B2 (ja) 腫瘍のディープシークエンシングプロファイリング
KR20220162873A (ko) 근접 보존 전위
CN110079592B (zh) 用于检测基因突变和已知、未知基因融合类型的高通量测序靶向捕获目标区域的探针和方法
KR102354422B1 (ko) 대량 평행 서열분석을 위한 dna 라이브러리의 생성 방법 및 이를 위한 키트
CN109576346B (zh) 高通量测序文库的构建方法及其应用
TW201321518A (zh) 微量核酸樣本的庫製備方法及其應用
WO2014101655A1 (zh) 一种高通量核酸分析方法及其应用
WO2018195217A1 (en) Compositions and methods for library construction and sequence analysis
US11261479B2 (en) Methods and compositions for enrichment of target nucleic acids
EP3480319A1 (en) Method for producing dna library and method for analyzing genomic dna using dna library
WO2022007863A1 (zh) 一种靶基因区域快速富集方法
CN112639127A (zh) 用于对基因改变进行检测和定量的方法
CN110938681A (zh) 等位基因核酸富集和检测方法
EP4215619A1 (en) Methods for sensitive and accurate parallel quantification of nucleic acids
EP3696279A1 (en) Methods for noninvasive prenatal testing of fetal abnormalities
CN118215744A (zh) 利用等温线性扩增探针的靶标富集和定量
Barry Overcoming the challenges of applying target enrichment for translational research

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21837447

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21837447

Country of ref document: EP

Kind code of ref document: A1