WO2015081890A1 - 测序文库及其制备和应用 - Google Patents

测序文库及其制备和应用 Download PDF

Info

Publication number
WO2015081890A1
WO2015081890A1 PCT/CN2014/093161 CN2014093161W WO2015081890A1 WO 2015081890 A1 WO2015081890 A1 WO 2015081890A1 CN 2014093161 W CN2014093161 W CN 2014093161W WO 2015081890 A1 WO2015081890 A1 WO 2015081890A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
sequence
dna
library
tag
Prior art date
Application number
PCT/CN2014/093161
Other languages
English (en)
French (fr)
Inventor
阮珏
王开乐
吴仲义
吕雪梅
Original Assignee
中国科学院北京基因组研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院北京基因组研究所 filed Critical 中国科学院北京基因组研究所
Priority to US15/101,605 priority Critical patent/US10718015B2/en
Publication of WO2015081890A1 publication Critical patent/WO2015081890A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/50Physical structure
    • C12N2310/51Physical structure in polymeric form, e.g. multimers, concatemers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2537/00Reactions characterised by the reaction format or use of a specific feature
    • C12Q2537/10Reactions characterised by the reaction format or use of a specific feature the purpose or use of
    • C12Q2537/155Cyclic reactions

Definitions

  • detecting the presence of a potential oncogenic mutation site in a tissue or organ of a normal individual detecting the heterogeneity of the DNA composition in the cancer cell population, and hiding the small clonal population, using the DNA mutation in each cell as a marker
  • the origin and division pattern of the cell accurate acquisition of genotypes in a highly heterozygous cancer population, calculation of the rate of mutations in cancer cells or somatic cell division, and search for small populations (eg cancer stem cells, etc.) in biomedical therapy The presence of pathogenic mutations, etc. How to use the existing second-generation sequencing technology to accurately determine the sequence of DNA has become a very critical issue.
  • a very large limitation of the method of exogenous labeling of DNA is that this method can only target small genomes or a small number of target genes, and comprehensive detection of the entire genome cannot be achieved. Because the labeling method needs to detect the same and complementary labels in order to correct the positive and negative strands of DNA, it requires a very high sequencing depth, so it is difficult to achieve for large genomes.
  • the peripheral blood is easy to obtain, the acquisition process will not cause invasive effects on the body, and the mutation information contained therein reflects the true mutation information in the individual to some extent. Therefore, detecting the mutation information contained in the free DNA in the peripheral blood. It has been widely used in prenatal diagnosis and monitoring of cancer. However, free DNA in the peripheral blood is degraded to 140-170 base pairs and there are only a few thousand copies in 1 milliliter of blood. How to use such a small amount of DNA to construct an effective DNA library, how to detect the very low frequency mutations in the free DNA of peripheral blood with limited sequencing coverage has become an urgent problem to be solved.
  • the present invention provides a sequencing library and its preparation and use.
  • a first aspect of the invention relates to a sequencing library, characterized in that the insert in the sequencing library comprises a co-directional alternating tandem of the sequence to be tested and the tag sequence.
  • the tag sequence may be ligated to the 5' end or the 3' end of the sequence to be tested.
  • the tag sequence is ligated to the 5' end of the sequence to be tested.
  • a sequencing library according to any one of the first aspects of the invention wherein the sum of the length of each of the sequences to be tested and the sequence of the tag is less than half the length of the sequencing of the sequencer.
  • the isotropic alternating tandem comprises at least two repeating units, each repeating unit comprising a sequence to be tested and a sequence of labels.
  • the sequencing library according to any one of the first aspects of the present invention, characterized in that the determined base and random base are arranged in a sequential arrangement (determining the base and random base before and after) or mosaic arrangement.
  • a sequencing library according to any one of the first aspects of the invention, the sequencing library being used for second generation sequencing or third generation sequencing.
  • a second aspect of the invention relates to a method of preparing a sequencing library, the method comprising:
  • the ligation sequence obtained in the step (1) is a double-stranded sequence
  • the ligation sequence is single-stranded, and then subjected to a cyclization treatment, and when the ligation sequence obtained in the step (1) is a single-stranded sequence, the cyclization is directly performed. deal with;
  • a method according to any one of the second aspects of the invention wherein the sum of the length of each of the sequences to be tested and the sequence of the tag is less than half the length of the sequencing of the sequencer.
  • the tag sequence may be ligated to the 5' end or the 3' end of the sequence to be tested.
  • the tag sequence is ligated to the 5' end of the sequence to be tested.
  • step (4) The method according to any one of the second aspects of the present invention, wherein the length of the homologous alternating concatemers described in step (4) is greater than the sequencing length of the sequencer.
  • the isotropic alternating tandem comprises at least two repeating units, each repeating unit comprising a sequence to be tested and a sequence of labels.
  • said tag sequence comprises 4-20 (e.g. 6-13) consecutive determined bases and 0-18 (e.g. 0-13) consecutive random bases base.
  • determining the base and the random base are arranged in a sequential arrangement (determining the base and random base before and after) or mosaic arrangement.
  • a third aspect of the invention relates to a sequencing method comprising the step of using a sequencing library of any of the first aspects of the invention.
  • a fourth aspect of the invention relates to a sequencing method, the method comprising the steps of preparing a sequencing library, the method of preparing a sequencing library comprising:
  • the ligation sequence obtained in the step (1) is a double-stranded sequence
  • the ligation sequence is single-stranded, and then cyclized, and the ligation sequence obtained in the step (1) is a single-stranded sequence.
  • the cyclization process is performed;
  • a sequencing method according to any one of the fourth aspects of the invention, wherein the sum of the length of each of the sequences to be tested and the sequence of the tag is less than half the length of the sequencing of the sequencer.
  • the tag sequence may be ligated to the 5' end or the 3' end of the sequence to be tested.
  • the tag sequence is ligated to the 5' end of the sequence to be tested.
  • the isotropic alternating tandem comprises at least two repeating units, each repeating unit comprising a sequence to be tested and a sequence of labels.
  • a sequencing method according to any one of the fourth aspects of the invention, wherein said tag sequence comprises 4-20 (e.g. 6-13) consecutive determined bases and 0-18 (e.g. 0-13) consecutive random Base.
  • determining the arrangement of the bases and the random bases is a sequential arrangement (determining that the bases and random bases are not before or after) or a mosaic arrangement.
  • a sequencing method according to any one of the fourth aspects of the invention, which is a second generation sequencing or a third generation sequencing method.
  • the invention also relates to the use of a sequencing library of any of the first aspects of the invention in sequencing.
  • sequencing is a second generation sequencing or a third generation sequencing.
  • the sequencing comprises, but is not limited to, genomic DNA sequencing, target fragment capture sequencing (eg, exon capture sequencing), sequencing of single-stranded DNA fragments, sequencing of fossil DNA, or body fluids (eg, Sequencing of free DNA in blood, urine, saliva).
  • target fragment capture sequencing eg, exon capture sequencing
  • sequencing of single-stranded DNA fragments sequencing of fossil DNA
  • body fluids eg, Sequencing of free DNA in blood, urine, saliva.
  • the sequencing length of the sequencer referred to in the present invention means that for double-end sequencing, the sequencing length of the sequencer is equal to the sum of the lengths of the double-end sequencing; for single-ended sequencing, the sequencing length of the sequencer is equal to the length of the single-ended sequence. .
  • the tag sequence contains random bases.
  • the number of the random bases may be, for example, 1 to 13, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
  • the number of bases determined in the tag sequence is 6-13, such as 6, 7, 8, 9, 10, 11, 12, 13.
  • a tag sequence when designing a tag sequence, it can be designed as two tag sequences, or can be designed as a tag sequence; when designed as two tag sequences, the two sequences can be annealed to form a double chain. In an embodiment of the invention, two label sequences are designed.
  • the tag sequence is set forth in SEQ ID NO: 1 and/or SEQ ID NO: 2.
  • the double-stranded test sequence may be ligated to the double-stranded tag sequence, or the single-stranded test sequence may be ligated to the single-stranded tag sequence to obtain a double-stranded ligation sequence or a single-stranded ligation sequence, respectively. If a double-stranded ligation sequence is obtained, it needs to be single-stranded and then cyclized; if a single-stranded ligation sequence is obtained, the cyclization can be directly performed.
  • the two tag sequences when designed as a double-stranded tag sequence, can be annealed to obtain a double-stranded tag sequence, wherein one of the 5' ends needs to be phosphorylated to be linked to the sequence to be tested, and the other The 5' end is not phosphorylated, so in the final constructed sequencing library, only the phosphorylated tag sequence is included; when designed as a single-stranded tag sequence At the 5' end of the tag sequence, phosphorylation is required to ligate to the sequence to be tested.
  • the fragmented sequence to be tested is end-blended and A is added to obtain a sequence containing the highlighted "A".
  • the 5' end of the one tag sequence protrudes by a "T" to more conveniently attach to the sequence to be tested with the "A"
  • the 3' end of the further tag sequence highlights one or more arbitrary bases to ensure directionality of the linkage.
  • the arrangement of the determined base and the random base in the tag sequence is sequentially arranged (determining the base and random base before and after the separation) or mosaic arrangement, that is, the random base is included in the determined base. base.
  • the tag sequence when designing the tag sequence, it is desirable to avoid the tag sequence itself from forming a palindrome sequence so that it cannot be properly connected to the sequence to be tested; a design method for avoiding the formation of a tag sequence of the palindrome structure is known in the art, for example Try to avoid reverse complementary sequences and so on.
  • the design should avoid the excessively high identity between the tag sequence and the reference sequence of the sequence to be tested.
  • the reference sequence is selected as the reference sequence of the known genomic DNA of the same species and the reference sequence of the known species, and if there is no known reference sequence of the same species, the reference sequence of the known genomic DNA of the similar species can be selected; Methods for avoiding excessive identity are well known in the art, for example, the identity between the tag sequence and the reference sequence can be less than 90%, such as less than 85%, less than 80%, less than 75%, lower than 70%, less than 65%, less than 60%, less than 55%, less than 50%.
  • the sequencing library refers to a collection of DNA fragments for sequencing containing the sequence to be tested and other sequences (eg, sequencing linkers).
  • the insert of the sequencing library refers to a fragment comprising the sequence to be tested and the tag sequence after removal of other sequences such as a sequencing linker.
  • the sequence to be tested refers to a DNA fragment to be tested after treatment, and the treatment includes, for example, interruption, end-filling, addition of A, and the like.
  • the sequence to be tested refers to the genome to be tested The sequence used for sequencing after the DNA was interrupted, end-filled, and A was added.
  • the homologous alternating tandem formed by the sequence to be tested and the tag sequence in the sequencing library insert comprises more than two repeating units (one of the sequences to be tested plus one of the tag sequences is a repeating unit).
  • the sequence to be tested is A
  • the sequence of the tag is B
  • one repeating unit is AB or BA
  • the alternating alternating tandem body includes at least ABAB or BABA; and because of the step of randomly interrupting when constructing the sequencing library,
  • the repeat unit of the same direction alternating series may not be a complete repeat unit, but after splicing includes at least two repeat units, such as 1/2A-BABAB, or ABABA-1/2B, or 1/2A- BAB-2/3A.
  • the strand displacement reaction-based DNA amplification (Roger S. Lasken Genomic DNA amplification by the multiple displacement amplification (MDA) method. Biochemical Society Transactions, 2009, 37, 450-453) refers to certain DNA.
  • the polymerase (for example, including Phi 29 DNA polymerase, Bst DNA polymerase (large fragment)) can continue to extend the reaction while simultaneously detaching the downstream duplex to produce a free single in the process of extending the new strand.
  • the strands of DNA are isothermally amplified.
  • DNA amplification based on strand displacement reactions does not require thermal denaturation.
  • the DNA amplification based on the strand displacement reaction includes, for example, strand displacement amplification, rolling circle amplification, multiple strand displacement amplification, and loop-mediated amplification.
  • MDA multiple replacement amplification technology
  • rolling circle amplification is carried out by using a circular DNA as a template, and using a specific primer or a random primer to realize a large expansion of the circular DNA template by the action of a strand displacement enzyme. increase.
  • the random primer binds to the single-stranded circular DNA
  • the phi29 DNA polymerase can perform the second strand synthesis along the loop.
  • the phi29 DNA polymerase will chain the primer by its strand displacement activity. Open, the new composition continues.
  • the newly synthesized DNA single strand will also be combined with the new six random primers for a new round of synthesis. Looping back and forth Efficient amplification of circular DNA molecules.
  • the second generation sequencing method refers to Sequencing by Synthesis, a method for determining the sequence of DNA by capturing a newly synthesized end label, including but not limited to Roche/454FLX, Illumina/Solexa Genome Analyzer and Applied Biosystems SOLID system.
  • the third-generation sequencing method refers to a single-molecule sequencing technology, that is, when DNA sequencing is performed, individual sequencing of each DNA molecule can be realized without PCR amplification.
  • a single-molecule sequencing technology that is, when DNA sequencing is performed, individual sequencing of each DNA molecule can be realized without PCR amplification.
  • These include, but are not limited to, single molecule fluorescent sequencing, and representative techniques are the Helicos SMS technology and the Pacific Bioscience SMART technology, and nanopore sequencing.
  • a tag sequence for preparing a homologous alternating concatemer is referred to as a "tag sequence”
  • a tag sequence for sequencing is referred to as a “sequencing linker”.
  • DNA amplification errors and sequencing errors can be effectively removed to accurately detect mutations present on DNA molecules.
  • Sequencing library construction was performed with the sequence of the test sequence and the tag sequence in the same direction alternating serial sequence (the library insert size was at least two repeat units).
  • the library insert size was at least two repeat units.
  • a single-stranded DNA and its complementary strand cannot be determined to be newly replicated after amplification. Which chain the DNA comes from, which affects the type of identifying base errors. For example, if the C mutation is T and the G mutation is A, the two types of errors are complementary on the double-stranded DNA. If the sequencing sequence is not labeled, it cannot be judged whether the C mutation is T or the G mutation is A. Since the tag sequence is a non-palindromic structure and is ligated to the 5' end of the single-stranded DNA, after copying and amplifying, the original single-stranded DNA can still be determined according to the direction of the tag sequence, so that the type of error can be identified. This in turn helps identify low frequency mutations.
  • a small amount of DNA copy number is significantly higher than the mean when a small amount of DNA is amplified to enough sequenced DNA.
  • a plurality of sequencing sequences obtained by replicating a single single-stranded DNA rolling circle together reflect the information of the same original DNA, and there is sequencing redundancy.
  • these sequencing sequences may be counted multiple times since there is no information to determine whether these sequencing sequences are from the same original DNA single-stranded loop. This results in an erroneous amplification effect: after a single-stranded rolling circle with DNA damage replicates, it is present in multiple sequencing sequences and is counted as a credible multiple independent DNA variation.
  • the tag sequence may comprise two portions: a linker region of known bases and a free region of random bases.
  • the linker region is 6 to 13 contiguous bases and the free region is 0 to 13 contiguous bases.
  • the base composition of the free region is random and is designed to be a corresponding length of 'N' (random base) when the nucleic acid sequence is synthesized. The longer the length of the free zone, the higher the resolution of the distinction. If the free region length is designed to be zero, sequencing sequences that distinguish between different original sources depend only on 1) the size of the target DNA fragment inferred from the sequencing sequence is different, and 2) the sequence composition of the inferred target DNA fragment is different.
  • the principle of the present invention is clarified below using a sequencing error rate of 1/100 (the error rate of second generation sequencing is 1/100 to 1/1000).
  • the probability of a type error occurring simultaneously at the same site of two repeating units on a consensus sequence is: 1/3*(1/100) 2 , which is the error rate of 3*10 -5 (more repeat unit uniform base)
  • the base has a lower probability of error).
  • the probability that two different consistent sequences appear the same error is: (1/3*(1/100) 2 ) 2 ie 9*10 -10 , therefore, this method is extremely effective in removing the library construction process and sequencing process.
  • the resulting error has reached the goal of precise sequencing.
  • the present invention provides a sequence in which the sequence to be tested is in the same direction as the tag sequence, and different copies copied from the original DNA are connected in series.
  • the probe captures a molecule containing at least two nucleic acid sequences of the same repeating unit, which enables accurate determination of the DNA sequence.
  • sequence of the test sequence constructed by the method and the sequence of the tag sequence can be used to construct a plurality of second-generation short-sequence sequencing libraries, which are suitable for various sequencing platforms.
  • Fig. 1 The size and distribution of the ring after single chain cyclization in Example 5 of the present invention.
  • One of the innovations of the present invention is that by connecting a short segment DNA molecule to a tag sequence (the total length of both is less than half the sequencing length of the sequencer), single-stranded cyclization, rolling circle replication, the sequence to be tested is identical to the tag sequence.
  • a sequencing library was constructed and sequenced. Specifically, the following two solutions can be implemented.
  • the DNA is randomly broken into fragments that are less than half the length of the second-generation sequencer. (The length after interruption plus the length of the tag sequence should be less than half of the read length), and then the tag sequence is ligated, wherein the 5' end of the first strand (positive strand) of the tag sequence is phosphorylated, and the 3' end One T base is highlighted, the 5' end of the second strand (negative strand) is not phosphorylated, and the 3' end highlights a G base.
  • the tag sequence at the incision is removed, thus forming a DNA sequence containing a single-stranded tag sequence, which is then denatured by high temperature and immediately cooled to change the DNA into a single strand.
  • the DNA containing the tag sequence after single-stranding is cyclized with a single-chain cyclase.
  • the cyclized DNA is subjected to displacement amplification using a rolling-loop strand based on random primers, and a large amount of the cyclized DNA molecule is amplified.
  • the amplified product formed is an isotropic alternating tandem consisting of the DNA molecule of interest and the tag sequence.
  • the co-directional alternating tandem nucleic acid sequences can be used to construct a standard second-generation sequencing library (the size of the insert during library construction should be greater than the sequencing length of the sequencer to ensure that the resulting multiple repeat units are independent of each other).
  • the DNA is randomly broken into less than half of the sequence read by the second-generation sequencer.
  • the length after the interruption plus the length of the subsequent tag sequence to be connected should also be less than half of the read length), and then the specific tag sequence is connected.
  • the DNA containing the tag sequence after single-stranding is cyclized with a single-chain cyclase.
  • the cyclized DNA is subjected to rolling circle amplification by a DNA polymerase having a strand displacement function (such as Phi29 DNA polymerase), and the primer uses a second strand (ie, a minus strand) in the tag sequence.
  • the first strand of the tag sequence (ie, the positive strand) is used as a primer, and the single-stranded linear DNA after the rolling circle is synthesized into a double strand.
  • the double-stranded DNA is composed of a repeating unit composed of a tag sequence and a DNA of interest.
  • This double stranded DNA can be used to construct a standard second generation sequencing library after purification.
  • the size of the insert during library construction should be greater than the sequencing length of the sequencer to ensure that the multiple repeat units obtained are independent of each other.
  • Example 1 Construction of a whole genome DNA test sequence and a tag sequence in the same direction alternating tandem library according to the above scheme 1 (Illumina platform)
  • Interrupted tube Covaris Microtube 6x16mm, catalog#:520045
  • Electrophoresis power supply Beijing Liuyi Instrument Factory, DYY-7C type
  • Electrophoresis tank Beijing Liuyi Instrument Factory, DYCP-31DN type electrophoresis tank
  • the product was purified using a MinElute Reaction Cleanup Kit and eluted with 15 ⁇ l of double distilled water.
  • UO-A consists of 100 pmol of UO-adaptorl (annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) and 100 pmol of UO-adaptor 2 (annealing buffer: 10 mM Tris-HCl) (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) was formed by equal volume mixing annealing (94 ° C for 5 min, gradually decreasing to 0.1 ° C per second to 25 ° C).
  • the tag sequence includes, but is not limited to, the sequence forms of UO-adaptor1 and UO-adaptor2 in the examples. The same below.
  • the fragmented DNA was evaporated to dryness to 4.2 ⁇ l at 37 °C.
  • Cyclization was carried out at 65 ° C for 2 h at 80 ° C for 10 min.
  • Exonuclease I (E.coli): 0.25 ⁇ l
  • the cyclized product was amplified by rolling circle using a genome-wide amplification (WGA) kit based on the MDA principle.
  • WGA genome-wide amplification
  • Beckman Coulter, Inc Agencourt AMPure XP, Item No. A63880
  • the product was purified using Agencourt AMPure XP (Beckman Coulter, Inc) magnetic beads. The summary is as follows: Add 1.8 times of magnetic beads to the amplified product, leave it at room temperature for 5 min, adsorb the magnetic frame for 5 min, remove the supernatant, wash it twice with 70% alcohol, and dry it, 50 ⁇ l of buffer AE (10 mM Tris-Cl, 0.5 mM). EDTA; pH 9.0) eluted. See the kit instructions for details.
  • the purified product is the alternating sequence of the sequence to be tested and the sequence of the tag.
  • the product was purified using a MinElute Reaction Cleanup Kit eluting with 43 [mu]l of ddH2O.
  • the product was purified using a MinElute Reaction Cleanup Kit eluting with 35.5 ⁇ l of ddH 2 O.
  • Annealing of the linker sequence Take an equal volume of 100 pmol of Multiplexing Adapter 1.0 (annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) and Multiplexing Adapter 2.0 (annealing buffer: 10 mM Tris-HCl (pH) 7.5), 1 mM EDTA, 0.1 mM NaCl), 94 ° C for 5 min, then gradually ramped to 25 ° C at 0.1 ° C per second. Upon completion of the annealing, a linker sequence having a concentration of 50 pmol was formed.
  • the eluted DNA is the constructed library, which can be used for sequencing on the second generation sequencing platform.
  • the primer sequences are as follows:
  • Example 2 constructs a co-directional tandem library of human exon sequences and tag sequences in the same manner as in the above scheme 1 (Illumina sequencing platform)
  • Step 2) DNA that has been flattened: 65 ⁇ l
  • the product was purified using a MinElute Reaction Cleanup Kit and eluted with 15 ⁇ l of ddH 2 O.
  • UO-A consists of 100 pmol of UO-adaptor1 (dissolved in annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) and 100 pmol of UO-adaptor 2 (annealing buffer: 10 mM Tris-HCl) (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) was formed by equal volume mixing annealing (94 ° C for 5 min, gradually decreasing to 0.1 ° C per second to 25 ° C).
  • the DNA fragmented in the step 3) was evaporated to dryness at 42 ° C to 4.2 ⁇ l.
  • Exonuclease I (E.coli): 0.25 ⁇ l
  • Exonuclease III (E.coli): 0.25 ⁇ l
  • the product was purified using Agencourt AMPure XP (Beckman Coulter, Inc) magnetic beads. The summary is as follows: Add 1.8 times of magnetic beads to the amplified product, leave it at room temperature for 5 min, adsorb the magnetic frame for 5 min, remove the supernatant, wash it twice with 70% alcohol, and dry it, 50 ⁇ l of buffer AE (10 mM Tris-Cl, 0.5 mM). EDTA; pH 9.0) eluted. See the kit instructions for details.
  • the purified product is the alternating sequence of the sequence to be tested and the sequence of the tag.
  • exon capture libraries Commercial kits for constructing exon capture libraries can be utilized, such as: Agilent: SureSelect Human All Exon Kits and the like.
  • Fragmented DNA of step (1) 85 ⁇ l
  • the product was purified using MinElute Reaction Cleanup Kit, 43 ⁇ l ddH 2 O elution.
  • Step (2) DNA that has been flattened: 42 ⁇ l
  • the product was purified using a MinElute Reaction Cleanup Kit eluting with 35.5 ⁇ l of ddH 2 O.
  • Annealing of the linker sequence Take an equal volume of 100 pmol of Multiplexing Adapter 1.0 (annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) and Multiplexing Adapter 2.0 (annealing buffer: 10 mM Tris-HCl (pH) 7.5), 1 mM EDTA, 0.1 mM NaCl), 94 ° C for 5 min, then gradually ramped to 25 ° C at 0.1 ° C per second. Upon completion of the annealing, a linker sequence 1 having a concentration of 50 pmol was formed.
  • dNTPs 100 mM; 25 mM each dNTP: 0.5 ⁇ l
  • the PCR products (MinElute Reaction Cleanup Kit) in 4 reaction tubes were concentrated, and eluted with 46 ⁇ l of ddH2O.
  • the primer sequences are as follows:
  • Magnetic beads (Invitrogen TM : M-280 Streptavidin, Catalog#: 11205D) Grab the hybridized fragment (50 ⁇ l magnetic beads, wash three times with 200 ⁇ l SureSelect Binding Buffer, resuspend the magnetic beads with 200 ⁇ l SureSelect Binding Buffer, add the hybridized product, leave it at room temperature for 30 min, magnetic beads adsorption, SureSelect Wash 1 wash once, SureSelect Wash 2 wash three times, 36.5 ⁇ l ddH 2 O resuspend magnetic beads), see Agilent: SureSelect Human All Exon Kits Operating Manual.
  • Beckman Coulter, Inc Agencourt AMPure XP, Item No. A63880
  • dNTPs 100 mM; 25 mM each dNTP: 0.5 ⁇ l
  • the primer sequences are as follows:
  • the eluted DNA is the constructed human exon sequence to be tested in the same direction as the tag sequence, and the library can be used for sequencing on the second generation sequencing platform.
  • Example 3 Constructing a library of peripheral blood free DNA test sequences and tag sequences in the same direction according to protocol 1 (Illumina sequencing platform)
  • the product was purified using a MinElute Reaction Cleanup Kit and eluted with 15 ⁇ l of ddH2O.
  • UO-A consists of 100 pmol of UO-adaptor1 (dissolved in annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) and 100 pmol of UO-adaptor 2 (annealing buffer: 10 mM Tris-HCl) (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) is an equal volume mixed annealing (94 ° C for 5 min, 0.1 ° C per second) It is gradually cooled down to 25 ° C).
  • the extracted peripheral blood free DNA was evaporated to 4.2 ⁇ l at 37 °C.
  • Exonuclease I (E.coli): 0.25 ⁇ l
  • Exonuclease III (E.coli): 0.25 ⁇ l
  • the product was purified using Agencourt AMPure XP (Beckman Coulter, Inc) magnetic beads. The summary is as follows: Add 1.8 times of magnetic beads to the amplified product, leave it at room temperature for 5 min, adsorb the magnetic frame for 5 min, remove the supernatant, wash it twice with 70% alcohol, and dry it, 50 ⁇ l of buffer AE (10 mM Tris-Cl, 0.5 mM). EDTA; pH 9.0) eluted. See the kit instructions for details.
  • the purified product is the alternating sequence of the sequence to be tested and the sequence of the tag.
  • kits for constructing standard Illumina libraries such as: TruSeq DNA Sample Preparation Kits, Nextera DNA Sample Preparation Kits, and the like, can be utilized.
  • the product was purified using a MinElute Reaction Cleanup Kit eluting with 43 [mu]l of ddH2O.
  • the product was purified using a MinElute Reaction Cleanup Kit and eluted with 35.5 ⁇ l of ddH2O.
  • Label sequence annealing Take an equal volume of 100 pmol of Multiplexing Adapter 1.0 (annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) and Multiplexing Adapter 2.0 (annealing buffer: 10 mM Tris-HCl (pH) 7.5), 1 mM EDTA, 0.1 mM NaCl), 94 ° C for 5 min, then gradually ramped to 25 ° C at 0.1 ° C per second. Upon completion of the annealing, a linker sequence 1 having a concentration of 50 pmol was formed.
  • the eluted DNA is the constructed library, which can be used for sequencing on the second generation sequencing platform.
  • the primer sequences are as follows:
  • the product was purified using a MinElute Reaction Cleanup Kit and eluted with 15 ⁇ l of ddH 2 O.
  • UO-A consists of 100 pmol of UO-adaptor1 (dissolved in annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) and 100 pmol of UO-adaptor 2 (annealing buffer: 10 mM Tris-HCl) (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) was formed by equal volume mixing annealing (94 ° C for 5 min, gradually decreasing to 0.1 ° C per second to 25 ° C).
  • the fragmented DNA was evaporated to -4.2 ⁇ l at 37 °C.
  • Exonuclease I (E.coli): 0.25 ⁇ l
  • Exonuclease III (E.coli): 0.25 ⁇ l
  • Beckman Coulter, Inc Agencourt AMPure XP, Item No. A63880
  • the product was purified by Agencourt AMPure XP magnetic beads.
  • the summary is as follows: 1.8 times of magnetic beads were added to the amplified product, left at room temperature for 5 min, magnetic frame was adsorbed for 5 min, supernatant was removed, 70% alcohol was washed twice, and dried, 20 ⁇ l ddH 2 O elution. See the kit instructions for details.
  • the purified product is the isotropic repeat tandem of the DNA fragment.
  • the amount of DNA obtained after 8 h of rolling circle can vary from tens of nanograms to several hundred nanograms, and the DNA yield after rolling circle can be increased by increasing the time of the rolling circle.
  • select the appropriate commercial kit to construct a standard Illumina library if you get a few Ten ng of DNA can be used in Nextera DNA Sample Preparation Kits or other kits based on a small amount of DNA. If the amount of DNA obtained is several hundred nanograms, the TruSeq DNA Sample Preparation Kits can be used for multiple starting amounts. DNA kit.
  • the product was purified using a MinElute Reaction Cleanup Kit and eluted with 24 ⁇ l of ddH2O.
  • the eluted DNA is the constructed library, which can be used for sequencing on the second generation sequencing platform.
  • Epi_ME annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl
  • Epi_Adaptor 2 annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl
  • 5 ⁇ LMW buffer 50 mM Tris-OAc, pH 8.0, 25 mM Mg(OAc) 2
  • the size of the ring formed is: 30-162 bp, the average size is 72.5333 bp, and the standard deviation is 14.06478.
  • the median is: 71 bp.
  • the specific distribution is shown in Figure 1.
  • the single base error rate (10 -5 ) of the method is much lower than the error rate of the second generation sequencing (1%), and is far lower than some existing improved methods.
  • the method completely eliminates the error rate problem of the second generation sequencing, and realizes ultra-precise sequencing of the DNA molecule by means of the second generation sequencing technology platform.
  • Another advantage of this method is that the sequencing accuracy is independent of the sequencing depth, which solves the problem that the labeling method must be able to accurately determine the DNA sequence under the extremely high sequencing coverage multiplier, thus enabling the realization of large genomes (such as human genomes). And so on) accurate sequencing.
  • the DNA of Ecoli W3110 was taken and ultrasonically disrupted into a 300 bp DNA fragment.
  • the 80-150 bp fragment was recovered, and the tag sequence was ligated, single-stranded, and rolling circle amplification.
  • a conventional second-generation sequencing library was constructed for the DNA after rolling circle (see Example 1 for details).
  • the data processing analysis is as follows:
  • the size of the ring formed is: 30-260 bp, the average size is 122.909 bp, and the standard deviation is 17.74147 bp.
  • the median is: 122bp.
  • the sequencing error rate of each base is shown in Table 2.
  • the size of the formed loop is (after removing the label sequence): 1-133 bp, the average size is 88.56275 bp, and the standard deviation is 29.17562 bp.
  • the median is: 98 bp.
  • the sequencing error frequency of each base is shown in Table 3.
  • the method of the present invention is capable of ultra-accurate determination of the molecular composition of DNA in a cell, and can present a DNA composition in a normal or pathologically occurring cell population such as a cancer tissue.
  • a cancer tissue In the detection of cancer, it can be used to detect whether a certain tissue or organ of a normal individual has developed a potential carcinogenic mutation to achieve the purpose of early detection of cancer and prevention of cancer.
  • this method can detect the distribution of DNA mutations in cancer populations; it can be used to discover potential small clonal populations in cancer tissues to truly understand the heterogeneous structure of tumors; it can help to explain the occurrence of mutations in cancer The role of development; can be used to find cancer stem cells.
  • cancer treatment it can be used to find a tumor stem cell population, and then design specific drug targets for cancer stem cells to achieve effective treatment of cancer.
  • this method can be used Detection of mutations in DNA in normal cells in an individual, thereby tracing the growth pattern of normal tissues; it is also possible to determine the number of DNA mutations in a tissue of individuals of different ages, thereby estimating the rate of DNA mutations; Whether there are mutations related to various diseases in normal individuals, and the purpose of preventing diseases is achieved.
  • the method can effectively construct the free DNA in the peripheral blood, and can effectively detect the low-frequency mutation sites existing in the peripheral blood.
  • This non-invasive detection means the occurrence and development of cancer. Effective detection and evaluation of harmful mutations in the fetus during prenatal diagnosis.
  • the sequencing of ancient human DNA is the main means to study human evolution, but there are many problems in the determination of ancient human DNA.
  • the biggest problems are the extraction of ancient human DNA with low content, serious degradation and serious microbial contamination.
  • the method can construct a library by using a very small amount of DNA (single and double strands), and the constructed library can perform exon capture (removing microbial genome contamination), and can effectively address these several problems in the construction process of the ancient DNA library.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Plant Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明提供一种测序文库及其制备方法,所述文库中的***片段是待测序列与标签序列同向交替串联体,还提供一种测序方法。所述测序文库及测序方法,在任何测序深度下,都能有效去除DNA扩增错误和测序错误,从而精确检测DNA分子上存在的突变,适用于微量DNA短片段甚至单链DNA测序文库的构建。

Description

测序文库及其制备和应用 技术领域
本发明涉及一种测序文库及其制备和应用。
背景技术
第二代测序技术的发展,推动了生物学以及生物医学研究的革命性发展。但是由于高通量测序本身的特点,在测得的序列中存在约1%的碱基错误。虽然在一些应用中1%的错误率是可以忍受的,但是在很多情况下,这1%的错误却掩盖了很多真实的信息,而成为很多研究的障碍。比如:检测一个正常个体的某一组织或器官是否存在潜在的致癌突变位点、检测癌症细胞群体中DNA组成的异质性以及隐藏的小克隆群体、利用每个细胞中的DNA突变作为标记追溯该细胞的起源及***模式、准确获取一个高度杂合的癌症群体中的基因型、计算癌症细胞或体细胞***时突变产生的速率、寻找生物医学治疗中一些小群体(如癌症干细胞等)中存在的致病突变等。如何利用现有的二代测序技术准确的测定DNA的序列,就成了一个非常关键的问题。
截止目前,有一些方法尝试从生物、化学等方面对二代测序的错误进行改进,如无扩增的建库方法,有效的避免了文库准备过程中因聚合酶链式反应扩增产生的错误。通过对样品DNA和参考DNA分别加相应的标签,从而有效过滤链特异性的错误。有一些方法则从数据分析角度降低二代测序的错误率。另外,还有一些方法通过DNA随机打断时产生的断点信息或者在聚合酶链式反应扩增之前对DNA模板加入相应的标签来矫正由于聚合酶链式反应扩增产生的错误。通过加入标签,就可以确定哪些DNA分子来自于同一条分子,从而达到矫正的作用。
这些方法从一定程度上提高了二代测序的准确性,但是由于各自 方法的缺陷性,比如在金迪及其同事的文章(Kinde I,Wu J,Papadopoulos N,Kinzler KW,Vogelstein B(2011)Detection and quantification of rare mutations with massively parallel sequencing.Proc Natl Acad Sci USA 108:9530-9535)中,加入标签的方法是通过将标签加在特定引物的末端,通过聚合酶链式反应的方式将标签加入到DNA分子中,当加入标签时的聚合酶链式反应发生错误时,这种错误在后面的实验中就很难去除,从而限制了其对极低频位点的检测。对DNA进行外源加标签方法的一个非常大的局限是这种方法只能针对于小的基因组或者少数的目的基因,无法实现对整个基因组的全面检测。因为标签法需要测到相同和互补的标签才能起到DNA正负链相互校正的目的,这样就需要极高的测序深度,因此对于大的基因组是很难实现的。
同时,由于外周血容易获取,获取过程不会对身体造成侵害性影响,且其含有的突变信息一定程度上反映了个体内真实的突变信息,因此,检测外周血中游离DNA所含有的突变信息,已被广泛应用于产前诊断和对癌症的监测中。但外周血中游离的DNA被降解为140-170碱基对,且在1毫升的血液中仅有几千个拷贝。如何利用如此少的DNA构建有效的DNA文库,如何利用有限的测序覆盖度检测到外周血游离DNA中存在的极低频率的突变,成为一个亟待解决的问题。
古化石DNA绝大多数被微生物所污染,DNA量少且降解严重,如何利用极少量的降解严重的古DNA有效的进行二代高通量测序文库的构建,并有效的富集古人类DNA也成为研究古人类DNA的一个难题。
综上所述,构建一种能够快速、有效、精确地测序的DNA测序文库是十分必要的。
发明内容
为了解决现有技术中DNA测序准确率不能满足实际需求的问 题,本发明提供了一种测序文库及其制备和应用。
本发明第一方面涉及一种测序文库,其特征在于,所述测序文库中的***片段包含待测序列与标签序列的同向交替串联体。
在本发明中,所述标签序列可以连接在待测序列的5’末端或者3’末端。
在本发明的实施方案中,所述标签序列连接在待测序列的5’末端。
根据本发明第一方面任一项的测序文库,其特征在于,所述每个待测序列与标签序列的长度之和小于测序仪测序长度的一半。
根据本发明第一方面任一项的测序文库,其特征在于,所述同向交替串联体的长度大于测序仪的测序长度。
在本发明的实施方案中,所述同向交替串联体至少包括两个重复单元,每个重复单元包括一个待测序列与一个标签序列。
根据本发明第一方面任一项的测序文库,其特征在于,所述标签序列包括4-20个(例如6-13个)连续的确定碱基和0-18个(例如0-13个)连续的随机碱基。
根据本发明第一方面任一项的测序文库,其特征在于,所述确定碱基和随机碱基的排列方式为顺序排列(确定碱基和随机碱基不分前后)或镶嵌排列。
根据本发明第一方面任一项的测序文库,所述测序文库用于第二代测序或第三代测序。
本发明第二方面涉及一种制备测序文库的方法,所述方法包括:
(1)将待测序列与标签序列连接,得到双链或单链连接序列;
(2)当步骤(1)得到的连接序列为双链序列时,将连接序列单链化,然后进行环化处理,当步骤(1)得到的连接序列为单链序列时,直接进行环化处理;
(3)将步骤(2)得到的环化的连接序列进行基于链置换反应的DNA扩增,得到待测序列与标签序列的同向交替串联体;
(4)将待测序列与标签序列的同向交替串联体片段化,并在片段的两端连接测序接头,得到测序文库。
根据本发明第二方面任一项的方法,其中所述每个待测序列与标签序列的长度之和小于测序仪测序长度的一半。
在本发明中,所述标签序列可以连接在待测序列的5’末端或者3’末端。
在本发明的实施方案中,所述标签序列连接在待测序列的5’末端。
根据本发明第二方面任一项的方法,其中步骤(4)所述的同向交替串联体片段化后的长度大于测序仪的测序长度。
在本发明的实施方案中,所述同向交替串联体至少包括两个重复单元,每个重复单元包括一个待测序列与一个标签序列。
根据本发明第二方面任一项的方法,其中所述标签序列包括4-20个(例如6-13个)连续的确定碱基和0-18个(例如0-13个)连续的随机碱基。
根据本发明第二方面任一项的方法,其中所述确定碱基和随机碱基的排列方式为顺序排列(确定碱基和随机碱基不分前后)或镶嵌排列。
根据本发明第二方面任一项的方法,其中所述测序文库用于第二代测序或第三代测序。
本发明第三方面涉及一种测序方法,该方法包括使用本发明第一方面任一项的测序文库的步骤。
本发明第四方面涉及一种测序方法,该方法包括制备测序文库的步骤,所述制备测序文库的方法包括:
(1)将待测序列与标签序列连接,得到双链或单链连接序列;
(2)当步骤(1)得到的连接序列为双链序列时,将连接序列单链化,然后进行环化处理,当步骤(1)得到的连接序列为单链序列 时,直接进行环化处理;
(3)将步骤(2)得到的环化的连接序列进行基于链置换反应的DNA扩增,得到待测序列与标签序列的同向交替串联体,即制备得到测序文库;
(4)将待测序列与标签序列的同向交替串联体片段化,并在片段的两端连接测序接头,得到测序文库。
根据本发明第四方面任一项的测序方法,其中所述每个待测序列与标签序列的长度之和小于测序仪测序长度的一半。
在本发明中,所述标签序列可以连接在待测序列的5’末端或者3’末端。
在本发明的实施方案中,所述标签序列连接在待测序列的5’末端。
根据本发明第四方面任一项的测序方法,其中步骤(4)所述的同向交替串联体片段化后的长度大于测序仪的测序长度。
在本发明的实施方案中,所述同向交替串联体至少包括两个重复单元,每个重复单元包括一个待测序列与一个标签序列。
根据本发明第四方面任一项的测序方法,其中所述标签序列包括4-20个(例如6-13个)连续的确定碱基和0-18个(例如0-13个)连续的随机碱基。
根据本发明第四方面任一项的测序方法,其中所述确定碱基和随机碱基的排列方式为顺序排列(确定碱基和随机碱基不分前后)或镶嵌排列。
根据本发明第四方面任一项的测序方法,该测序方法为第二代测序或第三代测序方法。
本发明还涉及本发明第一方面任一项的测序文库在测序中的应用。
根据本发明任一项的应用,其中所述的测序为第二代测序或第三代测序。
根据本发明任一项的应用,其中所述的测序包括但不限于基因组DNA测序、目标片段捕获测序(例如外显子捕获测序)、单链DNA片段的测序、化石DNA的测序或体液(例如血液、尿液、唾液)中游离DNA的测序。
本发明所称测序仪的测序长度是指:对双端测序而言,测序仪的测序长度等于双端测序长度之和;对单端测序而言,测序仪的测序长度等于单端序列的长度。
在本发明的一个实施方案中,所述标签序列含有随机碱基。在本发明的实施方案中,所述随机碱基的个数例如可以为1-13个,例如为1、2、3、4、5、6、7、8、9、10、11、12、13个。
在本发明的一个实施方案中,所述标签序列中确定碱基的个数为6-13个,例如为6、7、8、9、10、11、12、13个。
在本发明中,当设计标签序列时,可以设计为两条标签序列,也可以设计为一条标签序列;当设计为两条标签序列时,这两条序列可以退火形成双链。在本发明的实施方案中,设计为两条标签序列。
在本发明的一个实施方案中,所述标签序列如SEQ ID NO:1和/或SEQ ID NO:2所示。
在本发明的一个实施方案中,所述标签序列如SEQ ID NO:14和/或SEQ ID NO:15所示。
在本发明中,可以将双链待测序列与双链标签序列连接,也可以将单链待测序列与单链标签序列连接,以分别得到双链的连接序列或单链的连接序列。如果得到的是双链的连接序列,需要将其单链化处理后再进行环化;如果得到的是单链的连接序列,可以直接进行环化处理。
在本发明的实施方案中,当设计为双链标签序列时,两条标签序列可以进行退火,得到双链标签序列,其中一条的5’端需要磷酸化,以与待测序列连接,另一条的5’端不磷酸化,因此在最终构建得到的测序文库中,仅含有磷酸化的那条标签序列;当设计为单链标签序列 时,该标签序列的5’端需要磷酸化,以与待测序列连接。
在本发明的实施方案中,为了方便连接,将片段化后的待测序列进行末端补平和加A,以得到含有突出的“A”的序列。
在本发明的实施方案中,所述一条标签序列的5’端突出一个“T”,以更方便与带有突出的“A”的待测序列连接。
在本发明的实施方案中,所述另一条标签序列的3’端突出一个或几个任意碱基,以保证连接的方向性。在本发明中,所述标签序列中的确定碱基与随机碱基的排列方式为顺序排列(确定碱基和随机碱基不分前后)或镶嵌排列,即在确定碱基中夹有随机碱基。
在本发明中,在设计标签序列时要尽量避免标签序列自身形成回文序列,以致于无法和待测序列正确连接;避免形成回文结构的标签序列的设计方法为本领域所公知,例如在设计时尽量避免出现反向互补序列等。
在本发明中,为了不影响测序结果的准确性或者避免标签序列与待测序列之间直接发生互补结合,在设计时应尽量避免标签序列与待测序列的参考序列之间有过高的同一性;参考序列的选择首选和待测序列同属于相同物种的已知的基因组DNA的参考序列,如果没有已知的相同物种的参考序列,可以选择相近物种的已知的基因组DNA的参考序列;避免过高的同一性的方法为本领域所公知,例如可以使标签序列与参考序列之间的同一性低于90%,例如低于85%,低于80%,低于75%,低于70%,低于65%,低于60%,低于55%,低于50%。
在本发明中,所述测序文库是指含有待测序列和其它序列(例如测序接头)的用于测序的DNA片段的集合。
在本发明中,所述测序文库的***片段是指除去测序接头等其它序列后的包含待测序列和标签序列的片段。
在本发明中,所述待测序列是指经过处理后的待测DNA片段,所述处理例如包括打断、末端补平、加A等。
在本发明的实施方案中,所述待测序列是指将待测的基因组 DNA经过打断、末端补平和加A后得到的用于测序的序列。
在本发明中,所述测序文库***片段中待测序列与标签序列形成的同向交替串联体包括两个以上的重复单元(其中一个待测序列加上一个标签序列为一个重复单元)。例如如果待测序列为A,标签序列为B,则一个重复单元为A-B或者B-A,同向交替串联体至少包括A-B-A-B或B-A-B-A;并且由于在构建测序文库时要经过随机打断的步骤,因此该同向交替串联体的重复单元可能并不是完整的重复单元,但经过拼接后至少包括两个以上的重复单元,例如可能是1/2A-B-A-B-A-B,或A-B-A-B-A-1/2B,或1/2A-B-A-B-2/3A。
在本发明中,所述基于链置换反应的DNA扩增(Roger S.Lasken Genomic DNA amplification by the multiple displacement amplification(MDA)method.Biochemical Society Transactions,2009,37,450-453)是指某些DNA聚合酶(例如包括Phi 29DNA聚合酶,Bst DNA聚合酶(大片段))在在延伸新链的过程中如果遇到下游DNA链,可以继续延伸反应并同时将下游双链剥离而产生游离的单链的DNA等温扩增。通常情况下,基于链置换反应的DNA扩增无需热变性。所述基于链置换反应的DNA扩增例如包括链置换扩增、滚环扩增、多重链置换扩增和环介导的扩增等。
在本发明的一个实施方案中,采用多重链置换扩增(multiple replacement amplification technical,MDA),其是一种恒温的DNA扩增方法,利用phi29DNA聚合酶的链置换活性,实现DNA的大量扩增。
在本发明的另一个实施方案中,采用滚环扩增,其是采用环状DNA为模板,利用特定的引物或者随机引物,在链置换酶的作用下,实现对环状DNA模板的大量扩增。当随机引物与单链环状DNA结合后,phi29DNA聚合酶可以顺着环进行第二条链的合成,当合成到引物的起始位置时,phi29DNA聚合酶通过其链置换活性将引物所在的链打开,新的合成继续进行下去。新合成的DNA单链又同时会与新的六随机引物结合,进行新一轮的合成。循环往复,从而实现了对 环状DNA分子的有效扩增。
在本发明中,所述第二代测序方法是指边合成边测序(Sequencing by Synthesis),即通过捕捉新合成的末端的标记来确定DNA的序列的方法,其包括但不限于Roche/454FLX、Illumina/Solexa Genome Analyzer和Applied Biosystems SOLID system。
在本发明中,所述第三代测序方法是指单分子测序技术,即DNA测序时,不需要经过PCR扩增,即可实现对每一条DNA分子的单独测序。其包括但不限于单分子荧光测序,代表性的技术为美国螺旋生物(Helicos)的SMS技术和美国太平洋生物(Pacific Bioscience)的SMART技术,以及纳米孔测序(nanopore sequencing)。
在本发明中,为了便于区分,用于制备同向交替串联体中的标签序列称为“标签序列”,用于测序的标签序列称为“测序接头”。
本发明提供的测序文库及其应用,至少实现了如下有益效果:
1、在任何测序深度下,都能有效去除DNA扩增错误和测序错误,从而超精确检测DNA分子上存在的突变。
通过连接标签序列到待测序的DNA小片段5’末端(总长度小于测序长度的一半),然后对这种嵌合体变性,得到单链的待测序列与标签序列连接片段DNA,再进行单链环化,而后对环化后的单链DNA进行滚环复制,构建待测序列与标签序列同向交替串联体序列。这些滚环复制所得到的重复单元之间,在扩增过程是相互独立的,因此,在各自单元上复制时所产生的错误也是独立的。以待测序列与标签序列同向交替串联体序列进行测序文库构建(文库***片段大小为至少两个重复单元)。对该文库进行一次测序,则至少测了两次同向重复单元,将两次重复单元测得的序列进行相互确认,两次重复单元不一致的碱基,即是文库制备过程中或测序过程中产生的聚合酶链式反应错误或测序错误。一致的序列即是原始序列。由于测序的重复单元来自于环状DNA,需要利用标签序列来确定待测序列的起始。
一条单链DNA和它的互补链,经过扩增后就无法确定新复制的 DNA来自于哪条链,这对识别碱基错误类型造成影响。例如,C突变为T和G突变为A,这两种类型的错误在双链DNA上是互补的,测序序列没有标记的话,就无法判断到底发生了C突变为T还是G突变为A。由于标签序列是非回文结构,并连接在单链DNA的5’端,经过复制扩增后,仍然可以根据标签序列的方向来确定出原始单链DNA,这样就可以识别出错误发生的类型,进而帮助识别低频突变。
由于DNA扩增的不平衡性,从少量DNA扩增到足够测序的DNA时会出现一部分DNA的拷贝数明显高于均值。在本发明中体现为:一个原始单链DNA滚环复制得到的多条测序序列共同反映同一条原始DNA的信息,存在测序冗余。但是在后续的数据处理中,由于没有任何信息来判断这些测序序列是否来自于同一个原始DNA单链环,这些测序序列可能被多次统计。由此会带来一种错误放大的效应:一个存在DNA损伤的单链滚环复制后,存在于多条测序序列中,被统计为可信的多次独立出现的DNA变异。识别出这种冗余将有助于排除上述错误。在本发明的实施方案中,标签序列可包括两个部分:已知碱基组成的接头区和随机碱基的自由区。接头区为6至13个连续的碱基,自由区为0至13个连续的碱基。特别指出的是自由区的碱基组成是随机的,在核酸序列合成时设计为相应长度的‘N’(随机碱基)。自由区的长度越长,区分的分辨率越高。如果自由区长度设计为零时,区分不同原始来源的测序序列仅依赖于1)从测序序列推断出的目标DNA片段的大小不同,2)推断出的目标DNA片段的序列组成不同。以下使用测序错误率为1/100(二代测序的错误率是1/100至1/1000)来阐明本发明的原理。一条一致序列上两个重复单元的同一位点同时发生一种类型错误的概率是:1/3*(1/100)2,即3*10-5的错误率(更多的重复单元一致碱基的错误概率更低)。那么两条不同的一致序列出现同样错误的概率为:(1/3*(1/100)2)2即9*10-10,因此,该方法极其有效的去除了文库构建过程和测序过程中产生的错误,达到了精确测序的目的。
2、适用于微量DNA短片段甚至单链DNA测序文库的构建。
由于单链环化所需的DNA起始量小(纳克级别甚至更低),片段短(30-200碱基对),环化后扩增效率高。因此特别适用于外周血游离DNA和古化石等降解严重的DNA的测序文库构建。
3、能够兼容目标区域捕获(如:外显子捕获,目的基因捕获)等方法。
本发明提供的待测序列与标签序列同向交替串联体序列中,由原始DNA复制的不同拷贝是串联在一起的。在进行目标区域捕获时,探针捕获到的分子至少含有两个同向重复单元的核酸序列,能够精确的测定DNA序列。
4、该方法构建的待测序列与标签序列同向交替串联体序列可用于构建多种第二代短片段测序文库,使其适用于各种测序平台。
附图说明
图1:本发明实施例5单链环化后的环大小及其分布图。
具体实施方式
下面将结合实施例对本发明的实施方案进行详细描述,但是本领域技术人员将会理解,下列实施例仅用于说明本发明,而不应视为限定本发明的范围。实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。
本发明的创新点之一在于,通过对短片段DNA分子连接标签序列(两者的总长度小于测序仪测序长度的一半),单链环化,滚环复制,得到待测序列与标签序列同向交替串联体序列,构建测序文库并测序。具体来讲,可以采用如下两种方案来实现。
方案一:
首先将DNA随机打断成小于二代测序仪测序读长一半的片段 (打断后的长度加上标签序列的长度应该小于读长一半),然后连接上标签序列,其中该标签序列第一条链(正链)的5’端经磷酸化修饰,而3’端突出一个T碱基,第二条链(负链)5’端未经磷酸化修饰,而3’端突出一个G碱基。经高温变性,去除切口处标签序列,这样就形成了含有单链标签序列的DNA序列,再经高温变性并立即冷却,将DNA变为单链。单链化后含标签序列的DNA,用单链环化酶进行环化。环化后的DNA,利用基于随机引物的滚环链置换扩增,大量扩增环化后的DNA分子。形成的扩增产物即是由目的DNA分子和标签序列组成的同向交替串联体。该同向交替串联核酸序列可用于构建标准的二代测序文库(文库构建过程中***片段的大小应大于测序仪的测序长度,以保证得到的多个重复单元是相互独立的)。
方案二:
首先将DNA随机打断成小于二代测序仪测序读长一半的片段(打断后的长度加上后续需要连接的标签序列的长度也应该小于读长的一半),然后连接上特定的标签序列(同方案一)。单链化后含标签序列的DNA,用单链环化酶进行环化。环化后的DNA通过具有链置换功能的DNA聚合酶(如Phi29DNA聚合酶)进行滚环扩增,引物则采用标签序列里面的第二条链(即负链)。扩增完成后,再用标签序列的第一条链(即正链)为引物,将滚环后的单链线性DNA合成双链。此时双链DNA是由标签序列和目的DNA组成的重复单元组成的。该双链DNA经纯化后,可用于构建标准的二代测序文库。文库构建过程中***片段的大小应大于测序仪的测序长度,以保证得到的多个重复单元是相互独立的。
实施例1:按照上述方案一构建全基因组DNA待测序列与标签序列同向交替串联体文库(Illumina平台)
1)DNA片段化
所用仪器和试剂:
超声打断仪:Covaris:S2 Focused-ultrasonicator
打断管:Covaris Microtube 6x16mm,catalog#:520045
琼脂糖:Promega,Agarose,LE,Analytical Grade,catalog#:V3121
电泳仪电源:北京市六一仪器厂,DYY-7C型
电泳槽:北京市六一仪器厂,DYCP-31DN型电泳槽
QIAGEN MinElute Gel Extraction Kit(250),Catalog#:28606
Takara 20bp DNA Ladder(Dye Plus),Takara Code,3420A
用超声打断仪(Covaris S2Focused-ultrasonicator)将1μg纯化好的PhiX 174基因组DNA打断为150-200bp(Intensity:5,Duty Cycle:10%,Cycles per Burst:200,Temperature:4℃,time:60s,humber of cycles:5),打断体系为50μl。
4%琼脂糖凝胶电泳(80V,70min;1×TAE),切胶回收(QIAGEN MinElute Gel Extraction Kit)60-90bp片段(Takara 20bp DNA Ladder),回收的简略步骤:6倍体积buffer QG溶胶,加入等体积异丙醇,混匀后过柱,buffer QG洗脱,buffer PE洗脱,晾干,56μl ddH2O洗脱。详见QIAGEN MinElute Gel Extraction Kit说明书。
2)末端补平
所用试剂:New England Biolabs:
Figure PCTCN2014093161-appb-000001
UltraTM DNA Library Prep Kit for
Figure PCTCN2014093161-appb-000002
Catalog#:E7370S
片段化DNA:55.5μl
End Prep Enzyme Mix:3μl
End Repair Reaction Buffer(10×):6.5μl
共:65μl
20℃30min,65℃30min。
3)末端加A并连接标签序列
所用试剂:New England Biolabs:
Figure PCTCN2014093161-appb-000003
UltraTM DNA Library Prep Kit for
Figure PCTCN2014093161-appb-000004
Catalog#:E7370S
已补平的DNA:65μl
Blunt/TA Ligase Master Mix:15μl
Ligation Enhancer:1μl
标签序列UO-A(50pmol):1μl
ddH2O:1.5μl
共:83.5μl
20℃30min,65℃10min立即置于冰上3min。
产物用MinElute Reaction Cleanup Kit纯化,15μl双蒸水洗脱。
标签序列:UO-A由100pmol的UO-adaptorl(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)和100pmol的UO-adaptor2(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)等体积混合退火(94℃5min,以每秒0.1℃逐渐降温至25℃)而成。
Figure PCTCN2014093161-appb-000005
注:标签序列包含但不局限于实施例中UO-adaptor1和UO-adaptor2的序列形式。下同。
4)单链环化
所用仪器和试剂:
PCR仪:Eppendorf:Mastercycler pros
New England Biolabs:Exonuclease I(E.coli),Catalog#:M0293
New England Biolabs:Exonuclease III(E.coli),Catalog#:M0206
Epicentre:CircLigase II ssDNA Ligase,Catalog#:CL9025K
将上述片段化后的DNA 37℃蒸干至4.2μl。
95℃3min(注:需要用可以对100μl体系进行反应的PCR仪,否则95℃后,4.2μl容易被蒸干),立即置于冰上3min
完成后加入:
10×circligase buffer:0.5μl
10mmol MnCl2:0.25μl
Circligase(100u/ul):0.25μl
65℃2h,80℃10min进行环化。
环化完成后消化线性及二聚体DNA:
Exonuclease I(E.coli):0.25μl
Exonuclease III(E.coli):0.25μl
37℃1h,80℃20min。
5)多重链置换(MDA)反应
采用基于MDA原理的全基因组扩增(WGA)试剂盒,滚环扩增环化后的产物。
所用仪器和试剂:
PCR仪:Eppendorf:Mastercycler pros
GE healthcare:illustra GenomiPhi HY DNA Amplification Kits,Product code:25-6600-20
Beckman Coulter,Inc:Agencourt AMPure XP,Item No.A63880
取上述环化DNA:2.5μl
Sample buffer:22.5μl
95℃3min,立即置于冰上3min。
完成后加入:
Reaction buffer:22.5μl
Enzyme mix:2.5μl
共20μl
30℃1h,65℃l0min。
产物采用Agencourt AMPure XP(Beckman Coulter,Inc)磁珠纯化。概述如下:对扩增后产物加入1.8倍体积磁珠,室温放置5min,磁力架吸附5min,去上清,70%酒精洗两次,晾干后,50μl buffer AE(10mM Tris-Cl,0.5mM EDTA;pH 9.0)洗脱。详见试剂盒说明书。
纯化后的产物即是待测序列与标签序列同向交替串联体。
6)对待测序列与标签序列同向交替串联体构建Illumina文库
可利用构建标准的Illumina文库的商业试剂盒,如:TruSeq DNA  Sample Preparation Kits,Nextera DNA Sample Preparation Kits等。具体包括以下步骤:
(1)待测序列与标签序列同向交替串联体DNA片段化
所用仪器和试剂:
1)超声打断仪:Covaris:S2Focused-ultrasonicator
2)打断管:Covaris Microtube 6x16mm,货号:520045
3)琼脂糖:Promega,Agarose,LE,Analytical Grade,catalog#:V3121
用超声打断仪(Covaris S2Focused-μltrasonicator)将2μg纯化后的DNA片段同向重复串联体打断为500-700bp(Intensity:3,Duty Cycle:5%,Cycles per Burst:200,Temperature:4℃,time:15s,number of cycles:5),打断体系为85μl。
(2)末端补平
所用试剂:New England Biolabs:
Figure PCTCN2014093161-appb-000006
End Repair Module,Catalog#:E6050
QIAGEN:MinElute Reaction Cleanup Kit,Catalog#:28206
片段化DNA:85μl
NEBNext End Repair Reaction Buffer:10μl
NEBNext End Repair Enzyme Mix:5μl
共:100μl
20℃30min。
产物用MinElute Reaction Cleanup Kit纯化,43μl ddH2O洗脱。
(3)末端加A
所用试剂:New England Biolabs:
Figure PCTCN2014093161-appb-000007
dA-Tailing Module,Catalog#:E6053
QIAGEN:MinElute Reaction Cleanup Kit,Catalog#:28206
已补平的DNA:42μl
NEBNext dA-Tailing Reaction Buffer:5μl
Klenow Fragment(3′→5′exo-):3μl
共:50μl
37℃30min。
产物用MinElute Reaction Cleanup Kit纯化,35.5μl ddH2O洗脱。
(4)测序接头序列连接
所用试剂:Invitrogen:T4 DNA Ligase,Catalog#:15224-041
已末端加A的DNA:34.5μl
接头序列1(50pmol):3μl
5×DNA ligase buffer:10μl
T4DNA Ligase:2.5μl
共:50μl
16℃过夜(16h)。
2%琼脂糖凝胶电泳(80V,80min;1×TAE),切胶回收(QIAGEN MinElute Gel Extraction Kit)500~700bp片段,22μl ddH2O洗脱。
接头序列1:
Figure PCTCN2014093161-appb-000008
接头序列退火:取等体积100pmol的Multiplexing Adapter 1.0(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)和Multiplexing Adapter 2.0(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl),94℃5min,接着以每秒0.1℃逐渐降温至25℃。退火完成后即形成了浓度为50pmol的接头序列。
(5)PCR扩增
所用仪器试剂:
PCR仪:Eppendorf:Mastercycler pros
Thermo scientific:Phusion High-Fidelity PCR Master Mix with  HF Buffer,Catalog#:F531L
上述回收的DNA(约30ng)+ddH2O:23μl
MP PCR primer 1.0(10pmol):lμl
MP index primer 1(10pmol):1μl
2×Phusion High-Fidelity PCR Master Mix:25μl
共:50μl
PCR扩增循环条件:
98℃45s预变性,循环扩增(98℃15s,65℃30s,72℃60s)10次,72℃5min,4℃冷却。
2%琼脂糖凝胶电泳(80V,80min;1XTAE),切胶回收(QIAGEN MinElute Gel Extraction Kit)500-700bp片段,22μl ddH2O洗脱。
洗脱后的DNA即是构建好的文库,该文库即可用于二代测序平台测序。
引物序列如下:
Figure PCTCN2014093161-appb-000009
实施例2按照上述方案一构建人外显子待测序列与标签序列同向交替串联体文库(Illumina测序平台)
1)DNA片段化
所用仪器、试剂参见实施例1。用超声打断仪将1μg纯化好的人外周血基因组DNA打断为300bp(Intensity:4,Duty Cycle:10%,Cycles per Burst:200,Temperature:4℃,time:60s,number of  cycles:2),打断体系为50μl。
4%琼脂糖凝胶电泳(80V,70min;1×TAE),切胶回收80~130bp片段,回收的简略步骤:6倍体积buffer QG溶胶,加入等体积异丙醇,混匀后过柱,buffer QG洗脱,buffer PE洗脱,晾干,56μl ddH2O洗脱。详见QIAGEN MinElute Gel Extraction Kit说明书。
2)末端补平
所用试剂参见实施例1。步骤1)的片段化DNA:55.5μl
End Prep Enzyme Mix:3μl
End Repair Reaction Buffer(10×):6.5μl
共:65μl
20℃30min,65℃30min。
3)末端加A并连接标签序列
所用试剂参见实施例1。
步骤2)已补平的DNA:65μl
Blunt/TA Ligase Master Mix:15μl
Ligation Enhancer:1μl
标签序列UO-A(50pmol):1μl
ddH2O:1.5μl
共:83.5μl
20℃30min,65℃10min立即置于冰上3min。
产物用MinElute Reaction Cleanup Kit纯化,15μl ddH2O洗脱。
标签序列:UO-A由100pmol的UO-adaptor1(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)和100pmol的UO-adaptor2(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)等体积混合退火(94℃5min,以每秒0.1℃逐渐降温至25℃)而成。
Figure PCTCN2014093161-appb-000010
4)DNA单链环化
所用仪器和试剂参见实施例1。
将步骤3)片段化后的DNA 37℃蒸干至4.2μl。
95℃3min(注:需要用可以对100μl体系进行反应的PCR仪,否则95℃后,4.2μl容易被蒸干),立即置于冰上3min;
完成后加入:
10×circligase buffer:0.5μl
10mmol Mncl2:0.25μl
Circligase(100u/μl):0.25μl
65℃2h,80℃10min;
环化完成后消化线性及二聚体DNA,
Exonuclease I(E.coli):0.25μl
Exonuclease III(E.coli):0.25μl
37℃1h,80℃20min。
5)多重链置换(MDA)反应
采用基于MDA原理的全基因组扩增(WGA)试剂盒,滚环扩增环化后的产物:
所用仪器和试剂参见实施例1。
上述环化DNA:2.5μl
Sample buffer:22.5μl
95℃3min,立即置于冰上3min;
完成后加入:
Reaction buffer:22.5μl
Enzyme mix:2.5μl
共20μl
30℃1h,65℃10min;
产物采用Agencourt AMPure XP(Beckman Coulter,Inc)磁珠纯化。概述如下:对扩增后产物加入1.8倍体积磁珠,室温放置5min,磁力架吸附5min,去上清,70%酒精洗两次,晾干后,50μl buffer AE(10mM Tris-Cl,0.5mM EDTA;pH 9.0)洗脱。详见试剂盒说明书。
纯化后的产物即是待测序列与标签序列同向交替串联体。
6)对上述产生的待测序列与标签序列同向交替串联体构建外显子捕获文库(Illumina测序平台)
可利用构建外显子捕获文库的商业试剂盒,如:Agilent:SureSelect Human All Exon Kits等。
(1)待测序列与标签序列同向交替串联体DNA片段化
所用仪器和试剂参见实施例1。
用超声打断仪将2μg纯化后的待测序列与标签序列同向交替串联体打断为500-700bp(Intensity:3,Duty Cycle:5%,Cycles per Burst:200,Temperature:4℃,time:15s,number of cycles:5),打断体系为85μl。
(2)末端补平
所用试剂参见实施例1。
步骤(1)的片段化DNA:85μl
NEBNext End Repair Reaction Buffer:10μl
NEBNext End Repair Enzyme Mix:5μl
共:100μl
20℃30min;
产物用MinElute Reaction Cleanup Kit纯化,43μl ddH2O洗脱。
(3)末端加A
所用试剂参见实施例1。
步骤(2)已补平的DNA:42μl
NEBNext dA-Tailing Reaction Buffer:5μl
Klenow Fragment(3′→5′exo-):3μl
共:50μl
37℃30min;
产物用MinElute Reaction Cleanup Kit纯化,35.5μl ddH2O洗脱。
(4)测序接头序列连接
所用试剂参见实施例1。已末端加A的DNA:34.5μl
接头序列1(50pmol):3μl
5×DNA ligase buffer:10μl
T4DNA Ligase:2.5μl
共:50μl
16℃过夜(16h);
2%琼脂糖凝胶电泳(80V,80min;1×TAE),切胶回收(QIAGEN MinElute Gel Extraction Kit)500-700bp片段,22μl ddH2O洗脱。
接头序列1:
Figure PCTCN2014093161-appb-000011
接头序列退火:取等体积100pmol的Multiplexing Adapter 1.0(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)和Multiplexing Adapter 2.0(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl),94℃5min,接着以每秒0.1℃逐渐降温至25℃。退火完成后即形成了浓度为50pmol的接头序列1。
(5)PCR扩增
所用仪器、试剂:
PCR仪:Eppendorf:Mastecycler pro s
Agilent:Herculase II Fusion DNA Polymerases,Catalog#:600677
QIAGEN:MinElute Reaction Cleanup Kit,Catalog#:28206
4个反应并行进行,每个反应配方如下:
上述回收的DNA(约90ng)+ddH2O:36.5μl
MP PCR primer 1.0(10pmol):1μl
MP index primer 1(10pmol):1μl
5×Herculase II Reaction Buffer:10μl
dNTPs(100mM;25mM each dNTP):0.5μl
Herculase II Fusion DNA Polymerase:1μl
共:50μl
PCR扩增循环条件:
98℃2min预变性,循环扩增(98℃30s,65℃30s,72℃30s)8次,72℃10min,4℃冷却。
PCR完成后浓缩4个反应管中的PCR产物(MinElute Reaction Cleanup Kit),46μl ddH2O洗脱。
2%琼脂糖凝胶电泳(80V,90min;1×TAE),切胶回收(QIAGEN MinElute Gel Extraction Kit)500~700bp片段,26μl ddH2O洗脱。
引物序列如下:
(6)外显子探针杂交
本实验采用Agilent:SureSelect Human All EXon Kits对上述PCR反应产物进行外显子探针杂交。简述如下:
杂交缓冲液配制:
SureSelect Hyb#1(orange cap,or bottle):25μl
SureSelect Hyb#2(red cap):1μl
SureSelect Hyb#3(yellow cap):10μl
SureSelect Hyb#4(black cap,or bottle):13μl
共:49μl
65℃5min。
捕获文库混合物配制:
SureSelect Library:5μl
SureSelect RNase Block(purple cap):0.5μl
ddH2O:1.5μl
共:7μl
65℃2min。
样品混合物配制:
纯化好的DNA(约700ng):3.4μl
SureSelect Indexing Block#1(green cap):2.5μl
SureSelect Block#2(blue cap):2.5μl
SureSelect Indexing Block#3(brown cap):0.6μl
共:9μl
95℃5min,65℃hold。
取13μl配制好的杂交缓冲液加入捕获文库混合物(7μl)中,再将样品混合物(9μl)加入,共29μl,65℃杂交24h。
磁珠(InvitrogenTM
Figure PCTCN2014093161-appb-000013
M-280Streptavidin,Catalog#:11205D)抓取杂交好的片段(50μl磁珠,用200μl SureSelect Binding Buffer洗涤三次,200μl SureSelect Binding Buffer重悬磁珠,加入杂交后产物,室温放置30min,磁珠吸附,SureSelect Wash 1洗一次,SureSelect Wash 2洗三次,36.5μl ddH2O重悬磁珠),详见Agilent:SureSelect Human All Exon Kits操作手册。
(7)探针杂交后PCR
所用仪器试剂:
PCR仪:Eppendorf:Mastecycler pro s
Agilent:Herculase II Fusion DNA Polymerases,Catalog#:600677
Beckman Coulter,Inc:Agencourt AMPure XP,Item No.A63880
4个反应并行进行,每个反应配方如下:
外显子探针杂交中重悬的磁珠:36.5μl
MP PCR primer 1.0(10pmol):1μl
MP PCR primer 2.0(10pmol):1μl
5×Herculase II Reaction Buffer:10μl
dNTPs(100mM;25mM each dNTP):0.5μl
Herculase II Fusion DNA Polymerase:1μl
共:50μl。
PCR扩增循环条件:
98℃2min预变性,循环扩增(98℃30s,65℃30s,72℃30s)12次,72℃10min,4℃冷却。
引物序列如下:
Figure PCTCN2014093161-appb-000014
PCR完成后用Agencourt AMPure XP磁珠纯化,概述如下:对扩增后产物加入1.8倍体积磁珠,室温放置5min,磁力架吸附5min,去上清,70%酒精洗两次,晾干后,16μl ddH2O洗脱。详见试剂盒说明书。
洗脱后的DNA即是构建好的人外显子待测序列与标签序列同向交替串联体文库,该文库即可用于二代测序平台测序。
实施例3:按照方案一构建外周血游离DNA待测序列与标签序列同向交替串联体文库(Illumina测序平台)
1)提取外周血游离DNA并检测其片段大小。
所用仪器和试剂:
QIAGEN:QIAamp Circulating Nucleic Acid Kit,catalog#: 55114
Agilent:2100bioanalyzer
取2ml血浆,采用QIAGEN的QIAamp Circulating Nucleic Acid Kit提取血浆中的DNA(cell-free circulating DNA),20μl ddH2O洗脱(提取方法见试剂盒说明书)。采用Agilent的2100bioanalyzer检测提取的片段大小分布。从结果得出,正常人中游离的DNA片段大小集中在172bp附近,分布范围约是(130bp-230bp),浓度为0.354ng/μl。肝癌病人中游离的DNA片段大小集中在164bp附近,分布范围约是(110bp-210bp),浓度为4.78ng/μl。
2)末端补平
所用试剂参见实施例1。
提取的外周血游离DNA(50ng)+ddH2O:55.5μl
End Prep Enzyme Mix:3μl
End Repair Reaction Buffer(10×):6.5μl
共:65μl
20℃30min,65℃30min。
3)末端加A并连接标签序列
所用试剂参见实施例1已补平的DNA:65μl
Blunt/TA Ligase Master Mix:15μl
Ligation Enhancer:1μl
标签序列UO-A(50pmol):1μl
ddH2O:1.5μl
共:83.5μl
20℃30min,65℃10min立即置于冰上3min。
产物用MinElute Reaction Cleanup Kit纯化,15μl ddH2O洗脱。
标签序列:UO-A由100pmol的UO-adaptor1(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)和100pmol的UO-adaptor2(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)等体积混合退火(94℃5min,以每秒0.1℃ 逐渐降温至25℃)而成。
Figure PCTCN2014093161-appb-000015
4)DNA单链环化
所用仪器和试剂参见实施例1。
将提取的外周血游离的DNA 37℃蒸干至4.2μl。
95℃3min(注:需要用可以对100μl体系进行反应的PCR仪,否则95℃后,4.2μl容易被蒸干),立即置于冰上3min
完成后加入:
10×circligase buffer:0.5μl
10mmol MnCl2:0.25μl
Circligase(100u/μl):0.25μl
65℃2h,80℃10min。
环化完成后消化线性及二聚体DNA
Exonuclease I(E.coli):0.25μl
Exonuclease III(E.coli):0.25μl
37℃1h,80℃20min。
5)多重链置换(MDA)反应
采用基于MDA原理的全基因组扩增(WGA)试剂盒,滚环扩增环化后的产物:
所用仪器和试剂参见实施例1。
上述环化DNA:2.5μl
Sample buffer:22.5μl
95℃3min,立即置于冰上3min;
完成后加入:
Reaction buffer:22.5μl
Enzyme mix:2.5μl
共20μl
30℃1h,65℃10min。
产物采用Agencourt AMPure XP(Beckman Coulter,Inc)磁珠纯化。概述如下:对扩增后产物加入1.8倍体积磁珠,室温放置5min,磁力架吸附5min,去上清,70%酒精洗两次,晾干后,50μl buffer AE(10mM Tris-Cl,0.5mM EDTA;pH 9.0)洗脱。详见试剂盒说明书。
纯化后的产物即是待测序列与标签序列同向交替串联体。
6)对上述产生的待测序列与标签序列同向交替串联体构建illumina测序文库
可利用构建标准的Illumina文库的商业试剂盒,如:TruSeq DNA Sample Preparation Kits,Nextera DNA Sample Preparation Kits等。
(1)同向重复串联体DNA片段化
所用仪器和试剂参见实施例1。
用超声打断仪将2μg纯化后的待测序列与标签序列同向交替串联体打断为500~700bp(Intensity:3,Duty Cycle:5%,Cycles per Burst:200,Temperature:4℃,time:15s,number of cycles:5),打断体系为85μl。
(2)末端补平
所用试剂参见实施例1。
片段化DNA:85μl
NEBNext End Repair Reaction Buffer:10μl
NEBNext End Repair Enzyme Mix:5μl
共:100μl
20℃30min;
产物用MinElute Reaction Cleanup Kit纯化,43μl ddH2O洗脱。
(3)末端加A
所用试剂参见实施例1。
已补平的DNA:42μl
NEBNext dA-Tailing Reaction Buffer:5μl
Klenow Fragment(3′→5′exo-):3μl
共:50μl
37℃30min;
产物用MinElute Reaction Cleanup Kit纯化,35.5μl ddH2O洗脱。
(4)接头序列连接
所用试剂参见实施例1。已末端加A的DNA:34.5μl
接头序列1(50pmol):3μl
5×DNA ligase buffer:10μl
T4DNA Ligase:2.5μl
共:50μl
16℃过夜(16h)。
2%琼脂糖凝胶电泳(80V,80min;1xTAE),切胶回收(QIAGEN MinElute Gel Extraction Kit)500~700bp片段,22μl ddH2O洗脱。
接头序列1:
Figure PCTCN2014093161-appb-000016
标签序列退火:取等体积100pmol的Multiplexing Adapter 1.0(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)和Multiplexing Adapter 2.0(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl),94℃5min,接着以每秒0.1℃逐渐降温至25℃。退火完成后即形成了浓度为50pmol的接头序列1。
(5)扩增
所用仪器、试剂参见实施例1。
上述回收的DNA(约30ng)+ddH2O:23μl
MP PCR primer 1.0(10pmol):1μl
MP index primer 1(10pmol):1μl
2×Phusion High-Fidelity PCR Master Mix:25μl
共:50μl。
PCR扩增循环条件:
98℃45s预变性,循环扩增(98℃15s,65℃30s,72℃60s)10次,72℃5min,4℃冷却。
2%琼脂糖凝胶电泳(80V,80min;1×TAE),切胶回收(QIAGEN MinElute Gel Extraction Kit)500-700bp片段,22μl ddH2O洗脱。
洗脱后的DNA即是构建好的文库,该文库即可用于二代测序平台测序。
引物序列如下:
Figure PCTCN2014093161-appb-000017
实施例4:
按照方案二构建待测序列与标签序列同向交替串联体文库(Illumina测序平台)
步骤:
1)DNA片段化
所用仪器和试剂参见实施例1。
用超声打断仪将1μg纯化好的黑腹果蝇基因组DNA打断为150-200bp(Intensity:5,Duty Cycle:10%,Cycles per Burst:200,Temperature:4℃,time:60s,number of cycles:5),打断体系为50μl。
4%琼脂糖凝胶电泳(80V,70min;1×TAE),切胶回收60-90bp片段,回收的简略步骤:6倍体积buffer QG溶胶,加入等体积异丙醇,混匀后过柱,buffer QG洗脱,buffer PE洗脱,晾干,56μl ddH2O 洗脱。详见QIAGEN MinElute Gel Extraction Kit说明书。
2)末端补平
所用试剂参见实施例1。片段化DNA:55.5μl
End Prep Enzyme Mix:3μl
End Repair Reaction Buffer(10X):6.5μl
共:65μl
20℃30min,65℃30min。
3)末端加A并连接标签序列
所用试剂参见实施例1。已补平的DNA:65μl
Blunt/TA Ligase Master Mix:15μl
Ligation Enhancer:1μl
标签序列UO-A(50pmol):1μl
ddH2O:1.5μl
共:83.5μl
20℃30min,65℃10min立即置于冰上3min。
产物用MinElute Reaction Cleanup Kit纯化,15μl ddH2O洗脱。
标签序列:UO-A由100pmol的UO-adaptor1(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)和100pmol的UO-adaptor2(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)等体积混合退火(94℃5min,以每秒0.1℃逐渐降温至25℃)而成。
Figure PCTCN2014093161-appb-000018
4)DNA单链环化
所用仪器和试剂参见实施例1。
将片段化后的DNA 37℃蒸干至4.2μl。
95℃3min(注:需要用可以对100μl体系进行反应的PCR仪,否则95℃后,4.2μl容易被蒸干),立即置于冰上3min
完成后加入:
10×circligase buffer:0.5μl
10mmol MnCl2:0.25μl
Circligase(100u/μl):0.25μl
65℃2h,80℃10min;
环化完成后消化线性及二聚体DNA
Exonuclease I(E.coli):0.25μl
Exonuclease III(E.coli):0.25μl
37℃1h,80℃20min。
5)滚环扩增
所用仪器和试剂:
PCR仪:Eppendorf:Mastecycler pros
New England Biolabs:phi29DNA Polymerase,Catalog#:M0269L
单链环化后的DNA:5.7μl
phi29DNA Polymerase Reaaction Buffer:2μl
引物UO-a3(10pmol):1μl
ddH2O:8.9μl
共17.6μl,95℃3min,立即置于冰上3min。完成后加入:
10mM dNTP:1μl
100×BSA:0.4μl
phi29DNA Polymerase(10U/μl):1μl
共:20μl
30℃8h,65℃10min。
引物序列:
Figure PCTCN2014093161-appb-000019
Figure PCTCN2014093161-appb-000020
6)线性DNA双链化
所用仪器和试剂:
PCR仪:Eppendorf:Mastecycler pros
New England Biolabs:phi29DNA Polymerase,Catalog#: M0269L
New England Biolabs:Exonuclease I(E.coli),Catalog#:M0293
New England Biolabs:T4DNA polymerase,Catalog#:m0203
Epicentre:
Figure PCTCN2014093161-appb-000021
Enzyme and Buffer,Catalog#:A3202K
Beckman Coulter,Inc:Agencourt AMPure XP,Item No.A63880
滚环后DNA:20μl
引物UO-a1(10p):1μl
Ampligase 10×Reaction Buffer:5μl
2.5mM dNTP:1μl
ddH2O:22.5μl
95℃3min,立即置于冰上3min,完成后加入:
T4DNA polymerase:0.5μl
12℃2.5h,75℃20min。完成后加入:
Ampligase DNA Ligase:3μl
60℃1h。完成后加入:
Exonuclease I:1μl
37℃1h,80℃20min。
产物用Agencourt AMPure XP磁珠纯化,概述如下:对扩增后产物加入1.8倍体积磁珠,室温放置5min,磁力架吸附5min,去上清,70%酒精洗两次,晾干后,20μl ddH2O洗脱。详见试剂盒说明书。
纯化后的产物即是DNA片段的同向重复串联体。
引物序列
Figure PCTCN2014093161-appb-000022
7)对上述产生的待测序列与标签序列同向交替串联体构建illumina测序文库
滚环8h后得到的DNA的量为几十纳克到几百纳克不等,可以通过增加滚环的时间来增加滚环后的DNA产量。根据所得到的DNA的量,选择合适的商业试剂盒构建标准的Illumina文库:如果得到几 十纳克的DNA可采用Nextera DNA Sample Preparation Kits或者其他基于少量DNA构建文库的试剂盒,如果得到的DNA的量为几百纳克,则可采用TruSeq DNA Sample Preparation Kits等可以针对多起始量DNA的试剂盒。
这里采用一种基于转座酶EZ-Tn5的文库构建方法:
(1)转座子组装
Epi_MA1(10pmol):1μl
Epi_MA2(10pmol):1μl
甘油:0.5μl
1U/μl转座酶EZ-Tn5(epicentre):2.5μl
共:5μl
25℃20min。
(2)DNA片段化
上述转座子:5μl
5×LMW buffer:2μl
上述获得的同向重复串联体DNA(约30ng)+ddH2O:3μl
共:10μl
55℃10min。
产物用MinElute Reaction Cleanup Kit纯化,24μl ddH2O洗脱。
(3)回收产物PCR扩增
所用仪器、试剂参见实施例1。
上述回收的DNA(约30ng)+ddH2O:23μl
Epi_PCR primer 1.0(10pmol):1μl
Epi_index primer(10pmol):1μl
2×Phusion High-Fidelity PCR Master Mix:25μl
共:50μl
PCR扩增循环条件:
72℃3min(不可少),98℃30s,循环扩增(98℃10s,65℃30s,72℃3min)10次,4℃冷却。
2%琼脂糖凝胶电泳(80V,80min;1XTAE),切胶回收(QIAGEN MinElute Gel Extraction Kit)500~800bp片段,17μl ddH2O洗脱。
洗脱后的DNA即是构建好的文库,该文库即可用于二代测序平台测序。
上述各引物序列如下:
Figure PCTCN2014093161-appb-000023
Epi_MA1:
由等体积100pmol的Epi_ME(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)和Epi_Adaptor1(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)退火而成。条件:94℃5min,接着以每秒0.1℃逐渐降温至25℃。
Epi_MA2:
由等体积100pmol的Epi_ME(退火缓冲液溶解:10mM Tris-HCl (pH 7.5),1mM EDTA,0.1mM NaCl)和Epi_Adaptor2(退火缓冲液溶解:10mM Tris-HCl(pH 7.5),1mM EDTA,0.1mM NaCl)退火而成。条件:94℃5min,接着以每秒0.1℃逐渐降温至25℃。
5×LMW buffer:50mM Tris-OAc,pH 8.0,25mM Mg(OAc)2
按实施例1进行Oseq文库构建。
实施例5对噬菌体Phix174进行文库构建以及数据分析
对1μg噬菌体Phix174DNA,超声打断成主带在300bp的DNA片段。回收60~80bp片段,连接标签序列,单链化,滚环扩增(详见实施例1)。对滚环后的DNA进行了基于转座子EZ-Tn5的二代测序文库构建(详见实施例4)。用hiseq 2000测得约10G的双向数据(读长为2×100=200bp)。对数据处理分析如下:
1、共测得:54391601条reads,其中能成环(能够检测到至少两个重复单元,下同)的reads数为:33987941条reads
2、成环率:OS2_in2:(135951764/4)/(217566404/4)=62.49%
3、形成的环大小范围为:30-162bp,平均大小为:72.5333bp,标准差为:14.06478
中位数为:71bp。具体分布如图1所示。
4、对构建好的待测序列与标签序列同向交替串联体文库,进行双端的高通量测序(Pair-End sequencing)。由于环的大小小于测序仪的测序长度的一半,因此单端一次测序一定覆盖了至少一个单元的串联体,双端的一次测序一定测了至少两次串联体单元,将这两个串联体的序列相互比较,去除不一致的序列。利用该原理,来计算所测的数据中DNA的错误率。假设样品中不存在低频突变,该方法的错误率为1e-5。测序错误在不同碱基(参考基因组的碱基)上分布不同,其中C到T、G到A的测序错误率较高,约为1e-4,具体测序错误率见表1。这种突变的模式在其他测定低频突变的研究中也有发现,这两种突变很可能是由于胞嘧啶或者5甲基化胞嘧啶自发的脱氨基作用导致的。当发生脱氨基作用后,一根原始单链DNA上的碱基已 经改变,对它的多次独立检测都只能观察到突变碱基。
表1 不同碱基的测序错误频率
测序错误类型 测序错误率
A=>C 1.78E-06
T=>G 1.13E-06
A=>G 4.41E-06
T=>C 6.96E-06
A=>T 5.70E-06
T=>A 2.97E-06
C=>A 1.34E-05
G=>T 2.91E-05
C=>G 1.19E-05
G=>C 1.92E-05
C=>T 0.000153171
G=>A 0.000443162
从上述计算的结果可以看出,该方法的单碱基错误率(10-5)远远低于二代测序的错误率(1%),也远远低于已经存在的一些改进方法,因此本方法较为彻底的消除了二代测序的错误率问题,借助于第二代测序技术平台实现了对DNA分子的超精确测序。该方法的另一个优点是测序精度与测序深度无关,解决了标签法必须在极高的测序覆盖乘数下才能较精确测定DNA序列的问题,从而也就可以实现对大基因组(如人类的基因组等)的精确测序。
实施例6对大肠杆菌进行文库构建以及数据分析
取Ecoli W3110的DNA,超声打断成主带在300bp的DNA片段。回收80~150bp片段,连接标签序列,单链化,滚环扩增。对滚环后的DNA进行了传统的二代测序文库构建(详见实施例1)。用hiseq2500测得约4G的双向数据(读长为2×150=300bp)。对数据处理分析如下:
1、共测得:13787730条reads,其中能成环的reads数为:7578585条reads。
2、成环率:54.96615468971325%。
3、形成的环大小范围为:30-260bp,平均大小为:122.909bp,标准差为:17.74147bp。
中位数为:122bp。
各碱基的测序错误率见表2。
表2不同碱基的测序错误率
测序错误类型 测序错误频率
A=>C 2.66E-07
T=>G 4.10E-07
A=>G 2.79E-06
T=>C 2.47E-06
A=>T 1.58E-06
T=>A 1.29E-06
C=>A 5.68E-06
G=>T 3.85E-06
C=>G 3.20E-06
G=>C 1.14E-06
C=>T 0.000119
G=>A 7.73E-05
实施例7带随机标签序列测序文库的制备及数据分析
取phiX174DNA,超声打断成主带在100~200bp的DNA片段。回收60~100bp片段,连接标签序列,单链化,滚环扩增。对滚环后的DNA进行了传统的二代测序文库构建(详见实施例1)。其中和待测DNA片段连接的标签序列如下:
Figure PCTCN2014093161-appb-000024
用hiseq 2000测得约4G的双向数据(读长为2×150=300bp)。对数据处理分析如下:
1、共测得:19147560条reads,其中能成环(能够检测到至少两 个重复单元,下同)的reads数为:4580270条reads。
2、成环率:23.92090689361987%。
3、形成的环大小范围为(去掉标签序列后):1-133bp,平均大小为:88.56275bp,标准差为:29.17562bp。
中位数为:98bp。
各碱基的测序错误频率见表3。
表3 不同碱基的测序错误频率
测序错误类型 测序错误频率
A=>C 4.36E-07
T=>G 9.22E-07
A=>G 3.79E-06
T=>C 4.12E-06
A=>T 8.75E-06
T=>A 1.24E-05
C=>A 2.97E-05
G=>T 1.93E-05
C=>G 1.50E-05
G=>C 9.99E-06
C=>T 0.000103
G=>A 0.000131
本发明的方法能够超精确测定细胞中的DNA分子组成,可以把一个正常或发生病变(如癌症组织等)细胞群体中的DNA组成较真实的呈现出来。在癌症的检测方面,可以用来检测一个正常个体的某一组织或器官是否已经发生了潜在的致癌突变,以达到提前发现癌症和预防癌症的目的。在癌症研究的方面,该方法可以检测癌症群体中DNA突变的分布情况;可以用于发现癌症组织中潜在的小克隆群体来真实的了解肿瘤的异质性结构;可以帮助阐释突变在癌症的发生发展所起的作用;可以用来寻找肿瘤干细胞等。对于癌症治疗方面,可以用于寻找肿瘤干细胞群体,然后针对肿瘤干细胞设计特定的药物靶标,以实现对癌症的有效治疗等。对正常个体而言,该方法可以用于 检测个体中正常细胞内DNA发生的突变,从而追溯正常组织的生长模式;也可以测定不同年龄个体中,某一组织中DNA突变发生的个数,从而估算DNA突变的速率;可以用于检测一个正常个体中是否存在与各种疾病相关的突变,达到预防疾病的目的等。
同时该方法能对外周血中的游离DNA进行有效的文库构建,能够有效的检测外周血中存在的低频突变位点,这种通过非侵害性的检测手段就能够对癌症的发生及发展过程、产前诊断中胎儿体内的有害突变等进行有效的检测和评估。
在古人类DNA的序列测定是研究人类进化的主要手段,但测定古人类DNA有很多难题,其中最大的几个问题是提取的古人类DNA含量低,降解严重,微生物污染严重。该方法能够利用极少量的DNA(单双链均可)进行文库构建,构建的文库能够进行外显子捕获(去除微生物基因组污染),可有效针对古DNA文库构建过程中的这几个难题。
尽管本发明的具体实施方式已经得到详细的描述,本领域技术人员将会理解。根据已经公开的所有教导,可以对那些细节进行各种修改和替换,这些改变均在本发明的保护范围之内。本发明的全部范围由所附权利要求及其任何等同物给出。

Claims (22)

  1. 一种测序文库,其特征在于,所述测序文库中的***片段包含待测序列与标签序列的同向交替串联体。
  2. 权利要求1的测序文库,其特征在于,所述每个待测序列与标签序列的长度之和小于测序仪测序长度的一半。
  3. 权利要求1的测序文库,其特征在于,所述同向交替串联体的长度大于测序仪的测序长度。
  4. 权利要求1的测序文库,其特征在于,所述标签序列包括4-20个(例如6-13个)连续的确定碱基和0-18个(例如0-13个)连续的随机碱基。
  5. 权利要求4的测序文库,其特征在于,所述确定碱基和随机碱基的排列方式为顺序排列(确定碱基和随机碱基不分前后)或镶嵌排列。
  6. 权利要求1-5任一项的测序文库,所述测序文库用于第二代测序或第三代测序。
  7. 一种制备测序文库的方法,所述方法包括:
    (1)将待测序列与标签序列连接,得到双链或单链连接序列;
    (2)当步骤(1)得到的连接序列为双链序列时,将连接序列单链化,然后进行环化处理,当步骤(1)得到的连接序列为单链序列时,直接进行环化处理;
    (3)将步骤(2)得到的环化的连接序列进行基于链置换反应的DNA扩增,得到待测序列与标签序列的同向交替串联体;
    (4)将待测序列与标签序列的同向交替串联体片段化,并在片段的两端连接测序接头,得到测序文库。
  8. 权利要求7的方法,其中所述每个待测序列与标签序列的长度之和小于测序仪测序长度的一半。
  9. 权利要求7的方法,其中步骤(4)所述的同向交替串联体片段化后的长度大于测序仪的测序长度。
  10. 权利要求7的方法,其中所述标签序列包括4-20个(例如6-13个)连续的确定碱基和0-18个(例如0-13个)连续的随机碱基。
  11. 权利要求10的方法,其中所述确定碱基和随机碱基的排列方式为顺序排列(确定碱基和随机碱基不分前后)或镶嵌排列。
  12. 权利要求7-11任一项的方法,其中所述测序文库用于第二代测序或第三代测序。
  13. 一种测序方法,该方法包括使用权利要求1-6任一项的测序文库的步骤。
  14. 一种测序方法,该方法包括制备测序文库的步骤,所述制备测序文库的方法包括:
    (1)将待测序列与标签序列连接,得到双链或单链连接序列;
    (2)当步骤(1)得到的连接序列为双链序列时,将连接序列单链化,然后进行环化处理,当步骤(1)得到的连接序列为单链序列时,直接进行环化处理;
    (3)将步骤(2)得到的环化的连接序列进行基于链置换反应的DNA扩增,得到待测序列与标签序列的同向交替串联体,即制备得 到测序文库;
    (4)将待测序列与标签序列的同向交替串联体片段化,并在片段的两端连接测序接头,得到测序文库。
  15. 权利要求14的测序方法,其中所述每个待测序列与标签序列的长度之和小于测序仪测序长度的一半。
  16. 权利要求14的测序方法,其中步骤(4)所述的同向交替串联体片段化后的长度大于测序仪的测序长度。
  17. 权利要求14的测序方法,其中所述标签序列包括4-20个(例如6-13个)连续的确定碱基和0-18个(例如0-13个)连续的随机碱基。
  18. 权利要求17的测序方法,其中所述确定碱基和随机碱基的排列方式为顺序排列(确定碱基和随机碱基不分前后)或镶嵌排列。
  19. 权利要求14-18任一项的测序方法,该测序方法为第二代测序或第三代测序方法。
  20. 权利要求1-6任一项的测序文库在测序中的应用。
  21. 权利要求20的应用,其中所述的测序为第二代测序或第三代测序。
  22. 权利要求20的应用,其中所述的测序包括基因组DNA测序、目标片段捕获测序(例如外显子捕获测序)、单链DNA片段的测序、化石DNA的测序或体液(例如血液、尿液、唾液)中游离DNA的测序。
PCT/CN2014/093161 2013-12-06 2014-12-05 测序文库及其制备和应用 WO2015081890A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/101,605 US10718015B2 (en) 2013-12-06 2014-12-05 Sequencing library, preparation method and use thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310651462.5 2013-12-06
CN201310651462.5A CN104695027B (zh) 2013-12-06 2013-12-06 测序文库及其制备和应用

Publications (1)

Publication Number Publication Date
WO2015081890A1 true WO2015081890A1 (zh) 2015-06-11

Family

ID=53272913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/093161 WO2015081890A1 (zh) 2013-12-06 2014-12-05 测序文库及其制备和应用

Country Status (3)

Country Link
US (1) US10718015B2 (zh)
CN (1) CN104695027B (zh)
WO (1) WO2015081890A1 (zh)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106554957B (zh) * 2015-09-30 2020-04-21 中国农业科学院深圳农业基因组研究所 测序文库及其制备和应用
CN106591425A (zh) * 2015-10-15 2017-04-26 北京寻因生物科技有限公司 基于连接反应的多重靶向检测核酸指标的方法
CN105442054B (zh) * 2015-11-19 2018-04-03 杭州谷坤生物技术有限公司 对血浆游离dna进行多目标位点扩增建库的方法
CN105986030A (zh) * 2016-02-03 2016-10-05 广州市基准医疗有限责任公司 甲基化dna检测方法
CN105861710B (zh) * 2016-05-20 2018-03-30 北京科迅生物技术有限公司 测序接头、其制备方法及其在超低频变异检测中的应用
CN107858408A (zh) * 2016-09-19 2018-03-30 深圳华大基因科技服务有限公司 一种基因组二代序列组装方法和***
CN106995845B (zh) * 2017-04-01 2020-05-05 中国科学院遗传与发育生物学研究所 利用三代测序平台(PacBio RS II)进行多倍体中基因等位变异挖掘的方法
WO2018229547A1 (en) * 2017-06-15 2018-12-20 Genome Research Limited Duplex sequencing using direct repeat molecules
EP3719182B1 (en) * 2017-11-27 2022-11-23 BGI Shenzhen Method for constructing library of cell-free dnas in body fluids and application thereof
CN108486101B (zh) * 2018-04-03 2021-01-08 艾吉泰康生物科技(北京)有限公司 一种快速捕获建库的方法
CN108588197A (zh) * 2018-06-14 2018-09-28 珠海市人民医院 一种检测痕量单链dna片段的方法
CN109295162B (zh) * 2018-10-09 2022-05-27 中国农业科学院深圳农业基因组研究所 一种一次分离多个已知序列dna片段的未知侧翼序列的方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101213311A (zh) * 2005-04-29 2008-07-02 J·克雷格·文特尔研究院 利用滚环扩增扩增和克隆单个dna分子
WO2009106308A2 (en) * 2008-02-27 2009-09-03 Roche Diagnostics Gmbh System and method for improved processing of nucleic acids for production of sequencable libraries

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090233291A1 (en) * 2005-06-06 2009-09-17 454 Life Sciences Corporation Paired end sequencing
US8383345B2 (en) * 2008-09-12 2013-02-26 University Of Washington Sequence tag directed subassembly of short sequencing reads into long sequencing reads
CN102212612A (zh) * 2011-03-23 2011-10-12 上海美吉生物医药科技有限公司 一种用于高通量454测序的双末端文库的构建方法
CN102690809B (zh) * 2011-03-24 2013-12-04 深圳华大基因科技服务有限公司 Dna标签及其在构建和测序配对末端标签文库中的应用
CN102373288B (zh) * 2011-11-30 2013-12-11 盛司潼 一种对目标区域进行测序的方法及试剂盒
EP2861769A4 (en) * 2012-06-14 2016-02-24 Hutchinson Fred Cancer Res COMPOSITIONS AND METHODS FOR SENSITIVE MUTATION RECOGNITION IN NUCLEIC ACIDS

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101213311A (zh) * 2005-04-29 2008-07-02 J·克雷格·文特尔研究院 利用滚环扩增扩增和克隆单个dna分子
WO2009106308A2 (en) * 2008-02-27 2009-09-03 Roche Diagnostics Gmbh System and method for improved processing of nucleic acids for production of sequencable libraries

Also Published As

Publication number Publication date
CN104695027A (zh) 2015-06-10
CN104695027B (zh) 2017-10-20
US10718015B2 (en) 2020-07-21
US20160362735A1 (en) 2016-12-15

Similar Documents

Publication Publication Date Title
WO2015081890A1 (zh) 测序文库及其制备和应用
US11702690B2 (en) Sequencing library, and preparation and use thereof
US20240141426A1 (en) Compositions and methods for identification of a duplicate sequencing read
JP7379418B2 (ja) 腫瘍のディープシークエンシングプロファイリング
US20210164042A1 (en) Bubble-shaped adaptor element and method of constructing sequencing library with bubble-shaped adaptor element
CN105400776B (zh) 寡核苷酸接头及其在构建核酸测序单链环状文库中的应用
CN110734908A (zh) 高通量测序文库的构建方法以及用于文库构建的试剂盒
US20120003657A1 (en) Targeted sequencing library preparation by genomic dna circularization
WO2018149091A1 (zh) 一种环状rna高通量测序文库的构建方法及其试剂盒
EP3885445B1 (en) Methods of attaching adapters to sample nucleic acids
CN109576346B (zh) 高通量测序文库的构建方法及其应用
CN108866174B (zh) 一种循环肿瘤dna低频突变的检测方法
CN107604046B (zh) 用于微量dna超低频突变检测的双分子自校验文库制备及杂交捕获的二代测序方法
WO2013104106A1 (zh) 用于构建血浆dna测序文库的方法和试剂盒
WO2021067484A1 (en) Compositions and methods for analyzing cell-free dna in methylation partitioning assays
CN111378720A (zh) 长链非编码rna的测序文库构建方法及其应用
CN105420348B (zh) 改进的测序文库及其制备和应用
CN112080555A (zh) Dna甲基化检测试剂盒及检测方法
WO2018081666A1 (en) Methods of single dna/rna molecule counting
CN210656930U (zh) 长链非编码rna的测序文库构建单元和对长链非编码rna进行测序的***
CN117904723A (zh) 一种构建测序文库的方法及其试剂盒
Belaghzal et al. HI-C 2.0: An Optimized Hi-C Procedure for High-Resolution Genome-Wide Mapping of Chromosome Conformation [preprint]

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14868422

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 15101605

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14868422

Country of ref document: EP

Kind code of ref document: A1