CN106350590B - DNA library construction method for high-throughput sequencing - Google Patents

DNA library construction method for high-throughput sequencing Download PDF

Info

Publication number
CN106350590B
CN106350590B CN201610807018.1A CN201610807018A CN106350590B CN 106350590 B CN106350590 B CN 106350590B CN 201610807018 A CN201610807018 A CN 201610807018A CN 106350590 B CN106350590 B CN 106350590B
Authority
CN
China
Prior art keywords
sequencing
primer
locus
temperature
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610807018.1A
Other languages
Chinese (zh)
Other versions
CN106350590A (en
Inventor
王旭
钟嘉泳
张弓
董鸣
余卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaintech Medical (shenzhen) Technology Co Ltd
Original Assignee
Chaintech Medical (shenzhen) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chaintech Medical (shenzhen) Technology Co Ltd filed Critical Chaintech Medical (shenzhen) Technology Co Ltd
Priority to CN201610807018.1A priority Critical patent/CN106350590B/en
Publication of CN106350590A publication Critical patent/CN106350590A/en
Application granted granted Critical
Publication of CN106350590B publication Critical patent/CN106350590B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Biochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biomedical Technology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

the invention discloses a method for constructing an amplicon sequencing library, which comprises the steps of carrying out multiple PCR on a DNA template obtained from a sample to be tested by using a plurality of pairs of fusion primers, and recovering PCR products to obtain a sequencing library; wherein the multiple pairs of fusion primers are respectively directed to different target fragments on the DNA template, and the fusion primers comprise specific primer sequences and sequencing joint sequences directed to the target fragments; wherein the multiplex PCR is carried out under the following reaction conditions: 2min at 95 ℃; 38 cycles, each cycle is 95 ℃ for 30s, then the temperature is slowly reduced from 76 ℃ to any temperature between 55 ℃ and 58 ℃, the temperature is reduced by 0.1 ℃ per second, the temperature is maintained for 20s after the temperature is reduced to 58 ℃, and then the temperature is 30s at 72 ℃; keeping at 72 deg.C for 2min and 4 deg.C. The invention also discloses application of the library construction method in high-throughput sequencing detection of STR gene loci and paternity test.

Description

DNA library construction method for high-throughput sequencing
Technical Field
The invention relates to the field of biological detection, in particular to a DNA library construction method for high-throughput sequencing.
Background
the main subjects of research and analysis of forensic DNA testing techniques are polymorphisms of DNA in organisms. In general, the location of a DNA marker for a gene or non-coding region on a chromosome is called a locus, and genes with different sequences located at the same locus are called alleles. DNA polymorphism refers to the existence of multiple alleles at a genetic marker locus, and analysis of differences in alleles at a locus is the biological basis for achieving identity [1 ].
short Tandem Repeats (STR) [2] in microsatellite DNA is a DNA polymorphism widely present in 23 pairs of chromosomes of the human genome, and generally consists of repeat units (or core sequences) of 2-6bp, and the difference between different individuals (or alleles) is generally only expressed as the difference in the number of repeats of the repeat units. STR loci are important genetic markers, and are widely used because they have a high level of polymorphism and are easily detected by PCR. STR loci have the following characteristics: the STR has a large number of polymorphic loci, the fragment length is generally less than 400bp, the amplification is easy, and the method is suitable for the detection of trace detection materials; the size of the allele is relatively close, and the preferential amplification of the smaller allele is not obvious; the STR loci of different sites have smaller segment length difference, thereby being convenient for multiplex amplification and improving the detection efficiency.
The STR locus typing detection technology has an important role in forensic DNA detection and paternity identification. Most paternity test laboratories in China use a multi-locus STR typing technology as a main means, analyze and compare different STR locus polymorphisms and calculate an paternity index (PI value, namely the ratio of true father possibility to false father possibility). The standard (PI value is more than or equal to 10000) adopted by most laboratories [3] in the world is used as a boundary line for distinguishing whether the 'existence of relationships' exists.
The traditional STR typing technology is mainly based on a PCR technology and a capillary electrophoresis technology, and multiple PCR amplification is carried out on a sample by using a fluorescence-labeled primer, amplified fragments with different sizes are generated and separated in capillary electrophoresis, so that typing is realized. The Sanger sequencing method is difficult to use for STR typing analysis due to the reasons of throughput and cost.
The second Generation high throughput Sequencing technology (Next Generation Sequencing) can perform Sequencing on hundreds of thousands to millions of DNA molecules at a time, and promotes the rapid development of the Sequencing technology. The whole genome can be sequenced by using a second-generation high-throughput sequencing technology. When a researcher is interested in only a specific genomic region, the Sequencing study can be performed on only the corresponding region using an Amplicon Sequencing (amplification Sequencing) method. The method comprises the steps of designing a primer of a genome region of interest, carrying out PCR amplification, enriching a target region, then carrying out library construction aiming at a PCR product with a specific length or a captured fragment, carrying out high-throughput sequencing and carrying out sequence analysis.
the conventional method for constructing the library is complex in steps, wastes time and large workload, and is not beneficial to sequencing and constructing a library of a large number of samples.
disclosure of Invention
the invention aims to provide a construction method of an amplicon sequencing library and application of the library construction method in high-throughput sequencing detection of STR loci.
According to one aspect of the invention, a method for constructing an amplicon sequencing library is provided, comprising performing multiplex PCR on a DNA template obtained from a sample to be tested by using a plurality of pairs of fusion primers, and recovering PCR products to obtain a sequencing library; wherein the multiple pairs of fusion primers are respectively directed to different target fragments on the DNA template, and each pair of fusion primers sequentially comprises a sequencing joint sequence and a specific primer sequence directed to the target fragment from the 5 'end to the 3' end.
In a preferred embodiment, the multiplex PCR is carried out under the following reaction conditions: 2min at 95 ℃; 38 cycles, each cycle is 95 ℃ for 30s, then the temperature is slowly reduced from 76 ℃ to any temperature between 55 ℃ and 58 ℃, the temperature is reduced by 0.1 ℃ per second, the temperature is maintained for 20s after the temperature is reduced to the target temperature, and then the temperature is reduced for 30s at 72 ℃; 72 ℃ for 2 min.
In the present invention, the "multiple pairs of fusion primers" refers to two, three, or more pairs of fusion primers.
In a more preferred embodiment, the multiplex PCR is carried out under the following reaction conditions: 2min at 95 ℃; 38 cycles, each cycle is 95 ℃ for 30s, then the temperature is slowly reduced from 76 ℃ to 58 ℃ and is reduced by 0.1 ℃ per second, the temperature is reduced to 58 ℃ and then is kept for 20s, and then the temperature is reduced to 72 ℃ for 30 s; 72 ℃ for 2 min.
in some embodiments, the sample to be tested includes, but is not limited to, blood, body fluid, saliva, semen, hair, muscle, or tissue organ. In a preferred embodiment, the DNA template is genomic DNA extracted from a sample to be tested.
In the present invention, each pair of fusion primers comprises a forward primer sequence and a reverse primer sequence specific to the target fragment.
In some embodiments, an index sequence is included in the sequencing adapter sequence to identify different DNA templates, facilitating mixed sequencing of different samples.
the sequencing linker sequence included in the fusion primer of the invention may be: 5'-CCTCTCTATGGGCAGTCGGTGAT-3' in the upstream primer and 5'-CCATCTCATCCCTGCGTGTCTCCGACTCAG-index sequence-GAT-3' in the downstream primer.
the sequencing linker sequence contained in the fusion primer of the invention may also be: 5 '-AATGATACGGCGACCACCGAGATCTACAC-the first index sequence-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' in the upstream primer, and 5'-CAAGCAGAAGACGGCATACGAGAT + the second index sequence + GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3' in the downstream primer.
wherein the first index sequence and the second index sequence are different index sequences.
in some embodiments, the multiplex PCR reaction is performed in 1 or more reaction systems.
In preferred embodiments, two, three, four, or five fusion primer pairs for different fragments of interest are included in a reaction system.
in a preferred embodiment, the fragment of interest is an STR locus.
In a preferred embodiment, the fragments of interest comprise two or more STR loci selected from the following STR loci:
CSF1PO (GenBank X14720), FGA (GenBank M64982), TH01(GenBank D00269), TPOX (GenBank M68651), D3S1358(NT _005997positions 754983-.
in a more preferred embodiment, the fragment of interest includes all of the following STR loci:
CSF1PO (GenBank X14720), FGA (GenBank M64982), TH01(GenBank D00269), TPOX (GenBank M68651), D3S1358(NT _005997positions 754983-.
In a more preferred embodiment, the fragment of interest includes all of the following STR loci:
CSF1PO (GenBank X14720), FGA (GenBank M64982), TH01(GenBank D00269), TPOX (GenBank M68651), D3S1358(NT _005997positions 754983-.
Wherein the specific forward primer sequence and reverse primer sequence for each STR locus are as follows:
CSF1 PO: forward direction 5 '-3': TAGCAGGTTGCTAACCACCC, reverse 5 '-3': TCAGACCCTGTTCTAAGTACTTC, respectively;
FGA: forward direction 5 '-3': CCCATAGGTTTTGAACTCACAG, reverse 5 '-3': GTGATTTGTCTGTAATTGCCAGC, respectively;
TH 01: forward direction 5 '-3': GGGCAAAATTCAAAGGGTATCTG, reverse 5 '-3': TGCAGGTCACAGGGAACAC, respectively;
TPOX: forward direction 5 '-3': AGGCACTTAGGGAACCCTC, reverse 5 '-3': TCCTTGTCAGCGTTTATTTGCC, respectively;
D3S 1358: forward direction 5 '-3': CAAGACCCTGTCTCATAGATAG, reverse 5 '-3': TCAACAGAGGCTTGCATGTATC, respectively;
D5S 818: forward direction 5 '-3': GTGACAAGGGTGATTTTCCTCTT, reverse 5 '-3': GTGATTCCAATCATAGCCACAG, respectively;
D7S 820: forward direction 5 '-3': GGTCAGGCTGACTATGGAG, reverse 5 '-3': TCCTCATTGACAGAATTGCACC, respectively;
D8S 1179: forward direction 5 '-3': TCTTTTTGCCCACACGGCC, reverse 5 '-3': CTGTAGATTATTTTCACTGTGGGG, respectively;
D13S 317: forward direction 5 '-3': ATTTCTTTAGTGGGCATCCGTG, reverse 5 '-3': CCTTCAACTTGGGTTGAGCC, respectively;
D16S 539: forward direction 5 '-3': CAGATCCCAAGCTCTTCCTC, reverse 5 '-3': GCATGTATCTATCATCCATCTCTG, respectively;
D18S 51: forward direction 5 '-3': CACTTCACTCTGAGTGACAAATTG, reverse 5 '-3': GTGTGGAGATGTCTTACAATAACAG, respectively;
D21S 11: forward direction 5 '-3': TCAATTCCCCAAGTGAATTGCC, reverse 5 '-3': TGTTCTCCAGAGACAGACTAATAG, respectively;
D2S 1338: forward direction 5 '-3': GTGGATTTGGAAACAGAAATGGC, reverse 5 '-3': GTGGCCCATAATCATGAGTTATTC, respectively;
CD 4: forward direction 5 '-3': AGGGGTACTTGTGTTAATTGTTGG, reverse 5 '-3': GCGTTTTCCAGTCTGAAAAAAGTG, respectively;
D12S 391: forward direction 5 '-3': GAATCAACAGGATCAATGGATGC, reverse 5 '-3': CCTCCATATCACTTGAGCTAATTC, respectively;
FABP: forward direction 5 '-3': TTGTAAGCTCCATGAGGTTAGAG, reverse 5 '-3': AGCCTCCCTAGGTCAGATAG, respectively;
PLA2a 1: forward direction 5 '-3': TAGTATCAGTTTCATAGGGTCACC, reverse 5 '-3': AGTTCGTTTCCATTGTCTGTCC, respectively;
D18S 865: forward direction 5 '-3': CAAATGTAGATCTTGGGACTTGTC, reverse 5 '-3': ATTCTCAAACATCCCCATTACCTTC, respectively;
VWA: forward direction 5 '-3': TCAGTATGTGACTTGGATTGATC, reverse 5 '-3': CAGGTTAGATAGATTAGACAGACAG are provided.
In a preferred embodiment, the multiplex PCR reaction is carried out in 1 or more reaction systems. The plurality of reaction systems includes one or more reaction systems selected from the group consisting of:
a reaction system comprising a fusion primer to the TPOX locus and a fusion primer to the D18S51 locus;
A reaction system comprising a fusion primer to the D8S1179 locus and a fusion primer to the D21S11 locus;
A reaction system comprising a fusion primer for the TPOX locus, a fusion primer for the D18S51 locus, a fusion primer for the D8S1179 locus, and a fusion primer for the D21S11 locus;
A reaction system comprising a fusion primer to the CSF1PO locus, a fusion primer to the D3S1358 locus, a fusion primer to the D13S317 locus, and a fusion primer to the D12S391 locus;
A reaction system comprising a fusion primer for the FGA locus, a fusion primer for the TH01 locus, a fusion primer for the D5S818 locus, and a fusion primer for the PLA2a1 locus;
A reaction system comprising a fusion primer for the FGA locus, a fusion primer for the TH01 locus, a fusion primer for the D5S818 locus, and a fusion primer for the VMA locus;
A reaction system comprising a fusion primer to the D7S820 locus, a fusion primer to the D16S539 locus, a fusion primer to the D2S1338 locus, a fusion primer to the CD4 locus, and a FABP primer; and
A reaction system comprising a fusion primer to the D7S820 locus, a fusion primer to the D16S539 locus, a fusion primer to the D2S1338 locus, a fusion primer to the CD4 locus and a fusion primer to the D18S865 locus.
In a more preferred embodiment, the plurality of reaction systems comprises one or more reaction systems selected from the group consisting of:
a reaction system comprising 1.5 μ M fusion primer to the TPOX locus and 3.5 μ M fusion primer to the D18S51 locus;
a reaction system comprising 1.5 μ M of the fusion primer to the D8S1179 locus and 3.5 μ M of the fusion primer to the D21S11 locus;
A reaction system comprising 1. mu.M fusion primer to the TPOX locus, 1.5. mu.M fusion primer to the D18S51 locus, 1. mu.M fusion primer to the D8S1179 locus, and 1.5. mu.M fusion primer to the D21S11 locus;
A reaction system comprising 1 μ M fusion primer to the CSF1PO locus, 2 μ M fusion primer to the D3S1358 locus, 1 μ M fusion primer to the D13S317 locus, and 1 μ M fusion primer to the D12S391 locus;
A reaction system comprising 1 μ M fusion primer to the CSF1PO locus, 1 μ M fusion primer to the D3S1358 locus, 1 μ M fusion primer to the D13S317 locus, and 2 μ M fusion primer to the D12S391 locus;
a reaction system comprising 1 μ M fusion primer to the FGA locus, 1.5 μ M fusion primer to the TH01 locus, 1.5 μ M fusion primer to the D5S818 locus, and 1 μ M fusion primer to the PLA2A1 locus;
A reaction system comprising 1.5 μ M fusion primer to the FGA locus, 1 μ M fusion primer to the TH01 locus, 1.5 μ M fusion primer to the D5S818 locus, and 1 μ M fusion primer to the VMA locus;
A reaction system comprising 1. mu.M fusion primer to the D7S820 locus, 1. mu.M fusion primer to the D16S539 locus, 1. mu.M fusion primer to the D2S1338 locus, 1. mu.M fusion primer to the CD4 locus, and 1. mu.M FABP primer; and
Reaction system comprising 1. mu.M fusion primer to the D7S820 locus, 1. mu.M fusion primer to the D16S539 locus, 1. mu.M fusion primer to the D2S1338 locus, 1. mu.M fusion primer to the CD4 locus and 1. mu.M fusion primer to the D18S865 locus.
Other STR loci can also be selected as appropriate. Alternative STR loci are selected and are shown in website: http:// www.cstl.nist.gov/biotch/strbase/seq _ info. htm, which is an authoritative website for forensic science that provides STR loci involved in forensic genetic identification. Reference sequence information for these STR loci is available from the website http:// www.cstl.nist.gov/biotech/strbase/seq _ ref.
The method for constructing the amplicon sequencing library is suitable for high-throughput sequencing of any amplicon, for example, the method can be used for detecting STR loci, SNP, SNV, a certain range of insertion deletion mutations and the like through high-throughput sequencing, but is not limited to the above.
Another aspect of the invention provides a method of high throughput sequencing of amplicons, comprising: and preparing a sequencing library by using the amplicon sequencing library construction method, then carrying out high-throughput sequencing and carrying out data analysis on sequencing data.
Target fragments for high throughput sequencing of amplicons include, but are not limited to, fragments comprising STR loci or comprising SNP sites. In a preferred embodiment, the target fragments for amplicon high-throughput sequencing are multiple STR loci.
Another aspect of the present invention provides a method for detecting an STR locus, comprising: the amplicon sequencing library construction method is utilized to prepare a sequencing library aiming at a plurality of STR loci, and then high-throughput sequencing is carried out and data analysis is carried out on sequencing data.
In a preferred embodiment, the data analysis comprises obtaining the number of repeats of the core repeat sequence of the plurality of STR loci.
In some embodiments, the data analysis further comprises analyzing the shadow bands resulting from strand slip mismatches.
in some embodiments, the data analysis further comprises obtaining mutation information in the plurality of STR loci.
In some embodiments, the methods of high throughput sequencing of amplicons and methods of detecting STR loci of the present invention are of non-diagnostic interest.
The method for detecting the STR locus can be used for forensic medicine and other related fields related to the STR locus detection, such as the fields of gene diagnosis, genetic map construction, population genetics research and the like. In a preferred embodiment, the methods of the present invention for detecting STR loci are used for paternity testing.
the invention also provides an paternity test method, which comprises the following steps: the number of repeats in the genome is detected by the detection method of the STR locus described above, and then the paternity index (Pi value) is calculated.
The invention has the beneficial effects that:
(1) The whole library building process can be completed by only one-time multiplex PCR, so that the time and the cost for sequencing and building the library of the amplicon are greatly reduced. When the method is applied to STR locus analysis and paternity relationship identification, the process can be simplified and the cost can be reduced.
(2) The amplicon database construction method of the invention optimizes the multiplex PCR reaction conditions, is suitable for high-throughput sequencing of various amplicons, has no selectivity to primers, and can be used for once amplification database construction by using different primers.
Drawings
FIG. 1 is an agarose gel electrophoresis of the products of the multiplex PCR reaction of example 1.
FIG. 2 is a library quality control chart in example 1.
FIG. 3 is a plot of the distribution of different numbers of repeats in the sequencing data for each STR in example 1.
FIG. 4 is a nucleic acid electrophoresis image of a polyacrylamide PAGE gel of the products of the multiplex PCR reaction in example 2.
FIG. 5 is a library quality control chart in example 2.
Detailed Description
In order to clearly understand the technical contents of the present invention, the following embodiments are described in detail with reference to the accompanying drawings. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. The experimental procedures, in which specific conditions are not noted in the following examples, are generally carried out under conventional conditions or conditions recommended by the manufacturers. The various chemicals and biologicals used in the examples were all commercially available products.
example 1: forensic genetic relationship identification
this example aims to detect the number of repeats of a repeated sequence in an STR, using an NGS sequencing protocol.
First, oral epithelial cell collection and treatment
(1) Human oral epithelial cells were collected using an oral swab collection tube.
(2) Oral epithelial cell DNA was extracted using an oral swab genome extraction kit (DP 322).
(3) OD260nm and OD280nm were measured using a Nanodrop 2000 type spectrophotometer, and the purity was confirmed to be high, and the concentration was measured.
Second, design of multiple PCR reaction primer
(1) The primer is designed into a fusion primer, and comprises a specific primer sequence of the target segment and a sequencing joint sequence, wherein the sequence contains an index sequence.
(2) The fusion primer structure is as follows:
An upstream primer: 5'-CCTCTCTATGGGCAGTCGGTGAT-3' + target fragment-specific forward primer;
downstream primer 5'-CCATCTCATCCCTGCGTGTCTCCGACTCAG CTAAGGTAAC GAT-3' + target fragment specific reverse primer;
Wherein the underlined sequence is an index sequence.
(3) The STR loci to be amplified and sequenced and the corresponding specific primers are shown in table 1.
TABLE 1STR loci corresponding specific primers
STR loci Forward primer 5'-3' reverse primer of 5'-3'
CSF1PO TAGCAGGTTGCTAACCACCC TCAGACCCTGTTCTAAGTACTTC
FGA CCCATAGGTTTTGAACTCACAG GTGATTTGTCTGTAATTGCCAGC
TH01 GGGCAAAATTCAAAGGGTATCTG TGCAGGTCACAGGGAACAC
TPOX AGGCACTTAGGGAACCCTC TCCTTGTCAGCGTTTATTTGCC
D3S1358 CAAGACCCTGTCTCATAGATAG TCAACAGAGGCTTGCATGTATC
D5S818 GTGACAAGGGTGATTTTCCTCTT GTGATTCCAATCATAGCCACAG
D7S820 GGTCAGGCTGACTATGGAG TCCTCATTGACAGAATTGCACC
D8S1179 TCTTTTTGCCCACACGGCC CTGTAGATTATTTTCACTGTGGGG
D13S317 ATTTCTTTAGTGGGCATCCGTG CCTTCAACTTGGGTTGAGCC
D16S539 CAGATCCCAAGCTCTTCCTC GCATGTATCTATCATCCATCTCTG
D18S51 CACTTCACTCTGAGTGACAAATTG GTGTGGAGATGTCTTACAATAACAG
D21S11 TCAATTCCCCAAGTGAATTGCC TGTTCTCCAGAGACAGACTAATAG
D2S1338 GTGGATTTGGAAACAGAAATGGC GTGGCCCATAATCATGAGTTATTC
CD4 AGGGGTACTTGTGTTAATTGTTGG GCGTTTTCCAGTCTGAAAAAAGTG
D12S391 GAATCAACAGGATCAATGGATGC CCTCCATATCACTTGAGCTAATTC
FABP TTGTAAGCTCCATGAGGTTAGAG AGCCTCCCTAGGTCAGATAG
PLA2A1 TAGTATCAGTTTCATAGGGTCACC AGTTCGTTTCCATTGTCTGTCC
Third, multiple PCR reaction library construction and nucleic acid agarose gel electrophoresis
The multiplex PCR reaction was carried out in five groups, and the grouping and the concentration of each primer are shown in Table 2.
TABLE 2 multiplex PCR primer grouping
the reagent used in multiplex PCR was DreamTaq Green PCR Master Mix 2X (Thermo Fisher, cat # K1081), and the reaction system is shown in Table 3:
TABLE 3 multiplex PCR reaction System
The multiple PCR reaction was performed by slow cooling method, and the reaction conditions are shown in Table 4.
TABLE 4 multiplex PCR reaction conditions
And (3) carrying out agarose gel electrophoresis on the PCR reaction product, wherein the conditions of the electrophoresis are as follows: the voltage is 120V, the current is 400mA, the time is 30min, the glue concentration is 1.5%, and the marker strip standard sequentially comprises the following components from bottom to top: 100bp, 200bp, 300bp, 400bp, 500bp, 700bp, 1kb, 1600bp, 2kb, 5kb, 8kb and 10kb, and the results are shown in FIG. 1, 5 groups of multiplex PCR reactions can amplify target bands, and the positions of the target bands are in the expected range.
The premixed five sets of multiplex PCR products were purified using the SanPrep column PCR product purification kit (Shanghai Biotech, cat. No. B518141) and purified according to the manufacturer's instructions to obtain a DNA library for sequencing. The Nanodrop spectrophotometer determined that the purified DNA library had a concentration of 128.73 ng/. mu.l, and a 260/280 value of 1.944.
fourth, library quality inspection
The quality detection of the library is completed by Agilent 2100, the quality detection result is shown in figure 2, the graph shows that the multiplex PCR amplification effect is uniform, the preference of great amplification difference is avoided, and the high-throughput sequencing can be directly carried out.
fifth, Large Scale sequencing
(1) And performing high-throughput sequencing on the prepared DNA sequencing library by using an Ion torrent platform SE400, outputting a result as a FASTQ format file after the sequencing is finished, measuring the sequencing data volume to 67M, and measuring the number of reads matched with the STR locus to be 490536.
Sixthly, processing sequencing data
(1) and (3) intercepting the front and back 20-30 bases of each read to locate the STR locus to which each read belongs, and discarding the read if the read cannot locate the target STR locus. The number of reads matched for each STR is shown in table 5.
TABLE 5 number of reads mapped to each STR locus
(2) As the primer strand or the template strand slips during the primer extension process in the multiplex PCR reaction, a base unpaired loop formed by one repeating unit is caused, namely, the strand slips through a mismatch mechanism, and finally a shadow band (Stutter) is generated. In order to facilitate analysis of influence of shadow bands, the type of the number of times of repetition of the core repeating unit obtained by analysis on reads corresponding to each STR in the sequencing data is finally used as an abscissa, and the percentage of the number of reads corresponding to each type is used as an ordinate to make a histogram, as shown in fig. 3 (a-q).
(4) the number of times each STR locus repeat unit in the test sample is shown in table 6:
TABLE 6 number of repeat units per STR locus
STR loci Number of repetitions
CSF1PO 10,11
FGA 21,23
TH01 7,9
TPOX 8
D3S1358 16
D5S818 11
D7S820 10
D8S1179 13,14
D13S317 8,10
D16S539 9,10
D18S51 18
D21S11 31,32
D2S1338 19,23
CD4 5
D12S391 18
FABP 10
PLA2A1 11,14
Example 2: detecting to which cell line the unknown cell line belongs
This example uses 2 unknown cell lines and confirms to which cell line the unknown cell line belongs by detecting the number of repeats of the repeat sequence in the STR. Wherein NGS sequencing protocol is used.
Extraction of human cell line DNA
(1) human cell line DNA was extracted using the peripheral blood DNA kit (Qiagen, cat # 937236).
(2) OD260nm and OD280nm were measured by a Nanodrop 2000 type spectrophotometer, and the purity was confirmed to be high, and the final concentration was measured to be 51 ng/. mu.l.
second, multiple PCR reaction library establishment and polyacrylamide PAGE gel nucleic acid electrophoresis
(1) the primer is designed into a fusion primer, and comprises a specific primer sequence of the target segment and a sequencing joint sequence, wherein the sequence contains an index sequence.
(2) The fusion primer structure is as follows:
Sample one:
an upstream primer: 5'-AATGATACGGCGACCACCGAGATCTACAC-3' + index sequence 1(CTCTCTAT) + ACACTCTTTCCCTACACGACGCTCTTCCGATCT + target fragment-specific forward primer;
A downstream primer: 5'-CAAGCAGAAGACGGCATACGAGAT-3' + index sequence 2(TCGCCTTA) + GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT + fragment-specific reverse primer of interest;
Sample two:
An upstream primer: 5'-AATGATACGGCGACCACCGAGATCTACAC-3' + index sequence 1(TATCCTCT) + ACACTCTTTCCCTACACGACGCTCTTCCGATCT + target fragment specific forward primer;
A downstream primer: 5'-CAAGCAGAAGACGGCATACGAGAT-3' + index sequence 2(CTAGTACG) + GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT + target fragment-specific reverse primer;
(3) The STR loci to be amplified and sequenced and the corresponding specific primers are shown in table 7.
TABLE 7STR loci corresponding specific primers
third, multiple PCR reaction library establishment and polyacrylamide PAGE gel nucleic acid electrophoresis
(1) multiplex PCR reactions were carried out using DreamTaq Green PCR Master Mix (2X) reagent (Saimer Feishale, cat # K1081), and the DNA library construction of the target STR was carried out in 4-panel multiplex PCR reactions, the panels and the respective primer concentrations being shown in Table 8, under the same conditions as in example 1.
TABLE 8 multiplex PCR primer grouping
The multiplex PCR reaction system and reaction conditions were the same as in example 1.
(2) Mixing the products of multiple PCR reactions of two samples together, and performing polyacrylamide PAGE gel nucleic acid electrophoresis, wherein the gel concentration is 8%, the gel running time is 4 hours, the current is 16mA, and the time is 20 minutes; 30mA, 3 hours and 40 minutes. As shown in FIG. 4, the target band was amplified in all of the 4 sets of multiplex PCR reactions, and the positions of the target bands were within the expected range.
Fourthly, purifying PCR products (STR related DNA libraries)
The PCR product mixture in which the products of the various PCR reactions were mixed together for both samples was purified according to the instructions of the SanPrep column PCR product purification kit (Shanghai Biotech, cat. No. B518141) instructions, and the concentration of the purified DNA library was 96.44 ng/. mu.l as measured by a Nanodrop spectrophotometer, and the value of 260/280 was 1.826.
Fifth, quality inspection of library
(1) The quality detection of the library is completed by Agilent 2100, the quality detection result is shown in figure 5, the graph shows that the multiplex PCR amplification effect is uniform, the preference of great amplification difference is avoided, and the high-throughput sequencing can be directly carried out.
sixth, Large Scale sequencing
(1) and (3) carrying out high-throughput sequencing on the purified target DNA library by using an Illumina Miseq platform PE250, and outputting a result as a FASTQ format file after the sequencing is finished, wherein the sequencing data volume is 63.6M.
(2) The number of reads that were measured to match the STR locus was: 686347.
Seventh, sequencing data analysis
(1) Selecting a read with one end capable of covering the STR core repetitive sequence in double-end sequencing as an analysis target, discarding the read at the other end, intercepting the front and back 20-30 bases of each target read so as to locate the STR locus to which each read belongs, and discarding the read if the read cannot be located on the target STR locus. The number of reads matched for each STR is shown in table 9:
TABLE 9 number of reads mapped to each STR locus
STR loci Number of reads located
CSF1PO 9646
FGA 58728
TH01 55099
TPOX 39217
D3S1358 15413
D5S818 62128
D7S820 108465
D8S1179 57787
D13S317 76639
D16S539 45218
D18S51 13047
D21S11 28294
D2S1338 15417
CD4 22561
D12S391 36707
D18S865 12472
VWA 29509
(2) Regarding the effect of Stutter bands generated during PCR reaction, the number of times of repeat of core repeat unit of STR allele of cell line locus was analyzed according to the method in example 1, and finally analyzed and compared with the STR allele typing results (analyzed target STR: CSF1PO, D13S317, D16S539, D5S818, D7S820, TH01, TPOX, VWA) corresponding to cell lines in ATCC cell bank, and the results are shown in tables 10 and 11.
TABLE 10 number of repeat units per STR locus for an unknown cell line
STR loci Number of repetitions
CSF1PO 10
D13S317 11,12
D16S539 9,13
D5S818 13
D7S820 10
TH01 8
TPOX 8,11
VWA 16,17
TABLE 11 number of repeat units per STR locus for another unknown cell line
Therefore, the results of allele typing corresponding to the STR locus can be used to determine that the two unknown cell lines belong to the NCI-H292 cell line and the A549 cell line respectively.

Claims (9)

1. A method for constructing an amplicon sequencing library comprises the steps of carrying out multiple PCR on a DNA template obtained from a sample to be tested by using a plurality of pairs of fusion primers, and recovering PCR products to obtain a sequencing library;
wherein the multiple pairs of fusion primers respectively aim at different target fragments on the DNA template, and each pair of fusion primers sequentially comprises a sequencing joint sequence and a specific primer sequence aiming at the target fragment from the 5 'end to the 3' end;
Wherein the multiplex PCR is carried out under the following reaction conditions: 2min at 95 ℃; 38 cycles, each cycle is 95 ℃ for 30s, then the temperature is slowly reduced from 76 ℃ to any temperature between 55 ℃ and 58 ℃, the temperature is reduced by 0.1 ℃ per second, the temperature is maintained for 20s after the temperature is reduced to the target temperature, and then the temperature is reduced for 30s at 72 ℃; 2min at 72 ℃;
Wherein, the sequence of the sequencing joint contained in the fusion primer is as follows: 5'-CCTCTCTATGGGCAGTCGGTGAT-3' in the upstream primer and 5'-CCATCTCATCCCTGCGTGTCTCCGACTCAG-index sequence-GAT-3' in the downstream primer; or
The sequence of the sequencing joint contained in the fusion primer is as follows: 5 '-AATGATACGGCGACCACCGAGATCTACAC-the first index sequence-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' in the upstream primer, and 5'-CAAGCAGAAGACGGCATACGAGAT + the second index sequence + GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3' in the downstream primer;
wherein the first index sequence and the second index sequence are different index sequences;
wherein, the index sequence is used for identifying different DNA templates, so that different samples can be conveniently subjected to mixed sequencing.
2. The method of claim 1, wherein the DNA template is genomic DNA extracted from a sample to be tested.
3. the method of claim 1, wherein the multiplex PCR reaction is performed in 1 or more reaction systems.
4. a method for high throughput sequencing of amplicons of non-diagnostic interest, comprising: a sequencing library prepared by the method of claim 1 or 2 followed by high throughput sequencing.
5. The sequencing method of claim 4, wherein the target fragments for amplicon high throughput sequencing comprise the following STR loci:
CSF1PO, FGA, TH01, TPOX, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, D2S1338, CD4, D12S391, PLA2A1 and FABP.
6. a method for detecting an STR locus of non-diagnostic interest, comprising: preparing a sequencing library for a plurality of STR loci using the amplicon sequencing library construction method of any one of claims 1 to 3, followed by high throughput sequencing and data analysis of the sequencing data.
7. The detection method of claim 6, wherein the data analysis comprises obtaining a number of repetitions of the plurality of STR locus core repeat sequences.
8. The detection method of claim 7, wherein the data analysis further comprises analyzing shadow bands resulting from strand slip mismatches.
9. a method of paternity testing comprising: determining the number of repeats of an STR locus in a genome using the method of claim 7 or 8, and then calculating an paternity index (Pi value).
CN201610807018.1A 2016-09-06 2016-09-06 DNA library construction method for high-throughput sequencing Active CN106350590B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610807018.1A CN106350590B (en) 2016-09-06 2016-09-06 DNA library construction method for high-throughput sequencing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610807018.1A CN106350590B (en) 2016-09-06 2016-09-06 DNA library construction method for high-throughput sequencing

Publications (2)

Publication Number Publication Date
CN106350590A CN106350590A (en) 2017-01-25
CN106350590B true CN106350590B (en) 2019-12-10

Family

ID=57859817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610807018.1A Active CN106350590B (en) 2016-09-06 2016-09-06 DNA library construction method for high-throughput sequencing

Country Status (1)

Country Link
CN (1) CN106350590B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107012139A (en) * 2017-04-05 2017-08-04 北京泛生子医学检验实验室有限公司 A kind of method that rapid build expands sublibrary
CN106906210A (en) * 2017-04-05 2017-06-30 北京泛生子医学检验实验室有限公司 A kind of fusion primer combination of rapid build amplification sublibrary
CN107937986A (en) * 2017-11-10 2018-04-20 深圳裕策生物科技有限公司 A kind of FFPE DNA build storehouse kit, its purposes and banking process
CN108949942B (en) * 2018-07-17 2020-08-11 浙江大学 Mitochondrial whole genome sequencing method based on high-throughput sequencing
CN109797437A (en) * 2019-01-18 2019-05-24 北京爱普益生物科技有限公司 A kind of construction method of sequencing library when detecting multiple samples and its application
CN110499372B (en) * 2019-09-18 2020-05-12 山西医科大学 Multiple PCR (polymerase chain reaction) targeted capture typing system and kit based on high-throughput sequencing technology
CN110499373B (en) * 2019-09-18 2020-06-05 山西医科大学 High-throughput STR typing system and kit for identifying complex genetic relationship
CN110734982B (en) * 2019-09-18 2020-08-07 山西医科大学 High-throughput sequencing technology-based linkage autosomal STR typing system and kit

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101397584A (en) * 2007-09-25 2009-04-01 阿普里拉股份有限公司 Composite STR detection method with improved resolving ability in Chinese crowd and kit
CN104059980A (en) * 2014-06-30 2014-09-24 无锡中德美联生物技术有限公司 Human X-chromosomal DNA 19-gene-locus multiplex amplification reagent kit and application
CN104263726A (en) * 2014-09-25 2015-01-07 天津诺禾致源生物信息科技有限公司 Primer applied to amplicon sequencing library construction and method for constructing amplicon sequencing library
CN104293783A (en) * 2014-09-30 2015-01-21 天津诺禾致源生物信息科技有限公司 Primer applicable to amplicon sequencing library construction, construction method, amplicon library and kit comprising amplicon library

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101397584A (en) * 2007-09-25 2009-04-01 阿普里拉股份有限公司 Composite STR detection method with improved resolving ability in Chinese crowd and kit
CN104059980A (en) * 2014-06-30 2014-09-24 无锡中德美联生物技术有限公司 Human X-chromosomal DNA 19-gene-locus multiplex amplification reagent kit and application
CN104263726A (en) * 2014-09-25 2015-01-07 天津诺禾致源生物信息科技有限公司 Primer applied to amplicon sequencing library construction and method for constructing amplicon sequencing library
CN104293783A (en) * 2014-09-30 2015-01-21 天津诺禾致源生物信息科技有限公司 Primer applicable to amplicon sequencing library construction, construction method, amplicon library and kit comprising amplicon library

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
STR滑脱模型的构建及其影响因素;邓建强等;《法医学杂志》;20060330;39-42页 *

Also Published As

Publication number Publication date
CN106350590A (en) 2017-01-25

Similar Documents

Publication Publication Date Title
CN106350590B (en) DNA library construction method for high-throughput sequencing
CN105779636B (en) PCR primer for amplifying human breast cancer susceptibility gene BRCA1 and BRCA2 coding sequence and application
Fordyce et al. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform
CN106591441B (en) Alpha and/or beta-thalassemia mutation detection probe, method and chip based on whole gene capture sequencing and application
CN108517363B (en) Individual recognition system based on second-generation sequencing, kit and application thereof
KR101533792B1 (en) Method for Autosomal Analysing Human Subject of Analytes based on a Next Generation Sequencing Technology
CN107012225A (en) A kind of detection kit and detection method of the str locus seat based on high-flux sequence
CN110863056A (en) Method, reagent and application for accurately typing human DNA
CN106399496B (en) Library building kit for high-throughput detection of STR genetic markers
WO2018147438A1 (en) Pcr primer set for hla gene, and sequencing method using same
CN104131072A (en) Method and system for individual recognition and paternity identification of unknown sample
CN113088562A (en) Novel low-initial-quantity DNA methylation library building method
CN108060237B (en) Forensic medicine composite detection kit based on 55Y chromosome SNP genetic markers
CN110923325B (en) Primer Blocker group, kit and method for detecting EGFR gene mutation
WO2015196752A1 (en) A method and a kit for quickly constructing a plasma dna sequencing library
CN112592981B (en) Primer group, kit and method for DNA archive construction
CN107988385B (en) Method for detecting marker of PLAG1 gene Indel of beef cattle and special kit thereof
CN109280696B (en) Method for splitting mixed sample by SNP detection technology
CN105316320B (en) DNA label, PCR primer and application thereof
CN110628920A (en) Fluorescence labeling multiplex amplification kit for 35 STR loci of human Y chromosome and application thereof
CN105296471B (en) DNA label, PCR primer and application thereof
CN107267600B (en) Primers, method and kit for enriching BRCA1 and BRCA2 gene target regions and application of primers, method and kit
CN104726604A (en) Decayed-sample degradation DNA (deoxyribonucleic acid) detection method and application thereof
CN107904297B (en) Primer group, joint group and sequencing method for microbial diversity research
CN111172159A (en) Bovine mitochondrial genome capture probe kit

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant