CN112063690A - Construction method and application of single-molecule probe multi-target capture library - Google Patents

Construction method and application of single-molecule probe multi-target capture library Download PDF

Info

Publication number
CN112063690A
CN112063690A CN202010983954.4A CN202010983954A CN112063690A CN 112063690 A CN112063690 A CN 112063690A CN 202010983954 A CN202010983954 A CN 202010983954A CN 112063690 A CN112063690 A CN 112063690A
Authority
CN
China
Prior art keywords
sequence
target
probe
region
molecule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010983954.4A
Other languages
Chinese (zh)
Inventor
段小红
张腾龙
杨春燕
杨文娟
李旋
王东亮
周启明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qiuzhen Medical Laboratory Co Ltd
Original Assignee
Beijing Qiuzhen Medical Laboratory Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qiuzhen Medical Laboratory Co Ltd filed Critical Beijing Qiuzhen Medical Laboratory Co Ltd
Priority to CN202010983954.4A priority Critical patent/CN112063690A/en
Publication of CN112063690A publication Critical patent/CN112063690A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Abstract

The invention relates to the technical field of molecular detection, in particular to a construction method and application of a single molecular probe multi-target capture library; the method comprises the following steps: designing and synthesizing a molecular inversion probe according to the sequence of the target region; aiming at the captured target region, the target sequence complementary region probe A and the target sequence complementary region probe B can be specifically combined with the target region template in a complementary way to form a ring structure containing one nucleotide to hundreds of nucleotide gaps; then adding 4 kinds of free nucleotides, filling the gap by taking the target sequence as a template under the action of DNA polymerase, and catalyzing 3', 5' -phosphodiester bonds at two ends of a probe area by using DNA ligase to form a complete circularized probe; and directly generating a sequencing library of the target region by using PCR amplification, wherein the PCR amplification product is the sequencing library. The invention is optimized and improved on the basis of the traditional MIP technology and is used for detecting low-frequency variation and methylation related to tumors.

Description

Construction method and application of single-molecule probe multi-target capture library
Technical Field
The invention relates to the technical field of molecular detection, in particular to a construction method and application of a single molecular probe multi-target capture library.
Background
Cancer (Cancer), also known as Malignant Tumor (Malignant Tumor), is a complex group of diseases that are caused by abnormal mechanisms controlling cell growth and proliferation, cause disturbance of cell metabolism, proliferate indefinitely and invade surrounding normal tissues. According to the latest national cancer data of the national cancer center, about 392.9 million people with malignant tumor onset and 233.8 million people with death in 2015. The research finds that: tumors are mainly caused by genetic material damage, mutation and structural variation of chromosomes due to environmental or self reasons. Meanwhile, in the process of tumorigenesis, epigenetic modifications such as DNA methylation and histone modification are changed, and the modifications are reversible, so that a new way is provided for tumor diagnosis and treatment. Gene variation and DNA methylation, and has potential application as a biomolecule marker for tumor diagnosis and prognosis. Therefore, the search for the tumor-related driving gene and mutation site, the detection of DNA methylation and other epigenetic modifications have great significance for the research of the occurrence and development of tumors and the diagnosis and treatment of tumors.
With the advent of high throughput Sequencing (Next Generation Sequencing), there has been a rapid development in cancer research. Researchers find hundreds of driving genes and mutation hot spots related to the occurrence and development of cancers through methods such as whole genome sequencing, transcriptome sequencing, single cell sequencing and the like and complete detailed annotation, thereby providing a solid theoretical basis for accurate detection, scientific medication and personalized treatment. Nevertheless, the detection of the methylation degree of promoter region CpG region and the like has great limitation to the low frequency variation of genomic DNA. The detection method of low-frequency variation mainly comprises whole genome sequencing, whole exon sequencing, target capture deep sequencing and the like, but is difficult to be applied to universal detection due to high price. Similarly, the existing methods for detecting DNA methylation of tumor samples, such as Whole genome methylation sequencing (WGBS) and PCR amplification sequencing after sulfite treatment, have the disadvantages of high sample investment, complicated preparation, high cost, and the like, and the sensitivity and specificity are to be improved. Therefore, there is a need to develop a convenient, economical and highly sensitive and specific method for detecting low-frequency tumor-associated mutations and methylation.
The Molecular Inversion Probe (MIP) technology is a newly developed Molecular biological technology for capturing a target sequence, captures a known specific target genome sequence by designing a specific Probe, enriches a target region and then performs high-throughput sequencing, has the characteristics of strong specificity, good repeatability, simplicity in operation, low cost, low requirement on DNA integrity and the like, avoids the problems of high whole genome research cost, difficult analysis and the like, and makes up for the defects of the targeted capture technology (capture efficiency is limited, about 50% -60%).
The common hybrid capture technology only comprises probes of 60-120 nt, the capture area is generally 60-240 bp, and after the capture area exceeds 240bp, the capture efficiency is very low; thus, the initial amount of DNA and DNA library required is also high, typically requiring a 500ng input of DNA library; different probes have different capture efficiency, so that the captured products have poor uniformity; in addition, the oligonucleotide probe has poor specificity, can be hybridized with a part of non-target regions of a target fragment, and is easy to cause data waste and increase of sequencing cost; the multi-PCR amplification sub-target region enrichment technology can simultaneously amplify a plurality of target fragments in a reaction system, but the more amplicons are, the higher the heterogeneity is, and meanwhile, the technology has stronger GC preference; for a targeting region with nonuniform GC, the enrichment effect is not uniform; the traditional MIP technology has the difficulties of uneven allele capture and poor detection effect of low-frequency variation. Therefore, a construction method and application of the single-molecule probe multi-target capture library are provided.
Disclosure of Invention
The invention aims to provide a construction method and application of a single-molecule probe multiple-target capture library, wherein a single-molecule marker sequence is added between a specific target sequence probe and a universal primer, so that more molecular combination markers can be effectively generated by using a shorter random sequence, the false positive ratio is reduced, the detection rate of low-frequency variation is improved, and the noise reduction effect is achieved; in addition, the double-end probe design of the method can capture a larger target area, and also can design probes at two ends of the target area under the condition of high GC (gas chromatography) in the target area, so that the central area is avoided, and the difficulty that the probes cannot be designed in the high GC area is effectively overcome; meanwhile, the method combines the inverted probe and the target region sequence into a ring, directly forms a sequencable library through PCR amplification, adopts the concept of 'capturing first and then building the library', improves the detection accuracy, avoids DNA breaking steps, simplifies the process and reduces the DNA loss.
In order to achieve the purpose, the technical scheme of the invention is as follows:
the construction method of the single molecular probe multiple target capture library comprises the following steps:
(1) designing and synthesizing molecular inversion probes according to the target region sequence, wherein each MIP in the molecular inversion probes sequentially comprises the following components: a probe A in a target sequence complementary region, a monomolecular marker sequence a, a universal primer recognition sequence 1, a connecting sequence, a universal primer recognition sequence 2, a monomolecular marker sequence B and a probe B in a target sequence complementary region;
(2) aiming at the captured target region, the target sequence complementary region probe A and the target sequence complementary region probe B can be specifically combined with the target region template in a complementary way to form a ring structure containing one nucleotide to hundreds of nucleotide gaps;
(3) then adding 4 kinds of free nucleotides, wherein the 4 kinds of free nucleotides are dATP, dTTP, dGTP and dCTP, filling gaps by taking a target sequence as a template under the action of DNA polymerase, and catalyzing 3', 5' -phosphodiester bonds at two ends of a probe area by using DNA ligase to form a complete circularized probe; degrading unreacted probes and linear genome DNA sequences in a reaction system by using exonuclease, and only reserving ring-shaped probes; directly generating a sequencing library of the target region by using PCR amplification, wherein the PCR amplification product is the sequencing library; wherein the forward primer for PCR amplification comprises an Illumina sequencing joint and a forward universal primer, and the reverse primer for PCR amplification comprises an Illumina sequencing joint and a reverse universal primer.
Specifically, the target sequence complementary region probes A and B in each MIP are specific sequences aiming at a target region and are complementary with a template target sequence region; the distance between the probe A in the target sequence complementary region and the probe B in the target sequence complementary region is 1-500 bp; the lengths of the probe A in the target sequence complementary region and the probe B in the target sequence complementary region are 14-35 base pairs respectively; the sequences of the single molecule marker sequence a and the single molecule marker sequence b in each MIP are different; the lengths of the single-molecule marker sequence a and the single-molecule marker sequence b are respectively 4-12 base pairs.
Preferably, the distance between the probe A in the target sequence complementary region and the probe B in the target sequence complementary region is 90-200 bp, wherein the melting temperature is 55-65 ℃, and the GC content is 35-75%.
Specifically, the universal primer recognition sequence 1 and the universal primer recognition sequence 2 are not complementary with any sequence of a target region, the lengths of the universal primer recognition sequence 1 and the universal primer recognition sequence 2 are 14-30 base pairs respectively, and the universal primer recognition sequence 1 and the universal primer recognition sequence 2 are partially or completely complementary with a forward primer and a reverse primer amplified by PCR respectively.
Preferably, the universal primer recognition sequence 1 comprises the nucleotide sequence of 5'-CTTCAGCTTCCCGATTACGG-3' (SEQ ID NO.1), and the universal primer recognition sequence 2 comprises the nucleotide sequence of 5'-GCACGATCCGACGGTAGTGT-3' (SEQ ID NO. 2).
Specifically, the connecting sequence is an artificially synthesized random sequence or a sequence taken from a non-target sequence and used for connecting a probe A in a target sequence complementary region, a single molecular marker sequence a, a universal primer recognition sequence 1 and a universal primer recognition sequence 2, a single molecular marker sequence B and a probe B in a target sequence complementary region to form a complete MIP; the length of the connecting sequence is 0-20 base pairs.
Specifically, each MIP comprises the nucleotide sequence of (N1-8) CTTCAGCTTCCCGATTACGG (I0-20) GCACGATCCGACGGTAGTGT (N8-16) (SEQ ID NO.3), wherein (N1-8) is a single molecular marker sequence a, (I0-20) is a connecting sequence, and (N8-16) is a single molecular marker sequence b.
Specifically, the sequence of the forward primer amplified by PCR comprises the nucleotide sequence of 5 '-AATGATATACGGCCACACCGAGATCCACXXXXXACACTTTCCCTTACACGCTCCGATCTCGTAATCGTAATCGGGAAGCTGAAG-3' (SEQ ID NO.4), and XXXXXXXX in the forward primer amplified by PCR is the sample barcode sequence 1; the sequence of the reverse primer amplified by PCR comprises the nucleotide sequence of 5 '-CAAGCAGAGACGGCATACGAGAATXXXXXXTGGACTGGAGGTTCAGACGTGCTTCCGATCTGATCCGACGGTAGTGT-3' (SEQ ID NO.5), and XXXXXXXXX in the reverse primer amplified by PCR is the sample barcode sequence 2; the sample barcode sequence 1 and the sample barcode sequence 2 differ in nucleotide sequence.
Specifically, the target region template is DNA or RNA.
Further, the DNA ligase related to the present invention includes various DNA ligases such as T4 DNA ligase, e.
Further, the DNA polymerases of the present invention include various DNA polymerases, such as high fidelity DNA polymerase Pfu DNA polymerase, Phusion DNA polymerase, Q5 DN; polymerase A, and the like.
The application of the single-molecular probe multi-target capture library is to construct the single-molecular probe multi-target capture library by adopting the method, and is suitable for the detection of low-frequency variation and CpG island methylation related to tumors.
Furthermore, the source of the detection sample can be body fluid, tissue, whole blood and formalin paraffin-embedded sample of a tumor patient; wherein tumor patients include individuals with any solid tumor; any solid tumor includes lung cancer solid tumor, breast cancer, gynecological tumor, digestive system tumor, urinary system tumor, nervous system tumor, head and neck tumor, etc.
The invention has the beneficial effects that:
(1) the invention provides a method for constructing a single-molecule probe multi-target capture library, wherein a capture region has elasticity and can be wide or narrow: the length of the connecting sequence is 0-20 base pairs, the lengths of the single-molecule marking sequence a and the single-molecule marking sequence b are 4-12 base pairs respectively, so that the toughness and the spanning performance of the whole MIP are improved, a target region of 1-500 bp can be captured, and thousands of different target regions can be captured directionally in one reaction;
(2) the requirement on the integrity of the DNA sample is not strict, and the binding length of the probes at two ends and the target region is only about 35bp at most, so that the method is suitable for partially degraded DNA samples, particularly formalin paraffin embedded samples;
(3) the invention has high specificity, because complementary probes are designed on the sense strand and the antisense strand of the template at the same time, the coverage of a target area is increased, the linear probes and the genome DNA sequence in an exonuclease degradation reaction system effectively reduce the influence of the probes on PCR amplification and reduce the interference of the background on a sequencing result;
(4) the invention has simple operation and short time: by using the mode of firstly capturing and then establishing a library, a DNA breaking step is not needed, the library establishing step only needs PCR amplification, and the library can be directly used for sequencing on a computer without specially constructing a library, so that the operation steps are greatly simplified, and the operation time is shortened;
(5) the method needs few DNA samples and has low cost, the target sequence captured by the single molecular probe-based multiple target capture amplification technology can be directly subjected to computer sequencing after PCR amplification, and the operation can be completed only by a small amount of DNA samples; in addition, although the total cost of the MIP technology is higher, compared with other enrichment technologies, the MIP technology has the advantages of good specificity, small influence of background noise, high capture efficiency and good sequencing data quality, and further reduces the detection cost of each sample;
(6) the invention reduces primer dimer, compared with linear probe, can exponentially reduce cross reaction and dimer phenomenon caused by linear primer sequence, and has the advantages of molecular inversion probe;
(7) the invention provides a single molecular probe multi-target capture library, which is suitable for low-frequency variation and CpG island methylation detection related to tumors.
Drawings
FIG. 1 is a structural diagram of a molecular inversion probe of the present invention;
FIG. 2 is a flow chart of the construction method of the single molecular probe multiple target capture library of the present invention;
FIG. 3 is a schematic diagram of amplification in example l of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for constructing the single-molecule probe multi-target capture library comprises the following steps:
1) design of the Probe
Designing single targeting molecule inverted probes according to the targeting regions in the genome, wherein the targeting regions comprise tumor-related low-frequency variation target region sequences, methylation regions and the like.
2) Composition of the Probe
The probes are a group of oligonucleotide sequences with the length of 120-150 nt, and the 5' ends of the probes need phosphorylation treatment; the structure of the primer is shown in figure 1, and comprises a target sequence complementary region A, B, universal primer recognition regions 1 and 2, single molecule marker sequences a and b and a connecting sequence.
3) Sources of DNA
The DNA may be of various origins, depending on the needs of the purpose of the assay, including genomic DNA of whole blood, cfDNA of plasma, DNA of pleural ascites, DNA of fresh tissue at a tumor site, or DNA of formalin paraffin-embedded specimen (FFPE) tissue.
4) Multiple target capture procedure
(a) For a DNA template for methylation detection, sulfite treatment is needed, and then downstream multiple probe region capture reaction is carried out; the hybridization capture experiment needs 100-500 ng of DNA template, then 300fmol of probe mixed liquor and 1ul of 10X Chosenligase DNA enzyme reaction buffer solution are added, the hybridization reaction is carried out on a thermal cycler, firstly, the denaturation is carried out at 98 ℃ for 3min, then, the temperature is reduced to 85 ℃ for 30min, then, the temperature is 60 ℃ for 60min, and finally, the temperature is 56 ℃ for 120 min;
(b) after the hybridization procedure was completed, 5ul of Chosenmix1 mixture including DNA polymerase, DNA ligase, dNTP, NAD + and other buffers was added, and the reaction solution was incubated at 56 ℃ for 60min and at 72 ℃ for 20min in a thermal cycler. In the system of this step, under the action of DNA polymerase, will hybridize the 3 'end of the probe will take the target area DNA as the template to carry on DNA synthesis, when the probe extends to the 5' end of the probe, through DNA ligase using this phosphate radical to carry on the covalent cyclization, form a closed single-stranded circular DNA, see FIG. 2;
(c) after the reaction, adding a Chosenexion mixture, removing linear probes and genome DNA by an exonuclease digestion mode in the step, incubating at 37 ℃ for 60-90 min under the reaction condition, and then inactivating exonuclease by a heating mode;
(d) adding a Chosenmix2 mixture containing DNA polymerase and buffer solution thereof, dNTP, universal primers a and b and the like into the extended and cyclized product of the capture probe, and then carrying out PCR reaction; and (3) PCR reaction conditions: 2min at 95 ℃; 12-15 cycles of 98 ℃ for 15s, 65 ℃ for 45s and 72 ℃ for 45s, wherein the amplification schematic diagram is shown in figure 3;
(e) the amplified product needs to be purified by magnetic beads and washed twice by 70% or 80% ethanol. Each amplified sample has different sequencing barcodes, so that the multiple sequencing of the samples can be realized; meanwhile, amplification products of different samples can also be purified after being mixed, so that a target product is obtained. The products were quantified using the Qubit and the library fragment distribution was examined using a QIAxcel advanced or Agilent2100 bioanalyzer of interest.
5) Sequencing of target capture products
The PCR product library is single-ended or double-ended sequenced using Next Generation Sequencing (NGS). If single-ended sequencing is used, the read can consist of the universal primer recognition sequence 1, the single-molecule tag sequence a, the target sequence capture region, and downstream data analysis can be performed.
6) The off-line data quality control analysis process comprises the following steps:
A. data splitting: and (4) carrying out data splitting on the bcl file after the sequencer is started up by using bcl2fastq, wherein the split file is fastq.
B. Data preprocessing: firstly, FASTQ format files are preprocessed, and sequencing reads are subjected to operations of universal primer removal, adaptor sequence removal, low-quality filtration and the like.
C. And (3) sequence alignment: the method comprises the following steps: (1) comparing the reads sequence after quality control with a reference genome (human hg19) by adopting BWA to obtain an SAM format comparison result; (2) converting the comparison result of the SAM format into a BAM format by utilizing samtools, simultaneously sequencing and establishing an index, and preprocessing a BAM file by utilizing samtools; (3) PCR duplication was removed using the MarkDuplicates module in GATK4, eliminating the effect of PCR-duplication; (4) performing re-alignment on the region near Indel by using an Assembly Based Realigner, and correcting errors of alignment results caused by insertion and deletion; (5) base Quality Score Recalibration (BQSR) quality correction was performed using the BaseRecalibrator and AplyBQSR from GATK 4.
D. Using samtools and bedtools to carry out statistics on various quality indexes of the comparison result, wherein the statistical indexes comprise each sample Q30, the average sequencing Depth (Depth) of the Target region, the Coverage rate (Target _ Coverage _100X) of the Target above 100X and the like; and respectively carrying out quality control on each sample, and if the data meet various requirements in a quality control table, continuously carrying out biological information analysis.
7) Low frequency variation analysis
(1) Detection of mutation sites was performed on all target regions using Mutect2 of GATK 4; (2) scoring for Pre-adapter (laboratory operations) and Bait-bias (sequencing target selection, strand specificity) errors caused by sequencing using CollectSequengArtifactMetrics in GATK 4; outputting detail and summary matrixes of the Pre-adapter and the Bait-bias as output results, and filtering the vcf file of the sample by adopting a FilterByOrientationBias module according to the matrixes; (3) then, filtering the mutation sites by adopting VariantFiltration according to the mutation quality and the strand deviation threshold value; (4) and finally, segmenting the sample, MuTect2, raw, filter, oxo, filter, vcf by adopting a self-built script, so that each line in the vcf file corresponds to one mutation, and obtaining the segmented sample, MuTect2, raw, filter, oxo, split, vcf file. According to the design of the molecular tag, a mutation with a low frequency of 0.1% can be detected. The results of detecting the low-frequency mutation site of HD799, a standard product, are shown in the following table.
Figure BDA0002688545270000121
8) Methylation analysis
The corrected reads were aligned to the bisulfite converted human genome (hg19) using Bismark methylation alignment software. The reads that check for unique alignment (SAM files) are counts of unique molecular tags for each targeted site with unique probe capture sequences. The unique alignment reading (SAM file) is finally run through a Bismark methylation reader to determine the methylation status of the sample. The ratio of methylated C in CpG throughout the sample is the ratio of repeat methylation, i.e., the number of methylated C in CpG divided by the sum of methylated C in CpG plus unmethylated C in CpG.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.
Sequence listing
<110> Beijing Zhen medical laboratory Co., Ltd
Construction method and application of <120> single-molecule probe multi-target capture library
<160> 6
<170> SIPOSequenceListing 1.0
<210> 1
<211> 20
<212> PRT
<213> general primer recognition sequence 1 (Artificial sequence)
<400> 1
Cys Thr Thr Cys Ala Gly Cys Thr Thr Cys Cys Cys Gly Ala Thr Thr
1 5 10 15
Ala Cys Gly Gly
20
<210> 2
<211> 20
<212> PRT
<213> general primer recognition sequence 2 (Artificial sequence)
<400> 2
Gly Cys Ala Cys Gly Ala Thr Cys Cys Gly Ala Cys Gly Gly Thr Ala
1 5 10 15
Gly Thr Gly Thr
20
<210> 3
<211> 20
<212> PRT
<213> general primer recognition sequence 1 (Artificial sequence) in MIP
<400> 3
Cys Thr Thr Cys Ala Gly Cys Thr Thr Cys Cys Cys Gly Ala Thr Thr
1 5 10 15
Ala Cys Gly Gly
20
<210> 4
<211> 20
<212> PRT
<213> general primer recognition sequence 2 (Artificial sequence) in MIP
<400> 4
Gly Cys Ala Cys Gly Ala Thr Cys Cys Gly Ala Cys Gly Gly Thr Ala
1 5 10 15
Gly Thr Gly Thr
20
<210> 5
<211> 91
<212> PRT
<213> sequence (artificial sequence) of forward primer for PCR amplification
<400> 5
Ala Ala Thr Gly Ala Thr Ala Cys Gly Gly Cys Gly Ala Cys Cys Ala
1 5 10 15
Cys Ala Cys Cys Gly Ala Gly Ala Thr Cys Thr Ala Cys Ala Cys Xaa
20 25 30
Xaa Xaa Xaa Xaa Xaa Xaa Xaa Ala Cys Ala Cys Thr Cys Thr Thr Thr
35 40 45
Cys Cys Cys Thr Ala Cys Ala Cys Gly Ala Cys Gly Cys Thr Cys Thr
50 55 60
Thr Cys Cys Gly Ala Thr Cys Thr Cys Gly Thr Ala Ala Thr Cys Gly
65 70 75 80
Gly Gly Ala Ala Gly Cys Thr Gly Ala Ala Gly
85 90
<210> 6
<211> 88
<212> PRT
<213> sequence (artificial sequence) of reverse primer for PCR amplification
<400> 6
Cys Ala Ala Gly Cys Ala Gly Ala Ala Gly Ala Cys Gly Gly Cys Ala
1 5 10 15
Thr Ala Cys Gly Ala Gly Ala Thr Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa
20 25 30
Xaa Gly Thr Gly Ala Cys Thr Gly Gly Ala Gly Gly Thr Thr Cys Ala
35 40 45
Gly Ala Cys Gly Thr Gly Thr Gly Cys Thr Cys Thr Thr Cys Cys Gly
50 55 60
Ala Thr Cys Thr Gly Cys Ala Cys Gly Ala Thr Cys Cys Gly Ala Cys
65 70 75 80
Gly Gly Thr Ala Gly Thr Gly Thr
85

Claims (9)

1. The method for constructing the single-molecule probe multiple-target capture library is characterized by comprising the following steps of:
(1) designing and synthesizing molecular inversion probes according to the target region sequence, wherein each MIP in the molecular inversion probes sequentially comprises the following components: the method comprises the following steps of (1) probe A in a target sequence complementary region, a monomolecular marker sequence a, a universal primer recognition sequence 1, a connecting sequence, a universal primer recognition sequence 2, a monomolecular marker sequence B and a target sequence complementary region probe B, wherein the universal primer recognition sequence 1 and the universal primer recognition sequence 2 are not complementary with any sequence of a target region, the lengths of the universal primer recognition sequence 1 and the universal primer recognition sequence 2 are 14-30 base pairs respectively, and the universal primer recognition sequence 1 and the universal primer recognition sequence 2 are partially or completely complementary with a forward primer and a reverse primer amplified by PCR respectively;
(2) aiming at the captured target region, the target sequence complementary region probe A and the target sequence complementary region probe B can be specifically combined with the target region template in a complementary way to form a ring structure containing one nucleotide to hundreds of nucleotide gaps;
(3) then adding 4 kinds of free nucleotides, wherein the 4 kinds of free nucleotides are dATP, dTTP, dGTP and dCTP, filling gaps by taking a target sequence as a template under the action of DNA polymerase, and catalyzing 3', 5' -phosphodiester bonds at two ends of a probe area by using DNA ligase to form a complete circularized probe; degrading unreacted probes and linear genome DNA sequences in a reaction system by using exonuclease, and only reserving ring-shaped probes; directly generating a sequencing library of the target region by using PCR amplification, wherein the PCR amplification product is the sequencing library; wherein the forward primer for PCR amplification comprises an Illumina sequencing joint and a forward universal primer, and the reverse primer for PCR amplification comprises an Illumina sequencing joint and a reverse universal primer.
2. The method for constructing a single-molecule probe multi-target capture library according to claim 1, wherein the target sequence complementary region probes A and B in each MIP are specific sequences for target regions and are complementary to the template target sequence region; the distance between the probe A in the target sequence complementary region and the probe B in the target sequence complementary region is 1-500 bp; the lengths of the probe A in the target sequence complementary region and the probe B in the target sequence complementary region are 14-35 base pairs respectively; the sequences of the single molecule marker sequence a and the single molecule marker sequence b in each MIP are different; the lengths of the single-molecule marker sequence a and the single-molecule marker sequence b are respectively 4-12 base pairs.
3. The method for constructing the single-molecule probe multi-target capture library according to claim 2, wherein the distance between the probe A in the target sequence complementary region and the probe B in the target sequence complementary region is 90-200 bp, wherein the melting temperature is 55-65 ℃ and the GC content is 35-75%.
4. The method for constructing the single-molecule probe multi-target capture library according to claim 1, wherein the universal primer recognition sequence 1 comprises a nucleotide sequence of 5'-CTTCAGCTTCCCGATTACGG-3' (SEQ ID NO.1), and the universal primer recognition sequence 2 comprises a nucleotide sequence of 5'-GCACGATCCGACGGTAGTGT-3' (SEQ ID NO. 2).
5. The method for constructing the single-molecule probe multi-target capture library according to claim 1, wherein the connecting sequence is an artificially synthesized random sequence or a sequence taken from a non-target sequence, and is used for connecting a target sequence complementary region probe A-a single-molecule marker sequence a-a universal primer recognition sequence 1 and a universal primer recognition sequence 2-a single-molecule marker sequence B-a target sequence complementary region probe B to form a complete MIP; the length of the connecting sequence is 0-20 base pairs.
6. The method for constructing the single-molecule probe multi-target capture library according to claim 1, wherein each MIP comprises the nucleotide sequence of (N1-8) CTTCAGCTTCCCGATTACGG (I0-20) GCACGATCCGACGGTAGTGT (N8-16) (SEQ ID NO.3), wherein (N1-8) is the single-molecule marker sequence a, (I0-20) is the linker sequence, and (N8-16) is the single-molecule marker sequence b.
7. The method for constructing the single-molecule probe multi-target capture library according to claim 1, wherein the sequence of the forward primer amplified by PCR comprises the nucleotide sequence of 5 '-AATGATACGGCGGACCACACCGAGATCTACACXXXXXACATCTTTCCCTACACGACCTCTCTCCGTCGTAATCGGGAAGCTGAAG-3' (SEQ ID NO.4), and XXXXXXXXXXX in the forward primer amplified by PCR is the sample barcode sequence 1; the sequence of the reverse primer amplified by PCR comprises the nucleotide sequence of 5 '-CAAGCAGAGACGGCATACGAGAATXXXXXXTGGACTGGAGGTTCAGACGTGCTTCCGATCTGATCCGACGGTAGTGT-3' (SEQ ID NO.5), and XXXXXXXXX in the reverse primer amplified by PCR is the sample barcode sequence 2; the sample barcode sequence 1 and the sample barcode sequence 2 differ in nucleotide sequence.
8. The method for constructing a single-molecule probe multi-target capture library according to claim 1, wherein the target region template is DNA or RNA.
9. The application of the single-molecule probe multi-targeting capture library is characterized in that the single-molecule probe multi-targeting capture library is constructed according to the method of any one of claims 1 to 8 and is suitable for low-frequency variation and CpG island methylation detection related to tumors.
CN202010983954.4A 2020-09-18 2020-09-18 Construction method and application of single-molecule probe multi-target capture library Pending CN112063690A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010983954.4A CN112063690A (en) 2020-09-18 2020-09-18 Construction method and application of single-molecule probe multi-target capture library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010983954.4A CN112063690A (en) 2020-09-18 2020-09-18 Construction method and application of single-molecule probe multi-target capture library

Publications (1)

Publication Number Publication Date
CN112063690A true CN112063690A (en) 2020-12-11

Family

ID=73681171

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010983954.4A Pending CN112063690A (en) 2020-09-18 2020-09-18 Construction method and application of single-molecule probe multi-target capture library

Country Status (1)

Country Link
CN (1) CN112063690A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112458085A (en) * 2020-12-10 2021-03-09 北京求臻医学检验实验室有限公司 Novel molecular capture optimization probe and library construction method thereof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013173774A2 (en) * 2012-05-18 2013-11-21 Pathogenica, Inc. Molecular inversion probes
US20140357497A1 (en) * 2011-04-27 2014-12-04 Kun Zhang Designing padlock probes for targeted genomic sequencing
CN105714383A (en) * 2014-12-22 2016-06-29 深圳华大基因研究院 Sequencing library building method and reagent based on molecular inverse probe
CN108350500A (en) * 2015-07-29 2018-07-31 普罗格尼迪公司 Nucleic acid for detecting chromosome abnormality and method
CN108396057A (en) * 2018-02-28 2018-08-14 重庆市肿瘤研究所 Nucleic acid target capture sequencing library preparation method based on long-chain molecule inversion probes
CN109252224A (en) * 2017-07-14 2019-01-22 深圳华大基因股份有限公司 A kind of cycling probe and the sequencing library construction method based on cycling probe capture

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140357497A1 (en) * 2011-04-27 2014-12-04 Kun Zhang Designing padlock probes for targeted genomic sequencing
WO2013173774A2 (en) * 2012-05-18 2013-11-21 Pathogenica, Inc. Molecular inversion probes
CN105714383A (en) * 2014-12-22 2016-06-29 深圳华大基因研究院 Sequencing library building method and reagent based on molecular inverse probe
CN108350500A (en) * 2015-07-29 2018-07-31 普罗格尼迪公司 Nucleic acid for detecting chromosome abnormality and method
CN109252224A (en) * 2017-07-14 2019-01-22 深圳华大基因股份有限公司 A kind of cycling probe and the sequencing library construction method based on cycling probe capture
CN108396057A (en) * 2018-02-28 2018-08-14 重庆市肿瘤研究所 Nucleic acid target capture sequencing library preparation method based on long-chain molecule inversion probes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
M. NIEDZICKA等: "Molecular Inversion Probes for targeted resequencing in non-model", 《SCIENTIFIC REPORTS》 *
罗志梅等: "分子倒置探针技术的研究进展及应用", 《生物技术通报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112458085A (en) * 2020-12-10 2021-03-09 北京求臻医学检验实验室有限公司 Novel molecular capture optimization probe and library construction method thereof

Similar Documents

Publication Publication Date Title
US10947589B2 (en) Varietal counting of nucleic acids for obtaining genomic copy number information
US20220042090A1 (en) PROGRAMMABLE RNA-TEMPLATED SEQUENCING BY LIGATION (rSBL)
CN105934523B (en) Multiplex detection of nucleic acids
JP6803327B2 (en) Digital measurements from targeted sequencing
EP2619329B1 (en) Direct capture, amplification and sequencing of target dna using immobilized primers
CN105039313B (en) For the high throughput identification of polymorphism and the strategy of detection
EP1713936B1 (en) Genetic analysis by sequence-specific sorting
EP3555305B1 (en) Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
CN110628880B (en) Method for detecting gene variation by synchronously using messenger RNA and genome DNA template
CN115927563A (en) Compositions and methods for analyzing modified nucleotides
CN109593757B (en) Probe and method for enriching target region by using same and applicable to high-throughput sequencing
JPWO2006085616A1 (en) Nucleic acid sequence amplification method
KR20160096633A (en) Nucleic acid probe and method of detecting genomic fragments
JP7051677B2 (en) High Molecular Weight DNA Sample Tracking Tag for Next Generation Sequencing
CN109536579A (en) The construction method of single-stranded sequencing library and its application
CN110959045B (en) Improved methods and kits for generating large-scale parallel sequenced DNA libraries
CN110546272A (en) Method of attaching adapters to sample nucleic acids
CN116445581A (en) Preparation method of oligodendroglioma related gene high-throughput amplicon library, multiple PCR primer pair and application
CN115011672A (en) Ultralow frequency gene mutation detection method
CN108359723B (en) Method for reducing deep sequencing errors
CN112063690A (en) Construction method and application of single-molecule probe multi-target capture library
CN112301430B (en) Library building method and application
WO2019002366A1 (en) Modular nucleic acid adapters
CN108166067A (en) A kind of Novel DNA banking process and its application
TWI771847B (en) Method of amplifying and determining target nucleotide sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201211