CN106119356B

CN106119356B - A kind of preparation method of molecular label

Info

Publication number: CN106119356B
Application number: CN201610496676.3A
Authority: CN
Inventors: 王海龙; 胡妮; 徐健; 唐元华
Original assignee: Hefei First Gene Technology Co Ltd; Suzhou For First Time Gene Technology LLC; First Biotechnology (suzhou) Co Ltd
Current assignee: Hefei First Gene Technology Co Ltd; Suzhou For First Time Gene Technology LLC; First Biotechnology (suzhou) Co Ltd
Priority date: 2016-06-30
Filing date: 2016-06-30
Publication date: 2019-09-24
Anticipated expiration: 2036-06-30
Also published as: CN106119356A

Abstract

The present invention relates to a kind of preparation methods of molecular label, comprising the following steps: introduces barcode sequence in 5 ' tip sides of template dna molecule；Extend 3 ' tip sides with archaeal dna polymerase, it is complementary with barcode sequence, obtain polishing sequence；Archaeal dna polymerase is added into polishing sequence, introduces thymidine, the connector being transformed in 3 ' tip sides of polishing sequence；DNA to be measured is broken into DNA short-movie section, with the connector of DNA ligase connection DNA short-movie section and transformation, obtains DNA connection product；Polymerase chain reaction amplification DNA connection product, obtains amplified production；Amplified production is sequenced, then compares to the mankind and refers to genome, the minimum DNA sequence dna of base error rate, as molecular label are chosen in multiple DNA profiling copies.The molecular label technology removes fixed sequence program, reduces the waste of data in sequencing, improve reaction rate, can effectively distinguish the mistake and true DNA variation during DNA cloning by changing barcode length.

Description

A kind of preparation method of molecular label

Technical field

The present invention relates to DNA variation detection field more particularly to a kind of preparation methods of molecular label.

Background technique

Liquid Biopsy, compared to traditional detection method, because have Small side effects, it is easy to operate, can repeat to sample The features such as, it is the technology of the accurate medical field the supreme arrogance of a person with great power of current tumour, predominantly detecting object includes: circulating tumor cell (CTC), Circulating tumor DNA (ctDNA) and tumour excretion body etc., can prompt the information such as tumor development process and drug resistance, The medication of individuation is instructed to treat.

CtDNA be it is a kind of be present in it is extracellularly free single-stranded or double-stranded in the body fluid such as blood plasma or serum, cerebrospinal fluid DNA mostlys come from the tumour cell of necrosis or apoptosis, the efflux body and circulating tumor cell of tumor cell secretion.CtDNA piece Section is typically sized to 160-180bp, in blood half-life period 2 hours, carries point mutation, insertion and deletion, copy number, fusion base Because of equal abrupt informations, is constantly flowed in blood circulation of human body system, can reflect the current information of tumor patient in real time.However, Early stage most of solid tumor and the middle and advanced stage of part entity tumor, the ctDNA content in blood plasma is extremely low, utilizes high-flux sequence The effective data rate that technology (NGS) obtains is relatively low while higher to requirement of experiment, limits experiment sample dosage and experiment number. Secondly contain a large amount of genomic DNA in ctDNA sample, cause high-flux sequence ambient noise very high, build library sequencing with conventional Cause tumor signal to be submerged in ambient noise completely when method, the detection of ctDNA can only be improved by increasing sequencing depth Accuracy, this not only adds sequencing costs, while inevitably depositing the amplification mistake that PCR is introduced during the experiment.

Since when library construction, fasciation are at and while synthesis, in sequencing procedure, there are PCR (polymerase chain reaction) polymerizations The amplification mistake of enzyme causes NGS that each base error rate is sequenced 5 × 10^-4To 10^-2.Therefore, it is examined lower than the variation of the frequency Survey will be unable to distinguish with experimental error.Scientist improves the mistake of NGS detection by computational algorithm and molecule experiments technology Accidentally rate.System distinguishes real variation using statistical model to CAPP-seq (deep sequencing cancer personalization spectrum), but is only limitted to Known variant sites.Currently, a kind of be known as UID (unique identification) technology, also referred to as molecular label technology, it is wrong for NGS Accidentally correct.The technology is that every molecule is introduced an identifiable only sequential coding (barcode) before PCR amplification, The molecular sequences which is PCR amplification introduces are distinguished by the only sequential coding, which is molecular order original in sample Column, to improve sequencing data effective percentage；Really variation and experiment can also be distinguished by the only sequential coding simultaneously Mistake is especially mutated the low frequency of ctDNA sample with improving sequencing accuracy, significant.

Currently used UID technology, there are mainly two types of methods.

First method, introduces double-strand barcode sequence in double chain DNA molecule template, and example the most typical is double Weight sequencing technologies, the specific method is as follows: (1) two single-stranded joint sequences is synthesized, wherein one has the Duplex of 12 bases Tag (double sided tag) sequence and 4 GTCA fixed sequence programs, two joint sequences carry out Annealing complementary；(2) pass through archaeal dna polymerase Extend, synthesizes complementary base；(3) A base is added at the end of double-strand 3 ' by archaeal dna polymerase.During building library, every DNA piece Two difference tag sequences (being indicated respectively with α and β) are all marked in section chemoattractant molecule respectively, after PCR amplification, form two kinds PCR product, i.e. α 'beta ' family and β α family.Only in all α 'beta ' families and β α family reads, and all deposited in double chain DNA fragment Variant sites be only real variant sites.Then occurs the improvement (Nature to dual sequencing technologies again Protocols, 2014,9,2586-2606), the Y-shaped connector for adding barcode sequence has been changed to 3 ' end T from 3 ' end A bases Base improves the joint efficiency of DNA fragmentation and connector during building library.

Second method is then expanded again identifying every DNA sequence dna by increasing label on amplimer Increase the high-flux sequence of son.

With continuously improving for technology, and there is the detection of iDES method (inhibition of integrated digital mistake) CAPP-seq technology CtDNA (Nature Biotechnology, 2016), detectable limit reaches 0.0025%.The technology combines both the above point Subtab method introduces insetion sequence using dual sequencing approach and encodes, while introducing exponential sequence coding by PCR, To further improve detection accuracy.

Dual sequencing technologies lead to DNA sequencing data using the random primer of 12bp and the fixed sequence program of 4bp or 5bp In preceding 16-17bp data it is unavailable, reduce data effective percentage；CAGT fixed sequence program is for distinguishing DNA molecular sequence without side It helps, sequencing quality is caused to reduce；It is limited in all α 'beta ' families and β α family reads in this method, and is all deposited in double chain DNA fragment Variant sites be only real variant sites；And the improved method of dual sequencing technologies, by interior enzymatic cleavage methods at 3 ' ends T base is introduced, reaction time 16h, the operation cycle is long, and endonuclease reaction efficiency is lower.

In view of the above drawbacks, the designer is actively subject to research and innovation, to found a kind of preparation of novel molecular label Method makes it with more the utility value in industry.

Summary of the invention

In order to solve the above technical problems, the object of the present invention is to provide a kind of preparation method of molecular label, the molecule mark Preparation method is signed by changing barcode length, removes fixed sequence program, barcode is not limited to two kinds of fixed sequence programs, reduces sequencing The waste of middle data；Thymidine T is introduced at connector end using archaeal dna polymerase, improves reaction rate；With increased DNA sequence Multiple copies that same DNA profiling generates in identification DNA cloning are arranged, it is effectively wrong and true during differentiation DNA cloning DNA variation, more correctly assesses the Preference of DNA cloning.

The preparation method of molecular label of the invention the following steps are included:

(1) barcode sequence is introduced in 5 ' tip sides of template dna molecule；

(2) extend 3 ' tip sides with archaeal dna polymerase, it is complementary with barcode sequence, obtain polishing sequence；

(3) archaeal dna polymerase is added in the polishing sequence obtained to step (2), introduces chest in 3 ' tip sides of polishing sequence Gland pyrimidine, the connector being transformed；

(4) DNA to be measured is broken into DNA short-movie section, with the connector of DNA ligase connection DNA short-movie section and transformation, obtained DNA connection product；

(5) polymerase chain reaction amplification DNA connection product, obtains amplified production；

(6) amplified production is sequenced, then compares to the mankind and refer to genome, is selected in multiple DNA profiling copies The DNA sequence dna for taking base error rate minimum, as molecular label.

Further, in step (1), the length of barcode sequence is 2-20bp.

Further, in step (1), barcode sequence can be known or unknown sequence.

Further, in step (2), archaeal dna polymerase is TAQ archaeal dna polymerase or Klenow enzyme.

Further, in step (3), archaeal dna polymerase is TAQ archaeal dna polymerase or Klenow enzyme.

Further, in step (4), the length of DNA short-movie section is 150-300bp.

Further, in step (4), DNA ligase is T4DNA ligase or Ecoli DNA ligase.

Further, it in step (6), is generated with same DNA profiling in increased DNA sequence dna identification PCR amplification more A copy.

Further, in step (6), both-end sequencing or single-ended sequencing are carried out to amplified production.

Further, in step (6), when DNA sequence dna is compared to reference genome, DNA fragmentation is in reference genome On the identical sequence of Origin And Destination of position be divided into one group, the relatively barcode sequence with group DNA sequence dna both ends, The identical DNA sequence dna of barcode sequence is thought from same DNA profiling, from a plurality of similar or identical of same DNA profiling A wherein sequence is merged into or retained to sequence.

Further, every by calculating by the sequence for multiple DNA copies that same DNA profiling generates in step (6) The probability of a position base error, chooses the minimum DNA sequence dna of base error rate, as molecular label.

Further, molecular label is detected for genetic mutation.

According to the above aspect of the present invention, the invention has the following advantages that

Barcode sequence is random sequence, and minimum length 2bp can reduce the waste of sequencing data amount, and Barcode sequence is not limited to fixed sequence program；Connector end introduces base T, constitutes cohesive end, improves reaction rate；It uses Random sequence label derives from the sequencing sequence of same DNA profiling, and the variation that 2 and the above DNA profiling are supported jointly is as true Consolidation is different；Multiple copies that same DNA profiling generates during DNA cloning can be effectively identified, during effectively distinguishing DNA cloning Mistake and true DNA variation, more correctly assess DNA cloning Preference.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And can be implemented in accordance with the contents of the specification, the following is a detailed description of the preferred embodiments of the present invention and the accompanying drawings.

Detailed description of the invention

Fig. 1 is inventive joint transformation process schematic diagram；

Fig. 2 is the process schematic that DNA comparison is referred to genome by the present invention to the mankind.

Specific embodiment

With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below Example is not intended to limit the scope of the invention for illustrating the present invention.

Embodiment 1

1, the synthesis and annealing of connector (adapter)

(1) respectively taking sequence is 5'-TACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' and 5 '-TCTTCTACAG The sequencing adapter DNA mono- of TCANNNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3 ' is managed, wherein sequence In NNNNNN be 6bp random barcode, N here indicates that any one base is ok in A/T/G/C.I.e. this is managed The sequence for including in adapter DNA is preferably at most 46 power kinds, and the difference of these sequences depends on which kind of base N is.At 4 DEG C Under, 20000g is centrifuged 10min, guarantees that adapter powder is gathered in bottom of the tube, label is respectively adap ter-1, adapter- 2。

(2) tube used for bottom pouring lid, should be noted not allow adapter powder to be blown out from pipe at this time.Respectively to adapter-1, Nuclease-Free water (nuclease-free water) dissolved powders are added in adapter-2, make its concentration 100nmol/mL, I.e. 100 μM；Then it is saved at -20 DEG C, if long-term do not use saves at -80 DEG C.

(3) 25 μ L adapter-1 (100 μM) and 25 μ L adapter-2 (100 μM) are respectively taken, uniformly mixes, obtains 50 μ L The mother liquor that concentration is 50 μM.Gradient temperature cooling annealing is carried out on PCR instrument (ABI9700), cycle of annealing is selected such as 1 institute of table Show.

The selection of 1 cycle of annealing of table

(4) mother liquor after annealing is placed in -20 DEG C of preservations, further according to needing to be diluted to working solution concentration when taking.2, it moves back Adapter Quality Control after fire

Mother liquor after extracting annealing, is diluted with water to 1 μM (1:49 is diluted with water), surveys Aglient 2100, peak value is in 90- 100bp。

3, Adapter polishing is tested

Using TAKARA PrimeSTAR HS DNA Polymerase (R010Q) polishing connector, it is divided to two for 50 μ L connectors Pipe, prepares the reaction of 50 μ LPCR reaction solution systems, and every pipe component is as shown in table 2.PCR reaction solution can be prepared at room temperature, prepared anti- When answering liquid, each component is put on ice.After the completion of preparation, 30min is incubated at 68 DEG C.

Reaction system in each pipe of table 2

Component	Volume (μ L)	Final concentration
			5×PrimeSTAR Buffer(Mg²⁺Plus)	10	1x
dNTP Mixture(2.5mM each)	4	200μM each
			Template (connector)	25	~82.5ng
PrimeSTAR HS DNA Polymerase(2.5U/μl)	0.5	1.25U/50μL
			Sterile purified water	10.5
Total	50

4, it is purified after Adapter polishing

Then after the adapter after purifying polishing using ethanol precipitation, steps are as follows:

(1) dehydrated alcohol of 30 μ L sodium acetates and 250 μ L, -20 DEG C of one nights of placement after mixing is added；

It is centrifuged at (2) 4 DEG C, 16000rcf is centrifuged 30min；

Then plus 80% ethyl alcohol of 750 μ L freezing (3) supernatant is discarded, is centrifuged at 4 DEG C, 16000rcf is centrifuged 5min；

(4) supernatant is discarded, then brief centrifugation to 1000rcf stops；

(5) liquid is blotted, is uncapped at room temperature, 5min is dried；

(6) plus 30 μ L Nuclease-Free water, thorough dissolution precipitating mix centrifugation, brief centrifugation arrives Then 1000rcf stops.

5, Adapter adds " T "

Use TAKARA Ex Taq DNA Polymerase (RR001Q), TAKARA dTTP (4029Q), reaction system 50 μ L, each component are as shown in table 3.Reaction system is incubated for 2h at 72 DEG C.

Table 3adapter adds " T " reaction system

Component	Volume (μ L)	Final concentration
			10×Ex Taq Buffer(Mg2+Plus)	5	1x
dTTP(20μmol 100mM)	1	2mM
			Template(adapter)	31	~82.5ng
TaKaRa Ex Taq(5U/μl)	1	5U
			Sterile purified water	12
Total	50

6, Adapter adds " T " to purify afterwards

The adapter for introducing thymidine is purified using ethanol precipitation, the specific steps are as follows:

(1) dehydrated alcohol of 30 μ L sodium acetates and 125 μ L is added, places 2h at -80 DEG C after mixing；

(2) 4 DEG C of centrifugations, 16000rcf are centrifuged 30min；

(3) supernatant is discarded, 80% ethyl alcohol for adding 750 μ L to freeze, 4 DEG C of centrifugation 16000rcf 5min；

(4) supernatant is discarded, then brief centrifugation to 1000rcf stops；

(5) liquid is blotted, uncaps dry 5min at room temperature；

(6) plus 30 μ L Nuclease-Free water, dissolution precipitating are mixed and are centrifuged, brief centrifugation to 1000rcf, so After stop, being stored at -20 DEG C.

After Adapter is prepared, DNA is carried out by Illumina routine banking process to build library, i.e., is used DNA to be detected DNA is crushed instrument and is broken into the DNA short-movie section that length is 200bp, with T4DNA ligase connection DNA short-movie section and transformation Adapter obtains DNA connection product；The DNA connection product that PCR amplification obtains, obtains amplified production；Use Illumina Hiseq4000 sequenator carries out both-end sequencing to amplified production.Then the DNA sequence dna ratio that software obtains sequencing is compared with bwa Genome hg19 is referred to to the mankind, specific as follows:

As shown in Fig. 2, "~" represents the DNA sequence dna expanded from sample in sequence label, the "-" at "~" both ends is represented The sequence label (barcode sequence) introduced in adapter, the "-" at both ends can be equally or different, after comparison Sequence is divided into two groups of A, B with reference to the position on genome in hg19 by DNA sequence dna, and lowercase indicates barcode sequence names. When DNA sequence dna is compared to reference genome, the identical sequence of Origin And Destination of position of the DNA fragmentation on reference genome It is divided into one group, such as in Fig. 2, it is same DNA profiling respectively that A-a----A-d and B-j-----B-l sequence has a plurality of respectively A plurality of sequence similar or identical is merged into or is retained a wherein sequence, forms new DNA data to be tested by multiple copies； Wherein if sequence is variant between multiple copies of same DNA profiling, the base of these discrepant positions will pass through calculating Joint error probability obtains the minimum base type of error rate, obtain that treated sequence, as molecular label, such as B- in figure J------B-l sequence, sequence that treated may be identical as a certain item in original sequence, it is also possible to not with original sequence Together.

This patent preparation molecular label for identification DNA or cDNA library preparation in PCR step generate DNA pair This, improves the accuracy of genetic mutation detection, and the variation that 2 and the above DNA profiling are supported jointly is true variation.

Embodiment 2

1, the synthesis and annealing of connector (adapter)

(1) respectively taking sequence is 5'-TACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' and 5 '-TCTTCTACAG The sequencing adapter DNA mono- of TCANNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3 ' is managed, wherein in sequence NN is the random barcode of 2bp, and N here indicates that any one base is ok in A/T/G/C.At 4 DEG C, 20000g from Heart 10min guarantees that adapter powder is gathered in bottom of the tube, and label is respectively adapter-1, adapter-2.

The selection of 1 cycle of annealing of table

3, Adapter polishing is tested

Reaction system in each pipe of table 2

4, it is purified after Adapter polishing

It is centrifuged at (2) 4 DEG C, 16000rcf is centrifuged 30min；

(4) supernatant is discarded, then brief centrifugation to 1000rcf stops；

(5) liquid is blotted, is uncapped at room temperature, 5min is dried；

5, Adapter adds " T "

Table 3adapter adds " T " reaction system

6, Adapter adds " T " to purify afterwards

(2) 4 DEG C of centrifugations, 16000rcf are centrifuged 30min；

(4) supernatant is discarded, then brief centrifugation to 1000rcf stops；

(5) liquid is blotted, uncaps dry 5min at room temperature；

The above is only a preferred embodiment of the present invention, it is not intended to restrict the invention, it is noted that for this skill For the those of ordinary skill in art field, without departing from the technical principles of the invention, can also make it is several improvement and Modification, these improvements and modifications also should be regarded as protection scope of the present invention.

Claims

1. a kind of preparation method of molecular label, which comprises the following steps:

(1) adapter-1 and adapter-2 are carried out to gradient temperature cooling annealing after evenly mixing；

The sequence of the adapter-1 is 5'-TACACTCTTTCCCTACACGACGCTCTTCCGATCT-3'；

The sequence of the adapter-2 is 5 '-TCTTCTACAGTCANNNNNNAGATCGGAAGAGCACACGTCTGAACTCCA GTCAC-3 ', wherein the random barcode that the NNNNNN in sequence is 6bp, N indicate any one base in A, T, G or C；Or

The sequence of the adapter-2 is 5 '-TCTTCTACAGTCANNAGATCGGAAGAGCACACGTCTGAACTCCAGTC AC-3 ', wherein the random barcode that the NN in sequence is 2bp, N indicate any one base in A, T, G or C；

(2) extend 3 ' tip sides with Taq archaeal dna polymerase, it is complementary with barcode sequence, obtain polishing sequence；

(3) Taq archaeal dna polymerase is added into the polishing sequence that step (2) obtains, in 3 ' connectors of the polishing sequence End introduces thymidine, the connector being transformed；

(4) DNA to be measured is broken into DNA short-movie section, the connector of the DNA short-movie section and the transformation is connected with DNA ligase, Obtain DNA connection product；

(5) DNA connection product described in polymerase chain reaction amplification, obtains amplified production；

(6) amplified production is sequenced, then compares to the mankind and refer to genome, is selected in multiple DNA profiling copies The DNA sequence dna for taking base error rate minimum.

2. the preparation method of molecular label according to claim 1, it is characterised in that: in step (4), the DNA is short The length of segment is 150-300bp.

3. the preparation method of molecular label according to claim 1, it is characterised in that: in step (4), the DNA connects Connecing enzyme is T4 DNA ligase or Ecoli DNA ligase.

4. the preparation method of molecular label according to claim 1, it is characterised in that: in step (6), to the amplification Product carries out both-end sequencing or single-ended sequencing.