CN105925671A - Method for enriching nucleic acids with target sequence from nucleic acid sample - Google Patents

Method for enriching nucleic acids with target sequence from nucleic acid sample Download PDF

Info

Publication number
CN105925671A
CN105925671A CN201610250133.3A CN201610250133A CN105925671A CN 105925671 A CN105925671 A CN 105925671A CN 201610250133 A CN201610250133 A CN 201610250133A CN 105925671 A CN105925671 A CN 105925671A
Authority
CN
China
Prior art keywords
nucleic acid
bait sequences
sequence
target
bait
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610250133.3A
Other languages
Chinese (zh)
Other versions
CN105925671B (en
Inventor
蔡万世
王瑞超
屈武斌
杭兴宜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aiji Taikang (Jiaxing) Biotechnology Co., Ltd.
Original Assignee
Igenetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Igenetech Co Ltd filed Critical Igenetech Co Ltd
Priority to CN201610250133.3A priority Critical patent/CN105925671B/en
Publication of CN105925671A publication Critical patent/CN105925671A/en
Priority to PCT/CN2016/106595 priority patent/WO2017181670A1/en
Priority to AU2016102398A priority patent/AU2016102398A4/en
Priority to AU2016403554A priority patent/AU2016403554A1/en
Application granted granted Critical
Publication of CN105925671B publication Critical patent/CN105925671B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids

Abstract

The invention provides a method for enriching nucleic acids with a target sequence from a nucleic acid sample, wherein the method comprises the following steps: providing the nucleic acid sample which contains a the target nucleic acid sequence or a bait sequence which is consistent with the target nucleic acid sequence or has characteristics to the target sequence; conducting in vitro transcription by taking the bait sequence as a template so as to prepare a nucleic acid analogue, wherein the nucleic acid analogue has a binding part; fragmenting the nucleic acid sample; hybridizing the nucleic acid analogue with the nucleic acid sample, so that a nucleic acid analogue/DNA hybrid complex is formed by the nucleic acid analogue and the nucleic acid with the target sequence; and by virtue of the binding part, separating the nucleic acid analogue/DNA hybrid complex from non-specific hybrid nucleic acids, so that the nucleic acids with non-target sequences are removed. According to a preferred embodiment, the method further comprises a step of amplifying the nucleic acid analogue/DNA hybrid complex, so that the purpose of enriching the nucleic acids with the target sequence is achieved.

Description

A kind of method from nucleic acid samples enrichment target sequence nucleotides
Technical field
The present invention relates to the capture of nucleotide sequence, be enriched with and analyze.More particularly, the present invention relates to capture based on liquid phase Target sequence enrichment method.
Background technology
Genome sequencing can obtain the sudden change of full-length genome horizontal extent, insert, lacks and structure variation.So And, owing to genome capacity is relatively big, with 30 × carry out order-checking and will produce the data volume close to 100G.And relevant low such as tumor Mutation frequency order-checking then needs at least 1000 × coverage, if carried out genome sequencing, then can produce up to 3000G Data volume.So the data volume of scale is in addition to causing great difficulty to the analysis work of data, and order-checking also can be made to become This is huge.In this time, target area capture technique arises at the historic moment.
Target area capture technique refers to the nucleotide sequence of the capture target area oriented by specific technological means, so After carry out building storehouse order-checking, to reach to make while target area carries out the purpose of degree of depth order-checking to check order, cost drops significantly Low.PCR is a kind of technology being commonly used in enrichment target area, more commonly utilizes multiple PCR technique disposably Capture multiple target area.Multiplex PCR is more suitable for hot spot region or the capture of the less target area of length;For length Bigger target area, the such as length target area more than 100K, multiplex PCR comes up from its cost and technical sophistication degree See, be all no longer appropriate for.
Therefore, there is a need in the art for the new method being suitable for that the target area that length is bigger is captured.
Summary of the invention
In order to solve the problems referred to above, the invention provides a kind of target sequence enrichment method based on liquid phase capture.
In first aspect, the invention provides a kind of method from nucleic acid samples enrichment target sequence nucleotides, described method Including:
A) nucleic acid samples comprising target nucleic acid sequence and consistent with target nucleic acid sequence or have target sequence is provided Distinctive bait sequences;
B) carrying out in vitro transcription with described bait sequences for template and prepare nucleic acid analog, described nucleic acid analog is with knot Close part;
C) make described nucleic acid sample fragment, preferably prepare library;
D) described nucleic acid analog hybridizes with described nucleic acid samples so that described nucleic acid analog and described target sequence core Acid forms nucleic acid analog/DNA hybridization complex;
E) by described bound fraction, from non-specific hybridization nucleic acid, separate described nucleic acid analog/DNA hybridization be combined Thing, removes non-targeted sequencing nucleic acid.
In one embodiment, in described nucleic acid sample fragment two ends jointing sequence in the preparation library of step c) Row, and also include that described nucleic acid analog/DNA hybridization complex is carried out by step f) according to described joint sequence in step e) Amplification, reaches to be enriched with the purpose of target sequence nucleotides.
In one embodiment, wherein said bait sequences has selected from following characteristic: i) self does not produce hair clip Structure and produce without dimer each other, ii) copy number is according to the G/C content of described target nucleic acid sequence and/or space knot Structure compensates, and iii) when described target area is high or extremely low G/C content region or when target area is low multiple During miscellaneous degree region, design bait, method for designing and described target area with two side areas region as an alternative, described target area Unanimously, iv) with target nucleic acid sequence in nucleic acid samples outside other sequences without specific binding.
In one embodiment, the copy number of described bait sequences is always according to the concerned situation of described target nucleic acid sequence Compensate.
In one embodiment, wherein said nucleic acid samples is genomic DNA, RNA, cDNA, mRNA, at described nucleic acid In the case of sample is RNA or mRNA, have, before middle step c), the step that described RNA or mRNA reverse transcription is become DNA.
In one embodiment, described bait sequences on a solid support, such as on microarray slide.
In one embodiment, described solid carrier is also multiple pearl or is microarray.
In one embodiment, some or all of described nucleic acid analog is with bound fraction.
In one embodiment, step b) utilize nucleic acid analog GNA, LNA, PNA, TNA or morpholine nucleic acid carry out In vitro transcription, prepares nucleic acid analog, and the most described nucleic acid analog is with bound fraction.
In one embodiment, wherein said bound fraction is biotin binding species.
In one embodiment, according to the G/C content of described target sequence, described bait sequences copy number is mended Repaying, G/C content is the least or the biggest, and it is the most that the bait sequences copy number that described target sequence is corresponding increases.
In one embodiment, copy number compensates according to the G/C content of described target nucleic acid sequence and refers to: with GC Content is on the basis of the bait sequences copy number coefficient of 50% 1, and G/C content deviates 50% every 1% between 10%-90%, bait Sequence copy numbers coefficient increases 0.08-0.12.
In a specific embodiment, the compensation method of bait sequences copy number is: contain according to the GC of described target sequence Amount size is divided into 6 grades from high to low, and wherein the 1st grade: 10%-30%;2nd grade: 30%-40%;3rd grade: 40%-60%;4th Shelves: 60%-70%;5th grade: 70%-90%;6th grade: less than 10% or more than 90%, wherein the copying of the bait sequences of the 3rd grade Copy number on the basis of shellfish number, the copy number of the bait sequences of the 2nd grade and the 4th grade is more than the 3rd grade, the 2.2-2.8 of the e.g. the 3rd gear Times, the copy number of the bait sequences of the 1st grade and the 5th grade is more, 3-4 times of the e.g. the 3rd gear.For the 6th grade, G/C content is less than 10% or more than 90%, and target area is the situation of low complex sequence, bait sequences method for designing is: use described target area Territory two side areas region as an alternative design probe, is typically chosen both sides, target area 300bp with inner region region as an alternative, Region within preferably 150bp.
In one embodiment, a length of 60-150bp of wherein said bait sequences, preferably 80-120bp.
In one embodiment, wherein said consistent with target nucleic acid sequence or target sequence is had specificity be Referring to, the thermodynamic stability that bait sequences combines on nontarget area to be significantly smaller than on target area the thermodynamics combined Stability, preferably with target area Tm-and non-specific region Tm>=5 DEG C, more preferably with target area Tm-and non-specific region Tm ≥10℃;The value of preferably Tm nearest neighbor algorithm based on SantaLucia 2007 thermodynamic parameter table calculates.
In one embodiment, wherein said generation without dimer refers to, is formed between any two bait sequences Dimer, its Tm≤ 47 DEG C, preferably≤37 DEG C;The value of preferably Tm is based on SantaLucia 2007 thermodynamic parameter table the most adjacent Nearly method calculates.
In one embodiment, wherein said generation without hairpin structure refers to, arbitrary bait sequences self forms hair fastener Structure, its Tm≤ 47 DEG C, preferably≤37 DEG C;The value of preferably Tm is based on SantaLucia 2007 thermodynamic parameter table closest Method calculates.
In one embodiment, wherein to each target area, described bait sequences is at specificity, dimer, sends out One or more bait sequences that card structure and position relative with target area aspect comprehensive grading are optimum, described comprehensively Scoring is carried out by following scoring functions: S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein a=0.26- 0.34, b=0.08-0.12, c=0.17-0.23, d=0.35-0.45, concrete marking computational methods are as follows:
SSpecificityMarking calculate: to newly-designed any bar bait sequences, genome carries out sequence alignment to it, right Sequence in its each comparison calculates Tm between the sequence in described bait sequences and comparison, described bait sequences and mesh respectively Mark region Tm-its with arbitrary comparison on sequence TmDifference >=5 DEG C, preferably >=10 DEG C, calculate in described bait sequences and all comparisons Sequence between average Tm, SSpecificity=1-TmMeansigma methods/(TmTarget-5), preferably SSpecificity=1-TmMeansigma methods/(TmTarget-10), wherein TmMeansigma methodsIt is the average Tm value of bait sequences and all non-specific regions comparison result, TmTargetIt is bait sequences and target area Tm
SDimerMarking calculate: to newly-designed any bar bait sequences, enter with each the bait sequences designed Row dimer comparison is analyzed, and the sequence in its each comparison calculates described bait sequences and the bait in described comparison respectively Tm between sequence, described Tm< 47 DEG C, calculates the average Tm between the bait sequences in described bait sequences and all comparisons, SDimer=(47-TmMeansigma methods)/47, the most described Tm< 37 DEG C, calculate the bait sequences in described bait sequences and all comparisons it Between average Tm, SDimer=(37-TmMeansigma methods)/37;
SHairpin structureMarking calculate: to any bar bait sequences, calculate its self optimal comparison structure, and calculate described The Tm of structure, described Tm< 47 DEG C, and SHairpin structure=(47-Tm)/47, the most described Tm < 37 DEG C, and SHairpin structure=(37- TmMeansigma methods)/37;
SRelative distanceMarking calculate: for target area coordinates, to newly-designed any bar bait sequences, calculate itself and institute State target area coordinates value of deltaDistance, δDistanceLess than 150, SRelative distance=(150-δDistance)/150。
In second aspect, present invention also offers the specificity bait sequences of the method implementing the present invention, described specificity Bait sequences is the bait sequences related in first aspect present invention.
In one embodiment, described specificity bait sequences is consistent with target nucleic acid sequence or has target sequence Characteristic, and i) self do not produce hairpin structure and produce without dimer each other, ii) copy number is according to described target G/C content and/or the space structure of nucleotide sequence compensate, iii) when described target area be high or extremely low G/C content During region or when target area is low complex degree region, visit with the design of two side areas region as an alternative, described target area Pin, method for designing is consistent with described target area, iv) with target nucleic acid sequence in nucleic acid samples outside other sequences without special Property combine.
In one embodiment, the copy number of described bait sequences is always according to the concerned situation of described target nucleic acid sequence Compensate.
In the third aspect, present invention also offers a kind of test kit, described test kit includes described in second aspect present invention Bait sequences, described test kit also includes, but not limited to double-stranded adapters molecule, multiple different oligonucleotide probe.
In one embodiment, described test kit comprise the method for realizing first aspect present invention compositions and Reagent.Described test kit includes, but not limited to double-stranded adapters molecule, multiple different oligonucleotide probe and target nucleic acid sequence Row are consistent or have distinctive bait sequences to target sequence, described bait sequences: i) self do not produce hairpin structure and Producing without dimer each other, ii) copy number is according to G/C content, the space structure of described target nucleic acid sequence and/or is closed Note situation compensates, iii) when described target area is high or extremely low G/C content region or when target area is low During complexity region, design probe, method for designing and described target area with two side areas region as an alternative, described target area Territory is consistent, iv) with target nucleic acid sequence in nucleic acid samples outside other sequences without specific binding.In some embodiment In, test kit comprises two kinds of different double-stranded adapters molecules.Described test kit can further include at least one or multiple other become Point, other compositions described are selected from archaeal dna polymerase, T4 polynucleotide kinase, T4 DNA ligase, hybridization solution, cleaning mixture and/or wash De-liquid.In certain embodiments, described test kit comprises magnet.In certain embodiments, described test kit comprises one Or multiple enzyme, and corresponding reagent, buffer etc., such as restricted enzyme, such as MlyI, and be used for using MlyI to enter Buffer/the reagent of row restriction endonuclease reaction.
Detailed description of the invention
The invention provides a kind of based on liquid phase capture target sequence enrichment method, described in include: bait sequences design, The nucleic acid synthesis (by synthesis custom primer or the method for solid phase synthesis) of bait sequences, prepares nucleic acid by the method for in vitro transcription Like thing, described nucleic acid analog comprises bound fraction;Nucleic acid samples pre-treatment (method prepared by library is carried out), sample is permissible It is genomic DNA, RNA, cDNA, mRNA etc.;Nucleic acid analog forms nucleic acid with target sequence nucleotides with complementary pairing principle Like thing/DNA hybridization complex;Eluting removes the nucleic acid analog/DNA hybridization body of low complementary pairing, removes non-targeted Sequence kernel Acid;According to the joint sequence added by nucleic acid samples pre-treatment, the nucleic acid analog/DNA of complementary pairing is carried out specific amplification, Reach to be enriched with the purpose of target sequence nucleotides.
In invention, term " sample " most widely looks like with it use, and it is intended to include from any source, preferably from life The sample of thing source acquisition or culture.Biological sample can obtain from animal (including people), and include liquid, solid, tissue and Gas.Biological sample includes blood products, such as blood plasma, serum etc..Therefore, " nucleic acid samples " comprises the nucleic acid in any source (such as DNA, RNA, cDNA, mRNA, tRNA, miRNA etc.).In the case of described nucleic acid samples is RNA or mRNA, middle step C) have, before, the step that described RNA or mRNA reverse transcription is become DNA.In this application, nucleic acid samples preferably originates from biogenetic derivation, Such as people or non-human cell, tissue etc..Term " inhuman " means all non-human animals and entity, includes but not limited to, vertebra Animal such as rodent, non-human primate, sheep, cattle, ruminant, lagomorph, pig, goat, horse, dog, cat, birds Etc..Inhuman invertebrates and the prokaryote of also including, such as antibacterial, plant, yeast, virus etc..Therefore, for this The nucleic acid samples of bright method and system for being derived from any biology, no matter eucaryon or the nucleic acid samples of protokaryon.
In invention, inventor finds that the target sequence capture rate captured based on liquid phase is had by the G/C content of target sequence Considerable influence.In order to reach the effective capture to multiple target sequences, preferably according to the G/C content of described target sequence to described Bait sequences copy number compensates, and G/C content is the least or the biggest, and the bait sequences copy number that described target sequence is corresponding increases Add is the most.
Inventor find, for G/C content about 50%, such as ± 10%, target sequence can obtain good mesh Mark sequence capturing efficiency;For the target sequence of other G/C content, need to carry out the compensation of bait sequences copy number and could obtain good Good target sequence capture rate.Process human genomic sequence is tested comprehensively, and inventor finds, in order to reach more preferable Target sequence capture rate, by G/C content on the basis of the bait sequences copy number coefficient of 50% 1, G/C content 10%-90% Between deviate 50% every 1%, bait sequences copy number coefficient increase 0.08-0.12.Such as, when G/C content is 68%, deviation 18%, induced sequence copy number coefficient is 2.44-3.16.
When G/C content less than 10% or belongs to low complex sequence more than 90%, in this case corresponding bait Sequence design methodology is: when described target area is high or extremely low G/C content region or when target area is low complexity During degree region, design probe with two side areas region as an alternative, described target area, be typically chosen both sides, target area 300bp With inner region region as an alternative, the region within preferably 150bp.
In the present invention, low complex degree region refers to the district being made up of the element (such as oligonucleotide) of little kind This simple repeated sequence in territory, such as microsatellite.
In the present invention, it is preferred to the sample dna fragment after fragmentation is built storehouse.
In one embodiment, the compensation method of bait sequences copy number can be expressed simply as: according to described target The G/C content size of sequence is divided into 6 grades from high to low, and wherein the 1st grade: 10%-30%;2nd grade: 30%-40%;3rd grade: 40%-60%;4th grade: 60%-70%;5th grade: 70%-90%;6th grade: less than 10% or more than 90%, wherein the 3rd grade Bait sequences copy number on the basis of copy number, the 2nd grade needs with the copy number of the 4th grade of corresponding bait sequences to increase, example 2.2-2.8 times of 3rd gear in this way, the copy number of the bait sequences of the 1st grade and the 5th grade needs to increase more, and the e.g. the 3rd keeps off 3-4 times.In one embodiment, for the 6th grade, G/C content is less than 10% or more than 90% or be low complexity at G/C content The situation of sequence, bait sequences method for designing is: design probe with two side areas region as an alternative, described target area, typically Select both sides, target area 300bp with inner region region as an alternative, the region within preferably 150bp.
In one embodiment, wherein to each target area, described bait sequences is at specificity, dimer, sends out One or more bait sequences that card structure and position relative with target area aspect comprehensive grading are optimum, described comprehensively Scoring is carried out by following scoring functions: S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein a=0.26- 0.34, b=0.08-0.12, c=0.17-0.23, d=0.35-0.45.SSpecificityThe numerical value being between 0 to 1 Deng marking, specifically Marking computational methods as follows:
SSpecificityMarking rule: to newly-designed any bar bait sequences, genome carries out sequence alignment to it, adopts Using BLAT software, use default parameters, to its each comparison result, Computational Thermodynamics Tm parameter respectively, if had and target Region Tm-and non-specific region TmDifference < 5 DEG C, preferably < 10 DEG C, then abandon this bait sequences, redesign;Otherwise calculate institute There are the average Tm value of non-specific region comparison result, final SSpecificity=1-TmMeansigma methods/(TmTarget-5), wherein preferably SSpecificity=1- TmMeansigma methods/(TmTarget-10), wherein TmMeansigma methodsIt is the average Tm value of bait sequences and all non-specific regions comparison result, TmTargetIt is Bait sequences and target area Tm
SDimerMarking rule: to newly-designed any bar bait sequences, enter with each the bait sequences designed Row dimer comparison is analyzed, and uses BLAT software, uses default parameters, to its each comparison result, and Computational Thermodynamics respectively Tm parameter, if there being Tm>=47 DEG C, then abandon this bait sequences, redesign;Otherwise calculate the average Tm of all comparison results Value, final SDimer=(47-TmMeansigma methods)/47;If preferably there being Tm≥37.DEG C, then abandon this bait sequences, redesign;Otherwise Calculate the average Tm value of all comparison results, SDimer=(37-TmMeansigma methods)/37;
SHairpin structureMarking rule: to any bar bait sequences, use Smith-Waterman algorithm, calculate it optimal Self comparison structure, and according to this Structure Calculation its thermodynamics Tm parameter value, if there being Tm>=47 DEG C, then abandon this bait sequence Row, redesign;Otherwise its SHairpin structure, if preferably there is T=(47-Tm)/47m>=37 DEG C, then abandon this bait sequences, again Design;Otherwise its SHairpin structure=(37-TmMeansigma methods)/37;
SRelative distanceMarking rule: known target area coordinates to be designed, to any bar bait sequences, calculate itself and target Area coordinate value of deltaDistance, setting acceptable difference as 150, this numerical value is empirical value;If difference is more than 150, then Abandon this bait sequences, redesign;Otherwise its SRelative distance=(150-δDistance)/150.With target area coordinates difference 150 In the range of cannot design suitable bait sequences, it is also possible to difference is set to 300, its SRelative distance=(300-δDistance)/ 300。
In the present invention, the T of sequencemCalculating be not limited to concrete method, the Tm value that various methods calculate is the most permissible For the present invention, the Tm value that various methods obtain can not reverse the effect of the present invention substantially, and simply the degree of effect can be variant. Although the nearest neighbor algorithm of SantaLucia 2007 thermodynamic parameter table can calculate Tm, but the Tm value that additive method calculates is permissible Corresponding, those skilled in the art can test the calculated Tm of more various methods through simple, thus to respectively The Tm value that the method for kind calculates makes suitably selection.
According to the experience of inventor, for human genome coding region, the target area more than 99% all can be designed Go out the bait sequences of the applicable present invention, show that our aforementioned stepping to GC region and the filtration to Tm value are all rational.
In certain embodiments, the hybridization between described nucleic acid analog and target nucleic acid is under preferably stringent condition Carrying out, described stringent condition be enough to support the hybridization between described nucleic acid analog/DNA, and wherein said nucleic acid analog comprises Connect compound and the complementary region of described target nucleic acid sample, to provide described nucleic acid analog/DNA hybridization complex.Institute State complex to be captured by described connection compound subsequently, and wash under conditions of being enough to remove ergotropy bind nucleic acid, Then the target nucleic acid sequence hybridized is eluting from the nucleic acid analog captured/DNA complex.
In certain embodiments, described nucleic acid analog comprises chemical group or connects compound, such as bound fraction Such as biotin, digoxin etc., it can be incorporated into solid carrier.Described solid carrier can comprise and captures chemical combination accordingly Thing, the such as Streptavidin for biotin or the DigiTAb for digoxin.The invention is not restricted to used company Connect compound, and the compound that connects substituted is equally applicable to the method for the present invention, bait sequences and test kit.
In the present invention, described chemical group or connection compound, such as bound fraction such as biotin, digoxin etc. Deng, nucleic acid analog (glycerol nucleic acid GNA, lock nucleic acid LNA, peptide nucleic acid(PNA) PNA, threose nucleic acid TNA or morpholine core can be connected to Acid) in any base.Preferably, described nucleic acid analog chain can include ribose and/or deoxyribose, described chemistry base Group or connection compound, such as bound fraction such as biotin, digoxin etc., can be connected to ribose and/or deoxyribose On base on.Such as, the synthesis of described nucleic acid analog includes ATP, CTP, GTP and/or the UTP using labelling.Labelling is used The labeling method of nucleotide Cydye, DIG, biotin, rhodamine, fluorescein etc. is known in the art.Such as, biotin can For use as labeled nucleic acid probe thing, it can combine with the C atom on UTP or the dUTP 5 ' position of nucleic acid molecules, and can be with affine Element combines and is detected.But, the invention is not restricted to known label and labeling method, the following label found and labelling Method is also in the limit of consideration of the present invention.
In embodiments of the invention, the plurality of target nucleic acid molecules preferably comprise a kind of biology full-length genome or At least one chromosome or the nucleic acid molecules of a kind of arbitrary size molecular weight.Preferably, the size of described nucleic acid molecules is at least about 200kb, at least about 500kb, at least about 1Mb, at least about 2Mb or at least about 5Mb, more preferably size about 100kb to about 5Mb, about 200kb is to about 5Mb, about 500kb to about 5Mb, about 1Mb to about 2Mb or about 2Mb to about 5Mb.
In certain embodiments, described target nucleic acid is from animal, plant or microorganism, in preferred embodiment In, described target nucleic acid molecules is selected from people.If fewer (the people's nucleic acid obtained the most in some cases of the amount of nucleic acid samples Sample, the genome of the most developmental fetus), the most amplifiable described nucleic acid, such as, pass through Whole genome amplification.For the method carrying out the present invention, amplification in advance is probably necessary, such as, (such as exist in legal medical expert applies For inherited characteristic purpose in prudence).
In certain embodiments, the plurality of target nucleic acid molecules is one group of genomic DNA molecule.Described bait sequences It is selected from such as limiting multiple bait sequences of multiple exon, intron or regulating and controlling sequence from multiple genetic locis; Limiting multiple bait sequences of the complete sequence of at least one particulate inheritance locus, described locus size is any, preferably at least One of 1Mb, or at least the above particular size;Limit the multiple bait sequences of single nucleotide polymorphism (SNP);Or limit a kind of battle array The multiple bait sequences of row, such as, be designed as capturing the tiling array of the complete sequence of at least one complete chromosome.
In this article, term " hybridizes " pairing meaning complementary nucleic acid.Hybridization and intensity for hybridization (combine between such as nucleic acid Intensity) affected by many factors, such as degree complementary between nucleic acid, use the Stringency of hybridization conditions, formed The melting temperature (Tm) of crossbred and the G/C content value of nucleic acid.Although the present invention is not only restricted to concrete hybridization conditions, but excellent Choosing uses strict hybridization conditions.Strict hybridization conditions depends on sequence and (such as salinity, Organic substance is deposited with Crossbreeding parameters Waiting) and change.Generally, " strict " condition is chosen as the Tm under the ionic strength and pH of regulation less than specific nucleic acid sequence About 5 DEG C to about 20 DEG C.Preferably, strict condition is to arrive less than the temperature melting point about 5 DEG C of the concrete nucleic acid combining complementary nucleic acid 10℃.Described Tm is the temperature (ionic strength in regulation of 50% nucleic acid (such as target nucleic acid) and pairing probe hybridization completely With under pH).
In this article, " strict condition ", such as, can be 50% Methanamide, 5 × SSC (0.75M NaCl, 0.075M Fructus Citri Limoniae Acid sodium), 50mM sodium phosphate (pH6.8), 0.1% sodium pyrophosphate, 5 × Denhardt solution, the salmon sperm dna of ultrasonic Treatment (50mg/ml), 0.1%SDS, and 10% dextran sulfate hybridizes, at 42 DEG C with 0.2 × SSC (sodium chloride/lemon at 42 DEG C Lemon acid sodium) and wash with 50% Methanamide at 55 DEG C, then wash with 0.1 × SSC containing EDTA at 55 DEG C.Such as, it is contemplated that Comprise 35% Methanamide, the buffer of 5 × SSC and 0.1% (w/v) sodium lauryl sulphate (SDS) is suitable in appropriateness non-critical Under the conditions of 45 DEG C hybridize 16-72 hour.
In this article, term " primer " means oligonucleotide, obtain after no matter naturally occurring purified, enzyme action or warp Synthetic method produces, (such as at nucleoside under conditions of the synthesis being placed in the induction primer extension product complementary with nucleic acid chains In the presence of acid and induction agent such as archaeal dna polymerase, and at suitable temperature and pH), it is possible to as the starting point of synthesis.Described Primer preferably has the strand of maximum amplification efficiency.Preferably, described primer is oligodeoxynucleotide.Described primer must foot Enough long with the synthesis causing extension products in the presence of described induction agent.The definite length of described primer depend on a lot of because of Element, including temperature, Primer Source and institute's using method.
In this article, term " bait " or " bait sequences " mean oligonucleotide (such as nucleotide sequence), the most natural That obtain after there is purified, enzyme action or produce through synthesis, restructuring or PCR amplification, it is possible to another target oligonucleotide At least some of hybridization of such as target nucleic acid sequence.Probe can be strand or double-strand.Probe can be used for specific gene sequence Detect, differentiate and separate.
In this article, term " target nucleic acid molecules " refers to the molecule from target genome area or sequence.Preliminary election Probe determines the scope of target nucleic acid molecules.Therefore, described " target " attempt to distinguish with other nucleotide sequence.One The nucleic acid region that " fragment " is defined as in described target sequence, such as one " fragment " or " a portion as nucleotide sequence Point ".
In this article, term " separates " when being used for relating to nucleic acid, as " separation nucleic acid " time, mean nucleotide sequence from At least one other component that its natural origin generally combines or pollutant are authenticated and separate.The nucleic acid separated is not with It is same as presented in it naturally occurs.On the contrary, the nucleic acid of unsegregated nucleic acid such as DNA and RNA is with its naturally occurring shape State exists.The nucleic acid of described separation, oligonucleotide or polynucleotide can exist with single stranded form or double chain form.
In this article, term " bait sequences consistent with target nucleic acid sequence " refers to that its complementary series can be with target core The sequence of acid sequence hybridization.Preferably, hybridize under strict conditions.When described target area is that high or extremely low GC contains During amount region or when target area is low complex degree region, owing to bait sequences, i.e. bait sequences cannot be designed in this region Coverage rate is zero, then can find appropriate area design bait sequences in this left and right sides, target area;Typically can be in left and right two Scope design bait sequences within the 300bp of side;Region within preferably 150bp.
In embodiments of the invention, for the bait sequences used in catching method as herein described and test kit Transcription primers comprise connection compound, such as bound fraction.Bound fraction comprises any connection or introducing captures for subsequently 5 ' the parts held of the amplimer of nucleic acid analog/target nucleic acid hybridization complex.Bound fraction is for introducing primer sequence 5 ' Any sequence of end, the most trappable 6 histidine (6HIS) sequence.Such as, the primer comprising 6HIS sequence can be captured by nickel, Such as being coated at nickel or comprise nickel and be coated in the pipe of pearl, granule etc., micropore or purification column, wherein said pearl is packaged into In pillar, sample loads and passes through the complex (target eluting such as, and subsequently) that pillar reduces with capture complexity.For The example of the another kind of bound fraction of embodiment of the present invention includes hapten, such as digoxin, and such as it is connected to amplification 5 ' ends of primer.Digoxin can use DigiTAb to capture, the substrate being such as coated or comprising anti digoxin antibody.
In certain embodiments, described bound fraction is biotin, is coated described capture substrate, example with Streptavidin Such as pearl such as paramagnetic particle, it is used for from non-specific hybridization target nucleic acid, separate described target nucleic acid/transcription product complex. Such as, when biotin is bound fraction, Streptavidin (SA) coated substrate, the coated pearl of such as SA (such as magnetic bead/ Granule) it is used for capturing described biotin labeled nucleic acid analog/target complex.Wash the complex that described SA combines, institute The target nucleic acid of hybridization checks order from described complex eluting.
Can use provide the most parallel without mask array synthetic technology in sequence with described genome at least The bait sequences that individual region is corresponding.Alternatively, probe can use standard DNA synthesizer to obtain continuously and be applied to described solid Carrier, or can obtain from organism and be fixed on described solid carrier.After hybridization, non-hybridized or non-with described nucleic acid analog The nucleic acid of specific hybrid is separated from described carrier-bound nucleic acid analog by washing.Remaining nucleic acid and described nucleic acid Analog is specific binding, in such as hot water or comprising the Nucleic Acid Elution buffer of such as TRIS buffer and/or EDTA In from described solid carrier eluting, to produce the eluate of described target nucleic acid molecules enrichment.
Or, the bait sequences for target molecule can synthesize the most on a solid support, as bait sequences collection Close and discharge from described solid carrier and expand.Load can be covalently or non-covalently fixed in the described release nucleic acid analog set transcribed Body, such as glass, metal, pottery or polymeric beads or other solid carrier.Described nucleic acid analog may be designed as from described solid The convenient release of body carrier, such as closest to the nucleic acid analog end of carrier or its be provided about sour or alkali labile nucleic acid Sequence, it discharges described nucleic acid analog respectively under the conditions of low or high pH.The multiple connection chemical combination sheared known in the art Thing.Described carrier is permissible, such as, to have the cylinder offer of liquid-inlet and outlet.This area is familiar with cDNA chip to carrying The method of body, such as by being attached in described nucleic acid analog by biotin labeled nucleotide, and use Streptavidin It is coated the described nucleic acid analog in described carrier, the most described non-covalent attraction of coated carrier fixing described set.Institute State sample pass through under hybridization conditions described in comprise nucleic acid analog carrier, the target core thus hybridized with described immobilization carrier Acid molecule can eluting, for afterwards analysis or other purposes.
Term " nucleic acid " can include, for example, but not limited to: DNA (deoxyribonucleic acid) (DNA), ribonucleic acid (RNA) and artificial Nucleic acid such as peptide nucleic acid(PNA) (PNA), morpholine nucleic acid (morpholino) and lock nucleic acid (LNA), glycerol nucleic acid (glycol nucleic Acid, GNA) and threose nucleic acid (TNA).In this article, term " nucleic acid ", " nucleotide sequence " or " nucleic acid molecules " should be from extensively Justice is explained, for example, can be ribonucleic acid (RNA) or DNA (deoxyribonucleic acid) (DNA) or the oligomer of its analogies or Person's polymer.This term includes the molecule connected and composed by (skeleton) between natural nucleobases, saccharide and covalency nucleoside and has The molecule with similar functions that between non-natural core base, saccharide and covalency nucleoside, (skeleton) connects and composes or its close.Cause For required character, such as nucleic acid target molecule affinity is strengthened and stability increases in the presence of nuclease and other enzymes, Such modified or substituted nucleic acid may be than native form it is further preferred that and use term " nucleic acid analog " in this article Or " nucleic acid mimics " describes.The preferred embodiment of nucleic acid mimics be comprise peptide nucleic acid(PNA) (PNA), lock nucleic acid (LNA), wood- Lock nucleic acid Uylo-LNA), D2EHDTPA is cruel, 2 '-methoxyl group, 2 '-methoxy ethoxy, morpholine nucleic acid and phosphoramidate point Sub or functionally similar nucleic acid derivative.
Embodiment
Embodiment 1: the design of bait sequences
Randomly choose on human genome 1000 sites (distribution in these sites is shown in Table) on exon and intron to be used for The method of the test present invention.To these 1000 random target sequence design bait sequences for follow-up test.
Table 1: the chromosome distribution in 1000 sites randomly choosed
Chromosome Number Chromosome Number
chr1 92 chr12 73
chr2 67 chr13 23
chr3 53 chr14 15
chr4 43 chr15 29
chr5 45 chr16 41
chr6 124 chr17 36
chr7 42 chr18 14
chr8 46 chr19 31
chr9 34 chr20 21
chr10 61 chr21 9
chr11 80 chr22 21
Bait sequences design comprises the following steps:
First, the analysis of target sequence characteristic comprises the steps:
A) 5 grades it are divided into from high to low according to target sequence G/C content size, wherein 1 grade: 10%-30%;2 grades: 30%- 40%;3 grades: 40%-60%;4 grades: 60%-70%;5 grades: 70%-90%;
B) analyzing target sequence space structure, labelling can form the target sequence of stable space structure;
2. secondly, established standards and the scoring to bait sequences:
A) target sequence length is in 60-150bp scope;
B) keeping specificity, specific principle is, the thermodynamic stability that bait sequences combines on nontarget area It is significantly smaller than on target area the thermodynamic stability combined;The index of general analysis is Tm(target area)-Tm(non-spy Different region) >=(non-specific region) 5 DEG C;Part data Tm(target area)-Tm(non-specific region) >=10 DEG C carries out contrasting (strong Specificity limits);Different thermodynamic calculation method, relatively big on result of calculation impact, it is based on SantaLucia 2007 heat here The nearest neighbor algorithm of mechanics parameter table calculates;
C) producing without secondary structure, secondary structure includes that dimer and hairpin structure, i.e. designed bait sequences are not permitted Permitted to produce dimer or hairpin structure;The dimer formed between any two bait sequences, its Tm≤ 47 DEG C, part data ≤ 37 DEG C carry out contrasting (strict dimer restriction);Arbitrary bait sequences self forms hairpin structure, its Tm≤ 47 DEG C, part number Carry out contrasting (strict hairpin structure restriction) according to≤37 DEG C;Different thermodynamic calculation method are relatively big, here on result of calculation impact It is that nearest neighbor algorithm based on SantaLucia 2007 thermodynamic parameter table calculates;
D) to each target area, analyze candidate's bait sequences, according to the specificity of each candidate sequence, dimer, send out Card structure and the relative position with target area, design synthesis is marked, then according to appraisal result, select optimum one or The multiple bait sequences of person (i.e. scoring functions value maximum): S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein A=0.26-0.34, b=0.08-0.12, c=0.17-0.23, d=0.35-0.45, marking is provided by own computed in software, Rule is as follows:
SSpecificityMarking rule: to newly-designed any bar bait sequences, genome carries out sequence alignment to it, adopts Using BLAT software, use default parameters, to its each comparison result, Computational Thermodynamics Tm parameter respectively, if had and target Region Tm-and non-specific region TmDifference < 5 DEG C, then abandon this bait sequences, redesign, wherein part data < 10 DEG C work For contrast;Otherwise calculate the average Tm value of all comparison results, final SSpecificity=1-TmMeansigma methods/(TmTarget-5), part data SSpecificity =1-TmMeansigma methods/(TmTarget-10) as a comparison, wherein TmMeansigma methodsBe bait sequences with all non-specific regions comparison result is average Tm value, TmTargetIt is bait sequences and target area Tm
SDimerMarking rule: to newly-designed any bar bait sequences, enter with each the bait sequences designed Row dimer comparison is analyzed, and uses BLAT software, uses default parameters, to its each comparison result, and Computational Thermodynamics respectively Tm parameter, if there being Tm>=47 DEG C, then abandon this bait sequences, redesign;Otherwise calculate the average Tm of all comparison results Value, final SDimer=(47-TmMeansigma methods)/47, part data Tm>=37 DEG C as a comparison, then abandon this bait sequences, again set Meter;Otherwise calculate the average Tm value of all comparison results, SDimer=(37-TmMeansigma methods)/37;
SHairpin structureMarking rule: to any bar bait sequences, use Smith-Waterman algorithm, calculate it optimal Self comparison structure, and according to this Structure Calculation its thermodynamics Tm parameter value, if there being Tm>=47 DEG C, then abandon this bait sequence Row, redesign;Otherwise its SHairpin structure=(47-Tm)/47, if part data have Tm>=37 DEG C as a comparison, then abandon this and lure Bait sequence, redesigns;Otherwise its SHairpin structure=(37-TmMeansigma methods)/37;
SRelative distanceMarking rule: known target area coordinates to be designed, to any bar bait sequences, calculate itself and target Area coordinate value of deltaDistance, setting acceptable difference as 150, this numerical value is empirical value;If difference is more than 150, then Abandon this bait sequences, redesign;Otherwise its SRelative distance=(150-δDistance)/150.With target area coordinates difference 150 In the range of cannot design suitable bait sequences, the most also partial difference is set to 300, its SRelative distance=(300- δDistance)/300。
The most again, according to objectives areas case, carry out bait sequences copy number compensation:
A) according to the Stability Classification situation of target sequence, using the bait sequences copy numbers of 3 grades as reference copy number (i.e. Benchmark 1);1 grade of bait sequences corresponding with 5 grades needs to increase more copy number, is 2.5 times of the 3rd gear;Next to that 2 grades and 4 Shelves, it is 3.5 times of the 3rd gear that the bait sequences of its correspondence is also required to slightly many copy numbers;
B) for forming the target sequence of stable space structure, bait sequences copy number is double;
C) target area is probably when paying close attention to region, for instance it can be possible that the region that fusion event occurs, bait Sequence copy numbers is double;
D) additionally the parallel test of bait sequences copy number uncompensation is carried out under the same conditions as comparison.
4. last, when target sequence cannot design probe, such as, when target area is high or extremely low G/C content district During territory, or (low complex degree region refers to the such as oligonucleoside of the element by little kind when target area is low complex degree region The region that acid is formed, this simple repeated sequence of such as microsatellite), owing to bait sequences cannot be designed in this region, i.e. Bait sequences coverage rate is zero, then can find appropriate area design bait sequences in this left and right sides, target area;General meeting Scope design bait sequences within the 300bp of the left and right sides;If suitable bait sequence can be designed in the region within 150bp Row, then record is as comparison.Having 138 to belong to this situation in the target sequence randomly choosed in the present embodiment, 68 at it Region successful design within the 150bp of left and right goes out bait sequences, other 22 in 150-300bp around successful design go out to lure Bait sequence, still has 48 all cannot design probe in these regions.
5. the bait sequences of final design is shown in that situation is shown in Table 2.
Table 2: bait sequences design conditions
The condition that the strictest scoring functions limits is: with target area Tm-and non-specific region Tm>=10 DEG C, SSpecificity= TmMeansigma methods/37;Tm< 37 DEG C, SDimer=(37-TmMeansigma methods)/37;Tm< 37 DEG C, SHairpin structure=(37-TmMeansigma methods)/37。
Embodiment 2: the preparation of bait sequences
Carrying out sequence according to the bait sequences of embodiment 1 design to prepare, bait sequences preparation method is as follows:
1. add the specific sequence of a length of 20 bases respectively at bait sequences 5 ' end and 3 ' ends, specific sequence sets Meter principle is: 1) will not produce non-specific amplification product on target (to be captured) genome;2) G/C content is positioned at 30%-70% Between, between preferably 40%-60%;3) both will not form dimer, or dimer free energy≤47 DEG C formed, preferably ≤37℃.Thus form sequence to be synthesized, all bait sequences, with a pair specific sequence, are exemplified below:
5 ' end sequences-bait sequences (60-150bp)-3 ' end sequence is (SEQ ID NO.1):
ATATAGATGCCGTCCTAGCG-NNNNNNNNNN......NNNNNNNNNN-TGGGCACAGGAAAGATACTT。 Wherein " NNNNNNNNNN......NNNNNNNNNN " represents bait sequences.
2. specific sequence is generated by the solution hybridization capture order-checking probe design software of the present inventor's independent development.
3. sequence to be synthesized is utilized the extensive synthetic oligonucleotide of chip method well known in the art, then with using ammonia Oligonucleotide on chip is eluted, through being dissolved in after purification in distilled water, forms oligonucleotide pond.
4. with oligonucleotide pond as template, with 5 ' end primers of 5 ' end sequences and 3 ' end complementary and 3 ' end primers are primer, utilize Taq polymerase (JumpStart Taq DNA Polymerase buying to Sigma, Catalog No.D6558) carrying out polymerase chain reaction (PCR) amplification, it is thus achieved that substantial amounts of double-stranded DNA pond, concrete operation step is as follows:
1) reaction system is as follows:
2) reaction condition is as follows:
3) QIAGEN PCR purification kit (QIAGEN, Cat No./ID 28104) is used, according to its operating instruction Carry out PCR primer purification:
4) use 5 ' ends band T7 sequence (TAATACGACTCACTATAGGG) of 5 ' end primers as forward primer and 3 ' ends Primer as reverse primer, utilize Taq polymerase (JumpStart Taq DNA Polymerase purchase to Sigma, Catalog No.D6558) carry out polymerase chain reaction (PCR) amplification, form the double-stranded DNA pond of 5 ' end band T7 sequences.Operate as follows:
5) reaction system:
Reagent name Volume
Water 37μl
10×PCR Buffer 5μl
10mM dATP 1μl
10mM dCTP 1μl
10mM dGTP 1μl
10mM TTP 1μl
BAITS_5_PRIMER_N-T7(10μM) 1μl
BAITS_3_PRIMER_N(10μM) 1μl
JumpStart Taq DNAPolymerase 1μl
Oligonucleotide pond 1μl
6) reaction condition is as follows:
Use gel electrophoresis that previous step PCR product is separated, remove non-specific band, reclaim 120-210bp Region segments, uses Qiagen glue to reclaim test kit (QIAquick Gel Extraction Kit, Cat No./ID 28704) It is purified;
7) use T7 High Yield RNA Transcription Kit (Vazyme, TR101-01/02), utilize core The NTP of acid-like substance (glycerol nucleic acid GNA, lock nucleic acid LNA, peptide nucleic acid(PNA) PNA, threose nucleic acid TNA or morpholine nucleic acid) and biotin The UTP of labelling is substrate, previous step glue is reclaimed purified product and carries out in vitro transcription, be prepared as containing biotin labeled nucleic acid Like thing pond:
Hatch 8-12 hour for 37 DEG C, obtain maximum output nucleic acid analog pond, be diluted to 500ng/ μ l after purification, be placed in- 80 DEG C of Refrigerator stores.
Additionally using parallel test under the same terms in standard nucleic acid ATP, CTP, GTP, UTP and Biotin-UTP as right According to.
Implement 3: library, target area captures
1. prepared by the DNA library for high flux capture order-checking:
1) take the genomic DNA 1 μ g of tested species, use sonicator Bioruptor pico to beat at random Break to 150-250bp small fragment;
2) Illumina TruSeq DNA library preparation test kit is used to carry out capturing front small fragment literary composition Prepared by storehouse.
2. use the nucleic acid analog pond of preparation and the small fragment library of target species to carry out target area Library hybridization and catch Obtain:
1) close primer to prepare:
Synthesize according to above primer sequence, every kind of synthesis 100 OD, every kind of primer is diluted to 1000 μMs, and according to Equal-volume mixes, named Block 1;
2) cot-1 DNA and salmon sperm DNA is diluted to 100ng/ μ l, and equal-volume mixing, it is labeled as Block 2;
3) take 6 μ l Block 1 to mix with 5 μ l Block 2, be labeled as Block Mix;
4) take 1 μ g small fragment genomic library and 11 μ l Block Mix mixing, and use frozen drying centrifuge Carry out being concentrated into 9 μ l, be labeled as reagent S1, be placed in the most stand-by;
6) take 20 μ l hybridization solutions (20 × SSPE, 2 × Dennard`s, 1mM EDTA, 1%SDS) to be placed on 65 DEG C of metal baths Preheating, is labeled as S2;
7) taking 5 μ l pure water, add 2 μ l 500ng/ μ l nucleic acid analog ponds after mixing, slowly suction is beaten and is mixed for several times, labelling For S3, it is placed in the most stand-by;
8) PCR instrument parameter is arranged to 95 DEG C, 5min;65 DEG C, 16h;65 DEG C, constant temperature;105 DEG C of heat lid;
9) being placed in by S1 in PCR module, start PCR program, program is run to 65 DEG C of 5min, and S2 puts into PCR instrument mould Block, after continuing to hatch 5min, puts into S3 PCR instrument module, continues to hatch 2min;
10) pipettor is adjusted to 13 μ l, takes 13 μ l S2 and be transferred to S3, take 9 μ l S1 and be transferred to S3, slowly inhale to beat and fill for several times Divide mixing mixture, seal lid, cover PCR heat lid, hatch 16 hours and carry out probe and Library hybridization;
11) take 50 μ l Dynabeads MyOne Streptavidin T1 (Invitrogen, article No.: 65601) to be placed in In 1.5ml low absorption centrifuge tube, add 200 μ l and combine liquid [0.5M NaCl (Ambion, article No.: AM9760G), 2mM Tris- HCl, pH 8.0 (Ambion, article No.: AM9855G), 0.2mM EDTA (Ambion, article No.: AM9260G)], it is rearmounted that mixing is played in suction 1min on magnetic frame, removes supernatant;
12) being taken off from magnetic frame by centrifuge tube, add 200 μ l and combine liquid, suction is played mixing and is placed on magnetic frame 1min, removes supernatant;
13) repeat step 11 twice, carry out 3 magnetic beads altogether and clean, finally combine the resuspended magnetic bead of liquid with 200 μ l;
14) probe, Library hybridization mixed liquor (step 9 product) are transferred in magnetic bead re-suspension liquid, seal lid, be placed in rotation Turn mixing on blending instrument and combine 30min;
15) centrifuge tube is placed in 2min on magnetic frame, removes supernatant;
16) centrifuge tube is taken off from magnetic frame, add 200 μ l cleanout fluid 1 [10 × SSC (and Ambion, article No.: AM9763), 1%SDS (Invitrogen, article No.: 24730020)] resuspended magnetic bead, seal lid, be placed in rotation blending instrument supernatant Wash 10min;
17) centrifuge tube is placed in 2min on magnetic frame, removes supernatant;
18) centrifuge tube is taken off from magnetic frame, add 200 μ l 65 DEG C preheating cleanout fluid 2 [1 × SSC (Ambion, Article No.: AM9763), 5%SDS (Invitrogen, article No.: 24730020)] resuspended magnetic bead, it is placed in PCR instrument module 65 DEG C Hatch 10min;
19) centrifuge tube is placed in 2min on magnetic frame, removes supernatant;
20) repeat step 17-18 twice, carry out 3 times altogether and clean;
21) adding 200 μ l 80% ethanol solution in centrifuge tube, stand 30s, remove whole ethanol, room temperature is dried 2min, adds 20 μ l pure water and slowly inhales dozen the most resuspended magnetic bead;
3.PCR enrichment target area capture product, employing NEB High fidelity PCR test kit (High- Fidelity PCR Kit, New England Biolabs, Catalog#E0553S):
1) reaction system:
Reagent name Volume
5×Phusion HF 10μl
10mM dNTPs 1μl
Post Prmier Mix (equal 10 μMs) 1μl
Resuspended magnetic bead (step 20) 20μl
Phusion archaeal dna polymerase 0.5μl
H2O 17.5μl
2) reaction condition is as follows:
3) Beckman Agencourt AMPure XP Kit [Beckman (p/n A63880)] is used to carry out PCR primer Purification;
4) using Illumina order-checking platform to carry out capture library, target area and carry out high-flux sequence, sequencing reading length is advised Use PE150 pattern.
3. result
1) use Illumina high-flux sequence instrument Hiseq 4000, sequencing library is carried out upper machine order-checking, obtains 1000 The sequencing data in individual site;
2) utilize BWA MEM software, sequencing data is compared with reference to genome HG19 with to the mankind, parameter used For: bwa mem-M-k 40-t 8-R "@RG tID:Hiseq tPL:Illumina tSM:sample ", thus obtain and reference The different single nucleotide polymorphism of genome, insert or lack, i.e. detected gene mutation.
3) size of samtools stats instrument statistical data in samtools-1.2 software, comparison rate, weight are used Multiple rate, mass value, the most again with the samtools depth instrument in software, the order-checking calculating each position, target area is deep Degree;
4) according to the order-checking degree of depth of each position, target area, respectively the statistics order-checking degree of depth >=1, >=4, >=10 and >=20 Base quantity, then by this base quantity divided by the total bases amount of target area, thus obtain 1 × coverage rate, 4 × coverage rate, 10 × coverage rate and the parameter of 20 × coverage rate.
Table 3:1000 site capture sequencing result
From above table 3 it can be seen that as a example by LNA, mean depth has 451.53 layers;4 × coverage rate has 94.35%, and 20 × coverage rate also has 93.64%, has preferable coverage rate and homogeneity, and total amount of data is only 8.52Mb reads.This The beneficial effect that the result of sample is brought has: 1) order-checking amount is little, effectively reduces cost;2) the average order-checking degree of depth is high, i.e. each mesh Mark point is sequenced repeatedly, thus data accuracy is high;3) coverage rate is high, omits site few;4) homogeneity is good, i.e. most Site has close overburden depth.
According to the analysis as the data subset compared and contrasting data, compared with LNA, bait sequences copy number is not In the case of compensation, coverage rate and homogeneity decline 4.5 and 5.1 percentage points respectively;Strong specificity limits, strict dimer limit System, strict hairpin structure limit and in the case of the restriction of strict scoring functions, coverage rate and homogeneity increase by 6.3 and 7.8 respectively Percentage point;Region within 150bp and the areal coverage in 150-300bp and homogeneity big 2.3 and 3.8 percentages respectively Point;Standard nucleic acid ATP, CTP, GTP, UTP and Biotin-UTP parallel test coverage rate and homogeneity difference with same ratio Reduce by 5.3 and 4.8 percentage points.
Although already in connection with preferred embodiment, invention has been described, it is to be understood that protection scope of the present invention is also It is not limited to embodiment as described herein.In conjunction with explanation and the practice of the present invention disclosed here, other of the present invention are implemented Example is all easy to for those skilled in the art to expect and understand.Illustrate and embodiment is to be considered only as exemplary, this Bright true scope and purport are all defined in the claims.

Claims (10)

1., from a method for nucleic acid samples enrichment target sequence nucleotides, described method includes:
A) nucleic acid samples comprising target nucleic acid sequence and consistent with target nucleic acid sequence or target sequence is had feature is provided The bait sequences of property;
B) carrying out in vitro transcription with described bait sequences for template and prepare nucleic acid analog, described nucleic acid analog is with joint portion Point, such as biotin binding species;
C) make described nucleic acid sample fragment, preferably prepare library;
D) described nucleic acid analog hybridizes with described nucleic acid samples so that described nucleic acid analog and described target sequence nucleotides shape Become nucleic acid analog/DNA hybridization complex;
E) by described bound fraction, from non-specific hybridization nucleic acid, described nucleic acid analog/DNA hybridization complex is separated, Remove non-targeted sequencing nucleic acid.
Method the most according to claim 1, also includes step f): expand described nucleic acid analog/DNA hybridization complex Increase, reach to be enriched with the purpose of target sequence nucleotides.
Method the most according to claim 1, wherein utilizes nucleic acid analog GNA, LNA, PNA, TNA or morpholine nucleic acid in step b) Carry out in vitro transcription, prepare nucleic acid analog.
Method the most according to claim 1, wherein said nucleic acid samples is genomic DNA, RNA, cDNA, mRNA, in institute State in the case of nucleic acid samples is RNA or mRNA, before step c), have the step that described RNA or mRNA reverse transcription is become DNA.
Method the most according to claim 1, wherein said bait sequences has selected from following characteristic: i) self does not produce hair clip Structure and produce without dimer each other, ii) copy number is according to the G/C content of described target nucleic acid sequence and/or space knot Structure compensates, iii) when described target area is high or extremely low G/C content region or when target area is low complexity During degree region, design bait, method for designing and described target area one with two side areas region as an alternative, described target area Cause, iv) with target nucleic acid sequence in nucleic acid samples outside other sequences without specific binding.
Method the most according to claim 4, wherein ii) in copy number compensate according to the G/C content of described target nucleic acid sequence Refer to: by G/C content on the basis of the bait sequences copy number coefficient of 50% 1, between G/C content 10%-90%, often deviate 1%, Bait sequences copy number coefficient increases 0.08-0.12.
7. bait sequences described on a solid support, such as on microarray slide.
Method the most according to claim 1, wherein to each target area, described bait sequences is at specificity, dimer, sends out One or more bait sequences that card structure and position relative with target area aspect comprehensive grading are optimum, described comprehensively Scoring is carried out by following scoring functions: S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein a=0.26- 0.34, b=0.08-0.12, c=0.17-0.23, d=0.35-0.45, concrete marking computational methods are as follows:
SSpecificityMarking calculate: to newly-designed any bar bait sequences, genome carries out sequence alignment to it, every to it Article one, the sequence in comparison calculates Tm between the sequence in described bait sequences and comparison, described bait sequences and target area respectively Territory Tm-its with arbitrary comparison on sequence TmDifference >=5 DEG C, preferably >=10 DEG C, calculate the sequence in described bait sequences and all comparisons Average Tm, S between rowSpecificity=1-TmMeansigma methods/(TmTarget-5), preferably SSpecificity=1-TmMeansigma methods/(TmTarget-10), wherein TmMeansigma methodsIt is Bait sequences and the average Tm value of all non-specific regions comparison result, TmTargetIt is bait sequences and target area Tm
SDimerMarking calculate: to newly-designed any bar bait sequences, carry out two with each the bait sequences designed Aggressiveness comparison is analyzed, and the sequence in its each comparison calculates described bait sequences and the bait sequences in described comparison respectively Between Tm, described Tm< 47 DEG C, calculates the average Tm, S between the bait sequences in described bait sequences and all comparisonsDimer =(47-TmMeansigma methods)/47, preferably Tm< 37 DEG C, calculate between the bait sequences in described bait sequences and all comparisons is average Tm, SDimer=(37-TmMeansigma methods)/37;
SHairpin structureMarking calculate: to any bar bait sequences, calculate its self optimal comparison structure, and calculate described structure Tm, described Tm< 47 DEG C, and SHairpin structure=(47-Tm)/47, described Tm < 47 DEG C, and SHairpin structure=(37-TmMeansigma methods)/ 37;
SRelative distanceMarking calculate: for target area coordinates, to newly-designed any bar bait sequences, calculate it with described mesh Mark area coordinate value of deltaDistance, δDistanceLess than 150, SRelative distance=(150-δDistance)/150。
9. the bait sequences that any one of claim 1-8 relates to.
10. including the test kit of bait sequences described in claim 9, described test kit includes, but not limited to double-stranded adapters and divides Different oligonucleotide probe, multiple.
CN201610250133.3A 2016-04-22 2016-04-22 A method of target sequence nucleotides are enriched with from nucleic acid samples Active CN105925671B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201610250133.3A CN105925671B (en) 2016-04-22 2016-04-22 A method of target sequence nucleotides are enriched with from nucleic acid samples
PCT/CN2016/106595 WO2017181670A1 (en) 2016-04-22 2016-11-21 Method for enriching target nucleic acid sequence from nucleic acid sample
AU2016102398A AU2016102398A4 (en) 2016-04-22 2016-11-21 Method for enriching target nucleic acid sequence from nucleic acid sample
AU2016403554A AU2016403554A1 (en) 2016-04-22 2016-11-21 Method for enriching target nucleic acid sequence from nucleic acid sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610250133.3A CN105925671B (en) 2016-04-22 2016-04-22 A method of target sequence nucleotides are enriched with from nucleic acid samples

Publications (2)

Publication Number Publication Date
CN105925671A true CN105925671A (en) 2016-09-07
CN105925671B CN105925671B (en) 2019-07-23

Family

ID=56839769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610250133.3A Active CN105925671B (en) 2016-04-22 2016-04-22 A method of target sequence nucleotides are enriched with from nucleic acid samples

Country Status (3)

Country Link
CN (1) CN105925671B (en)
AU (2) AU2016102398A4 (en)
WO (1) WO2017181670A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106676169A (en) * 2016-11-15 2017-05-17 上海派森诺医学检验所有限公司 Hybrid capture kit and method for detecting mutation of breast cancer susceptibility genes BRCA1 and BRCA2
WO2017181670A1 (en) * 2016-04-22 2017-10-26 艾吉泰康生物科技(北京)有限公司 Method for enriching target nucleic acid sequence from nucleic acid sample
CN108546739A (en) * 2018-04-20 2018-09-18 曹顺 A method of the nucleic acid target sequence enrichment for NGS sequencings
CN111723261A (en) * 2019-03-22 2020-09-29 昆明逆火科技股份有限公司 Search engine-based DNA comparison algorithm

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110343756B (en) * 2019-06-25 2023-02-24 广西识远医学检验实验室有限公司 Group of probes for detecting thalassemia, related kit and application
JP2023519898A (en) * 2020-03-26 2023-05-15 インテグレーティッド ディーエヌエイ テクノロジーズ インコーポレーティッド Hybridization capture methods and compositions

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003093509A1 (en) * 2002-05-01 2003-11-13 Seegene, Inc. Methods and compositions for improving specificity of pcr amplication
US8192937B2 (en) * 2004-04-07 2012-06-05 Exiqon A/S Methods for quantification of microRNAs and small interfering RNAs
CN103602658A (en) * 2013-10-15 2014-02-26 东南大学 Novel capture and enrichment technology for targeting nucleic acid molecules

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105925671B (en) * 2016-04-22 2019-07-23 艾吉泰康(嘉兴)生物科技有限公司 A method of target sequence nucleotides are enriched with from nucleic acid samples

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003093509A1 (en) * 2002-05-01 2003-11-13 Seegene, Inc. Methods and compositions for improving specificity of pcr amplication
US8192937B2 (en) * 2004-04-07 2012-06-05 Exiqon A/S Methods for quantification of microRNAs and small interfering RNAs
CN103602658A (en) * 2013-10-15 2014-02-26 东南大学 Novel capture and enrichment technology for targeting nucleic acid molecules

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017181670A1 (en) * 2016-04-22 2017-10-26 艾吉泰康生物科技(北京)有限公司 Method for enriching target nucleic acid sequence from nucleic acid sample
CN106676169A (en) * 2016-11-15 2017-05-17 上海派森诺医学检验所有限公司 Hybrid capture kit and method for detecting mutation of breast cancer susceptibility genes BRCA1 and BRCA2
CN108546739A (en) * 2018-04-20 2018-09-18 曹顺 A method of the nucleic acid target sequence enrichment for NGS sequencings
CN111723261A (en) * 2019-03-22 2020-09-29 昆明逆火科技股份有限公司 Search engine-based DNA comparison algorithm
CN111723261B (en) * 2019-03-22 2021-08-13 昆明逆火科技股份有限公司 Search engine-based DNA comparison algorithm

Also Published As

Publication number Publication date
AU2016403554A1 (en) 2018-12-13
CN105925671B (en) 2019-07-23
WO2017181670A1 (en) 2017-10-26
AU2016102398A4 (en) 2019-05-02

Similar Documents

Publication Publication Date Title
CN105925671B (en) A method of target sequence nucleotides are enriched with from nucleic acid samples
CN109310784B (en) Methods and compositions for making and using guide nucleic acids
US8986958B2 (en) Methods for generating target specific probes for solution based capture
CN101835907B (en) Methods and systems for solution based sequence enrichment and analysis of genomic regions
AU2015364286B2 (en) Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using CRISPR/Cas system proteins
KR102531677B1 (en) Methods of analyzing nucleic acids from individual cells or cell populations
CN105647907B (en) It is a kind of for targeting the preparation method of the modified DNA hybridization probe of hybrid capture
AU2014409073B2 (en) Linker element and method of using same to construct sequencing library
EP3555305B1 (en) Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
CN107446995A (en) For expanding the primer sets of multiple target dna sequences and its application in sample
AU2021240263A1 (en) Isothermal methods and related compositions for preparing nucleic acids
CN102317476A (en) Method and systems for enrichment of target genomic sequences
CA3101648A1 (en) Compositions and methods for making guide nucleic acids
CN106191256A (en) A kind of method carrying out DNA methylation order-checking for target area
CN103374759B (en) A kind of detection of lung cancer shifts method and the application thereof of significant SNP
US20110091939A1 (en) Methods and Compositions for Removing Specific Target Nucleic Acids
US9315807B1 (en) Genome selection and conversion method
CN1434873A (en) Method for selectively isolating nucleic acid
CN113454235A (en) Improved nucleic acid target enrichment and related methods
EP1221491B8 (en) SUSPENSION SYSTEM FOR SEQUENCING GENETIC SUBSTANCE, METHOD OF SEQUENCING GENETIC SUBSTANCE BY USING THE SUSPENSION SYSTEM, AND METHOD OF HIGH-SPEED SNPs SCORING BY USING THE SUSPENSION SYSTEM

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20190624

Address after: 314100 Building No. 2, 371 Hongye Road, Dayun Town, Jiashan County, Jiaxing City, Zhejiang Province 101

Applicant after: Aiji Taikang (Jiaxing) Biotechnology Co., Ltd.

Address before: Room B316-64, Building No. 29, Life Garden Road, Changping District Science and Technology Park, Beijing 102206

Applicant before: IGENETECH CO., LTD.

GR01 Patent grant
GR01 Patent grant