CN105925671B - A method of target sequence nucleotides are enriched with from nucleic acid samples - Google Patents
A method of target sequence nucleotides are enriched with from nucleic acid samples Download PDFInfo
- Publication number
- CN105925671B CN105925671B CN201610250133.3A CN201610250133A CN105925671B CN 105925671 B CN105925671 B CN 105925671B CN 201610250133 A CN201610250133 A CN 201610250133A CN 105925671 B CN105925671 B CN 105925671B
- Authority
- CN
- China
- Prior art keywords
- nucleic acid
- bait sequences
- sequence
- target
- sequences
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
Abstract
The present invention provides a kind of methods from nucleic acid samples enrichment target sequence nucleotides, which comprises provide the nucleic acid samples comprising target nucleic acid sequence and it is consistent with target nucleic acid sequence or to target sequence with characteristic bait sequences;In-vitro transcription is carried out as template using the bait sequences and prepares nucleic acid analog, and the nucleic acid analog has bound fraction;Make the nucleic acid sample fragment;The nucleic acid analog hybridizes with the nucleic acid samples, so that the nucleic acid analog and the target sequence nucleotides form nucleic acid analog/DNA hybridization compound;By the bound fraction, the nucleic acid analog/DNA hybridization compound is separated from non-specific hybridization nucleic acid, removes non-targeted sequencing nucleic acid.In preferred embodiments, the method also includes expanding the nucleic acid analog/DNA hybridization compound, achieve the purpose that be enriched with target sequence nucleotides.
Description
Technical field
The present invention relates to the capture of nucleic acid sequence, enrichment and analyses.More specifically, the present invention relates to captured based on liquid phase
Target sequence enrichment method.
Background technique
Genome sequencing can obtain mutation, insertion, missing and the structure variation of full-length genome horizontal extent.So
And since gene pool-size is larger, with 30 × carry out sequencing and will generate data volume close to 100G.And tumour etc. is relevant low
Frequency of mutation sequencing then needs at least 1000 × coverage can generate up to 3000G if carrying out genome sequencing
Data volume.In this way the data volume of scale in addition to the analysis work of data can be caused it is greatly difficult other than, can also make sequencing at
This is huge.This when, target area capture technique come into being.
Target area capture technique refers to the nucleic acid sequence of the capture target area oriented by specific technological means, so
After build library sequencing, to achieve the purpose that drop sequencing cost significantly while carrying out deep sequencing to target area
It is low.PCR is a kind of technology for being commonly used in enrichment target area, more commonly disposably using multiple PCR technique
Capture multiple target areas.Multiplex PCR is more suitable for the capture of hot spot region or the lesser target area of length;For length
Biggish target area, such as length are more than the target area of 100K, and multiplex PCR comes from its cost and technology complexity
It sees, is all no longer appropriate for.
Therefore, there is a need in the art for the new methods for being suitble to capture the biggish target area of length.
Summary of the invention
To solve the above-mentioned problems, the present invention provides a kind of target sequence enrichment methods based on liquid phase capture.
In a first aspect, the present invention provides it is a kind of from nucleic acid samples enrichment target sequence nucleotides method, the method
Include:
A) nucleic acid samples comprising target nucleic acid sequence and consistent with target nucleic acid sequence or have to target sequence are provided
Characteristic bait sequences;
B) in-vitro transcription being carried out as template using the bait sequences and preparing nucleic acid analog, the nucleic acid analog is with knot
Close part;
C) make the nucleic acid sample fragment, preferably prepare library;
D) nucleic acid analog hybridizes with the nucleic acid samples, so that the nucleic acid analog and the target sequence core
Acid forms nucleic acid analog/DNA hybridization compound;
E) by the bound fraction, it is compound that the nucleic acid analog/DNA hybridization is separated from non-specific hybridization nucleic acid
Object removes non-targeted sequencing nucleic acid.
In one embodiment, in the nucleic acid sample fragment both ends jointing sequence in the preparation library of step c)
Column, and further include step f) in step e) and the nucleic acid analog/DNA hybridization compound is carried out according to the joint sequence
Amplification achievees the purpose that be enriched with target sequence nucleotides.
In one embodiment, wherein the bait sequences have characteristic chosen from the followings: i) itself does not generate hair clip
Structure and generated between each other without dimer, ii) copy number ties according to the G/C content of the target nucleic acid sequence and/or space
Structure compensates and iii) when the target area is high or extremely low G/C content region or when target area is low multiple
When miscellaneous degree region, the target area two side areas is used to design bait, design method and the target area as replacement area
Unanimously, iv) with the other sequences in nucleic acid samples except target nucleic acid sequence without specific binding.
In one embodiment, the copy number of the bait sequences is also according to the concerned situation of the target nucleic acid sequence
It compensates.
In one embodiment, wherein the nucleic acid samples are genomic DNA, RNA, cDNA, mRNA, in the nucleic acid
In the case that sample is RNA or mRNA, there is RNA the or mRNA reverse transcription the step of at DNA before middle step c).
In one embodiment, the bait sequences on a solid carrier, such as on microarray slide.
In one embodiment, the solid carrier is also a variety of pearls or is microarray.
In one embodiment, some or all of nucleic acid analog has bound fraction.
In one embodiment, it is carried out in step b) using nucleic acid analog GNA, LNA, PNA, TNA or morpholine nucleic acid
It is transcribed in vitro, prepares nucleic acid analog, the preferably described nucleic acid analog has bound fraction.
In one embodiment, wherein the bound fraction is biotin binding species.
In one embodiment, the bait sequences copy number is mended according to the G/C content of the target sequence
It repays, G/C content is smaller or bigger, and the corresponding bait sequences copy number of the target sequence is increased more.
In one embodiment, copy number compensates according to the G/C content of the target nucleic acid sequence and refers to: with GC
Content is benchmark 1 in 50% bait sequences copy number coefficient, and G/C content deviates 50% every 1% between 10%-90%, bait
Sequence copy numbers coefficient increases 0.08-0.12.
In a specific embodiment, bait sequences copy number compensation method are as follows: contained according to the GC of the target sequence
Amount size is divided into 6 grades from high to low, wherein the 1st grade: 10%-30%;2nd grade: 30%-40%;3rd grade: 40%-60%;4th
Shelves: 60%-70%;5th grade: 70%-90%;6th grade: less than 10% or be greater than 90%, wherein the 3rd grade of bait sequences are copied
Shellfish number is benchmark copy number, and the copy number of the 2nd grade and the 4th grade of bait sequences is more than the 3rd grade, the 2.2-2.8 of the e.g. the 3rd gear
Times, the copy number of the 1st grade and the 5th grade of bait sequences is more, and 3-4 times of the e.g. the 3rd gear.For the 6th grade, G/C content is less than
10% or it is greater than 90% and target area the case where being low complex sequence, bait sequences design method is: with the target area
Domain two side areas designs probe as replacement area, is typically chosen target area two sides 300bp using inner region as replacement area,
It is preferred that the region within 150bp.
In one embodiment, wherein the bait sequences length is 60-150bp, preferably 80-120bp.
In one embodiment, wherein it is described consistent with target nucleic acid sequence or be with specificity to target sequence
Refer to, the thermodynamic stability that bait sequences combine on nontarget area will be significantly smaller than the thermodynamics combined on target area
Stability, preferably with target area TmWith non-specific region Tm>=5 DEG C, more preferably with target area TmWith non-specific region Tm
≥10℃;It is preferred that the value of Tm is calculated based on the nearest neighbor algorithm of 2007 thermodynamic parameter table of SantaLucia.
In one embodiment, it wherein the no dimer generation refers to, is formed between any two bait sequences
Dimer, Tm≤ 47 DEG C, preferably≤37 DEG C;It is preferred that the value of Tm is based on the most adjacent of 2007 thermodynamic parameter table of SantaLucia
Nearly method calculates.
In one embodiment, wherein the no hairpin structure generation refers to that any bait sequences itself form hair fastener
Structure, Tm≤ 47 DEG C, preferably≤37 DEG C;It is preferred that the value of Tm is based on the closest of 2007 thermodynamic parameter table of SantaLucia
Method calculates.
In one embodiment, wherein to each target area, the bait sequences are in specificity, dimer, hair
Card structure and one or more bait sequences optimal with the relative position aspect comprehensive score of target area, the synthesis
Scoring is carried out by following scoring functions: S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein a=0.26-
0.34, b=0.08-0.12, c=0.17-0.23, d=0.35-0.45, specific calculation method of giving a mark are as follows:
SSpecificityMarking calculate: to newly-designed any bar bait sequences, sequence alignment is carried out to it in the genome, it is right
The sequence that its each compares calculates separately Tm between the bait sequences and the sequence compared, the bait sequences and mesh
Mark region TmIt compares upper sequence T with anymDifference >=5 DEG C preferably >=10 DEG C calculate the bait sequences and all compare
Sequence between average Tm, SSpecificity=1-TmAverage value/(TmTarget- 5), preferably SSpecificity=1-TmAverage value/(TmTarget- 10), wherein
TmAverage valueIt is the average Tm value of bait sequences Yu all non-specific region comparison results, TmTargetIt is bait sequences and target area Tm
SDimerMarking calculate: to newly-designed any bar bait sequences, the bait sequences designed with each into
Row dimer compares analysis, and the sequence compared to its each calculates separately the bait sequences and the bait compared
Tm between sequence, the Tm47 DEG C of <, the average Tm between the bait sequences and all bait sequences compared is calculated,
SDimer=(47-TmAverage value)/47, the preferably described Tm37 DEG C of <, calculate the bait sequences and all bait sequences compared it
Between average Tm, SDimer=(37-TmAverage value)/37;
SHairpin structureMarking calculate: to any bar bait sequences, calculate its it is optimal itself compare structure, and calculate described in
The Tm of structure, the Tm47 DEG C of <, and SHairpin structure=(47-Tm)/47, it is 37 DEG C of Tm < preferably described, and SHairpin structure=(37-
TmAverage value)/37;
SRelative distanceMarking calculate: itself and institute are calculated to newly-designed any bar bait sequences for target area coordinates
State target area coordinates value of deltaDistance, δDistanceLess than 150, SRelative distance=(150- δDistance)/150。
In second aspect, the present invention also provides the specific bait sequences for implementing method of the invention, the specificity
Bait sequences are the bait sequences being related in first aspect present invention.
In one embodiment, the specific bait sequences are consistent with target nucleic acid sequence or have to target sequence
Characteristic, and i) itself do not generate hairpin structure and generated between each other without dimer, ii) copy number is according to the target
The G/C content and/or space structure of nucleic acid sequence compensate, iii) when the target area is high or extremely low G/C content
When region or when target area is low complex degree region, uses the target area two side areas to design as replacement area and visit
Needle, design method is consistent with the target area, iv) with the other sequences except target nucleic acid sequence in nucleic acid samples without special
Property combine.
In one embodiment, the copy number of the bait sequences is also according to the concerned situation of the target nucleic acid sequence
It compensates.
In the third aspect, the present invention also provides a kind of kit, the kit includes described in second aspect of the present invention
Bait sequences, the kit further includes, but is not limited to, double-stranded adapters molecule, a variety of different oligonucleotide probes.
In one embodiment, the kit include for realizing first aspect present invention method composition and
Reagent.The kit includes, but are not limited to double-stranded adapters molecule, a variety of different oligonucleotide probes and target nucleic acid sequence
Column are consistent or have characteristic bait sequences to target sequence, the bait sequences: itself do not generate i) hairpin structure and
Being generated between each other without dimer, ii) copy number according to the G/C content of the target nucleic acid sequence, space structure and/or closed
Note situation compensates, iii) when the target area is high or extremely low G/C content region or when target area is low
When complexity region, the target area two side areas is used to design probe, design method and the target area as replacement area
Domain is consistent, iv) with the other sequences in nucleic acid samples except target nucleic acid sequence without specific binding.In certain embodiments
In, kit includes two kinds of different double-stranded adapters molecules.The kit can further include it is at least one or more of other at
Point, the other compositions are selected from archaeal dna polymerase, T4 polynucleotide kinase, T4 DNA ligase, hybridization solution, cleaning solution and/or wash
De- liquid.In certain embodiments, the kit includes magnet.In certain embodiments, the kit includes one kind
Or a variety of enzymes, and corresponding reagent, buffer etc., such as restriction enzyme, such as MlyI, and for using MlyI into
Buffer/reagent of row restriction endonuclease reaction.
Specific embodiment
The present invention provides a kind of target sequence enrichment method based on liquid phase capture, described includes: bait sequences design,
The nucleic acid synthesis (with the method for synthesis custom primer or synthesis in solid state) of bait sequences, prepares nucleic acid with the method for in-vitro transcription
Like object, the nucleic acid analog includes bound fraction;Nucleic acid samples pre-treatment (is carried out) by the method for library preparation, and sample can be with
It is genomic DNA, RNA, cDNA, mRNA etc.;Nucleic acid analog and target sequence nucleotides are with complementary pairing principle formation nucleic acid
Like object/DNA hybridization compound;Elution removes nucleic acid analog/DNA hybridization body of low complementary pairing, removes non-targeted Sequence kernel
Acid;According to joint sequence added by nucleic acid samples pre-treatment, specific amplification is carried out to nucleic acid analog/DNA of complementary pairing,
Achieve the purpose that be enriched with target sequence nucleotides.
In invention, term " sample " is used with its widest meaning, is intended to include from any source, preferably from life
The sample or culture that object source obtains.Biological sample can be obtained from animal (including people), and including liquid, solid, tissue and
Gas.Biological sample includes blood product, such as blood plasma, serum etc..Therefore, " nucleic acid samples " include the nucleic acid in any source
(such as DNA, RNA, cDNA, mRNA, tRNA, miRNA etc.).In the case where the nucleic acid samples are RNA or mRNA, middle step
C) there is RNA the or mRNA reverse transcription the step of at DNA before.In this application, nucleic acid samples preferably originate from biological source,
Such as people or non-human cell, tissue etc..Term " inhuman " means all non-human animals and entity, including but not limited to, vertebra
Animal such as rodent, non-human primate, sheep, ox, ruminant, lagomorph, pig, goat, horse, dog, cat, birds
Etc..Inhuman further includes invertebrate and prokaryotes, such as bacterium, plant, yeast, virus etc..Therefore, it is used for this hair
The nucleic acid samples of bright method and system be from any biology, no matter the nucleic acid samples of eukaryon or protokaryon.
In invention, inventor has found that the G/C content of target sequence has the target sequence capture rate captured based on liquid phase
Larger impact.In order to reach effective capture to multiple target sequences, preferably according to the G/C content of the target sequence to described
Bait sequences copy number compensates, and G/C content is smaller or bigger, and the corresponding bait sequences copy number of the target sequence increases
What is added is more.
Inventors have found that for G/C content 50% or so, such as ± 10%, target sequence can obtain good mesh
Mark sequence capturing efficiency;For the target sequence of other G/C contents, need to carry out bait sequences copy number compensation could obtain it is good
Good target sequence capture rate.By being tested with human genomic sequence comprehensively, inventors have found that more preferable in order to reach
Target sequence capture rate, using G/C content 50% bait sequences copy number coefficient as benchmark 1, G/C content 10%-90%
Between deviate 50% every 1%, bait sequences copy number coefficient increase 0.08-0.12.For example, deviateing when G/C content is 68%
18%, induced sequence copy number coefficient is 2.44-3.16.
The case where low complex sequence is belonged to less than 10% or greater than 90% for G/C content, in this case corresponding bait
Sequence design methodology is: when the target area is high or extremely low G/C content region or when target area is low complexity
When spending region, uses the target area two side areas to design probe as replacement area, be typically chosen target area two sides 300bp
Region using inner region as replacement area, within preferably 150bp.
In the present invention, low complex degree region refers to an area as composed by the element (such as oligonucleotides) of seldom type
Domain, such as this simple repeated sequence of microsatellite.
In the present invention, it is preferred to carry out building library to the sample dna fragment after fragmentation.
In one embodiment, the compensation method of bait sequences copy number can be expressed simply as: according to the target
The G/C content size of sequence is divided into 6 grades from high to low, wherein the 1st grade: 10%-30%;2nd grade: 30%-40%;3rd grade:
40%-60%;4th grade: 60%-70%;5th grade: 70%-90%;6th grade: less than 10% or be greater than 90%, wherein the 3rd grade
The copy numbers of bait sequences be benchmark copy number, the copy number of the 2nd grade and the 4th grade corresponding bait sequences needs to increase, example
The copy number of 2.2-2.8 times, the 1st grade and the 5th grade of bait sequences of the 3rd gear in this way needs to increase more, the e.g. the 3rd gear
3-4 times.In one embodiment, for the 6th grade, G/C content is low complexity less than 10% or greater than 90% or in G/C content
The case where sequence, bait sequences design method is: using the target area two side areas to design probe as replacement area, generally
Region of the selection target region two sides 300bp using inner region as replacement area, within preferably 150bp.
In one embodiment, wherein to each target area, the bait sequences are in specificity, dimer, hair
Card structure and one or more bait sequences optimal with the relative position aspect comprehensive score of target area, the synthesis
Scoring is carried out by following scoring functions: S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein a=0.26-
0.34, b=0.08-0.12, c=0.17-0.23, d=0.35-0.45.SSpecificityEqual marking are the numerical value between 0 to 1, specifically
Marking calculation method it is as follows:
SSpecificityMarking rule: to newly-designed any bar bait sequences, sequence alignment is carried out to it in the genome, is adopted
With BLAT software, thermodynamics Tm parameter is calculated separately to its each comparison result using default parameters, if there is with target
Region TmWith non-specific region Tm5 DEG C of difference <, preferably 10 DEG C of < then abandons the bait sequences, redesign;Otherwise institute is calculated
There are the average Tm value of non-specific region comparison result, final SSpecificity=1-TmAverage value/(TmTarget- 5), wherein it is preferred that SSpecificity=1-
TmAverage value/(TmTarget- 10), wherein TmAverage valueIt is the average Tm value of bait sequences Yu all non-specific region comparison results, TmTargetIt is
Bait sequences and target area Tm;
SDimerMarking rule: to newly-designed any bar bait sequences, the bait sequences designed with each into
Row dimer compares analysis, using BLAT software, calculates separately thermodynamics to its each comparison result using default parameters
Tm parameter, if there is Tm>=47 DEG C, then the bait sequences are abandoned, redesigned;Otherwise the average Tm of all comparison results is calculated
Value, final SDimer=(47-TmAverage value)/47;It is preferred that if there is Tm≥37.DEG C, then the bait sequences are abandoned, are redesigned;Otherwise
Calculate the average Tm value of all comparison results, SDimer=(37-TmAverage value)/37;
SHairpin structureMarking rule: it is optimal that its is calculated using Smith-Waterman algorithm to any bar bait sequences
Itself compares structure, and calculates its thermodynamics Tm parameter value according to this configuration, if there is Tm>=47 DEG C, then abandon the bait sequence
Column redesign;Otherwise its SHairpin structure=(47-Tm)/47, preferably if there is Tm>=37 DEG C, then the bait sequences are abandoned, again
Design;Otherwise its SHairpin structure=(37-TmAverage value)/37;
SRelative distanceMarking rule: known target area coordinates to be designed calculate itself and target to any bar bait sequences
Area coordinate value of deltaDistance, acceptable difference is set as 150, which is empirical value;If difference is greater than 150,
The bait sequences are abandoned, are redesigned;Otherwise its SRelative distance=(150- δDistance)/150.With target area coordinates difference 150
Suitable bait sequences can not be designed in range, can also set 300, S for differenceRelative distance=(300- δDistance)/
300。
In the present invention, the T of sequencemCalculating be not limited to specific method, the Tm value that various methods calculate can be with
For the present invention, the Tm value that various methods obtain cannot reverse effect of the invention substantially, and only the degree of effect can be variant.
Although the nearest neighbor algorithm of 2007 thermodynamic parameter table of SantaLucia can calculate Tm, the Tm value that other methods calculate can be with
It corresponds, the Tm that those skilled in the art can be calculated by the simple more various methods of test, thus to each
The Tm value that kind method calculates makes appropriate selection.
According to the experience of inventor, for human genome code area, the target area more than 99% can be designed
It is suitble to bait sequences of the invention out, shows that our aforementioned steppings to the region GC and the filtering to Tm value are all reasonable.
In certain embodiments, hybridizing under preferably stringent condition between the nucleic acid analog and target nucleic acid
It carries out, the stringent condition is enough to support the hybridization between the nucleic acid analog/DNA, wherein the nucleic acid analog includes
The complementary region of compound and the target nucleic acid sample is connected, to provide the nucleic acid analog/DNA hybridization compound.Institute
It states compound and then passes through the connection compound capture, and washed under conditions of being enough and removing ergotropy combination nucleic acid,
Then the target nucleic acid sequence hybridized is eluted from the nucleic acid analog captured /DNA compound.
In certain embodiments, the nucleic acid analog includes chemical group or connection compound, such as bound fraction
Such as biotin, digoxin etc., solid carrier can be incorporated into.The solid carrier may include corresponding capture chemical combination
Object, such as the Streptavidin for biotin or the DigiTAb for digoxin.Connect the present invention is not limited to used
Compound is connect, and the connection compound substituted is equally applicable to method of the invention, bait sequences and kit.
In the present invention, the chemical group or connection compound, such as bound fraction such as biotin, digoxin etc.
Deng can connect in nucleic acid analog (glycerol nucleic acid GNA, lock nucleic acid LNA, peptide nucleic acid PNA, threose nucleic acid TNA or morpholine core
Acid) in any base.It preferably, may include ribose and/or deoxyribose, the chemistry base in the nucleic acid analog chain
Group or connection compound, such as bound fraction such as biotin, digoxin etc., can connect in ribose and/or deoxyribose
On base on.For example, including ATP, CTP, GTP and/or UTP using label in the nucleic acid analog synthesis.Label is used
The labeling method of nucleotide Cydye, DIG, biotin, rhodamine, fluorescein etc. is known in the art.For example, biotin can
For use as labeled nucleic acid probe object, it can be combined with the C atom on the 5 ' position the UTP of nucleic acid molecules or dUTP, and can with it is affine
Element in conjunction with and be detected.However, the present invention is not limited to known marker and labeling method, the marker and label of future discovery
Method is also within the scope of consideration of the invention.
In embodiments of the invention, the multiple target nucleic acid molecules preferably comprise a kind of biology full-length genome or
At least one chromosome or a kind of nucleic acid molecules of arbitrary size molecular weight.Preferably, the size of the nucleic acid molecules is at least about
200kb, at least about 500kb, at least about 1Mb, at least about 2Mb or at least about 5Mb, more preferable size about 100kb to about 5Mb, about
200kb to about 5Mb, about 500kb are to about 5Mb, about 1Mb to about 2Mb or about 2Mb to about 5Mb.
In certain embodiments, the target nucleic acid comes from animal, plant or microorganism, in preferred embodiment
In, the target nucleic acid molecules choosing comes from people.If fewer (such as the people's nucleic acid obtained in some cases of the amount of nucleic acid samples
Sample, such as the genome of developmental fetus), the amplifiable nucleic acid before implementing the method for the invention, such as pass through
Whole genome amplification.To carry out method of the invention, amplification may be necessary in advance, such as in legal medical expert's application (such as
Hereditary feature purpose is used in medical jurisprudence).
In certain embodiments, the multiple target nucleic acid molecules are one group of genomic DNA molecule.The bait sequences
It can be selected from the multiple bait sequences for for example limiting a variety of exons from multiple genetic locis, introne or regulating and controlling sequence;
Multiple bait sequences of the complete sequence of at least one particulate inheritance locus are limited, the locus size is any, preferably at least
One of 1Mb or at least the above particular size;Limit a variety of bait sequences of single nucleotide polymorphism (SNP);Or limit a kind of battle array
A variety of bait sequences of column, such as it is designed as the tiling array of the complete sequence of at least one complete chromosome of capture.
Herein, term " hybridization " means the pairing of complementary nucleic acid.Hybridization and intensity for hybridization (such as combine between nucleic acid
Intensity) influenced by many factors, such as degree complementary between nucleic acid, using hybridization conditions Stringency, formed
The melting temperature (Tm) of hybrid and the G/C content value of nucleic acid.Although the present invention is not only restricted to specific hybridization conditions, excellent
Choosing uses stringent hybridization conditions.Stringent hybridization conditions depend on sequence and (such as salinity, organic matter are deposited with Crossbreeding parameters
Waiting) and change.In general, " strictly " condition is selected as the Tm for being lower than specific nucleic acid sequence under defined ionic strength and pH
About 5 DEG C to about 20 DEG C.Preferably, stringent condition arrives for about 5 DEG C of temperature melting point lower than the specific nucleic acid for combining complementary nucleic acid
10℃.The temperature that the Tm is 50% nucleic acid (such as target nucleic acid) to be hybridized with complete pairing probe is (in defined ionic strength
Under pH).
Herein, " stringent condition " may be, for example, 50% formamide, 5 × SSC (0.75M NaCl, 0.075M lemon
Sour sodium), 50mM sodium phosphate (pH6.8), 0.1% sodium pyrophosphate, the salmon sperm dna of 5 × Denhardt solution, ultrasonication
(50mg/ml), 0.1%SDS and 10% dextran sulfate hybridize at 42 DEG C, at 42 DEG C with 0.2 × SSC (sodium chloride/lemon
Lemon acid sodium) and 55 DEG C with 50% formamide washing, then 55 DEG C with containing EDTA 0.1 × SSC washing.Such as, it is contemplated that
Buffer comprising 35% formamide, 5 × SSC and 0.1% (w/v) lauryl sodium sulfate (SDS) is suitble to non-critical in appropriateness
Under the conditions of 45 DEG C hybridization 16-72 hours.
Herein, term " primer " means oligonucleotides, no matter naturally occurring purified, being obtained after digestion or warp
What synthetic method generated, under conditions of the synthesis for being placed in the induction primer extension product complementary with nucleic acid chains (such as in nucleosides
In the presence of acid and induction agent such as archaeal dna polymerase, and at suitable temperature and pH), it can be as the starting point of synthesis.It is described
Primer preferably has the single-stranded of maximum amplification efficiency.Preferably, the primer is oligodeoxynucleotide.The primer must be sufficient
Enough long synthesis to cause extension products in the presence of the induction agent.The definite length of the primer depend on it is many because
Element, including temperature, Primer Source and institute's application method.
Herein, no matter naturally term " bait " or " bait sequences " mean oligonucleotides (such as nucleotide sequence),
It, can be with another target oligonucleotide there are purified, obtaining after digestion or generate through synthesis, recombination or PCR amplification
Such as at least part hybridization of target nucleic acid sequence.Probe can be single-stranded or double-stranded.Probe can be used for specific gene sequence
Detection identifies and separates.
Herein, term " target nucleic acid molecules " refers to molecule or sequence from target genome area.Pre-selection
Probe has determined the range of target nucleic acid molecules.Therefore, described " target " attempts to distinguish with other nucleic acid sequences.One
" segment " is defined as a nucleic acid region in the target sequence, such as one " segment " of nucleic acid sequence or " a portion
Point ".
Herein, term " separation " is when for when being related to nucleic acid, such as " separation nucleic acid " when, mean nucleic acid sequence from
It is authenticated and separates in at least one other components or pollutant that its natural origin usually combines.Isolated nucleic acid is not with
Its naturally occurring form is same as to exist.On the contrary, the nucleic acid of unsegregated nucleic acid such as DNA and RNA are with its naturally occurring shape
State exists.The isolated nucleic acid, oligonucleotides or polynucleotides can exist with single stranded form or double-stranded form.
Herein, term " with the consistent bait sequences of target nucleic acid sequence " refers to that its complementary series can be with target core
The sequence of acid sequence hybridization.It is preferred that being hybridized under strict conditions.When the target area is that high or extremely low GC contains
When amount region or when target area is low complex degree region, since the region can not design bait sequences, i.e. bait sequences
Coverage rate is zero, then appropriate area design bait sequences can be found at left and right sides of the target area;It generally can be in left and right two
Range within the 300bp of side designs bait sequences;It is preferred that the region within 150bp.
In embodiments of the invention, the bait sequences used in catching method as described herein and kit are used for
Transcription primers include connection compound, such as bound fraction.Bound fraction includes any connection or introduces for then capturing
The part at 5 ' ends of nucleic acid analog/target nucleic acid hybridization complex amplimer.Bound fraction is to introduce primer sequence 5 '
Any sequence at end, such as trappable 6 histidine (6HIS) sequence.For example, the primer comprising 6HIS sequence can be captured by nickel,
Such as in nickel coating or the pipe, micropore or purification column that are coated with pearl, particle etc. comprising nickel, wherein the pearl is packaged into
In pillar, sample is packed into and passes through pillar to capture the compound (for example, eluting with subsequent target) of complexity reduction.For
The example of another bound fraction of embodiment of the present invention includes haptens, such as digoxin, such as it is connected to amplification
5 ' ends of primer.DigiTAb capture, such as coating or the matrix comprising anti digoxin antibody can be used in digoxin.
In certain embodiments, the bound fraction is biotin, is coated with the capture matrix, example with Streptavidin
Such as pearl such as paramagnetic particle, for separating the target nucleic acid/transcription product compound from non-specific hybridization target nucleic acid.
For example, when biotin is bound fraction, Streptavidin (SA) coated matrix, such as the coated pearl of SA (such as magnetic bead/
Particle) for capturing nucleic acid analog/target complex of the biotin labeling.Wash the compound that the SA is combined, institute
The target nucleic acid of hybridization is sequenced from compound elution.
Can be used without mask array synthetic technology on a solid carrier parallel provide sequence in the genome at least one
The corresponding bait sequences in a region.Alternatively, standard DNA synthesizer can be used continuously to obtain and be applied to the solid for probe
Carrier, or can be obtained from organism and be fixed on the solid carrier.It is non-hybridized or non-with the nucleic acid analog after hybridization
The nucleic acid of specific hybrid is separated from the carrier-bound nucleic acid analog by washing.Remaining nucleic acid and the nucleic acid
Analog specific binding, in such as hot water or in the Nucleic Acid Elution buffer including, for example, TRIS buffer and/or EDTA
In eluted from the solid carrier, to generate the eluate of target nucleic acid molecules enrichment.
Alternatively, the bait sequences for target molecule can synthesize on a solid carrier as described above, as bait sequences collection
Conjunction is discharged and is expanded from the solid carrier.The release nucleic acid analog set of the transcription can covalently or non-covalently be fixed on load
Body, such as glass, metal, ceramics or polymeric beads or other solid carriers.The nucleic acid analog may be designed as from described solid
Body carrier facilitates release, for example, closest to the nucleic acid analog end of carrier or its be provided about acid or alkali labile nucleic acid
Sequence discharges the nucleic acid analog under the conditions of low or high pH respectively.A variety of connection chemical combination sheared known in the art
Object.The carrier can be with for example, to provide with the cylinder of liquid-inlet and outlet.It is familiar with cDNA chip to load this field
The method of body, such as by the way that the nucleotide of biotin labeling to be integrated in the nucleic acid analog, and use Streptavidin
It is coated with the carrier, thus the coated carrier is non-covalent attracts and fix the nucleic acid analog in the set.Institute
It states sample and passes through the carrier comprising nucleic acid analog, the target core thus hybridized with the immobilization carrier under hybridization conditions
Acid molecule can elute, for analysis or other purposes later.
Term " nucleic acid " may include, such as, but not limited to: DNA (DNA), ribonucleic acid (RNA) and artificial
Nucleic acid such as peptide nucleic acid (PNA), morpholine nucleic acid (morpholino) and lock nucleic acid (LNA), glycerol nucleic acid (glycol nucleic
Acid, GNA) and threose nucleic acid (TNA).Herein, term " nucleic acid ", " nucleic acid sequence " or " nucleic acid molecules " should be from wide
Justice explain, for example, can be ribonucleic acid (RNA) or DNA (DNA) or its analogies oligomer or
Person's polymer.The term includes by (skeleton) connects and composes between natural nucleobases, carbohydrate and covalent nucleosides molecule and having
The molecule with similar functions or a combination thereof that (skeleton) connects and composes between non-natural nucleobase, carbohydrate and covalent nucleosides.Cause
For required property, for example nucleic acid target molecule affinity is enhanced and stability increases in the presence of nuclease and other enzymes,
Such nucleic acid modified or replaced may be than native form it is further preferred that and using term " nucleic acid analog " herein
Or " nucleic acid mimics " describe.The preferred embodiment of nucleic acid mimics is comprising peptide nucleic acid (PNA), lock nucleic acid (LNA), wood-
Lock nucleic acid Uylo-LNA), thiophosphoric acid is cruel, point of 2 '-methoxyl groups, 2 '-methoxy ethoxies, morpholine nucleic acid and phosphoramidate
Sub or functionally similar nucleic acid derivative.
Embodiment
Embodiment 1: the design of bait sequences
1000 sites (distribution in these sites is shown in Table) are used on exon and introne on random selection human genome
Test method of the invention.Follow-up test is used for this 1000 random target sequence design bait sequences.
Table 1: the chromosome distribution in randomly selected 1000 sites
Chromosome | Number | Chromosome | Number |
chr1 | 92 | chr12 | 73 |
chr2 | 67 | chr13 | 23 |
chr3 | 53 | chr14 | 15 |
chr4 | 43 | chr15 | 29 |
chr5 | 45 | chr16 | 41 |
chr6 | 124 | chr17 | 36 |
chr7 | 42 | chr18 | 14 |
chr8 | 46 | chr19 | 31 |
chr9 | 34 | chr20 | 21 |
chr10 | 61 | chr21 | 9 |
chr11 | 80 | chr22 | 21 |
Bait sequences design the following steps are included:
1. firstly, the analysis of target sequence characteristic includes the following steps:
A) it is divided into 5 grades from high to low according to target sequence G/C content size, wherein 1 grade: 10%-30%;2 grades: 30%-
40%;3 grades: 40%-60%;4 grades: 60%-70%;5 grades: 70%-90%;
B) target sequence space structure is analyzed, label can form the target sequence of stable space structure;
2. secondly, established standards and scoring to bait sequences:
A) target sequence length is in 60-150bp range;
B) specificity is kept, specific principle is the thermodynamic stability that bait sequences combine on nontarget area
It is significantly smaller than the thermodynamic stability combined on target area;The index of general analysis is Tm(target area)-Tm(non-spy
Different region) >=5 DEG C of (non-specific region);Partial data Tm(target area)-Tm(non-specific region) >=10 DEG C compares (strong
Specificity limitation);Different thermodynamic calculation methods, are affected to calculated result, are based on 2007 heat of SantaLucia here
The nearest neighbor algorithm of mechanics parameter table calculates;
C) it is generated without secondary structure, secondary structure includes dimer and hairpin structure, i.e. designed bait sequences are not permitted
Perhaps dimer or hairpin structure are generated;The dimer formed between any two bait sequences, Tm≤ 47 DEG C, partial data
≤ 37 DEG C compare (stringent dimer limitation);Any bait sequences itself form hairpin structure, Tm≤ 47 DEG C, part number
(stringent hairpin structure limitation) is compared according to≤37 DEG C;Different thermodynamic calculation methods, are affected to calculated result, here
It is that the nearest neighbor algorithm based on 2007 thermodynamic parameter table of SantaLucia calculates;
D) to each target area, candidate bait sequences are analyzed, according to the specificity of each candidate sequence, dimer, hair
Card structure and relative position with target area, design synthesis scoring, then according to appraisal result, select optimal one or
The multiple bait sequences of person (i.e. scoring functions value is maximum): S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein
A=0.26-0.34, b=0.08-0.12, c=0.17-0.23, d=0.35-0.45, marking are calculated by own software and are provided,
Rule is as follows:
SSpecificityMarking rule: to newly-designed any bar bait sequences, sequence alignment is carried out to it in the genome, is adopted
With BLAT software, thermodynamics Tm parameter is calculated separately to its each comparison result using default parameters, if there is with target
Region TmWith non-specific region Tm5 DEG C of difference <, then abandon the bait sequences, redesign, 10 DEG C of part of data < work
For comparison;Otherwise the average Tm value of all comparison results, final S are calculatedSpecificity=1-TmAverage value/(TmTarget- 5), partial data SSpecificity
=1-TmAverage value/(TmTarget- 10) as a comparison, wherein TmAverage valueIt is bait sequences and all non-specific region comparison results are averaged
Tm value, TmTargetIt is bait sequences and target area Tm;
SDimerMarking rule: to newly-designed any bar bait sequences, the bait sequences designed with each into
Row dimer compares analysis, using BLAT software, calculates separately thermodynamics to its each comparison result using default parameters
Tm parameter, if there is Tm>=47 DEG C, then the bait sequences are abandoned, redesigned;Otherwise the average Tm of all comparison results is calculated
Value, final SDimer=(47-TmAverage value)/47, partial data Tm>=37 DEG C as a comparison, then abandons the bait sequences, set again
Meter;Otherwise the average Tm value of all comparison results, S are calculatedDimer=(37-TmAverage value)/37;
SHairpin structureMarking rule: it is optimal that its is calculated using Smith-Waterman algorithm to any bar bait sequences
Itself compares structure, and calculates its thermodynamics Tm parameter value according to this configuration, if there is Tm>=47 DEG C, then abandon the bait sequence
Column redesign;Otherwise its SHairpin structure=(47-Tm)/47, partial data is if there is Tm>=37 DEG C as a comparison, then abandons this and lure
Bait sequence redesigns;Otherwise its SHairpin structure=(37-TmAverage value)/37;
SRelative distanceMarking rule: known target area coordinates to be designed calculate itself and target to any bar bait sequences
Area coordinate value of deltaDistance, acceptable difference is set as 150, which is empirical value;If difference is greater than 150,
The bait sequences are abandoned, are redesigned;Otherwise its SRelative distance=(150- δDistance)/150.With target area coordinates difference 150
Suitable bait sequences can not be designed in range, also set 300, S for partial difference as a comparisonRelative distance=(300-
δDistance)/300。
3. again, carrying out the compensation of bait sequences copy number according to objectives areas case:
A) according to the Stability Classification situation of target sequence, (i.e. using 3 grades of bait sequences copy number as reference copy number
Benchmark 1);1 grade and 5 grades of corresponding bait sequences need to increase more copy number, are 2.5 times of the 3rd gear;Followed by 2 grades and 4
Shelves, corresponding bait sequences are also required to 3.5 times that slightly more copy numbers is the 3rd gear;
B) for the target sequence of formation stable space structure, bait sequences copy number is double;
It c) may be when paying close attention to region, for instance it can be possible that the region that fusion event occurs, bait for target area
Sequence copy numbers are double;
D) parallel test of bait sequences copy number uncompensation is in addition carried out under the same conditions as control.
4. finally, when target sequence can not design probe, for example, when target area is high or extremely low G/C content area
When domain, or when target area is low complex degree region (low complex degree region refers to the element such as few nucleosides by seldom type
A region composed by acid, such as this simple repeated sequence of microsatellite), since the region can not design bait sequences, i.e.,
Bait sequences coverage rate is zero, then appropriate area design bait sequences can be found at left and right sides of the target area;General meeting
Range within the 300bp of the left and right sides designs bait sequences;If the region within 150bp can design suitable bait sequence
Column, then record as control.There are 138 in the present embodiment in randomly selected target sequence and belong to such case, 68 at it
Region successful design within the 150bp of left and right goes out bait sequences, in addition 22 in the 150-300bp of its left and right successful design go out lure
Bait sequence still has 48 can not all design probe in these regions.
5. the bait sequences of final design are shown in that situation is shown in Table 2.
Table 2: bait sequences design conditions
Wherein the condition of stringent scoring functions limitation is: with target area TmWith non-specific region Tm>=10 DEG C, SSpecificity=
TmAverage value/37;Tm37 DEG C of <, SDimer=(37-TmAverage value)/37;Tm37 DEG C of <, SHairpin structure=(37-TmAverage value)/37。
Embodiment 2: the preparation of bait sequences
According to embodiment 1 design bait sequences carry out sequence preparation, bait sequences the preparation method is as follows:
1. adding the specific sequence that length is 20 bases respectively at the end of bait sequences 5 ' and 3 ' ends, specific sequence is set
Meter principle is: 1) non-specific amplification product will not be generated on target (to be captured) genome;2) G/C content is located at 30%-70%
Between, between preferably 40%-60%;3) the two not will form dimer, or dimer free energy≤47 DEG C formed, preferably
≤37℃.To form sequence to be synthesized, all bait sequences are exemplified below with a pair of of specific sequence:
5 ' end-specificity sequences--3 ' end-specificity sequence of bait sequences (60-150bp etc.) is (SEQ ID NO.1):
ATATAGATGCCGTCCTAGCG-NNNNNNNNNN......NNNNNNNNNN-TGGGCACAGGAAAGATACTT。
Wherein " NNNNNNNNNN......NNNNNNNNNN " indicates bait sequences.
2. specific sequence through the invention people's independent development solution hybridization capture sequencing probe design software generate.
3. sequence to be synthesized is utilized the extensive synthetic oligonucleotide of chip method well known in the art, then with using ammonium hydroxide
Oligonucleotides on chip is eluted, by being dissolved in distilled water after purification, forms oligonucleotides pond.
4. using oligonucleotides pond as template, 5 ' the end primers complementary with 5 ' end-specificity sequences and 3 ' end-specificity sequences and
3 ' end primers are primer, and using Taq polymerase, (JumpStart Taq DNA Polymerase is purchased to Sigma, Catalog
No.D6558 polymerase chain reaction amplification) is carried out, a large amount of double-stranded DNA pond is obtained, specific steps are as follows:
1) reaction system is as follows:
2) reaction condition is as follows:
3) QIAGEN PCR purification kit (QIAGEN, Cat No./ID 28104) is used, according to its operational manual
Carry out PCR product purifying:
4) 5 ' end bands T7 sequence (TAATACGACTCACTATAGGG) of 5 ' end primers is used to hold as forward primer and 3 '
Primer as reverse primer, using Taq polymerase (JumpStart Taq DNA Polymerase is purchased to Sigma,
Catalog No.D6558) polymerase chain reaction amplification is carried out, form 5 ' double-stranded DNA ponds of the end with T7 sequence.It operates as follows:
5) reaction system:
Reagent name | Volume |
Water | 37μl |
10×PCR Buffer | 5μl |
10mM dATP | 1μl |
10mM dCTP | 1μl |
10mM dGTP | 1μl |
10mM TTP | 1μl |
BAITS_5_PRIMER_N-T7(10μM) | 1μl |
BAITS_3_PRIMER_N(10μM) | 1μl |
JumpStart Taq DNAPolymerase | 1μl |
Oligonucleotides pond | 1μl |
6) reaction condition is as follows:
Previous step PCR reaction product is separated using gel electrophoresis, removes non-specific band, recycles 120-210bp
Region segments, using Qiagen plastic recovery kit (QIAquick Gel Extraction Kit, Cat No./ID 28704)
It is purified;
7) T7 High Yield RNA Transcription Kit (Vazyme, TR101-01/02) is used, core is utilized
The NTP and biotin of acid-like substance (glycerol nucleic acid GNA, lock nucleic acid LNA, peptide nucleic acid PNA, threose nucleic acid TNA or morpholine nucleic acid)
The UTP of label is substrate, is transcribed in vitro to previous step glue recovery purifying product, is prepared into the nucleic acid containing biotin labeling
Like object pond:
37 DEG C incubation 8-12 hours, obtain maximum output nucleic acid analog pond, be diluted to 500ng/ μ l after purification, be placed in-
80 DEG C of refrigerators save.
In addition using parallel test under the same terms in standard nucleic acid ATP, CTP, GTP, UTP and Biotin-UTP as pair
According to.
Implement 3: target area library captures
1. the DNA library preparation for high-throughput capture sequencing:
1) the 1 μ g of genomic DNA for taking tested species, is beaten at random using sonicator Bioruptor pico
Break to 150-250bp small fragment;
2) small fragment text before being captured using Illumina TruSeq DNA library preparation kit
Library preparation.
2. carrying out target area Library hybridization using the nucleic acid analog pond of preparation and the small fragment library of target species to catch
It obtains:
1) closing primer prepares:
It is synthesized according to the above primer sequence, every kind of 100 OD of synthesis, every kind of primer is diluted to 1000 μM, and according to
Isometric mixing, is named as Block 1;
2) cot-1 DNA and salmon sperm DNA are diluted to 100ng/ μ l, and mixed in equal volume, is labeled as
Block 2;
3) it takes the 6 μ l of μ l Block 1 and 5 Block 2 to be mixed, is labeled as Block Mix;
4) it takes 1 μ g small fragment genomic library to mix with 11 μ l Block Mix, and uses frozen drying centrifuge
It carries out being concentrated into 9 μ l, is labeled as reagent S1, be placed in stand-by on ice;
6) 20 μ l hybridization solutions (20 × SSPE, 2 × Dennard`s, 1mM EDTA, 1%SDS) is taken to be placed on 65 DEG C of metal baths
Preheating is labeled as S2;
7) 5 μ l pure water are taken, 2 μ l 500ng/ μ l nucleic acid analog ponds are added after mixing, slowly suction is beaten mixes for several times, marks
For S3, it is placed in stand-by on ice;
8) by PCR instrument parameter setting at 95 DEG C, 5min;65 DEG C, 16h;65 DEG C, constant temperature;105 DEG C of hot lid;
9) S1 is placed in PCR module, starts PCR program, program is run to 65 DEG C of 5min, and S2 is put into PCR instrument mould
Block continues after being incubated for 5min, S3 is put into PCR instrument module, continues to be incubated for 2min;
10) pipettor is adjusted to 13 μ l, 13 μ l S2 is taken to be transferred to S3,9 μ l S1 is taken to be transferred to S3, slowly inhaled to beat and fill for several times
Divide and mix mixture, seal pipe lid covers PCR heat lid, is incubated for 16 hours carry out probes and Library hybridization;
11) taking 50 μ l Dynabeads MyOne Streptavidin T1, (article No.: 65601) Invitrogen is placed in
In 1.5ml low adsorption centrifuge tube, 200 μ l combination liquid [0.5M NaCl (Ambion, article No.: AM9760G), 2mM Tris- is added
HCl, pH 8.0 (Ambion, article No.: AM9855G), 0.2mM EDTA (Ambion, article No.: AM9260G)], suction, which is beaten, mixes postposition
In 1min on magnetic frame, supernatant is removed;
12) centrifuge tube is removed from magnetic frame, adds 200 μ l combination liquid, suction is played mixing and is placed on magnetic frame
1min removes supernatant;
13) it repeats step 11 twice, carries out 3 magnetic bead cleanings altogether, magnetic bead finally is resuspended with 200 μ l combination liquid;
14) probe, Library hybridization mixed liquor (step 9 product) are transferred in magnetic bead re-suspension liquid, seal pipe lid is placed in rotation
Turn to mix on blending instrument and combines 30min;
15) centrifuge tube is placed in 2min on magnetic frame, removes supernatant;
16) centrifuge tube is removed from magnetic frame, be added 200 μ l cleaning solutions 1 [10 × SSC (and Ambion, article No.:
AM9763), 1%SDS (Invitrogen, article No.: 24730020)] be resuspended magnetic bead, seal pipe lid, be placed in rotation blending instrument supernatant
Wash 10min;
17) centrifuge tube is placed in 2min on magnetic frame, removes supernatant;
18) centrifuge tube is removed from magnetic frame, be added 65 DEG C of 200 μ l preheating cleaning solutions 2 [1 × SSC (Ambion,
Article No.: AM9763), and 5%SDS (Invitrogen, article No.: 24730020)] magnetic bead is resuspended, 65 DEG C are placed in PCR instrument module
It is incubated for 10min;
19) centrifuge tube is placed in 2min on magnetic frame, removes supernatant;
20) it repeats step 17-18 twice, carries out 3 cleanings altogether;
21) 200 μ l, 80% ethanol solution is added into centrifuge tube, stands 30s, removes whole alcohol, room temperature is dried
2min, 20 μ l pure water of addition slowly are inhaled to beat is resuspended magnetic bead for several times;
3.PCR be enriched with target area capture product, using NEB high fidelity PCR kit (High-
Fidelity PCR Kit, New England Biolabs, Catalog#E0553S):
1) reaction system:
Reagent name | Volume |
5×Phusion HF | 10μl |
10mM dNTPs | 1μl |
Post Prmier Mix (equal 10 μM) | 1μl |
Magnetic bead (step 20) is resuspended | 20μl |
Phusion archaeal dna polymerase | 0.5μl |
H2O | 17.5μl |
2) reaction condition is as follows:
3) PCR product is carried out using Beckman Agencourt AMPure XP Kit [Beckman (p/n A63880)]
Purifying;
4) target area capture library is carried out using Illumina microarray dataset carry out high-flux sequence, sequencing reading length suggestion
Use PE150 mode.
3. result
1) Illumina high-flux sequence instrument Hiseq 4000 is used, upper machine sequencing is carried out to sequencing library, obtains 1000
The sequencing data in a site;
2) BWA MEM software is utilized, sequencing data is compared with to the mankind with reference to genome HG19, parameter used
Are as follows: bwa mem-M-k 40-t 8-R "@RG tID:Hiseq tPL:Illumina tSM:sample ", to obtain and refer to
The different single nucleotide polymorphism of genome, insertion or missing, i.e. detected gene mutation.
3) using the size of the samtools stats tool statistical data in samtools-1.2 software, comparison rate, again
Multiple rate, mass value, then again with the samtools depth tool in software, the sequencing for calculating each position in target area is deep
Degree;
4) according to the sequencing depth of each position in target area, respectively statistics sequencing depth >=1, >=4, >=10 and >=20
Base quantity, then by the base quantity divided by the total bases amount of target area, thus obtain 1 × coverage rate, 4 × coverage rate,
The parameter of 10 × coverage rate and 20 × coverage rate.
The site table 3:1000 captures sequencing result
From the above table 3 as can be seen that by taking LNA as an example, mean depth has 451.53 layers;4 × coverage rate has 94.35%, and
20 × coverage rate also has 93.64%, has preferable coverage rate and homogeneity, and total amount of data is only 8.52Mb reads.This
The result bring beneficial effect of sample has: 1) sequencing amount is small, effectively reduces cost;2) average sequencing depth is high, i.e. each mesh
Mark point is sequenced repeatedly, thus data accuracy is high;3) coverage rate is high, and it is few to omit site;4) homogeneity is good, i.e., most
Site has similar overburden depth.
According to the analysis as the data subset and contrasting data that compare, compared with LNA, bait sequences copy number is not
Coverage rate and homogeneity decline 4.5 and 5.1 percentage points respectively in the case where compensation;Strong specificity limitation, stringent dimer limit
Coverage rate and homogeneity increase separately 6.3 and 7.8 in the case that system, the limitation of stringent hairpin structure and stringent scoring functions limit
Percentage point;The areal coverage and big 2.3 and 3.8 percentages of homogeneity difference in region and 150-300bp within 150bp
Point;Distinguished with standard nucleic acid ATP, CTP, GTP, UTP and Biotin-UTP parallel test coverage rate and homogeneity of same ratio
Reduce by 5.3 and 4.8 percentage points.
Although having been combined preferred embodiment, invention has been described, it is to be understood that protection scope of the present invention is simultaneously
It is not limited to embodiment as described herein.In conjunction with the explanation and practice of the invention disclosed here, other implementations of the invention
Example all will be readily apparent and understand for those skilled in the art.Illustrate and embodiment is regarded only as being exemplary, this hair
Bright true scope and purport is defined in the claims.
Claims (10)
1. a kind of method from nucleic acid samples enrichment target sequence nucleotides, which comprises
A) provide the nucleic acid samples comprising target nucleic acid sequence and it is consistent with target nucleic acid sequence or to target sequence with feature
The bait sequences of property, wherein to each target area, the bait sequences be specificity, dimer, hairpin structure and with
One or more optimal bait sequences of comprehensive score in terms of the relative position of target area, the comprehensive score pass through as follows
Scoring functions carry out: S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein a=0.26-0.34, b=0.08-
0.12, c=0.17-0.23, d=0.35-0.45, specific calculation method of giving a mark are as follows:
SSpecificityMarking calculate: to newly-designed any bar bait sequences, sequence alignment is carried out to it in the genome, it is every to its
One sequence compared calculates separately Tm between the bait sequences and the sequence compared, the bait sequences and target area
Domain TmIt compares upper sequence T with anymDifference >=5 DEG C, calculate flat between the bait sequences and all sequences compared
Equal Tm, SSpecificity=1-TmAverage value/(TmTarget- 5), wherein TmAverage valueIt is bait sequences and all non-specific region comparison results are averaged
Tm value, TmTargetIt is bait sequences and target area Tm;
SDimerMarking calculate: to newly-designed any bar bait sequences, carry out two with the bait sequences that each has designed
Aggressiveness compares analysis, and the sequence compared to its each calculates separately the bait sequences and the bait sequences compared
Between Tm, the Tm< 47 DEG C, calculate the average Tm, S between the bait sequences and all bait sequences comparedDimer
=(47-TmAverage value)/47;
SHairpin structureMarking calculate: to any bar bait sequences, calculate its it is optimal itself compare structure, and calculate the structure
Tm, the Tm< 47 DEG C, and SHairpin structure=(47-Tm)/47, Tm < 47 DEG C, and SHairpin structure=(37-TmAverage value)/37;
SRelative distanceMarking calculate: itself and the mesh are calculated to newly-designed any bar bait sequences for target area coordinates
Mark area coordinate value of deltaDistance, δDistanceLess than 150, SRelative distance=(150- δDistance)/150;
B) it is transcribed in vitro using the bait sequences as template using nucleic acid analog GNA, LNA, PNA, TNA or morpholine nucleic acid
Nucleic acid analog is prepared, the nucleic acid analog has bound fraction;
C) make the nucleic acid sample fragment;
D) nucleic acid analog hybridizes with the nucleic acid samples, so that the nucleic acid analog and the target sequence nucleotides shape
At nucleic acid analog/DNA hybridization compound;
E) by the bound fraction, the nucleic acid analog/DNA hybridization compound is separated from non-specific hybridization nucleic acid,
Remove non-targeted sequencing nucleic acid.
2. the method according to claim 1, SSpecificityMarking calculate: to newly-designed any bar bait sequences, in the genome
Sequence alignment is carried out to it, the sequence compared to its each calculates separately between the bait sequences and the sequence compared
Tm, the bait sequences and target area TmIt compares upper sequence T with anymDifference >=10 DEG C, calculate the bait sequences with
Average Tm, S between sequence in all comparisonsSpecificity=1-TmAverage value/(TmTarget- 10), wherein TmAverage valueIt is bait sequences and institute
There are the average Tm value of non-specific region comparison result, TmTargetIt is bait sequences and target area Tm。
3. the method according to claim 1, SDimerMarking calculate: to newly-designed any bar bait sequences, with each
Bait sequences through designing carry out dimer and compare analysis, and the sequence compared to its each calculates separately the bait sequences
With the Tm between the bait sequences on described compare, Tm< 37 DEG C, calculate the bait sequences and all bait sequences compared
Between average Tm, SDimer=(37-TmAverage value)/37。
4. method according to claim 1-3 further includes step f): multiple to the nucleic acid analog/DNA hybridization
It closes object to be expanded, achievees the purpose that be enriched with target sequence nucleotides.
5. method according to claim 1-3, wherein in step b), the bound fraction is biotin engaging portion
Point.
6. method according to claim 1-3, wherein the nucleic acid samples be genomic DNA, RNA, cDNA,
MRNA, in the case where the nucleic acid samples are RNA or mRNA, have before the step c) by RNA the or mRNA reverse transcription at
The step of DNA.
7. method according to claim 1-3 wherein in step c), makes the nucleic acid sample fragment, preparation
Library.
8. method according to claim 1-3, the bait sequences are on a solid carrier.
9. method according to claim 8, the solid carrier is microarray slide.
10. method according to claim 1-3, wherein the bait sequences have characteristic chosen from the followings: i)
Hairpin structure itself is not generated and is generated between each other without dimer, ii) copy number is according to the GC of the target nucleic acid sequence
Content and/or space structure compensate, iii) when the target area is high or extremely low G/C content region or work as
When target area is low complex degree region, the target area two side areas is used to design bait, design method as replacement area
, iv consistent with the target area) with the other sequences except target nucleic acid sequence in nucleic acid samples without specific binding.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610250133.3A CN105925671B (en) | 2016-04-22 | 2016-04-22 | A method of target sequence nucleotides are enriched with from nucleic acid samples |
PCT/CN2016/106595 WO2017181670A1 (en) | 2016-04-22 | 2016-11-21 | Method for enriching target nucleic acid sequence from nucleic acid sample |
AU2016102398A AU2016102398A4 (en) | 2016-04-22 | 2016-11-21 | Method for enriching target nucleic acid sequence from nucleic acid sample |
AU2016403554A AU2016403554A1 (en) | 2016-04-22 | 2016-11-21 | Method for enriching target nucleic acid sequence from nucleic acid sample |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610250133.3A CN105925671B (en) | 2016-04-22 | 2016-04-22 | A method of target sequence nucleotides are enriched with from nucleic acid samples |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105925671A CN105925671A (en) | 2016-09-07 |
CN105925671B true CN105925671B (en) | 2019-07-23 |
Family
ID=56839769
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610250133.3A Active CN105925671B (en) | 2016-04-22 | 2016-04-22 | A method of target sequence nucleotides are enriched with from nucleic acid samples |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN105925671B (en) |
AU (2) | AU2016102398A4 (en) |
WO (1) | WO2017181670A1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105925671B (en) * | 2016-04-22 | 2019-07-23 | 艾吉泰康(嘉兴)生物科技有限公司 | A method of target sequence nucleotides are enriched with from nucleic acid samples |
CN106676169B (en) * | 2016-11-15 | 2021-01-12 | 上海派森诺医学检验所有限公司 | Hybridization capture kit for detecting breast cancer susceptibility genes BRCA1 and BRCA2 mutation and method thereof |
CN108546739A (en) * | 2018-04-20 | 2018-09-18 | 曹顺 | A method of the nucleic acid target sequence enrichment for NGS sequencings |
CN111723261B (en) * | 2019-03-22 | 2021-08-13 | 昆明逆火科技股份有限公司 | Search engine-based DNA comparison algorithm |
CN110343756B (en) * | 2019-06-25 | 2023-02-24 | 广西识远医学检验实验室有限公司 | Group of probes for detecting thalassemia, related kit and application |
JP2023519898A (en) * | 2020-03-26 | 2023-05-15 | インテグレーティッド ディーエヌエイ テクノロジーズ インコーポレーティッド | Hybridization capture methods and compositions |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003093509A1 (en) * | 2002-05-01 | 2003-11-13 | Seegene, Inc. | Methods and compositions for improving specificity of pcr amplication |
US8192937B2 (en) * | 2004-04-07 | 2012-06-05 | Exiqon A/S | Methods for quantification of microRNAs and small interfering RNAs |
CN103602658A (en) * | 2013-10-15 | 2014-02-26 | 东南大学 | Novel capture and enrichment technology for targeting nucleic acid molecules |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105925671B (en) * | 2016-04-22 | 2019-07-23 | 艾吉泰康(嘉兴)生物科技有限公司 | A method of target sequence nucleotides are enriched with from nucleic acid samples |
-
2016
- 2016-04-22 CN CN201610250133.3A patent/CN105925671B/en active Active
- 2016-11-21 WO PCT/CN2016/106595 patent/WO2017181670A1/en active Application Filing
- 2016-11-21 AU AU2016102398A patent/AU2016102398A4/en active Active
- 2016-11-21 AU AU2016403554A patent/AU2016403554A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003093509A1 (en) * | 2002-05-01 | 2003-11-13 | Seegene, Inc. | Methods and compositions for improving specificity of pcr amplication |
US8192937B2 (en) * | 2004-04-07 | 2012-06-05 | Exiqon A/S | Methods for quantification of microRNAs and small interfering RNAs |
CN103602658A (en) * | 2013-10-15 | 2014-02-26 | 东南大学 | Novel capture and enrichment technology for targeting nucleic acid molecules |
Also Published As
Publication number | Publication date |
---|---|
AU2016403554A1 (en) | 2018-12-13 |
WO2017181670A1 (en) | 2017-10-26 |
CN105925671A (en) | 2016-09-07 |
AU2016102398A4 (en) | 2019-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105925671B (en) | A method of target sequence nucleotides are enriched with from nucleic acid samples | |
CN109310784B (en) | Methods and compositions for making and using guide nucleic acids | |
US20220282242A1 (en) | Contiguity Preserving Transposition | |
US8986958B2 (en) | Methods for generating target specific probes for solution based capture | |
EP3234200B1 (en) | Method for targeted depletion of nucleic acids using crispr/cas system proteins | |
CN105647907B (en) | It is a kind of for targeting the preparation method of the modified DNA hybridization probe of hybrid capture | |
JP7282692B2 (en) | Preparation and Use of Guide Nucleic Acids | |
AU2014409073B2 (en) | Linker element and method of using same to construct sequencing library | |
CN107446995A (en) | For expanding the primer sets of multiple target dna sequences and its application in sample | |
CN103333949A (en) | High throughput physical mapping using aflp | |
CN102317476A (en) | Method and systems for enrichment of target genomic sequences | |
AU2021240263A1 (en) | Isothermal methods and related compositions for preparing nucleic acids | |
CN107760772A (en) | For the method for nucleic acid match end sequencing, composition, system, instrument and kit | |
CA3101648A1 (en) | Compositions and methods for making guide nucleic acids | |
CN106191256A (en) | A kind of method carrying out DNA methylation order-checking for target area | |
US20110091939A1 (en) | Methods and Compositions for Removing Specific Target Nucleic Acids | |
US11414686B2 (en) | Stoichiometric nucleic acid purification using randomer capture probe libraries | |
US9315807B1 (en) | Genome selection and conversion method | |
CN113454235A (en) | Improved nucleic acid target enrichment and related methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20190624 Address after: 314100 Building No. 2, 371 Hongye Road, Dayun Town, Jiashan County, Jiaxing City, Zhejiang Province 101 Applicant after: Aiji Taikang (Jiaxing) Biotechnology Co., Ltd. Address before: Room B316-64, Building No. 29, Life Garden Road, Changping District Science and Technology Park, Beijing 102206 Applicant before: IGENETECH CO., LTD. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |