CN105925671B - A method of target sequence nucleotides are enriched with from nucleic acid samples - Google Patents

A method of target sequence nucleotides are enriched with from nucleic acid samples Download PDF

Info

Publication number
CN105925671B
CN105925671B CN201610250133.3A CN201610250133A CN105925671B CN 105925671 B CN105925671 B CN 105925671B CN 201610250133 A CN201610250133 A CN 201610250133A CN 105925671 B CN105925671 B CN 105925671B
Authority
CN
China
Prior art keywords
nucleic acid
bait sequences
sequence
target
sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610250133.3A
Other languages
Chinese (zh)
Other versions
CN105925671A (en
Inventor
蔡万世
王瑞超
屈武斌
杭兴宜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aiji Taikang (Jiaxing) Biotechnology Co., Ltd.
Original Assignee
Aiji Taikang (jiaxing) Biotechnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aiji Taikang (jiaxing) Biotechnology Co Ltd filed Critical Aiji Taikang (jiaxing) Biotechnology Co Ltd
Priority to CN201610250133.3A priority Critical patent/CN105925671B/en
Publication of CN105925671A publication Critical patent/CN105925671A/en
Priority to PCT/CN2016/106595 priority patent/WO2017181670A1/en
Priority to AU2016102398A priority patent/AU2016102398A4/en
Priority to AU2016403554A priority patent/AU2016403554A1/en
Application granted granted Critical
Publication of CN105925671B publication Critical patent/CN105925671B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids

Abstract

The present invention provides a kind of methods from nucleic acid samples enrichment target sequence nucleotides, which comprises provide the nucleic acid samples comprising target nucleic acid sequence and it is consistent with target nucleic acid sequence or to target sequence with characteristic bait sequences;In-vitro transcription is carried out as template using the bait sequences and prepares nucleic acid analog, and the nucleic acid analog has bound fraction;Make the nucleic acid sample fragment;The nucleic acid analog hybridizes with the nucleic acid samples, so that the nucleic acid analog and the target sequence nucleotides form nucleic acid analog/DNA hybridization compound;By the bound fraction, the nucleic acid analog/DNA hybridization compound is separated from non-specific hybridization nucleic acid, removes non-targeted sequencing nucleic acid.In preferred embodiments, the method also includes expanding the nucleic acid analog/DNA hybridization compound, achieve the purpose that be enriched with target sequence nucleotides.

Description

A method of target sequence nucleotides are enriched with from nucleic acid samples
Technical field
The present invention relates to the capture of nucleic acid sequence, enrichment and analyses.More specifically, the present invention relates to captured based on liquid phase Target sequence enrichment method.
Background technique
Genome sequencing can obtain mutation, insertion, missing and the structure variation of full-length genome horizontal extent.So And since gene pool-size is larger, with 30 × carry out sequencing and will generate data volume close to 100G.And tumour etc. is relevant low Frequency of mutation sequencing then needs at least 1000 × coverage can generate up to 3000G if carrying out genome sequencing Data volume.In this way the data volume of scale in addition to the analysis work of data can be caused it is greatly difficult other than, can also make sequencing at This is huge.This when, target area capture technique come into being.
Target area capture technique refers to the nucleic acid sequence of the capture target area oriented by specific technological means, so After build library sequencing, to achieve the purpose that drop sequencing cost significantly while carrying out deep sequencing to target area It is low.PCR is a kind of technology for being commonly used in enrichment target area, more commonly disposably using multiple PCR technique Capture multiple target areas.Multiplex PCR is more suitable for the capture of hot spot region or the lesser target area of length;For length Biggish target area, such as length are more than the target area of 100K, and multiplex PCR comes from its cost and technology complexity It sees, is all no longer appropriate for.
Therefore, there is a need in the art for the new methods for being suitble to capture the biggish target area of length.
Summary of the invention
To solve the above-mentioned problems, the present invention provides a kind of target sequence enrichment methods based on liquid phase capture.
In a first aspect, the present invention provides it is a kind of from nucleic acid samples enrichment target sequence nucleotides method, the method Include:
A) nucleic acid samples comprising target nucleic acid sequence and consistent with target nucleic acid sequence or have to target sequence are provided Characteristic bait sequences;
B) in-vitro transcription being carried out as template using the bait sequences and preparing nucleic acid analog, the nucleic acid analog is with knot Close part;
C) make the nucleic acid sample fragment, preferably prepare library;
D) nucleic acid analog hybridizes with the nucleic acid samples, so that the nucleic acid analog and the target sequence core Acid forms nucleic acid analog/DNA hybridization compound;
E) by the bound fraction, it is compound that the nucleic acid analog/DNA hybridization is separated from non-specific hybridization nucleic acid Object removes non-targeted sequencing nucleic acid.
In one embodiment, in the nucleic acid sample fragment both ends jointing sequence in the preparation library of step c) Column, and further include step f) in step e) and the nucleic acid analog/DNA hybridization compound is carried out according to the joint sequence Amplification achievees the purpose that be enriched with target sequence nucleotides.
In one embodiment, wherein the bait sequences have characteristic chosen from the followings: i) itself does not generate hair clip Structure and generated between each other without dimer, ii) copy number ties according to the G/C content of the target nucleic acid sequence and/or space Structure compensates and iii) when the target area is high or extremely low G/C content region or when target area is low multiple When miscellaneous degree region, the target area two side areas is used to design bait, design method and the target area as replacement area Unanimously, iv) with the other sequences in nucleic acid samples except target nucleic acid sequence without specific binding.
In one embodiment, the copy number of the bait sequences is also according to the concerned situation of the target nucleic acid sequence It compensates.
In one embodiment, wherein the nucleic acid samples are genomic DNA, RNA, cDNA, mRNA, in the nucleic acid In the case that sample is RNA or mRNA, there is RNA the or mRNA reverse transcription the step of at DNA before middle step c).
In one embodiment, the bait sequences on a solid carrier, such as on microarray slide.
In one embodiment, the solid carrier is also a variety of pearls or is microarray.
In one embodiment, some or all of nucleic acid analog has bound fraction.
In one embodiment, it is carried out in step b) using nucleic acid analog GNA, LNA, PNA, TNA or morpholine nucleic acid It is transcribed in vitro, prepares nucleic acid analog, the preferably described nucleic acid analog has bound fraction.
In one embodiment, wherein the bound fraction is biotin binding species.
In one embodiment, the bait sequences copy number is mended according to the G/C content of the target sequence It repays, G/C content is smaller or bigger, and the corresponding bait sequences copy number of the target sequence is increased more.
In one embodiment, copy number compensates according to the G/C content of the target nucleic acid sequence and refers to: with GC Content is benchmark 1 in 50% bait sequences copy number coefficient, and G/C content deviates 50% every 1% between 10%-90%, bait Sequence copy numbers coefficient increases 0.08-0.12.
In a specific embodiment, bait sequences copy number compensation method are as follows: contained according to the GC of the target sequence Amount size is divided into 6 grades from high to low, wherein the 1st grade: 10%-30%;2nd grade: 30%-40%;3rd grade: 40%-60%;4th Shelves: 60%-70%;5th grade: 70%-90%;6th grade: less than 10% or be greater than 90%, wherein the 3rd grade of bait sequences are copied Shellfish number is benchmark copy number, and the copy number of the 2nd grade and the 4th grade of bait sequences is more than the 3rd grade, the 2.2-2.8 of the e.g. the 3rd gear Times, the copy number of the 1st grade and the 5th grade of bait sequences is more, and 3-4 times of the e.g. the 3rd gear.For the 6th grade, G/C content is less than 10% or it is greater than 90% and target area the case where being low complex sequence, bait sequences design method is: with the target area Domain two side areas designs probe as replacement area, is typically chosen target area two sides 300bp using inner region as replacement area, It is preferred that the region within 150bp.
In one embodiment, wherein the bait sequences length is 60-150bp, preferably 80-120bp.
In one embodiment, wherein it is described consistent with target nucleic acid sequence or be with specificity to target sequence Refer to, the thermodynamic stability that bait sequences combine on nontarget area will be significantly smaller than the thermodynamics combined on target area Stability, preferably with target area TmWith non-specific region Tm>=5 DEG C, more preferably with target area TmWith non-specific region Tm ≥10℃;It is preferred that the value of Tm is calculated based on the nearest neighbor algorithm of 2007 thermodynamic parameter table of SantaLucia.
In one embodiment, it wherein the no dimer generation refers to, is formed between any two bait sequences Dimer, Tm≤ 47 DEG C, preferably≤37 DEG C;It is preferred that the value of Tm is based on the most adjacent of 2007 thermodynamic parameter table of SantaLucia Nearly method calculates.
In one embodiment, wherein the no hairpin structure generation refers to that any bait sequences itself form hair fastener Structure, Tm≤ 47 DEG C, preferably≤37 DEG C;It is preferred that the value of Tm is based on the closest of 2007 thermodynamic parameter table of SantaLucia Method calculates.
In one embodiment, wherein to each target area, the bait sequences are in specificity, dimer, hair Card structure and one or more bait sequences optimal with the relative position aspect comprehensive score of target area, the synthesis Scoring is carried out by following scoring functions: S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein a=0.26- 0.34, b=0.08-0.12, c=0.17-0.23, d=0.35-0.45, specific calculation method of giving a mark are as follows:
SSpecificityMarking calculate: to newly-designed any bar bait sequences, sequence alignment is carried out to it in the genome, it is right The sequence that its each compares calculates separately Tm between the bait sequences and the sequence compared, the bait sequences and mesh Mark region TmIt compares upper sequence T with anymDifference >=5 DEG C preferably >=10 DEG C calculate the bait sequences and all compare Sequence between average Tm, SSpecificity=1-TmAverage value/(TmTarget- 5), preferably SSpecificity=1-TmAverage value/(TmTarget- 10), wherein TmAverage valueIt is the average Tm value of bait sequences Yu all non-specific region comparison results, TmTargetIt is bait sequences and target area Tm
SDimerMarking calculate: to newly-designed any bar bait sequences, the bait sequences designed with each into Row dimer compares analysis, and the sequence compared to its each calculates separately the bait sequences and the bait compared Tm between sequence, the Tm47 DEG C of <, the average Tm between the bait sequences and all bait sequences compared is calculated, SDimer=(47-TmAverage value)/47, the preferably described Tm37 DEG C of <, calculate the bait sequences and all bait sequences compared it Between average Tm, SDimer=(37-TmAverage value)/37;
SHairpin structureMarking calculate: to any bar bait sequences, calculate its it is optimal itself compare structure, and calculate described in The Tm of structure, the Tm47 DEG C of <, and SHairpin structure=(47-Tm)/47, it is 37 DEG C of Tm < preferably described, and SHairpin structure=(37- TmAverage value)/37;
SRelative distanceMarking calculate: itself and institute are calculated to newly-designed any bar bait sequences for target area coordinates State target area coordinates value of deltaDistance, δDistanceLess than 150, SRelative distance=(150- δDistance)/150。
In second aspect, the present invention also provides the specific bait sequences for implementing method of the invention, the specificity Bait sequences are the bait sequences being related in first aspect present invention.
In one embodiment, the specific bait sequences are consistent with target nucleic acid sequence or have to target sequence Characteristic, and i) itself do not generate hairpin structure and generated between each other without dimer, ii) copy number is according to the target The G/C content and/or space structure of nucleic acid sequence compensate, iii) when the target area is high or extremely low G/C content When region or when target area is low complex degree region, uses the target area two side areas to design as replacement area and visit Needle, design method is consistent with the target area, iv) with the other sequences except target nucleic acid sequence in nucleic acid samples without special Property combine.
In one embodiment, the copy number of the bait sequences is also according to the concerned situation of the target nucleic acid sequence It compensates.
In the third aspect, the present invention also provides a kind of kit, the kit includes described in second aspect of the present invention Bait sequences, the kit further includes, but is not limited to, double-stranded adapters molecule, a variety of different oligonucleotide probes.
In one embodiment, the kit include for realizing first aspect present invention method composition and Reagent.The kit includes, but are not limited to double-stranded adapters molecule, a variety of different oligonucleotide probes and target nucleic acid sequence Column are consistent or have characteristic bait sequences to target sequence, the bait sequences: itself do not generate i) hairpin structure and Being generated between each other without dimer, ii) copy number according to the G/C content of the target nucleic acid sequence, space structure and/or closed Note situation compensates, iii) when the target area is high or extremely low G/C content region or when target area is low When complexity region, the target area two side areas is used to design probe, design method and the target area as replacement area Domain is consistent, iv) with the other sequences in nucleic acid samples except target nucleic acid sequence without specific binding.In certain embodiments In, kit includes two kinds of different double-stranded adapters molecules.The kit can further include it is at least one or more of other at Point, the other compositions are selected from archaeal dna polymerase, T4 polynucleotide kinase, T4 DNA ligase, hybridization solution, cleaning solution and/or wash De- liquid.In certain embodiments, the kit includes magnet.In certain embodiments, the kit includes one kind Or a variety of enzymes, and corresponding reagent, buffer etc., such as restriction enzyme, such as MlyI, and for using MlyI into Buffer/reagent of row restriction endonuclease reaction.
Specific embodiment
The present invention provides a kind of target sequence enrichment method based on liquid phase capture, described includes: bait sequences design, The nucleic acid synthesis (with the method for synthesis custom primer or synthesis in solid state) of bait sequences, prepares nucleic acid with the method for in-vitro transcription Like object, the nucleic acid analog includes bound fraction;Nucleic acid samples pre-treatment (is carried out) by the method for library preparation, and sample can be with It is genomic DNA, RNA, cDNA, mRNA etc.;Nucleic acid analog and target sequence nucleotides are with complementary pairing principle formation nucleic acid Like object/DNA hybridization compound;Elution removes nucleic acid analog/DNA hybridization body of low complementary pairing, removes non-targeted Sequence kernel Acid;According to joint sequence added by nucleic acid samples pre-treatment, specific amplification is carried out to nucleic acid analog/DNA of complementary pairing, Achieve the purpose that be enriched with target sequence nucleotides.
In invention, term " sample " is used with its widest meaning, is intended to include from any source, preferably from life The sample or culture that object source obtains.Biological sample can be obtained from animal (including people), and including liquid, solid, tissue and Gas.Biological sample includes blood product, such as blood plasma, serum etc..Therefore, " nucleic acid samples " include the nucleic acid in any source (such as DNA, RNA, cDNA, mRNA, tRNA, miRNA etc.).In the case where the nucleic acid samples are RNA or mRNA, middle step C) there is RNA the or mRNA reverse transcription the step of at DNA before.In this application, nucleic acid samples preferably originate from biological source, Such as people or non-human cell, tissue etc..Term " inhuman " means all non-human animals and entity, including but not limited to, vertebra Animal such as rodent, non-human primate, sheep, ox, ruminant, lagomorph, pig, goat, horse, dog, cat, birds Etc..Inhuman further includes invertebrate and prokaryotes, such as bacterium, plant, yeast, virus etc..Therefore, it is used for this hair The nucleic acid samples of bright method and system be from any biology, no matter the nucleic acid samples of eukaryon or protokaryon.
In invention, inventor has found that the G/C content of target sequence has the target sequence capture rate captured based on liquid phase Larger impact.In order to reach effective capture to multiple target sequences, preferably according to the G/C content of the target sequence to described Bait sequences copy number compensates, and G/C content is smaller or bigger, and the corresponding bait sequences copy number of the target sequence increases What is added is more.
Inventors have found that for G/C content 50% or so, such as ± 10%, target sequence can obtain good mesh Mark sequence capturing efficiency;For the target sequence of other G/C contents, need to carry out bait sequences copy number compensation could obtain it is good Good target sequence capture rate.By being tested with human genomic sequence comprehensively, inventors have found that more preferable in order to reach Target sequence capture rate, using G/C content 50% bait sequences copy number coefficient as benchmark 1, G/C content 10%-90% Between deviate 50% every 1%, bait sequences copy number coefficient increase 0.08-0.12.For example, deviateing when G/C content is 68% 18%, induced sequence copy number coefficient is 2.44-3.16.
The case where low complex sequence is belonged to less than 10% or greater than 90% for G/C content, in this case corresponding bait Sequence design methodology is: when the target area is high or extremely low G/C content region or when target area is low complexity When spending region, uses the target area two side areas to design probe as replacement area, be typically chosen target area two sides 300bp Region using inner region as replacement area, within preferably 150bp.
In the present invention, low complex degree region refers to an area as composed by the element (such as oligonucleotides) of seldom type Domain, such as this simple repeated sequence of microsatellite.
In the present invention, it is preferred to carry out building library to the sample dna fragment after fragmentation.
In one embodiment, the compensation method of bait sequences copy number can be expressed simply as: according to the target The G/C content size of sequence is divided into 6 grades from high to low, wherein the 1st grade: 10%-30%;2nd grade: 30%-40%;3rd grade: 40%-60%;4th grade: 60%-70%;5th grade: 70%-90%;6th grade: less than 10% or be greater than 90%, wherein the 3rd grade The copy numbers of bait sequences be benchmark copy number, the copy number of the 2nd grade and the 4th grade corresponding bait sequences needs to increase, example The copy number of 2.2-2.8 times, the 1st grade and the 5th grade of bait sequences of the 3rd gear in this way needs to increase more, the e.g. the 3rd gear 3-4 times.In one embodiment, for the 6th grade, G/C content is low complexity less than 10% or greater than 90% or in G/C content The case where sequence, bait sequences design method is: using the target area two side areas to design probe as replacement area, generally Region of the selection target region two sides 300bp using inner region as replacement area, within preferably 150bp.
In one embodiment, wherein to each target area, the bait sequences are in specificity, dimer, hair Card structure and one or more bait sequences optimal with the relative position aspect comprehensive score of target area, the synthesis Scoring is carried out by following scoring functions: S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein a=0.26- 0.34, b=0.08-0.12, c=0.17-0.23, d=0.35-0.45.SSpecificityEqual marking are the numerical value between 0 to 1, specifically Marking calculation method it is as follows:
SSpecificityMarking rule: to newly-designed any bar bait sequences, sequence alignment is carried out to it in the genome, is adopted With BLAT software, thermodynamics Tm parameter is calculated separately to its each comparison result using default parameters, if there is with target Region TmWith non-specific region Tm5 DEG C of difference <, preferably 10 DEG C of < then abandons the bait sequences, redesign;Otherwise institute is calculated There are the average Tm value of non-specific region comparison result, final SSpecificity=1-TmAverage value/(TmTarget- 5), wherein it is preferred that SSpecificity=1- TmAverage value/(TmTarget- 10), wherein TmAverage valueIt is the average Tm value of bait sequences Yu all non-specific region comparison results, TmTargetIt is Bait sequences and target area Tm
SDimerMarking rule: to newly-designed any bar bait sequences, the bait sequences designed with each into Row dimer compares analysis, using BLAT software, calculates separately thermodynamics to its each comparison result using default parameters Tm parameter, if there is Tm>=47 DEG C, then the bait sequences are abandoned, redesigned;Otherwise the average Tm of all comparison results is calculated Value, final SDimer=(47-TmAverage value)/47;It is preferred that if there is Tm≥37.DEG C, then the bait sequences are abandoned, are redesigned;Otherwise Calculate the average Tm value of all comparison results, SDimer=(37-TmAverage value)/37;
SHairpin structureMarking rule: it is optimal that its is calculated using Smith-Waterman algorithm to any bar bait sequences Itself compares structure, and calculates its thermodynamics Tm parameter value according to this configuration, if there is Tm>=47 DEG C, then abandon the bait sequence Column redesign;Otherwise its SHairpin structure=(47-Tm)/47, preferably if there is Tm>=37 DEG C, then the bait sequences are abandoned, again Design;Otherwise its SHairpin structure=(37-TmAverage value)/37;
SRelative distanceMarking rule: known target area coordinates to be designed calculate itself and target to any bar bait sequences Area coordinate value of deltaDistance, acceptable difference is set as 150, which is empirical value;If difference is greater than 150, The bait sequences are abandoned, are redesigned;Otherwise its SRelative distance=(150- δDistance)/150.With target area coordinates difference 150 Suitable bait sequences can not be designed in range, can also set 300, S for differenceRelative distance=(300- δDistance)/ 300。
In the present invention, the T of sequencemCalculating be not limited to specific method, the Tm value that various methods calculate can be with For the present invention, the Tm value that various methods obtain cannot reverse effect of the invention substantially, and only the degree of effect can be variant. Although the nearest neighbor algorithm of 2007 thermodynamic parameter table of SantaLucia can calculate Tm, the Tm value that other methods calculate can be with It corresponds, the Tm that those skilled in the art can be calculated by the simple more various methods of test, thus to each The Tm value that kind method calculates makes appropriate selection.
According to the experience of inventor, for human genome code area, the target area more than 99% can be designed It is suitble to bait sequences of the invention out, shows that our aforementioned steppings to the region GC and the filtering to Tm value are all reasonable.
In certain embodiments, hybridizing under preferably stringent condition between the nucleic acid analog and target nucleic acid It carries out, the stringent condition is enough to support the hybridization between the nucleic acid analog/DNA, wherein the nucleic acid analog includes The complementary region of compound and the target nucleic acid sample is connected, to provide the nucleic acid analog/DNA hybridization compound.Institute It states compound and then passes through the connection compound capture, and washed under conditions of being enough and removing ergotropy combination nucleic acid, Then the target nucleic acid sequence hybridized is eluted from the nucleic acid analog captured /DNA compound.
In certain embodiments, the nucleic acid analog includes chemical group or connection compound, such as bound fraction Such as biotin, digoxin etc., solid carrier can be incorporated into.The solid carrier may include corresponding capture chemical combination Object, such as the Streptavidin for biotin or the DigiTAb for digoxin.Connect the present invention is not limited to used Compound is connect, and the connection compound substituted is equally applicable to method of the invention, bait sequences and kit.
In the present invention, the chemical group or connection compound, such as bound fraction such as biotin, digoxin etc. Deng can connect in nucleic acid analog (glycerol nucleic acid GNA, lock nucleic acid LNA, peptide nucleic acid PNA, threose nucleic acid TNA or morpholine core Acid) in any base.It preferably, may include ribose and/or deoxyribose, the chemistry base in the nucleic acid analog chain Group or connection compound, such as bound fraction such as biotin, digoxin etc., can connect in ribose and/or deoxyribose On base on.For example, including ATP, CTP, GTP and/or UTP using label in the nucleic acid analog synthesis.Label is used The labeling method of nucleotide Cydye, DIG, biotin, rhodamine, fluorescein etc. is known in the art.For example, biotin can For use as labeled nucleic acid probe object, it can be combined with the C atom on the 5 ' position the UTP of nucleic acid molecules or dUTP, and can with it is affine Element in conjunction with and be detected.However, the present invention is not limited to known marker and labeling method, the marker and label of future discovery Method is also within the scope of consideration of the invention.
In embodiments of the invention, the multiple target nucleic acid molecules preferably comprise a kind of biology full-length genome or At least one chromosome or a kind of nucleic acid molecules of arbitrary size molecular weight.Preferably, the size of the nucleic acid molecules is at least about 200kb, at least about 500kb, at least about 1Mb, at least about 2Mb or at least about 5Mb, more preferable size about 100kb to about 5Mb, about 200kb to about 5Mb, about 500kb are to about 5Mb, about 1Mb to about 2Mb or about 2Mb to about 5Mb.
In certain embodiments, the target nucleic acid comes from animal, plant or microorganism, in preferred embodiment In, the target nucleic acid molecules choosing comes from people.If fewer (such as the people's nucleic acid obtained in some cases of the amount of nucleic acid samples Sample, such as the genome of developmental fetus), the amplifiable nucleic acid before implementing the method for the invention, such as pass through Whole genome amplification.To carry out method of the invention, amplification may be necessary in advance, such as in legal medical expert's application (such as Hereditary feature purpose is used in medical jurisprudence).
In certain embodiments, the multiple target nucleic acid molecules are one group of genomic DNA molecule.The bait sequences It can be selected from the multiple bait sequences for for example limiting a variety of exons from multiple genetic locis, introne or regulating and controlling sequence; Multiple bait sequences of the complete sequence of at least one particulate inheritance locus are limited, the locus size is any, preferably at least One of 1Mb or at least the above particular size;Limit a variety of bait sequences of single nucleotide polymorphism (SNP);Or limit a kind of battle array A variety of bait sequences of column, such as it is designed as the tiling array of the complete sequence of at least one complete chromosome of capture.
Herein, term " hybridization " means the pairing of complementary nucleic acid.Hybridization and intensity for hybridization (such as combine between nucleic acid Intensity) influenced by many factors, such as degree complementary between nucleic acid, using hybridization conditions Stringency, formed The melting temperature (Tm) of hybrid and the G/C content value of nucleic acid.Although the present invention is not only restricted to specific hybridization conditions, excellent Choosing uses stringent hybridization conditions.Stringent hybridization conditions depend on sequence and (such as salinity, organic matter are deposited with Crossbreeding parameters Waiting) and change.In general, " strictly " condition is selected as the Tm for being lower than specific nucleic acid sequence under defined ionic strength and pH About 5 DEG C to about 20 DEG C.Preferably, stringent condition arrives for about 5 DEG C of temperature melting point lower than the specific nucleic acid for combining complementary nucleic acid 10℃.The temperature that the Tm is 50% nucleic acid (such as target nucleic acid) to be hybridized with complete pairing probe is (in defined ionic strength Under pH).
Herein, " stringent condition " may be, for example, 50% formamide, 5 × SSC (0.75M NaCl, 0.075M lemon Sour sodium), 50mM sodium phosphate (pH6.8), 0.1% sodium pyrophosphate, the salmon sperm dna of 5 × Denhardt solution, ultrasonication (50mg/ml), 0.1%SDS and 10% dextran sulfate hybridize at 42 DEG C, at 42 DEG C with 0.2 × SSC (sodium chloride/lemon Lemon acid sodium) and 55 DEG C with 50% formamide washing, then 55 DEG C with containing EDTA 0.1 × SSC washing.Such as, it is contemplated that Buffer comprising 35% formamide, 5 × SSC and 0.1% (w/v) lauryl sodium sulfate (SDS) is suitble to non-critical in appropriateness Under the conditions of 45 DEG C hybridization 16-72 hours.
Herein, term " primer " means oligonucleotides, no matter naturally occurring purified, being obtained after digestion or warp What synthetic method generated, under conditions of the synthesis for being placed in the induction primer extension product complementary with nucleic acid chains (such as in nucleosides In the presence of acid and induction agent such as archaeal dna polymerase, and at suitable temperature and pH), it can be as the starting point of synthesis.It is described Primer preferably has the single-stranded of maximum amplification efficiency.Preferably, the primer is oligodeoxynucleotide.The primer must be sufficient Enough long synthesis to cause extension products in the presence of the induction agent.The definite length of the primer depend on it is many because Element, including temperature, Primer Source and institute's application method.
Herein, no matter naturally term " bait " or " bait sequences " mean oligonucleotides (such as nucleotide sequence), It, can be with another target oligonucleotide there are purified, obtaining after digestion or generate through synthesis, recombination or PCR amplification Such as at least part hybridization of target nucleic acid sequence.Probe can be single-stranded or double-stranded.Probe can be used for specific gene sequence Detection identifies and separates.
Herein, term " target nucleic acid molecules " refers to molecule or sequence from target genome area.Pre-selection Probe has determined the range of target nucleic acid molecules.Therefore, described " target " attempts to distinguish with other nucleic acid sequences.One " segment " is defined as a nucleic acid region in the target sequence, such as one " segment " of nucleic acid sequence or " a portion Point ".
Herein, term " separation " is when for when being related to nucleic acid, such as " separation nucleic acid " when, mean nucleic acid sequence from It is authenticated and separates in at least one other components or pollutant that its natural origin usually combines.Isolated nucleic acid is not with Its naturally occurring form is same as to exist.On the contrary, the nucleic acid of unsegregated nucleic acid such as DNA and RNA are with its naturally occurring shape State exists.The isolated nucleic acid, oligonucleotides or polynucleotides can exist with single stranded form or double-stranded form.
Herein, term " with the consistent bait sequences of target nucleic acid sequence " refers to that its complementary series can be with target core The sequence of acid sequence hybridization.It is preferred that being hybridized under strict conditions.When the target area is that high or extremely low GC contains When amount region or when target area is low complex degree region, since the region can not design bait sequences, i.e. bait sequences Coverage rate is zero, then appropriate area design bait sequences can be found at left and right sides of the target area;It generally can be in left and right two Range within the 300bp of side designs bait sequences;It is preferred that the region within 150bp.
In embodiments of the invention, the bait sequences used in catching method as described herein and kit are used for Transcription primers include connection compound, such as bound fraction.Bound fraction includes any connection or introduces for then capturing The part at 5 ' ends of nucleic acid analog/target nucleic acid hybridization complex amplimer.Bound fraction is to introduce primer sequence 5 ' Any sequence at end, such as trappable 6 histidine (6HIS) sequence.For example, the primer comprising 6HIS sequence can be captured by nickel, Such as in nickel coating or the pipe, micropore or purification column that are coated with pearl, particle etc. comprising nickel, wherein the pearl is packaged into In pillar, sample is packed into and passes through pillar to capture the compound (for example, eluting with subsequent target) of complexity reduction.For The example of another bound fraction of embodiment of the present invention includes haptens, such as digoxin, such as it is connected to amplification 5 ' ends of primer.DigiTAb capture, such as coating or the matrix comprising anti digoxin antibody can be used in digoxin.
In certain embodiments, the bound fraction is biotin, is coated with the capture matrix, example with Streptavidin Such as pearl such as paramagnetic particle, for separating the target nucleic acid/transcription product compound from non-specific hybridization target nucleic acid. For example, when biotin is bound fraction, Streptavidin (SA) coated matrix, such as the coated pearl of SA (such as magnetic bead/ Particle) for capturing nucleic acid analog/target complex of the biotin labeling.Wash the compound that the SA is combined, institute The target nucleic acid of hybridization is sequenced from compound elution.
Can be used without mask array synthetic technology on a solid carrier parallel provide sequence in the genome at least one The corresponding bait sequences in a region.Alternatively, standard DNA synthesizer can be used continuously to obtain and be applied to the solid for probe Carrier, or can be obtained from organism and be fixed on the solid carrier.It is non-hybridized or non-with the nucleic acid analog after hybridization The nucleic acid of specific hybrid is separated from the carrier-bound nucleic acid analog by washing.Remaining nucleic acid and the nucleic acid Analog specific binding, in such as hot water or in the Nucleic Acid Elution buffer including, for example, TRIS buffer and/or EDTA In eluted from the solid carrier, to generate the eluate of target nucleic acid molecules enrichment.
Alternatively, the bait sequences for target molecule can synthesize on a solid carrier as described above, as bait sequences collection Conjunction is discharged and is expanded from the solid carrier.The release nucleic acid analog set of the transcription can covalently or non-covalently be fixed on load Body, such as glass, metal, ceramics or polymeric beads or other solid carriers.The nucleic acid analog may be designed as from described solid Body carrier facilitates release, for example, closest to the nucleic acid analog end of carrier or its be provided about acid or alkali labile nucleic acid Sequence discharges the nucleic acid analog under the conditions of low or high pH respectively.A variety of connection chemical combination sheared known in the art Object.The carrier can be with for example, to provide with the cylinder of liquid-inlet and outlet.It is familiar with cDNA chip to load this field The method of body, such as by the way that the nucleotide of biotin labeling to be integrated in the nucleic acid analog, and use Streptavidin It is coated with the carrier, thus the coated carrier is non-covalent attracts and fix the nucleic acid analog in the set.Institute It states sample and passes through the carrier comprising nucleic acid analog, the target core thus hybridized with the immobilization carrier under hybridization conditions Acid molecule can elute, for analysis or other purposes later.
Term " nucleic acid " may include, such as, but not limited to: DNA (DNA), ribonucleic acid (RNA) and artificial Nucleic acid such as peptide nucleic acid (PNA), morpholine nucleic acid (morpholino) and lock nucleic acid (LNA), glycerol nucleic acid (glycol nucleic Acid, GNA) and threose nucleic acid (TNA).Herein, term " nucleic acid ", " nucleic acid sequence " or " nucleic acid molecules " should be from wide Justice explain, for example, can be ribonucleic acid (RNA) or DNA (DNA) or its analogies oligomer or Person's polymer.The term includes by (skeleton) connects and composes between natural nucleobases, carbohydrate and covalent nucleosides molecule and having The molecule with similar functions or a combination thereof that (skeleton) connects and composes between non-natural nucleobase, carbohydrate and covalent nucleosides.Cause For required property, for example nucleic acid target molecule affinity is enhanced and stability increases in the presence of nuclease and other enzymes, Such nucleic acid modified or replaced may be than native form it is further preferred that and using term " nucleic acid analog " herein Or " nucleic acid mimics " describe.The preferred embodiment of nucleic acid mimics is comprising peptide nucleic acid (PNA), lock nucleic acid (LNA), wood- Lock nucleic acid Uylo-LNA), thiophosphoric acid is cruel, point of 2 '-methoxyl groups, 2 '-methoxy ethoxies, morpholine nucleic acid and phosphoramidate Sub or functionally similar nucleic acid derivative.
Embodiment
Embodiment 1: the design of bait sequences
1000 sites (distribution in these sites is shown in Table) are used on exon and introne on random selection human genome Test method of the invention.Follow-up test is used for this 1000 random target sequence design bait sequences.
Table 1: the chromosome distribution in randomly selected 1000 sites
Chromosome Number Chromosome Number
chr1 92 chr12 73
chr2 67 chr13 23
chr3 53 chr14 15
chr4 43 chr15 29
chr5 45 chr16 41
chr6 124 chr17 36
chr7 42 chr18 14
chr8 46 chr19 31
chr9 34 chr20 21
chr10 61 chr21 9
chr11 80 chr22 21
Bait sequences design the following steps are included:
1. firstly, the analysis of target sequence characteristic includes the following steps:
A) it is divided into 5 grades from high to low according to target sequence G/C content size, wherein 1 grade: 10%-30%;2 grades: 30%- 40%;3 grades: 40%-60%;4 grades: 60%-70%;5 grades: 70%-90%;
B) target sequence space structure is analyzed, label can form the target sequence of stable space structure;
2. secondly, established standards and scoring to bait sequences:
A) target sequence length is in 60-150bp range;
B) specificity is kept, specific principle is the thermodynamic stability that bait sequences combine on nontarget area It is significantly smaller than the thermodynamic stability combined on target area;The index of general analysis is Tm(target area)-Tm(non-spy Different region) >=5 DEG C of (non-specific region);Partial data Tm(target area)-Tm(non-specific region) >=10 DEG C compares (strong Specificity limitation);Different thermodynamic calculation methods, are affected to calculated result, are based on 2007 heat of SantaLucia here The nearest neighbor algorithm of mechanics parameter table calculates;
C) it is generated without secondary structure, secondary structure includes dimer and hairpin structure, i.e. designed bait sequences are not permitted Perhaps dimer or hairpin structure are generated;The dimer formed between any two bait sequences, Tm≤ 47 DEG C, partial data ≤ 37 DEG C compare (stringent dimer limitation);Any bait sequences itself form hairpin structure, Tm≤ 47 DEG C, part number (stringent hairpin structure limitation) is compared according to≤37 DEG C;Different thermodynamic calculation methods, are affected to calculated result, here It is that the nearest neighbor algorithm based on 2007 thermodynamic parameter table of SantaLucia calculates;
D) to each target area, candidate bait sequences are analyzed, according to the specificity of each candidate sequence, dimer, hair Card structure and relative position with target area, design synthesis scoring, then according to appraisal result, select optimal one or The multiple bait sequences of person (i.e. scoring functions value is maximum): S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein A=0.26-0.34, b=0.08-0.12, c=0.17-0.23, d=0.35-0.45, marking are calculated by own software and are provided, Rule is as follows:
SSpecificityMarking rule: to newly-designed any bar bait sequences, sequence alignment is carried out to it in the genome, is adopted With BLAT software, thermodynamics Tm parameter is calculated separately to its each comparison result using default parameters, if there is with target Region TmWith non-specific region Tm5 DEG C of difference <, then abandon the bait sequences, redesign, 10 DEG C of part of data < work For comparison;Otherwise the average Tm value of all comparison results, final S are calculatedSpecificity=1-TmAverage value/(TmTarget- 5), partial data SSpecificity =1-TmAverage value/(TmTarget- 10) as a comparison, wherein TmAverage valueIt is bait sequences and all non-specific region comparison results are averaged Tm value, TmTargetIt is bait sequences and target area Tm
SDimerMarking rule: to newly-designed any bar bait sequences, the bait sequences designed with each into Row dimer compares analysis, using BLAT software, calculates separately thermodynamics to its each comparison result using default parameters Tm parameter, if there is Tm>=47 DEG C, then the bait sequences are abandoned, redesigned;Otherwise the average Tm of all comparison results is calculated Value, final SDimer=(47-TmAverage value)/47, partial data Tm>=37 DEG C as a comparison, then abandons the bait sequences, set again Meter;Otherwise the average Tm value of all comparison results, S are calculatedDimer=(37-TmAverage value)/37;
SHairpin structureMarking rule: it is optimal that its is calculated using Smith-Waterman algorithm to any bar bait sequences Itself compares structure, and calculates its thermodynamics Tm parameter value according to this configuration, if there is Tm>=47 DEG C, then abandon the bait sequence Column redesign;Otherwise its SHairpin structure=(47-Tm)/47, partial data is if there is Tm>=37 DEG C as a comparison, then abandons this and lure Bait sequence redesigns;Otherwise its SHairpin structure=(37-TmAverage value)/37;
SRelative distanceMarking rule: known target area coordinates to be designed calculate itself and target to any bar bait sequences Area coordinate value of deltaDistance, acceptable difference is set as 150, which is empirical value;If difference is greater than 150, The bait sequences are abandoned, are redesigned;Otherwise its SRelative distance=(150- δDistance)/150.With target area coordinates difference 150 Suitable bait sequences can not be designed in range, also set 300, S for partial difference as a comparisonRelative distance=(300- δDistance)/300。
3. again, carrying out the compensation of bait sequences copy number according to objectives areas case:
A) according to the Stability Classification situation of target sequence, (i.e. using 3 grades of bait sequences copy number as reference copy number Benchmark 1);1 grade and 5 grades of corresponding bait sequences need to increase more copy number, are 2.5 times of the 3rd gear;Followed by 2 grades and 4 Shelves, corresponding bait sequences are also required to 3.5 times that slightly more copy numbers is the 3rd gear;
B) for the target sequence of formation stable space structure, bait sequences copy number is double;
It c) may be when paying close attention to region, for instance it can be possible that the region that fusion event occurs, bait for target area Sequence copy numbers are double;
D) parallel test of bait sequences copy number uncompensation is in addition carried out under the same conditions as control.
4. finally, when target sequence can not design probe, for example, when target area is high or extremely low G/C content area When domain, or when target area is low complex degree region (low complex degree region refers to the element such as few nucleosides by seldom type A region composed by acid, such as this simple repeated sequence of microsatellite), since the region can not design bait sequences, i.e., Bait sequences coverage rate is zero, then appropriate area design bait sequences can be found at left and right sides of the target area;General meeting Range within the 300bp of the left and right sides designs bait sequences;If the region within 150bp can design suitable bait sequence Column, then record as control.There are 138 in the present embodiment in randomly selected target sequence and belong to such case, 68 at it Region successful design within the 150bp of left and right goes out bait sequences, in addition 22 in the 150-300bp of its left and right successful design go out lure Bait sequence still has 48 can not all design probe in these regions.
5. the bait sequences of final design are shown in that situation is shown in Table 2.
Table 2: bait sequences design conditions
Wherein the condition of stringent scoring functions limitation is: with target area TmWith non-specific region Tm>=10 DEG C, SSpecificity= TmAverage value/37;Tm37 DEG C of <, SDimer=(37-TmAverage value)/37;Tm37 DEG C of <, SHairpin structure=(37-TmAverage value)/37。
Embodiment 2: the preparation of bait sequences
According to embodiment 1 design bait sequences carry out sequence preparation, bait sequences the preparation method is as follows:
1. adding the specific sequence that length is 20 bases respectively at the end of bait sequences 5 ' and 3 ' ends, specific sequence is set Meter principle is: 1) non-specific amplification product will not be generated on target (to be captured) genome;2) G/C content is located at 30%-70% Between, between preferably 40%-60%;3) the two not will form dimer, or dimer free energy≤47 DEG C formed, preferably ≤37℃.To form sequence to be synthesized, all bait sequences are exemplified below with a pair of of specific sequence:
5 ' end-specificity sequences--3 ' end-specificity sequence of bait sequences (60-150bp etc.) is (SEQ ID NO.1):
ATATAGATGCCGTCCTAGCG-NNNNNNNNNN......NNNNNNNNNN-TGGGCACAGGAAAGATACTT。 Wherein " NNNNNNNNNN......NNNNNNNNNN " indicates bait sequences.
2. specific sequence through the invention people's independent development solution hybridization capture sequencing probe design software generate.
3. sequence to be synthesized is utilized the extensive synthetic oligonucleotide of chip method well known in the art, then with using ammonium hydroxide Oligonucleotides on chip is eluted, by being dissolved in distilled water after purification, forms oligonucleotides pond.
4. using oligonucleotides pond as template, 5 ' the end primers complementary with 5 ' end-specificity sequences and 3 ' end-specificity sequences and 3 ' end primers are primer, and using Taq polymerase, (JumpStart Taq DNA Polymerase is purchased to Sigma, Catalog No.D6558 polymerase chain reaction amplification) is carried out, a large amount of double-stranded DNA pond is obtained, specific steps are as follows:
1) reaction system is as follows:
2) reaction condition is as follows:
3) QIAGEN PCR purification kit (QIAGEN, Cat No./ID 28104) is used, according to its operational manual Carry out PCR product purifying:
4) 5 ' end bands T7 sequence (TAATACGACTCACTATAGGG) of 5 ' end primers is used to hold as forward primer and 3 ' Primer as reverse primer, using Taq polymerase (JumpStart Taq DNA Polymerase is purchased to Sigma, Catalog No.D6558) polymerase chain reaction amplification is carried out, form 5 ' double-stranded DNA ponds of the end with T7 sequence.It operates as follows:
5) reaction system:
Reagent name Volume
Water 37μl
10×PCR Buffer 5μl
10mM dATP 1μl
10mM dCTP 1μl
10mM dGTP 1μl
10mM TTP 1μl
BAITS_5_PRIMER_N-T7(10μM) 1μl
BAITS_3_PRIMER_N(10μM) 1μl
JumpStart Taq DNAPolymerase 1μl
Oligonucleotides pond 1μl
6) reaction condition is as follows:
Previous step PCR reaction product is separated using gel electrophoresis, removes non-specific band, recycles 120-210bp Region segments, using Qiagen plastic recovery kit (QIAquick Gel Extraction Kit, Cat No./ID 28704) It is purified;
7) T7 High Yield RNA Transcription Kit (Vazyme, TR101-01/02) is used, core is utilized The NTP and biotin of acid-like substance (glycerol nucleic acid GNA, lock nucleic acid LNA, peptide nucleic acid PNA, threose nucleic acid TNA or morpholine nucleic acid) The UTP of label is substrate, is transcribed in vitro to previous step glue recovery purifying product, is prepared into the nucleic acid containing biotin labeling Like object pond:
37 DEG C incubation 8-12 hours, obtain maximum output nucleic acid analog pond, be diluted to 500ng/ μ l after purification, be placed in- 80 DEG C of refrigerators save.
In addition using parallel test under the same terms in standard nucleic acid ATP, CTP, GTP, UTP and Biotin-UTP as pair According to.
Implement 3: target area library captures
1. the DNA library preparation for high-throughput capture sequencing:
1) the 1 μ g of genomic DNA for taking tested species, is beaten at random using sonicator Bioruptor pico Break to 150-250bp small fragment;
2) small fragment text before being captured using Illumina TruSeq DNA library preparation kit Library preparation.
2. carrying out target area Library hybridization using the nucleic acid analog pond of preparation and the small fragment library of target species to catch It obtains:
1) closing primer prepares:
It is synthesized according to the above primer sequence, every kind of 100 OD of synthesis, every kind of primer is diluted to 1000 μM, and according to Isometric mixing, is named as Block 1;
2) cot-1 DNA and salmon sperm DNA are diluted to 100ng/ μ l, and mixed in equal volume, is labeled as Block 2;
3) it takes the 6 μ l of μ l Block 1 and 5 Block 2 to be mixed, is labeled as Block Mix;
4) it takes 1 μ g small fragment genomic library to mix with 11 μ l Block Mix, and uses frozen drying centrifuge It carries out being concentrated into 9 μ l, is labeled as reagent S1, be placed in stand-by on ice;
6) 20 μ l hybridization solutions (20 × SSPE, 2 × Dennard`s, 1mM EDTA, 1%SDS) is taken to be placed on 65 DEG C of metal baths Preheating is labeled as S2;
7) 5 μ l pure water are taken, 2 μ l 500ng/ μ l nucleic acid analog ponds are added after mixing, slowly suction is beaten mixes for several times, marks For S3, it is placed in stand-by on ice;
8) by PCR instrument parameter setting at 95 DEG C, 5min;65 DEG C, 16h;65 DEG C, constant temperature;105 DEG C of hot lid;
9) S1 is placed in PCR module, starts PCR program, program is run to 65 DEG C of 5min, and S2 is put into PCR instrument mould Block continues after being incubated for 5min, S3 is put into PCR instrument module, continues to be incubated for 2min;
10) pipettor is adjusted to 13 μ l, 13 μ l S2 is taken to be transferred to S3,9 μ l S1 is taken to be transferred to S3, slowly inhaled to beat and fill for several times Divide and mix mixture, seal pipe lid covers PCR heat lid, is incubated for 16 hours carry out probes and Library hybridization;
11) taking 50 μ l Dynabeads MyOne Streptavidin T1, (article No.: 65601) Invitrogen is placed in In 1.5ml low adsorption centrifuge tube, 200 μ l combination liquid [0.5M NaCl (Ambion, article No.: AM9760G), 2mM Tris- is added HCl, pH 8.0 (Ambion, article No.: AM9855G), 0.2mM EDTA (Ambion, article No.: AM9260G)], suction, which is beaten, mixes postposition In 1min on magnetic frame, supernatant is removed;
12) centrifuge tube is removed from magnetic frame, adds 200 μ l combination liquid, suction is played mixing and is placed on magnetic frame 1min removes supernatant;
13) it repeats step 11 twice, carries out 3 magnetic bead cleanings altogether, magnetic bead finally is resuspended with 200 μ l combination liquid;
14) probe, Library hybridization mixed liquor (step 9 product) are transferred in magnetic bead re-suspension liquid, seal pipe lid is placed in rotation Turn to mix on blending instrument and combines 30min;
15) centrifuge tube is placed in 2min on magnetic frame, removes supernatant;
16) centrifuge tube is removed from magnetic frame, be added 200 μ l cleaning solutions 1 [10 × SSC (and Ambion, article No.: AM9763), 1%SDS (Invitrogen, article No.: 24730020)] be resuspended magnetic bead, seal pipe lid, be placed in rotation blending instrument supernatant Wash 10min;
17) centrifuge tube is placed in 2min on magnetic frame, removes supernatant;
18) centrifuge tube is removed from magnetic frame, be added 65 DEG C of 200 μ l preheating cleaning solutions 2 [1 × SSC (Ambion, Article No.: AM9763), and 5%SDS (Invitrogen, article No.: 24730020)] magnetic bead is resuspended, 65 DEG C are placed in PCR instrument module It is incubated for 10min;
19) centrifuge tube is placed in 2min on magnetic frame, removes supernatant;
20) it repeats step 17-18 twice, carries out 3 cleanings altogether;
21) 200 μ l, 80% ethanol solution is added into centrifuge tube, stands 30s, removes whole alcohol, room temperature is dried 2min, 20 μ l pure water of addition slowly are inhaled to beat is resuspended magnetic bead for several times;
3.PCR be enriched with target area capture product, using NEB high fidelity PCR kit (High- Fidelity PCR Kit, New England Biolabs, Catalog#E0553S):
1) reaction system:
Reagent name Volume
5×Phusion HF 10μl
10mM dNTPs 1μl
Post Prmier Mix (equal 10 μM) 1μl
Magnetic bead (step 20) is resuspended 20μl
Phusion archaeal dna polymerase 0.5μl
H2O 17.5μl
2) reaction condition is as follows:
3) PCR product is carried out using Beckman Agencourt AMPure XP Kit [Beckman (p/n A63880)] Purifying;
4) target area capture library is carried out using Illumina microarray dataset carry out high-flux sequence, sequencing reading length suggestion Use PE150 mode.
3. result
1) Illumina high-flux sequence instrument Hiseq 4000 is used, upper machine sequencing is carried out to sequencing library, obtains 1000 The sequencing data in a site;
2) BWA MEM software is utilized, sequencing data is compared with to the mankind with reference to genome HG19, parameter used Are as follows: bwa mem-M-k 40-t 8-R "@RG tID:Hiseq tPL:Illumina tSM:sample ", to obtain and refer to The different single nucleotide polymorphism of genome, insertion or missing, i.e. detected gene mutation.
3) using the size of the samtools stats tool statistical data in samtools-1.2 software, comparison rate, again Multiple rate, mass value, then again with the samtools depth tool in software, the sequencing for calculating each position in target area is deep Degree;
4) according to the sequencing depth of each position in target area, respectively statistics sequencing depth >=1, >=4, >=10 and >=20 Base quantity, then by the base quantity divided by the total bases amount of target area, thus obtain 1 × coverage rate, 4 × coverage rate, The parameter of 10 × coverage rate and 20 × coverage rate.
The site table 3:1000 captures sequencing result
From the above table 3 as can be seen that by taking LNA as an example, mean depth has 451.53 layers;4 × coverage rate has 94.35%, and 20 × coverage rate also has 93.64%, has preferable coverage rate and homogeneity, and total amount of data is only 8.52Mb reads.This The result bring beneficial effect of sample has: 1) sequencing amount is small, effectively reduces cost;2) average sequencing depth is high, i.e. each mesh Mark point is sequenced repeatedly, thus data accuracy is high;3) coverage rate is high, and it is few to omit site;4) homogeneity is good, i.e., most Site has similar overburden depth.
According to the analysis as the data subset and contrasting data that compare, compared with LNA, bait sequences copy number is not Coverage rate and homogeneity decline 4.5 and 5.1 percentage points respectively in the case where compensation;Strong specificity limitation, stringent dimer limit Coverage rate and homogeneity increase separately 6.3 and 7.8 in the case that system, the limitation of stringent hairpin structure and stringent scoring functions limit Percentage point;The areal coverage and big 2.3 and 3.8 percentages of homogeneity difference in region and 150-300bp within 150bp Point;Distinguished with standard nucleic acid ATP, CTP, GTP, UTP and Biotin-UTP parallel test coverage rate and homogeneity of same ratio Reduce by 5.3 and 4.8 percentage points.
Although having been combined preferred embodiment, invention has been described, it is to be understood that protection scope of the present invention is simultaneously It is not limited to embodiment as described herein.In conjunction with the explanation and practice of the invention disclosed here, other implementations of the invention Example all will be readily apparent and understand for those skilled in the art.Illustrate and embodiment is regarded only as being exemplary, this hair Bright true scope and purport is defined in the claims.

Claims (10)

1. a kind of method from nucleic acid samples enrichment target sequence nucleotides, which comprises
A) provide the nucleic acid samples comprising target nucleic acid sequence and it is consistent with target nucleic acid sequence or to target sequence with feature The bait sequences of property, wherein to each target area, the bait sequences be specificity, dimer, hairpin structure and with One or more optimal bait sequences of comprehensive score in terms of the relative position of target area, the comprehensive score pass through as follows Scoring functions carry out: S=a × SSpecificity+b×SDimer+c×SHairpin structure+d×SRelative distance, wherein a=0.26-0.34, b=0.08- 0.12, c=0.17-0.23, d=0.35-0.45, specific calculation method of giving a mark are as follows:
SSpecificityMarking calculate: to newly-designed any bar bait sequences, sequence alignment is carried out to it in the genome, it is every to its One sequence compared calculates separately Tm between the bait sequences and the sequence compared, the bait sequences and target area Domain TmIt compares upper sequence T with anymDifference >=5 DEG C, calculate flat between the bait sequences and all sequences compared Equal Tm, SSpecificity=1-TmAverage value/(TmTarget- 5), wherein TmAverage valueIt is bait sequences and all non-specific region comparison results are averaged Tm value, TmTargetIt is bait sequences and target area Tm
SDimerMarking calculate: to newly-designed any bar bait sequences, carry out two with the bait sequences that each has designed Aggressiveness compares analysis, and the sequence compared to its each calculates separately the bait sequences and the bait sequences compared Between Tm, the Tm< 47 DEG C, calculate the average Tm, S between the bait sequences and all bait sequences comparedDimer =(47-TmAverage value)/47;
SHairpin structureMarking calculate: to any bar bait sequences, calculate its it is optimal itself compare structure, and calculate the structure Tm, the Tm< 47 DEG C, and SHairpin structure=(47-Tm)/47, Tm < 47 DEG C, and SHairpin structure=(37-TmAverage value)/37;
SRelative distanceMarking calculate: itself and the mesh are calculated to newly-designed any bar bait sequences for target area coordinates Mark area coordinate value of deltaDistance, δDistanceLess than 150, SRelative distance=(150- δDistance)/150;
B) it is transcribed in vitro using the bait sequences as template using nucleic acid analog GNA, LNA, PNA, TNA or morpholine nucleic acid Nucleic acid analog is prepared, the nucleic acid analog has bound fraction;
C) make the nucleic acid sample fragment;
D) nucleic acid analog hybridizes with the nucleic acid samples, so that the nucleic acid analog and the target sequence nucleotides shape At nucleic acid analog/DNA hybridization compound;
E) by the bound fraction, the nucleic acid analog/DNA hybridization compound is separated from non-specific hybridization nucleic acid, Remove non-targeted sequencing nucleic acid.
2. the method according to claim 1, SSpecificityMarking calculate: to newly-designed any bar bait sequences, in the genome Sequence alignment is carried out to it, the sequence compared to its each calculates separately between the bait sequences and the sequence compared Tm, the bait sequences and target area TmIt compares upper sequence T with anymDifference >=10 DEG C, calculate the bait sequences with Average Tm, S between sequence in all comparisonsSpecificity=1-TmAverage value/(TmTarget- 10), wherein TmAverage valueIt is bait sequences and institute There are the average Tm value of non-specific region comparison result, TmTargetIt is bait sequences and target area Tm
3. the method according to claim 1, SDimerMarking calculate: to newly-designed any bar bait sequences, with each Bait sequences through designing carry out dimer and compare analysis, and the sequence compared to its each calculates separately the bait sequences With the Tm between the bait sequences on described compare, Tm< 37 DEG C, calculate the bait sequences and all bait sequences compared Between average Tm, SDimer=(37-TmAverage value)/37。
4. method according to claim 1-3 further includes step f): multiple to the nucleic acid analog/DNA hybridization It closes object to be expanded, achievees the purpose that be enriched with target sequence nucleotides.
5. method according to claim 1-3, wherein in step b), the bound fraction is biotin engaging portion Point.
6. method according to claim 1-3, wherein the nucleic acid samples be genomic DNA, RNA, cDNA, MRNA, in the case where the nucleic acid samples are RNA or mRNA, have before the step c) by RNA the or mRNA reverse transcription at The step of DNA.
7. method according to claim 1-3 wherein in step c), makes the nucleic acid sample fragment, preparation Library.
8. method according to claim 1-3, the bait sequences are on a solid carrier.
9. method according to claim 8, the solid carrier is microarray slide.
10. method according to claim 1-3, wherein the bait sequences have characteristic chosen from the followings: i) Hairpin structure itself is not generated and is generated between each other without dimer, ii) copy number is according to the GC of the target nucleic acid sequence Content and/or space structure compensate, iii) when the target area is high or extremely low G/C content region or work as When target area is low complex degree region, the target area two side areas is used to design bait, design method as replacement area , iv consistent with the target area) with the other sequences except target nucleic acid sequence in nucleic acid samples without specific binding.
CN201610250133.3A 2016-04-22 2016-04-22 A method of target sequence nucleotides are enriched with from nucleic acid samples Active CN105925671B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201610250133.3A CN105925671B (en) 2016-04-22 2016-04-22 A method of target sequence nucleotides are enriched with from nucleic acid samples
PCT/CN2016/106595 WO2017181670A1 (en) 2016-04-22 2016-11-21 Method for enriching target nucleic acid sequence from nucleic acid sample
AU2016102398A AU2016102398A4 (en) 2016-04-22 2016-11-21 Method for enriching target nucleic acid sequence from nucleic acid sample
AU2016403554A AU2016403554A1 (en) 2016-04-22 2016-11-21 Method for enriching target nucleic acid sequence from nucleic acid sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610250133.3A CN105925671B (en) 2016-04-22 2016-04-22 A method of target sequence nucleotides are enriched with from nucleic acid samples

Publications (2)

Publication Number Publication Date
CN105925671A CN105925671A (en) 2016-09-07
CN105925671B true CN105925671B (en) 2019-07-23

Family

ID=56839769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610250133.3A Active CN105925671B (en) 2016-04-22 2016-04-22 A method of target sequence nucleotides are enriched with from nucleic acid samples

Country Status (3)

Country Link
CN (1) CN105925671B (en)
AU (2) AU2016102398A4 (en)
WO (1) WO2017181670A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105925671B (en) * 2016-04-22 2019-07-23 艾吉泰康(嘉兴)生物科技有限公司 A method of target sequence nucleotides are enriched with from nucleic acid samples
CN106676169B (en) * 2016-11-15 2021-01-12 上海派森诺医学检验所有限公司 Hybridization capture kit for detecting breast cancer susceptibility genes BRCA1 and BRCA2 mutation and method thereof
CN108546739A (en) * 2018-04-20 2018-09-18 曹顺 A method of the nucleic acid target sequence enrichment for NGS sequencings
CN111723261B (en) * 2019-03-22 2021-08-13 昆明逆火科技股份有限公司 Search engine-based DNA comparison algorithm
CN110343756B (en) * 2019-06-25 2023-02-24 广西识远医学检验实验室有限公司 Group of probes for detecting thalassemia, related kit and application
JP2023519898A (en) * 2020-03-26 2023-05-15 インテグレーティッド ディーエヌエイ テクノロジーズ インコーポレーティッド Hybridization capture methods and compositions

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003093509A1 (en) * 2002-05-01 2003-11-13 Seegene, Inc. Methods and compositions for improving specificity of pcr amplication
US8192937B2 (en) * 2004-04-07 2012-06-05 Exiqon A/S Methods for quantification of microRNAs and small interfering RNAs
CN103602658A (en) * 2013-10-15 2014-02-26 东南大学 Novel capture and enrichment technology for targeting nucleic acid molecules

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105925671B (en) * 2016-04-22 2019-07-23 艾吉泰康(嘉兴)生物科技有限公司 A method of target sequence nucleotides are enriched with from nucleic acid samples

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003093509A1 (en) * 2002-05-01 2003-11-13 Seegene, Inc. Methods and compositions for improving specificity of pcr amplication
US8192937B2 (en) * 2004-04-07 2012-06-05 Exiqon A/S Methods for quantification of microRNAs and small interfering RNAs
CN103602658A (en) * 2013-10-15 2014-02-26 东南大学 Novel capture and enrichment technology for targeting nucleic acid molecules

Also Published As

Publication number Publication date
AU2016403554A1 (en) 2018-12-13
WO2017181670A1 (en) 2017-10-26
CN105925671A (en) 2016-09-07
AU2016102398A4 (en) 2019-05-02

Similar Documents

Publication Publication Date Title
CN105925671B (en) A method of target sequence nucleotides are enriched with from nucleic acid samples
CN109310784B (en) Methods and compositions for making and using guide nucleic acids
US20220282242A1 (en) Contiguity Preserving Transposition
US8986958B2 (en) Methods for generating target specific probes for solution based capture
EP3234200B1 (en) Method for targeted depletion of nucleic acids using crispr/cas system proteins
CN105647907B (en) It is a kind of for targeting the preparation method of the modified DNA hybridization probe of hybrid capture
JP7282692B2 (en) Preparation and Use of Guide Nucleic Acids
AU2014409073B2 (en) Linker element and method of using same to construct sequencing library
CN107446995A (en) For expanding the primer sets of multiple target dna sequences and its application in sample
CN103333949A (en) High throughput physical mapping using aflp
CN102317476A (en) Method and systems for enrichment of target genomic sequences
AU2021240263A1 (en) Isothermal methods and related compositions for preparing nucleic acids
CN107760772A (en) For the method for nucleic acid match end sequencing, composition, system, instrument and kit
CA3101648A1 (en) Compositions and methods for making guide nucleic acids
CN106191256A (en) A kind of method carrying out DNA methylation order-checking for target area
US20110091939A1 (en) Methods and Compositions for Removing Specific Target Nucleic Acids
US11414686B2 (en) Stoichiometric nucleic acid purification using randomer capture probe libraries
US9315807B1 (en) Genome selection and conversion method
CN113454235A (en) Improved nucleic acid target enrichment and related methods

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20190624

Address after: 314100 Building No. 2, 371 Hongye Road, Dayun Town, Jiashan County, Jiaxing City, Zhejiang Province 101

Applicant after: Aiji Taikang (Jiaxing) Biotechnology Co., Ltd.

Address before: Room B316-64, Building No. 29, Life Garden Road, Changping District Science and Technology Park, Beijing 102206

Applicant before: IGENETECH CO., LTD.

GR01 Patent grant
GR01 Patent grant