CN107435070A - Copy the detection and classification of number variation - Google Patents

Copy the detection and classification of number variation Download PDF

Info

Publication number
CN107435070A
CN107435070A CN201710644858.5A CN201710644858A CN107435070A CN 107435070 A CN107435070 A CN 107435070A CN 201710644858 A CN201710644858 A CN 201710644858A CN 107435070 A CN107435070 A CN 107435070A
Authority
CN
China
Prior art keywords
chromosome
sequence
interested
sample
normalization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710644858.5A
Other languages
Chinese (zh)
Inventor
里查德·P·拉瓦
阿奴巴玛·斯里尼瓦桑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Verinata Health Inc
Original Assignee
Verinata Health Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/445,778 external-priority patent/US9447453B2/en
Priority claimed from US13/482,964 external-priority patent/US20120270739A1/en
Priority claimed from US13/555,037 external-priority patent/US9260745B2/en
Application filed by Verinata Health Inc filed Critical Verinata Health Inc
Publication of CN107435070A publication Critical patent/CN107435070A/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Plant Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Mother body D NA backgrounds in maternal sample are to any operation limitation attempted from the maternal DNA group of sample for the detection of differentiation fetal chromosomal all with sensitiveness.Therefore, for by the quantitative differences and/or the diagnosis of essence difference and conventional detection between fetus and maternal DNA group, fetus fraction is the important parameter for needing to consider.The invention provides a kind of method for being used to determine the fetus fraction in maternal sample.This method obtains using fetus fraction as the function of normalization chromosome value or normalization chromosomal region segment value.The present invention is used to determine that the method for fetus fraction can be combined with other method, such as be combined with using fetus fraction as the function of polymorphism allelic unbalance information come the method obtained, the copy number variation of the fetal chromosomal in maternal sample or chromosome segment is classified.Present invention also offers the equipment and kit for implementing methods described.

Description

Copy the detection and classification of number variation
It is on November 7th, 2012 applying date that the application, which is, and Application No. 201210441134.8 is entitled " to copy The divisional application of the application for a patent for invention of the detection and classification of number variation ".
Background technology
Crucial in physianthropy research one of make great efforts be found that it is different to the extremely important heredity of adverse health result Often.Under many circumstances, specific gene and/or key diagnostic label have been have identified in some of genome, They are with existing for abnormal copy number.For example, in pre-natal diagnosis, extra or loss the copy of whole chromosome be through The genetic damage often occurred.In cancer, the copy missing of whole chromosome or chromosome segment or multiplication and genome spy The higher levels of amplification for determining region is common situation.
By allowing to identify that the cytogenetics resolution capability of structure sexual abnormality is had been provided on copying number variation Most information.A variety of conventional programs for genetic screening and biological dosimetry make use of invasive program (example Such as amniocentesis) obtain the cell for karyotyping.Recognize the rapider method of testing to not needing cell culture Need, it is miscellaneous to have been developed for FISH (FISH), quantitative fluorescence PCR (QF-PCR) and array-Comparative genomic strategy (array-CGH) is handed over to be used as the molecular cytogenetics method for analyzing copy number variation.
Allow the appearance of technology and the circulation Cell-free DNA that whole gene group is sequenced within a short period of time (cfDNA) discovery, which has been provided for chance, to have chromosomal genetic material to be compared and another hereditary thing from one The chromosome of matter is compared, without the risk related to invasive sampling process.However, a variety of limitations of Existing methods (they include come from limited levels cfDNA insufficient sensitiveness) and come from genomic information intrinsic property technology Sequencing deviation determine continuous drive for non invasive method, these non invasive methods will provide specificity, sensitiveness, With any one of applicability or all, reliably to determine the change of copy number in various clinical environment.
Embodiment disclosed here meets some in above demand, and is particularly providing a kind of reliable method Aspect gives a kind of advantage, and this method is at least applied to implement Non-invasive Prenatal Diagnosis and suitable for diagnosing and guarding Metastatic progress in cancer patient.
General introduction
Mother body D NA backgrounds in maternal sample attempt to distinguish fetal chromosomal from the maternal DNA group of sample to any Detection for all with sensitiveness operation limit.Therefore, for poor by the quantization between fetus and maternal DNA group For different and/or essence difference diagnosis and conventional detection, fetus fraction is the important parameter for needing to consider.The invention provides A kind of method for being used to determine the fetus fraction in maternal sample.This method as normalization chromosome value or returns fetus fraction The function of one change chromosomal region segment value obtains.The present invention is used to determine that the method for fetus fraction can be combined with other method, Such as be combined with using fetus fraction as the function of polymorphism allelic unbalance information come the method obtained, to parent The copy number variation of fetal chromosomal or chromosome segment in sample is classified.Present invention also offers implement methods described Equipment and kit.
A variety of methods have been supplied to be used to determine that the copy number of sequence interested becomes in the test sample including mixtures of nucleic acids Different (CNV), it is different that these nucleic acid, which are known or suspected in the amount of one or more sequences interested,.This method bag A kind of statistical is included, the statistical method is by the accumulation from related, interchromosomal the variability between sequence of process Property variability is taken into account.This method is applied to determine the CNV of any fetus aneuploidy, and known or suspection and a variety of doctors The related a variety of CNV of condition.Any one in chromosome 1-22, X and Y or is included according to the confirmable CNV of this method more Individual trisomy or monosomy, the section of any one or more in the polysomy of other chromosomes, and these chromosomes Missing and/or duplication, these can only carry out once sequencing by the nucleic acid to test sample to detect.From passing through test specimens Only the carrying out once sequencing of the nucleic acid of product and the sequencing information that obtains can determine any aneuploidy.
Provide a method that in one embodiment, this method is used to survey in the parent comprising fetus and maternal nucleic acids Determined in test agent presence or absence of any four kinds or more kinds are different, complete fetal chromosomal aneuploidy.Should The step of method, includes:(a) obtain in parent test sample fetus and maternal nucleic acids sequence information;(b) sequence is used Column information is directed to each in any four or more chromosome interested selected from chromosome 1-22, X and Y The sequence label of certain amount is identified, and it is every in any four or more chromosome interested for being used for One normalization chromosome sequence of one identifies the sequence label of certain amount;(c) appoint using for described interested The number of each described sequence label identified in what four or more chromosome and for each normalizing Change the number of the sequence label that chromosome sequence identifies and dyed be directed to described interested any four or more Each in body calculates a monosome dosage;And by for any four or more dye interested (d) The each monosome dosage of each in colour solid is with being directed to any four or more chromosome interested In each threshold value be compared, and thus determine in the parent test sample presence or absence of any Complete, the different fetal chromosomal aneuploidy of four kinds or more kinds.Step (a) can be included to test sample At least a portion in these nucleic acid is sequenced, to obtain the sequence of fetus and maternal nucleic acids molecule for test sample Column information.In some embodiments, step (c) includes calculating a simple stain for each chromosome interested Body dosage, the number as this sequence label identified for each chromosome interested are each described with being directed to The ratio for this sequence label number that the normalization chromosome sequence of chromosome interested identifies.Some other In embodiment, step (c) includes:(i) by making what is identified in step (b) for each chromosome interested The number of this sequence label and the length of each chromosome interested are associated each described interested to be directed to Chromosome calculate a sequence label density ratio;(ii) by making to dye for each normalization in step (b) The number for this sequence label that body recognition sequence goes out and the length of each normalization chromosome sequence are associated and carry out pin One sequence label density ratio is calculated to each normalization chromosome sequence;And (iii) use in step (i) and (ii) these sequence label density ratios calculated in calculate a simple stain to be directed to each chromosome interested Body dosage, wherein the chromosome dosage are with being directed to as the sequence label density ratio for each chromosome interested The ratio of the sequence label density ratio of the normalization chromosome sequence of each chromosome interested calculates.
Provide a method that in another embodiment for including the parent test specimens of fetus and maternal nucleic acids Determined in product presence or absence of any four kinds or more kinds are different, complete fetal chromosomal aneuploidy.This method The step of include:(a) obtain for the fetus in parent test sample and the sequence information of maternal nucleic acids;(b) described in use Sequence information is each in any four or more chromosome interested selected from chromosome 1-22, X and Y to be directed to The individual sequence label for identifying certain amount and for be used for any four or more chromosome interested in The normalization chromosome sequence of each identifies the sequence label of certain amount;(c) using for described interested The number of each described sequence label identified in any four or more chromosome and described return for each The number of the sequence label that one change chromosome sequence identifies contaminates be directed to described interested any four or more Each in colour solid calculates a monosome dosage;And by for described interested any four or more (d) The each monosome dosage of each in chromosome is with being directed to any four or more dyeing interested Each threshold value in body is compared, and thus come determine in the parent test sample presence or absence of appoint He Si kinds or more plant complete, different fetal chromosomal aneuploidy, wherein selected from chromosome 1-22, X and Y Any four or more chromosome interested includes at least 20 dyeing selected from chromosome 1-22, X and Y Body, and wherein determine presence or absence of at least 20 kinds different, complete fetal chromosomal aneuploidy.Step (a) can include at least a portion in these nucleic acid of test sample is sequenced, to obtain for the test sample Fetus and the sequence information of maternal nucleic acids molecule.In some embodiments, step (c) includes emerging for each sense The chromosome of interest calculates a monosome dosage, as this sequence identified for each chromosome interested The number of column label and this sequence identified for the normalization chromosome sequence of each chromosome interested The ratio of column label number.In some other embodiments, step (c) includes:(i) by making in step (b) for every The number for this sequence label that the individual chromosome interested identifies and the length of each chromosome interested It is associated and calculates a sequence label density ratio to be directed to each chromosome interested;(ii) by making in step (b) number of this sequence label identified in for each normalization chromosome sequence contaminates with each normalization The length of colour solid sequence is associated calculates a sequence label density ratio to be directed to each normalization chromosome sequence; And (iii) is directed to each described interested using these sequence label density ratios calculated in step (i) and (ii) Chromosome calculate a monosome dosage, wherein the chromosome dosage is as each dye interested The sequence label density ratio of colour solid and the sequence of the normalization chromosome sequence for each chromosome interested Label densities than ratio calculate.
Provide a method that in another embodiment, for including the parent test specimens of fetus and maternal nucleic acids Determined in product presence or absence of any four kinds or more kinds are different, complete fetal chromosomal aneuploidy.This method The step of include:(a) obtain for the fetus in parent test sample and the sequence information of maternal nucleic acids;(b) use The sequence information is directed in any four or more chromosome interested selected from chromosome 1-22, X and Y Each identifies the sequence label of certain amount, and for being used for any four or more chromosome interested In each normalization chromosome sequence identify the sequence label of certain amount;(c) using emerging for the sense Interest any four or more chromosome in each described sequence label identified number and for each institute The number of the sequence label that normalization chromosome sequence identifies is stated to be directed to described any four or more interested Each in individual chromosome calculates a monosome dosage;And by for described any four interested or more (d) The each monosome dosage of each in multiple chromosomes with for described interested any four or more Each threshold value in chromosome is compared, and thus determines in the sample presence or absence of any Complete, the different fetal chromosomal aneuploidy of four kinds or more kinds, wherein the institute selected from chromosome 1-22, X and Y It is all chromosome 1-22, X and Y to state any four or more chromosome interested, and wherein determine presence or not In the presence of whole chromosome 1-22, X and Y complete fetal chromosomal aneuploidy.Step (a) can include to test specimens At least a portion in these nucleic acid of product is sequenced, to obtain fetus and the maternal nucleic acids molecule for the test sample The sequence information.In some embodiments, step (c) includes calculating one for each chromosome interested Monosome dosage, the number as this sequence label identified for each chromosome interested are every with being directed to The ratio for this sequence label number that the normalization chromosome sequence of the individual chromosome interested identifies.One In other a little embodiments, step (c) includes:(i) by making for each chromosome interested to know in step (b) The number for this sequence label not gone out and the length of each chromosome interested are associated each described to be directed to Chromosome interested calculates a sequence label density ratio;(ii) by making in step (b) for each normalizing The number and the length of each normalization chromosome sequence for changing this sequence label that chromosome sequence identifies are closed Connection calculates a sequence label density ratio to be directed to each normalization chromosome sequence;And (iii) is used in step (i) These sequence label density ratios calculated in (ii) calculate a single dye to be directed to each chromosome interested Colour solid dosage, wherein the chromosome dosage are as the sequence label density ratio and pin for each chromosome interested The ratio of the sequence label density ratio of the normalization chromosome sequence of each chromosome interested is calculated.
In what embodiments above in office, this normalization chromosome sequence can be selected from chromosome 1-22, X and Y A kind of monosome.Alternately, this normalization chromosome sequence is one group of dye selected from chromosome 1-22, X and Y Colour solid.
Provide a method that in another embodiment, for including the parent test specimens of fetus and maternal nucleic acids Determined in product presence or absence of any one or more of different, complete fetal chromosomal aneuploidy.This method Step includes:(a) obtain for the fetus in the sample and the sequence information of maternal nucleic acids;(b) believed using the sequence Each for ceasing to be directed in any one or more chromosomes interested selected from chromosome 1-22, X and Y identifies The sequence label of certain amount, and for being used for each one in described any one or more chromosomes interested Individual normalization chromosome sequence identifies the sequence label of certain amount;(c) using for interested any one or The number of each described sequence label identified in multiple chromosomes and for each normalization sector sequence The number of the sequence label identified calculates to be directed to each in described any one or more chromosomes interested Go out a monosome dosage;And by for each described in described any one or more chromosomes interested (d) Monosome dosage compared with for each threshold value in one or more chromosomes interested, and It is and thus non-whole presence or absence of any one or more complete, different fetal chromosomals in the sample to determine Ploidy.Step (a) can include at least a portion in these nucleic acid of test sample is sequenced, and the survey is directed to obtain The fetus of test agent and the sequence information of maternal nucleic acids molecule.
In some embodiments, step (c) includes calculating a single dye for each chromosome interested Colour solid dosage, the number as this sequence label identified for each chromosome interested is with being directed to each institute State the ratio for this sequence label number that the normalization chromosome sequence of chromosome interested identifies.At some its In his embodiment, step (c) includes:(i) by making to identify for each chromosome interested in step (b) The number of this sequence label and the length of each chromosome interested be associated and each described feel emerging be directed to Each in the chromosome of interest calculates a sequence label density ratio;(ii) by making to be directed to each institute in step (b) The number and the length of each normalization chromosome for stating this sequence label that normalization sector sequence identifies are closed Connection calculates a sequence label density ratio to be directed to each normalization sector sequence;And (iii) using step (i) and (ii) the sequence label density ratio calculated in calculates the monosome dosage of each in the chromosome interested, Wherein described chromosome dosage is calculated as the sequence label density ratio of each in chromosome interested and interested The ratio of the sequence label density ratio of the normalization sector sequence of each in chromosome.
Provide a method that in another embodiment, for including the parent test specimens of fetus and maternal nucleic acids Determined in product presence or absence of any one or more of different, complete fetal chromosomal aneuploidy.This method Step includes:(a) obtain for fetus in the sample and the sequence information of maternal nucleic acids;(b) come using the sequence information Identified necessarily for each in any one or more chromosomes interested selected from chromosome 1-22, X and Y The sequence label of number, and return for being used for each one in described any one or more chromosomes interested One change chromosome sequence identifies the sequence label of certain amount;(c) using for interested any one or more The number of each described sequence label identified in chromosome and for it is each it is described normalization sector sequence identification The number of the sequence label gone out calculates one to be directed to each in described any one or more chromosomes interested Individual monosome dosage;And by for each of each in described any one or more chromosomes interested (d) The monosome dosage for each threshold value in described any one or more chromosomes interested with entering Row compares, and thus determines in the sample presence or absence of one or more complete, different fetal chromosomals Body aneuploidy, wherein described any one or more chromosomes interested selected from chromosome 1-22, X and Y include At least 20 chromosomes selected from chromosome 1-22, X and Y, and wherein determine presence or absence of at least 20 kinds not Same complete fetal chromosomal aneuploidy.Step (a) can include at least one in these nucleic acid of test sample Part is sequenced, to obtain the sequence information of fetus and maternal nucleic acids molecule for the test sample.In some realities Apply in scheme, step (c) includes calculating a monosome dosage for each chromosome interested, as The number for this sequence label that each chromosome interested identifies is with being directed to each chromosome interested The ratio of this sequence label number that identifies of the normalization chromosome sequence.In some other embodiments, step Suddenly (c) includes:(i) this sequence label by making to identify for each chromosome interested in step (b) Number and the length of each chromosome interested be associated be directed to each chromosome calculating interested Go out a sequence label density ratio;(ii) by making what is identified in step (b) for each normalization sector sequence The number of this sequence label and the length of each normalization chromosome are associated to be directed to each normalization area Duan Xulie calculates a sequence label density ratio;And (iii) uses these sequences calculated in step (i) and (ii) Label densities to be directed to each chromosome interested than calculating a monosome dosage, wherein the chromosome agent Amount is with being directed to each dyeing interested as the sequence label density ratio for each chromosome interested The ratio of the sequence label density ratio of the normalization sector sequence of body calculates.
Provide a method that in another embodiment, for including the parent test specimens of fetus and maternal nucleic acids Determined in product presence or absence of any one or more of different, complete fetal chromosomal aneuploidy.This method Step includes:(a) obtain for fetus in the sample and the sequence information of maternal nucleic acids;(b) come using the sequence information Identified necessarily for each in any one or more chromosomes interested selected from chromosome 1-22, X and Y The sequence label of number, and return for being used for each one in described any one or more chromosomes interested One change sector sequence identifies the sequence label of certain amount;(c) using for described any one or more dyes interested The number of each described sequence label identified in colour solid and for it is each it is described normalization sector sequence identify The number of the sequence label calculate one to be directed to each in described any one or more chromosomes interested Monosome dosage;And by for each institute of each in described any one or more chromosomes interested (d) Monosome dosage is stated with being carried out for each threshold value in described any one or more chromosomes interested Compare, and thus determine in the sample presence or absence of one or more complete, different fetal chromosomals Property aneuploidy, wherein described any one or more chromosomes interested selected from chromosome 1-22, X and Y be all Chromosome 1-22, X and Y, and the complete fetus dye presence or absence of whole chromosome 1-22, X and Y is wherein determined Colour solid aneuploidy.Step (a) can include at least a portion in these nucleic acid of test sample is sequenced, to obtain The fetus of the test sample and the sequence information of maternal nucleic acids molecule must be directed to.In some embodiments, step (c) is wrapped Include for each chromosome interested to calculate a monosome dosage, as each dye interested The number for this sequence label that colour solid identifies and the normalization chromosome for each chromosome interested The ratio for this sequence label number that recognition sequence goes out.In some other embodiments, step (c) includes:(i) by making The number of this sequence label identified in step (b) for each chromosome interested and each sense are emerging The length of the chromosome of interest is associated calculates a sequence label density ratio to be directed to each chromosome interested; (ii) by the number of this sequence label that makes to identify for each normalization sector sequence in step (b) with it is every The length of the individual normalization chromosome is associated calculates a sequence mark to be directed to each normalization sector sequence Sign density ratio;And it is each described to be directed to that (iii) is used in these sequence label density ratios calculated in step (i) and (ii) Chromosome interested calculates a monosome dosage, wherein the chromosome dosage is as emerging for each sense The sequence label density ratio of the chromosome of interest and the normalization sector sequence for each chromosome interested The ratio of sequence label density ratio calculates.
In any one of embodiments above, these different complete chromosome aneuploidy are selected from complete chromosome three Body, complete chromosome monosomy and complete chromosome polysomy.These coloured differently body aneuploidy are selected from chromosome 1- 22nd, the complete aneuploidy of any one in X and Y.For example, the different complete fetal chromosomal aneuploidy are selected from Trisomy 2, trisomy 8, trisomy 9, trisomy 20, trisomy 21, trisomy 13, trisomy 16, trisomy 18, trisomy 22nd, 47, XXX, 47, XYY and monosomy X.
In any one of embodiments above, for the test sample repeat step (a) from different female subjects- (d), and this method includes determining in each test sample, different complete presence or absence of any four or more The chromosome aneuploidy of fetus.
In any one of embodiments above, this method, which may further include, calculates a normalization chromosome value (NCV), wherein the NCV makes the average value of chromosome dosage chromosome dosage corresponding with one group of qualified samples It is associated, as:
WhereinWithIt is respectively correspondingly that estimation for j-th of the chromosome dosage in one group of qualified samples is put down Average and standard deviation, and xijIt is for j-th of chromosome dosage observed by test sample i.
Provide a method that in another embodiment, for including the parent test specimens of fetus and maternal nucleic acids Determined in product presence or absence of different, part fetal chromosomal aneuploidy.The step of this method, includes:(a) obtain The sequence information of fetus and maternal nucleic acids in the sample must be directed to;(b) it is selected from dyeing for each using the sequence information Any one or more sections of body 1-22, X and Y any one or more chromosomes interested identify a fixed number Aim sequence label and any one or more sections for being directed to each any one or more chromosomes interested Normalization sector sequence identify the sequence label of certain amount;(c) using for each interested any one Or the number of the sequence label that identifies of any one or more sections of multiple chromosomes and described return for each The number of the sequence label that one change sector sequence identifies is directed to described any one or more chromosomes interested Any one or more sections in each calculate a monosome dosage;And each sense will be directed to (d) Each single section dosage in any one or more sections of any one or more chromosomes of interest is every with being directed to One threshold value of any one or more sections of individual any one or more chromosomes interested is compared, and Thus determine in the sample presence or absence of one or more different, part non-multiples of fetal chromosomal Property.Step (a) can include at least a portion in these nucleic acid of test sample is sequenced, and the test is directed to obtain The fetus of sample and the sequence information of maternal nucleic acids molecule.
In some embodiments, step (c) is included for each described any one or more chromosomes interested Any one or more sections calculate a single section dosage, as each interested any one or more The number for this sequence label that any one or more sections of individual chromosome identify is each described interested with being directed to This sequence that the normalization sector sequence of any one or more sections of any one or more chromosomes identifies The ratio of the number of label.In some other embodiments, step (c) includes:(i) by making in step (b) for every The number of this sequence label identified in individual each section interested and each section interested Length is associated calculates a sequence label density ratio to be directed to each section interested;(ii) by making in step Suddenly the number of this sequence label identified in (b) for each normalization sector sequence and each normalization area Duan Xulie length is associated calculates a sequence label density ratio to be directed to each normalization sector sequence;And (iii) each area interested is directed to using these sequence label density ratios calculated in step (i) and (ii) Section calculates a monosome dosage, wherein the section dosage is as the sequence for each section interested Ratio of the label densities than the sequence label density ratio with the normalization sector sequence for each section interested Rate calculates.This method, which may further include, calculates a normalization section value (NSV), wherein the NSV make it is described The average value of section dosage section dosage corresponding with one group of qualified samples is associated, as:
WhereinWithAccordinglyIt is the estimation average value for j-th of the section dosage in one group of qualified samples And standard deviation, and xijIt is j-th observed of the section dosage for test sample i.
In multiple embodiments of illustrated method, thus chromosome dosage is determined using normalization sector sequence Or section dosage, this normalization sector sequence can be a list any one or more in chromosome 1-22, X and Y One section.Alternately, this normalization sector sequence can be any one or more in chromosome 1-22, X and Y One group of section.
It is recycled and reused for determining the tire presence or absence of part for multiple test samples from different female subjects The step of method of youngster's chromosome aneuploidy (a)-(d), and this method include determine exist in each sample Or in the absence of different, part fetal chromosomal aneuploidy.According to the fetal chromosomal of the confirmable part of this method Body aneuploidy includes the aneuploidy of the part of any fragment of any chromosome.The aneuploidy of these parts can be selected The multiplication of duplication, part, the insertion of part and partial missing from part.According to the non-multiple in the confirmable part of this method Property example include the partial monosomy of chromosome 1, the partial monosomy of chromosome 4, the partial monosomy of chromosome 5, the portion of chromosome 7 Divide monomer, the partial monosomy of chromosome 11, the partial monosomy of chromosome 15, the partial monosomy of chromosome 17, the portion of chromosome 18 Divide the partial monosomy of monomer and chromosome 22.
In any one of the embodiment above, this test sample can be selected from blood, blood plasma, serum, urine and One maternal sample of saliva sample.In any one of these embodiments, this test sample can be plasma sample. These nucleic acid molecules of maternal sample are the Cell-free DNA molecules of fetus and parent.It can be come using next generation's sequencing (NGS) These nucleic acid are sequenced.In some embodiments, sequencing is to use to be sequenced by the synthetic method of reversible dye-terminators Large-scale parallel sequencing.In other embodiments, sequencing is connection method sequencing.Still in other embodiments, sequencing is Single-molecule sequencing.Optionally, an amplification step is carried out before sequencing.
Provide a method that in another embodiment, for including the Cell-free DNA molecule of fetus and parent Mixture Maternal plasma test sample in determine presence or absence of any 20 kinds or more different, the complete tires of kind Youngster's chromosome aneuploidy.The step of this method, includes:(a) at least a portion in Cell-free DNA molecule is sequenced To obtain the sequence information of the Cell-free DNA molecule of the fetus being directed in the sample and parent;(b) believed using the sequence Cease and identified to be directed to each any 20 or more chromosome interested selected from chromosome 1-22, X and Y The sequence label of certain amount and to be directed to a normalization of each 20 or more chromosomes interested Chromosome identifies the sequence label of certain amount;(c) using for each 20 or more dyeing interested The number for the sequence label that body is identified and the sequence mark identified for each normalization chromosome The number of label for each 20 or more chromosomes interested calculates a monosome dosage;And (d) will be each with being directed to for each monosome dosage of each 20 or more chromosome interested One threshold value of 20 or more the chromosomes interested is compared, and thus determines in the sample Presence or absence of any 20 kinds or more kinds are different, complete fetal chromosomal aneuploidy.
In another embodiment, the invention provides for identifying a sequence interested in the test sample The method of the copy number variation (CNV) of (such as clinically related sequence), this method comprise the following steps:(a) one is obtained Test sample and multiple qualified samples, the test sample include test nucleic acid molecules and the multiple qualified sample, institute Stating multiple qualified samples includes qualified nucleic acid molecules;(b) fetus described in the sample and parent nucleic acid is obtained Sequence information;(c) sequencing based on the qualified nucleic acid molecules is calculated and felt in each the multiple qualified samples The qualified sequence dosage of the qualified sequence of interest, wherein it is described calculate qualified sequence dosage include determining it is interested described The parameter of qualified sequence and at least one qualified normalization sequence;(d) at least one is identified based on the qualified sequence dosage Individual qualified normalization sequence, wherein at least one qualified normalization sequence described in the multiple qualified samples has most Small variability and/or maximum resolvability;(e) sequencing based on the nucleic acid molecules described in the test sample, calculate The cycle tests dosage of the cycle tests interested, wherein the calculating cycle tests dosage is described interested including determining Cycle tests and at least one normalization cycle tests parameter, at least one normalization cycle tests corresponds to described At least one qualified normalization sequence;(f) the cycle tests dosage and at least one threshold value;And (g) is based on step Suddenly the result of (f) evaluates the copy number variation of sequence interested described in the test sample.In an implementation In scheme, this is set multiple to be mapped to for the parameter of the qualified sequence interested and at least one qualified normalization sequence The sequence label of the qualified sequence interested is closed with being mapped to this multiple label of the qualified normalization sequence Connection, and the parameter of the cycle tests wherein interested and at least one normalization cycle tests makes this multiple mapping Sequence label to the cycle tests interested is closed with this multiple label for being mapped to the normalization cycle tests Connection.In some embodiments, step (b) includes carrying out at least a portion in these qualified and test nucleic acid molecules Sequencing, wherein sequencing includes providing the sequence label and a qualified sequence interested, simultaneously of the multiple mappings for being used for test And at least one test and at least one qualified normalization sequence;To in the nucleic acid molecules of test sample at least A part is sequenced to obtain the sequence information of the fetus of the test sample and maternal nucleic acids molecule.In some embodiments Sequence measurement of future generation has been used to carry out this sequencing steps.In some embodiments, the sequence measurement can be advised greatly The parallel sequence measurement of mould, the wherein sequence measurement use to be sequenced by the synthetic method of reversible dye-terminators.In other embodiment party In case, the sequence measurement is connection method sequencing.In some embodiments, sequencing includes once expanding.In other embodiments In, sequencing is single-molecule sequencing.The CNV of sequence interested is a kind of aneuploidy, and it can be a chromosome or one Individual partial aneuploidy.In some embodiments, this chromosome aneuploidy is selected from trisomy 2, trisomy 8th, trisomy 9, trisomy 20, trisomy 16, trisomy 21, trisomy 13, trisomy 18, trisomy 22, Ge Laifudeshi are comprehensive Simulator sickness (klinefelter's syndrome), 47, XXX, 47, XYY and monomer X.In other embodiments, this portion The aneuploidy divided is a chromosome dyad missing or a chromosome dyad insertion.In some embodiments, by this The CNV of method identification is a kind of chromosome the or partial aneuploidy related to cancer.In some embodiments, this Sample test a bit and qualified is biological fluid sample, such as:Subject derived from pregnancy is (such as the human subjects of pregnancy Person) plasma sample.In other embodiments, test and qualified biological fluid sample (such as plasma sample) be From known or subject of the suspection with cancer.
For determining to wrap presence or absence of some methods of fetal chromosomal aneuploidy in parent test sample Include following operation:(a) sequence reads of the fetus and maternal nucleic acids in the parent test sample, wherein these sequences are provided Reading provides in electronic format;(b) computing device is used by these sequence reads and one or more chromosomes Reference sequences are compared, and thus provide the multiple sequence labels corresponding with these sequence reads;(c) with the side of calculating Formula identifies the number of these sequence labels from one or more chromosomes interested or chromosome segment interested, And identified in a manner of calculating from every in this or these chromosome interested or chromosome segment interested The number of at least one normalization chromosome sequence of one or these sequence labels of normalization chromosome segment sequence;(d) Using for each institute identified in one or more of chromosomes interested or chromosome segment interested State the number of sequence label and for each in the normalization chromosome sequence or normalization chromosome segment sequence The number of the sequence label identified, calculated in a manner of calculating for one or more of chromosomes interested Or a monosome of each in chromosome segment interested or section dosage;And filled using described calculate (e) Put for the monosome of each in one or more chromosomes interested or chromosome segment interested Each of dosage with for each in one or more of chromosomes interested or chromosome segment interested A respective threshold be compared, and thus in the test sample determine it is non-presence or absence of at least one fetus Ortholoidy.In some implementations, in this or these chromosome interested or chromosome segment interested The number of each sequence label identified is at least about 10,000 or at least about 100,000.Disclosed embodiment is also A kind of computer program product is provided, the computer program product includes a non-transitory computer-readable media, non-at this Provide on temporary computer-readable media and refer to for performing the operation and other programs for calculating operation described here Order.
In certain embodiments, chromosome reference sequences have multiple regions being excluded, these regions being excluded Naturally occur in chromosome but they do not influence the number of its sequence label for any chromosome or chromosome segment Mesh.In certain embodiments, a kind of method comprises additionally in:(i) reading that determines whether to pay attention to one with one A site on chromosome reference sequences is compared, and another reading for carrying out test sample in the site had previously been carried out Compare;And the reading that (ii) determines whether to pay attention to this is included for a chromosome interested or one Among the number of the sequence label of chromosome segment interested.Chromosome reference sequences are storable in computer-readable media On.
In certain embodiments, a kind of method is comprised additionally in the nucleic acid molecules to the parent test sample At least a portion is sequenced, to obtain the fetus and the sequence of maternal nucleic acids molecule that are directed to the test sample Information.Sequencing may include to carry out large-scale parallel sequencing to the parent from the parent test sample and fetal nucleic acid to produce sequence Row reading.
In certain embodiments, a kind of method further comprises providing the human experimenter's of the parent test sample The identified existence or non-existence non-multiple of fetal chromosomal such as in (d) is recorded automatically using processor in patient medical record card Property.Record may include to record chromosome dosage and/or the diagnosis based on the chromosome dosage in computer-readable media. In some cases, patient medical record card is by laboratory, doctor's office, hospital, HMO, insurance company or individual Case record website preserves.A kind of method can further comprise opening place to the human experimenter for obtaining the parent test sample Side, start to treat, and/or change treatment.Additionally or alternatively, this method may include to preengage and/or perform one or more Other test.
Some methods disclosed here identify chromosome or chromosome segment interested normalization chromosome sequence or Normalize chromosome segment sequence.Some methods describeds include following operation:(a) provide and be directed to chromosome interested or dyeing Multiple qualified samples of body section;(b) using multiple potential normalization chromosome sequences or normalization chromosome segment sequence Chromosome dosage is computed repeatedly to be directed to chromosome interested or chromosome segment, wherein this compute repeatedly is with a meter Device is calculated to perform;And individually or in one kind combines to normalization chromosome sequence or normalize chromosomal region (c) Duan Xulie is selected, so as to provide the change of minimum in the dosage calculated for chromosome or chromosome segment interested Different in nature and/or big resolvability.
The normalization chromosome sequence or normalization chromosome segment sequence selected can be normalization chromosome sequences Or a part for the combination of normalization chromosome segment sequence, or can be provided separately, rather than normalize chromosome with other Sequence or normalization chromosome segment sequence in combination.
The embodiment of disclosure provides a kind of method that copy number variation in Fetal genome is classified.This method Operation include:(a) sequence reads of the fetus and maternal nucleic acids in a parent test sample, wherein these sequences are received Row reading provides in electronic format;(b) these sequence reads are dyed with one or more using a computing device Body reference sequences are compared, and thus provide the multiple sequence labels corresponding with these sequence reads;(c) by using The computing device identifies the number of these sequence labels from one or more chromosomes interested in a manner of calculating, And determine that first chromosome interested in the fetus makes a variation with copy number;(d) by a kind of first method come Calculate a first fetus fractional value, the first method without using the label from first chromosome interested information; (e) a second fetus fractional value is calculated by a kind of second method, the second method is used from the first chromosome The information of label;And compare by the first fetus fractional value compared with the second fetus fractional value and using this (f) The copy number variation of the first chromosome is classified.In certain embodiments, this method further comprises to from this The Cell-free DNA of parent test sample is sequenced to provide these sequence reads.In certain embodiments, this method enters one Step includes obtaining the parent test sample from a pregnancy organism.In certain embodiments, (b) is operated including the use of one Computing device compares at least about 1,000,000 readings.In certain embodiments, operation (f) may include to determine two fetuses point Numerical value whether approximately equal.
In certain embodiments, operation (f) may further comprise determining two fetus fractional value approximately equals, and It is real to thereby determine that the ploidy implied in the second method is assumed.In certain embodiments, the second method In imply the ploidy assume be that first chromosome interested has complete chromosome aneuploidy.In these some realities Apply in scheme, the complete chromosome aneuploidy of first chromosome interested is monosomy or trisomy.
In certain embodiments, whether not operation (f) may include to determine two fetus fractional values approximately equal, and Further comprise analyzing that the label information of first chromosome interested to determine (i) first chromosome interested is band There is a kind of part aneuploidy, still (ii) fetus is a chimera.
In certain embodiments, this operation can also include casing the sequence of first chromosome interested into more Individual part;Determine any one in the part whether comprising significantly more more or significantly less than one or more other parts Nucleic acid;And if any one in the part includes the core significantly more more or significantly less than one or more other parts Acid, it is determined that first chromosome interested carries part aneuploidy.In one embodiment, the operation can be further Including first dyeing interested for determining to include the nucleic acid significantly more more or significantly less than one or more other parts One part of body carries part aneuploidy.
In one embodiment, operate (f) can also include by the vanning of the sequence of first chromosome interested into Some;Determine any one in the part whether comprising significantly more more than one or more other parts or significantly less Nucleic acid;And if all do not include the nucleic acid significantly more more or significantly less than one or more other parts in the part, It is a chimera then to determine the fetus.
Operation (e) may include:(a) calculate from first chromosome interested and at least one normalization chromosome sequence The number of the sequence label of row is to determine chromosome dosage;And use second method from the chromosome Rapid Dose Calculation fetus (b) Fractional value.In certain embodiments, this operation further comprises calculating normalized chromosome value (NCV), wherein this second Method uses the normalized chromosome value, and the wherein NCV is by the chromosome dosage and the phase in one group of qualified samples Answering the average of chromosome dosage is associated, as:
WhereinAnd σiUIt is the estimation average and standard for i-th of chromosome dosage in this group of qualified samples respectively Difference, and RiAIt is the chromosome dosage calculated for chromosome interested.In another embodiment, operation (d) enters one It is unbalanced using the allele in the fetus and maternal nucleic acids for showing the parent test sample that step includes first method The information of one or more polymorphisms calculates the first fetus fractional value.
In different embodiments, if the first fetus fractional value and the second fetus fractional value not approximately equal, the party Method further comprises that (i) determines that copy number variation is caused by part aneuploidy or chimera;And (ii) is if copy number Variation is caused by part aneuploidy, it is determined that the locus of the part aneuploidy on first chromosome interested. In certain embodiments, it is determined that the locus of part aneuploidy on first chromosome interested include by this These sequence labels of one chromosome interested are divided into nucleic acid data box or matrix in first chromosome interested;And And these map tags in each data box are counted.
Operation (e) can further comprise by calculating fetus fractional value to following formula evaluation:
Ff=2 × | NCViACViU|
Wherein ff is the second fetus fractional value, NCViAIt is the normalizing in an impacted sample on i-th of chromosome The chromosome value of change, and CViUIt is the coefficient of variation of the dosage of the chromosome interested determined in these qualified samples.
In more than one embodiment in office, first chromosome interested is to be selected from the group, the group by chromosome 1 to 22nd, X and Y compositions.In more than one embodiment in office, copy number variation can be categorized into one be selected from the group by operation (f) Classification, the group are made up of the following:Complete chromosome insertion, complete chromosome missing, chromosome dyad replicates and part Chromosome deficiency and chimera.
Disclosed embodiment also provides a kind of computer program product, and the computer program product is non-temporary including one When property computer-readable media, provided in the non-transitory computer-readable media for the copy in Fetal genome The programmed instruction that number variation is classified.The computer program product may include:(a) it is used to receive to come from a parent test specimens The code of the sequence reads of fetus and maternal nucleic acids in product, wherein these sequence reads provide in electronic format;(b) It is used to these sequence reads be compared with one or more chromosome reference sequences and thus using a computing device The code of the multiple sequence labels corresponding with these sequence reads is provided;(c) it is used for by using the computing device to calculate Mode identify the number from these sequence labels of one or more chromosomes interested and determine in the fetus First chromosome interested carry copy number variation code;(d) it is used to calculate the by a kind of first method The code of one fetus fractional value, the first method without using the label from first chromosome interested information;(e) use In the code that the second fetus fractional value is calculated by a kind of second method, the second method is used from the first chromosome The information of label;And (f) be used for by the first fetus fractional value with the second fetus fractional value compared with and using this Compare the code classified to the copy number variation of the first chromosome.In certain embodiments, the computer program produces Product include the code for the different operations and method being used in any one embodiments above of disclosed method.
The embodiment of disclosure also provides the system that a kind of copy number variation in Fetal genome is classified.This is System includes:(a) it is used at least about 10,000 sequences for receiving fetus and maternal nucleic acids in a parent test sample One interface of reading, wherein these sequence reads provide in electronic format;(b) it is multiple for temporarily, at least storing The memory of the sequence reads;(c) processor, the processor are specially designed or configured to carry multiple programmed instruction, this A little programmed instruction are used for:(i) these sequence reads are compared with one or more chromosome reference sequences, and thus carried For the multiple sequence labels corresponding with these sequence reads;(ii) this from one or more chromosomes interested is identified One number of a little sequence labels, and determine that first chromosome interested in the fetus makes a variation with copy number; (iii) a first fetus fractional value is calculated by a kind of first method, the first method is without using emerging from first sense The information of the label of the chromosome of interest;(iv) a second fetus fractional value, the second party are calculated by a kind of second method Method uses the information of the label from the first chromosome;And by the first fetus fractional value and the second fetus fraction (v) Value is compared and relatively the copy number variation of the first chromosome is classified using this.According to different embodiment party Case, the first chromosome interested is to be selected from the group, and the group is made up of chromosome 1 to 22, X and Y.In certain embodiments, Programmed instruction for (c) (v) includes being used for the programmed instruction that the copy number variation is categorized into be selected from the group classification, The group is made up of the following:Complete chromosome insertion, complete chromosome missing, chromosome dyad replicates and part is dyed Body lacks and chimera.According to different embodiments, the system may include to from the acellular of the parent test sample DNA is sequenced to provide the programmed instruction of these sequence reads.According to some embodiments, for operating the program of (c) (i) Instruct the programmed instruction for being used at least about 1,000,000 readings of comparison including the use of computing device.
In certain embodiments, the system also includes a sequenator, and the sequenator is configurable for a mother Fetus and maternal nucleic acids in body examination test agent are sequenced and provide sequence reads in electronic format.In different embodiment party In case, the sequenator is located in the facility separated with the processor, and the sequenator is connected with the processor by network.
In different embodiments, system is still further comprised for obtaining parent test sample from a pregnant mothers Device.According to some embodiments, the device for obtaining parent test sample is located in facility out of the ordinary with the processor. In different embodiments, system also includes being used for the device from parent test sample extraction Cell-free DNA.In some implementations In scheme, the device for extracting Cell-free DNA is located in same facility with the sequenator, and for obtaining parent survey The device of test agent is in a distal end facility.
According to some embodiments, for the journey by the first fetus fractional value compared with the second fetus fractional value Sequence instruction also includes being used to determine the whether approximately equalised programmed instruction of two fetus fractional values.
In certain embodiments, the system also includes being used to determine second in two fetus fractional value approximately equals It is real programmed instruction that the ploidy implied in method, which is assumed,.In certain embodiments, times implied in second method Number property assumes it is that first chromosome interested has complete chromosome aneuploidy.In certain embodiments, this first The complete chromosome aneuploidy of chromosome interested is monosomy or trisomy.
In certain embodiments, the system also include being used for analyzing the label information of first chromosome interested with It is determined that (i) first chromosome interested is to carry a kind of part aneuploidy, still (ii) fetus is a chimera Programmed instruction, wherein for analyze these programmed instruction be configured in for by the first fetus fractional value with should The programmed instruction that second fetus fractional value is compared indicate two fetus fractional values not approximately equal when perform.In some realities Apply in scheme, include for analyzing the programmed instruction of label information of first chromosome interested:For this first to be felt Programmed instruction of the sequence vanning of the chromosome of interest into some;Whether any one for determining in the part includes The programmed instruction of significantly more or significantly less nucleic acid than one or more other parts;And if in the part Any one includes the nucleic acid significantly more more or significantly less than one or more other parts, it is determined that first dye interested Colour solid carries a kind of programmed instruction of part aneuploidy.In certain embodiments, the system further comprises being used to determine One of first chromosome interested comprising the nucleic acid significantly more more or significantly less than one or more other parts Part carries the programmed instruction of the part aneuploidy.
In certain embodiments, for the programmed instruction bag for the label information for analyzing first chromosome interested Include:For the programmed instruction by the sequence vanning of first chromosome interested into some;For determining the part In any one whether include the programmed instruction of the nucleic acid significantly more or significantly less than one or more other parts;And If for all not including the nucleic acid significantly more more or significantly less than one or more other parts in the part, it is determined that should Fetus is the programmed instruction of a chimera.
According to different embodiments, the system may include that the program of the second method for calculating fetus fractional value refers to Order, these programmed instruction include:(a) it is used to calculate from first chromosome interested and at least one normalization chromosome The number of the sequence label of sequence is to determine the programmed instruction of chromosome dosage;It is used to use second method from the dyeing (b) The programmed instruction of body Rapid Dose Calculation fetus fractional value.
In certain embodiments, the system further comprises being used for the program for calculating normalized chromosome value (NCV) Instruction, wherein the programmed instruction for second method includes being used for the programmed instruction using the normalized chromosome value, and Programmed instruction wherein for the NCV is by the average of chromosome dosage chromosome dosage corresponding in one group of qualified samples It is associated, as:
WhereinAnd σiUIt is the estimation average and standard for i-th of chromosome dosage in this group of qualified samples respectively Difference, and RiAIt is the chromosome dosage calculated for chromosome interested.In different embodiments, for the first party The programmed instruction of method includes being used for using the allele in the fetus and maternal nucleic acids for showing the parent test sample not The information of one or more polymorphisms of balance calculates the programmed instruction of the first fetus fractional value.
According to different embodiments, for calculate the programmed instruction of second method of fetus fractional value include being used for The programmed instruction of lower evaluation of expression:
Ff=2 × | NCViACViU|
Wherein ff is the second fetus fractional value, NCViAIt is the normalizing in an impacted sample on i-th of chromosome The chromosome value of change, and CViUIt is the coefficient of variation of the dosage of the chromosome interested determined in these qualified samples.
According to different embodiments, the system further comprises:(i) it is used to determine that the copy number variation is by a kind of portion Divide programmed instruction caused by aneuploidy or a chimera;(ii) is if non-whole by a kind of part for the copy number variation Ploidy causes, it is determined that the programmed instruction of the locus of the part aneuploidy on first chromosome interested, its In these programmed instruction in (i) and (ii) be configured in for by the first fetus fractional value and second fetus point These programmed instruction that numerical value is compared determine the first fetus fractional value and the second fetus fractional value not approximately equal when Perform.
In certain embodiments, the locus of the part aneuploidy for determination on the first chromosome interested Programmed instruction include being used to for the sequence label of the first chromosome interested to be divided into core in the first chromosome interested The programmed instruction of sour data box or matrix;Refer to the program for being counted to these map tags in each data box Order.
In certain embodiments, there is provided for the presence of identification cancer and/or cancer in mammal (such as mankind) The increased method of risk, wherein these methods include:(a) nucleic acid in a test sample from the mammal is provided Sequence reads, wherein the test sample may include genomic nucleic acids from cancer cell or precancerous cell with from forming The genomic nucleic acids of (germline) cell, wherein these sequence reads provide in electronic format;(b) calculated and filled using one Put and these sequence reads are compared with one or more chromosome reference sequences, and thus provide and these sequence reads Corresponding multiple sequence labels;(c) identified in a manner of calculating from one or more known amplifications or missing and cancer Related chromosome interested or known amplification or missing and the fetus of the related chromosome segment interested of cancer With the number of the sequence label of maternal nucleic acids, wherein the chromosome or chromosome segment be selected from chromosome 1 to 22, X and Y with And its section, and identified in a manner of calculating for this or these chromosome interested or chromosomal region interested The number of the sequence label of at least one normalization chromosome sequence of each or normalization chromosome segment sequence in section, Wherein for each sequence mark identified in this or these chromosome interested or chromosome segment interested The number of label at least about 2,000, or at least about 5,000, or at least about 10,000;(d) use and be directed to one or more of senses The number of each described sequence label identified in the chromosome of interest or chromosome segment interested and it is directed to The normalization chromosome sequence or the number for normalizing each described sequence label identified in chromosome segment sequence Mesh, calculated in a manner of calculating in one or more of chromosomes interested or chromosome segment interested The monosome of each or section dosage;And it will be directed to using the computing device one or more interested (e) Chromosome or the monosome dosage of each in chromosome segment interested each with for described one Each respective threshold in individual or multiple chromosomes interested or chromosome segment interested is compared, and And thus determined in the sample presence or absence of aneuploidy, wherein the aneuploidy exists and/or described is directed to Each sequence label number identified in this or these chromosome interested or chromosome segment interested increases Instruction is added cancer and/or risk of cancer increase to be present.In certain embodiments, risk increase is (such as early with different time Phase) same subject be compared, with reference group (such as sex and/or race and/or age etc. optionally adjust) It is compared, compared with the similar subject without certain risk factor etc..In certain embodiments, dye interested Colour solid or chromosome segment interested include amplification and/or missing is known related complete with cancer (such as described herein) Chromosome.In certain embodiments, chromosome interested or chromosome segment interested are included known to amplification or missing With the related chromosome segment of one or more cancers.In certain embodiments, chromosome segment includes substantially full dye Colour solid arm (such as described herein).In certain embodiments, chromosome segment includes whole chromosome aneuploidy.Some In embodiment, whole chromosome aneuploidy includes losing, and in certain other embodiments, whole chromosome aneuploidy bag Include acquisition (such as acquisition as shown in table 1 or loss).In certain embodiments, chromosome segment interested is essence Any one or more galianconism or long-armed in the fragment of upper arm level, including chromosome 1 to 22, X and Y.In some embodiment party In case, aneuploidy includes the missing of the amplification of the substantive arm horizontal segment of chromosome or the substantive arm horizontal segment of chromosome. In certain embodiments, chromosome segment interested substantially comprises the one or more arms being selected from the group, the group by with Lower every composition:1q、3q、4p、4q、5p、5q、6p、6q、7p、7q、8p、8q、9p、9q、10p、10q、12p、12q、13q、14q、 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and/or 22q.In certain embodiments, aneuploidy includes The amplification for the one or more arms being selected from the group, the group are made up of the following:1q、3q、4p、4q、5p、5q、6p、6q、7p、 7q、8p、8q、9p、9q、10p、10q、12p、12q、13q、14q、16p、17p、17q、18p、18q、19p、19q、20p、20q、 21q、22q.In certain embodiments, aneuploidy includes the missing for the one or more arms being selected from the group, and the group is by following Items composition:1p、3p、4p、4q、5q、6q、8p、8q、9p、9q、10p、10q、11p、11q、13q、14q、15q、16q、17p、 17q、18p、18q、19p、19q、22q.In certain embodiments, chromosome segment interested is to include table 3 and/or table 5 And/or the region shown in table 4 and/or table 6 and/or the fragment of gene.In certain embodiments, aneuploidy includes table 3 And/or the region shown in table 5 and/or the amplification of gene.In certain embodiments, aneuploidy is included in table 4 and/or 6 Shown region and/or the missing of gene.In certain embodiments, chromosome segment interested be it is known containing a kind of or A variety of oncogenes and/or the fragment of one or more tumor suppressor genes.In certain embodiments, aneuploidy includes being selected from The amplification in one or more regions of the following group, the group are made up of the following:20Q13、19q12、1q21-1q23、8p11-p12、 And ErbB2.In certain embodiments, aneuploidy includes the expansion in the region of one or more genes for including being selected from the group Increase, the group is made up of the following:MYC, ERBB2 (EFGR), CCND1 (cycle element D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2 and CDK4 etc..In some realities Apply in scheme, cancer is the cancer being selected from the group, and the group is made up of the following:Leukaemia, ALL, the cancer of the brain, breast cancer, colon are straight Intestinal cancer, dedifferentiated liposarcoma, esophageal adenocarcinoma, esophageal squamous cell carcinoma, GIST, glioma, HCC, liver cell cancer, Lung cancer, lung NSC, lung SC, medulloblastoma, melanoma, MPD, myeloproliferative disorders, cervix cancer, oophoroma, prostate Cancer and kidney.In certain embodiments, biological sample includes the sample being selected from the group, and the group is made up of the following:Entirely Blood, clot, saliva/saliva, urine, tissue biopsy, liquor pleurae, pericardial fluid, Cerebrospinal fluid and peritoneal fluid.In some embodiments In, chromosome reference sequences have multiple regions being excluded, these regions being excluded naturally occur in chromosome but They do not influence the number of its sequence label for any chromosome or chromosome segment.In certain embodiments, should Method further comprises determining whether the reading that one is paid attention to and a site on a chromosome reference sequences It is compared, and another reading had previously been compared in the site;And determine whether the reading that this is paid attention to It is included among the number of the sequence label for a chromosome interested or a chromosome segment interested, wherein Two determine that operation is all performed with the computing device.In different embodiments, this method further comprises at least Temporarily sequence of the storage for nucleic acid described in the sample in a kind of computer-readable media (such as non-transitory media) Information.In certain embodiments, step (d) is counted including being directed to one selected in section interested in a manner of calculating It is selected with being directed to this as the number of the sequence label identified for the selected section interested to calculate section dosage Corresponding at least one normalization chromosome sequence of section interested normalizes the sequence that chromosome segment sequence is identified The ratio of the number of label.In certain embodiments, one or more of chromosome segments interested include at least five Or at least ten or at least 15 or at least 20 or at least 50 or at least 100 different sections interested.Some In embodiment, at least five or at least ten or at least 15 or at least 20 or at least 50 or at least 100 differences are detected Aneuploidy.In certain embodiments, at least one normalization chromosome sequence includes the one or more being selected from the group Chromosome, the group are made up of chromosome 1 to 22, X and Y.In certain embodiments, for each section, described at least one Individual normalization chromosome sequence includes the chromosome corresponding with the chromosome that the section is located at.In some embodiments In, for each section, at least one normalization chromosome sequence includes the chromosome segment phase with being just normalized Corresponding chromosome segment.In certain embodiments, at least one normalization chromosome sequence or normalization chromosome segment Sequence is the chromosome or section selected for a kind of associated chromosome interested or section, this be by with What under type was carried out, i.e.,:(i) multiple qualified samples of the identification for the section interested;(ii) potentially returned using multiple One change chromosome sequence or normalization chromosome segment sequence compute repeatedly chromosome dosage to be directed to the chromosome selected; And (iii) individually or in one kind combines selects the normalization chromosome segment sequence, so as to calculated The variability of minimum and/or the resolvability of maximum are provided in chromosome dosage.In certain embodiments, this method is further Including calculating normalized section value (NSV), wherein as described in this, the NSV is qualified by the section dosage and one group The average of respective section dosage is associated in sample.In certain embodiments, it is chromosome 1 to 22, X to normalize sector sequence With a single section any one or more in Y.In certain embodiments, normalization sector sequence be chromosome 1 to 22, One group of any one or more section in X and Y.In certain embodiments, normalizing sector sequence includes substantially dyeing An any one or more arm in body 1 to 22, X and Y.In certain embodiments, this method further comprises to described At least a portion in the nucleic acid molecules of test sample is sequenced, to obtain the sequence information.In some implementations In scheme, sequencing includes the Cell-free DNA for carrying out test sample is sequenced to provide sequence information.In some embodiments In, sequencing includes the cell DNA for carrying out test sample is sequenced to provide sequence information.In certain embodiments, survey Sequence includes large-scale parallel sequencing.In certain embodiments, (these) method is somebody's turn to do to further comprise providing test sample Recorded automatically in the patient medical record card of human experimenter it is identified presence or absence of a kind of aneuploidy such as in (d), its In the record performed using processor.In certain embodiments, record is included in a kind of computer-readable media Record chromosome dosage and/or the diagnosis based on the chromosome dosage.In different embodiments, patient medical record card be by Laboratory, doctor's office, hospital, HMO, insurance company or personal medical records card website preserve.Some In embodiment, it is determined that including presence or absence of the aneuploidy and/or number in a kind of antidiastole for cancer A factor.In certain embodiments, the detection instruction positive findings of aneuploidy, and methods described further comprises Treatment is prescribed, starts to treat, and/or changed to the human experimenter for taking test sample.In certain embodiments, surveyed to taking The human experimenter of test agent, which prescribes, starts to treat, and/or changed treatment, includes prescribing and/or performing further diagnosis To determine the presence of cancer and/or the order of severity.In certain embodiments, further diagnosis includes being directed to cancer biomarkers Thing, the sample from the subject is screened, and/or for cancer, the subject is imaged.In some embodiments In, when methods described indicates neoplastic cell to be present in the mammal, treat the mammal or make the lactation Animal is treated, with growth or the propagation for removing the neoplastic cell and/or suppressing the neoplastic cell.In some realities Apply in scheme, treatment mammal includes removing neoplastic (such as tumour) cell by performing the operation.In certain embodiments, control Treating mammal includes performing the mammal radiotherapy or the mammal is performed radiotherapy, to kill Dead neoplastic cell.In certain embodiments, treating mammal includes giving or making the mammal be given anticancer Medicine (such as matuzumab (matuzumab), Erbitux (erbitux), dimension gram are replaced than (vectibix), Buddhist nun's trastuzumab (nimotuzumab), matuzumab, Victibix (panitumumab), fluorouracil (flourouracil), Ka Peita Shore (capecitabine), 5-trifluoromethyl-2'-deoxyuridine (5-trifluoromethy1-2'-deoxyuridine), first Aminopterin (methotrexate), Raltitrexed (raltitrexed), pemetrexed (pemetrexed), cytarabine (cytosine arabinoside), Ismipur (6-mercaptopurine), imuran (azathioprine), 6- Thioguanine (6-thioguanine), Pentostatin (pentostatin), fludarabine (fludarabine), carat are bent Shore (cladribine), floxuridine (FUDR) (floxuridine), endoxan (cyclophosphamide), knob husky (neosar), Ifosfamide (ifosfamide), thiotepa (thiotepa), 1,3- double (2- chloroethyls) -1- nitroso ureas, 1- (2- chloroethenes Base) -3- cyclohexyl -1- nitroso ureas, hemel (hexamethylmelamine), busulfan (busulfan), the third kappa Hydrazine (procarbazine), dacarbazine (dacarbazine), Chlorambucil (chlorambucil), melphalan (melphalan), cis-platinum (cisplatin), Kapo Platinum (carboplatin), oxaliplatin (oxaliplatin), benzene is not up to Take charge of spit of fland (bendamustine), BCNU (carmustine), mustargen (chloromethine), dacarbazine, Fotemustine (fotemustine), lomustine (lomustine), mannosulfan (mannosulfan), Nedaplatin (nedaplatin), Buddhist nun Mo Siting (nimustine), prednimustine (prednimustine), Ranimustine (ranimustine), Satraplatin (satraplatin), Semustine (semustine), streptozotocin (streptozocin), Temozolomide (temozolomide), Treosulfan (treosulfan), triethyleneiminobenzoquinone (triaziquone), triethylenemelamine (triethylene melamine), thiotepa (thiotepa), the platinum of four nitric acid three (triplatin tetranitrate), chlorine Second endoxan (trofosfamide), uracil mastard (uramustine), small red mould (doxorubicin), daunomycin (daunorubicin), mitoxantrone (mitoxantrone), Etoposide (etoposide), Hycamtin (topotecan), Teniposide (teniposide), Irinotecan (irinotecan), Ka Motuosha (camptosar), happiness Set alkali (camptothecin), Belotecan (belotecan), rubitecan (rubitecan), vincristine (vincristine), vinblastine (vinblastine), vinorelbine (vinorelbine), eldisine (vindesine), taxol (paclitaxel), Docetaxel (docetaxel), Ah cloth Kern (abraxane), Yi Sha Grand (ixabepilone), La Ruotaxi (larotaxel), Ao Tataxi (ortataxel), Te Saitaxi (tesetaxel), the easypro Buddhist nun of vinflunine (vinflunine), imatinib mesylate (imatinib mesylate), malic acid For Buddhist nun (sunitinib malate), Sorafenib Tosylate (sorafenib tosylate), AMN107 hydrochloride list Hydrate/, Ta Sina (tasigna), Sai Makeni (semaxanib), ZD6474 (vandetanib), PTK787 (vatalanib), retinoic acid (retinoic acid), retinoic acid derivatives etc.).
In another embodiment, there is provided one kind is used to determine cancer presence and/or risk of cancer in mammal Increased computer program product.The computer program product typically comprises:(a) it is used to provide from the mammal The code of the sequence reads of nucleic acid in one test sample, wherein the test sample may include before cancer cell or cancer carefully The genomic nucleic acids of born of the same parents and the genomic nucleic acids from composition (germline) cell, wherein these sequence reads are to come in electronic format There is provided;(b) it is used to be compared these sequence reads and one or more chromosome reference sequences using a computing device Pair and thus the code of multiple sequence labels corresponding with these sequence reads is provided;(c) it is used for the pin in a manner of calculating To from known to one or more expand or lack with the related chromosome interested of cancer it is known amplification or missing with The related chromosome segment interested of cancer identifies the number of the sequence label from fetus and maternal nucleic acids, wherein institute State chromosome or chromosome segment is selected from chromosome 1 to 22, X and Y and its section, and this is identified in a manner of calculating Or at least one normalization chromosome sequence of each in these chromosomes interested or chromosome segment interested Or the number destination code of the sequence label of normalization chromosome segment sequence, wherein for this or these chromosome interested Or the number of sequence label that each in chromosome segment interested is identified is at least about 10,000;(d) pin is used To each described sequence identified in one or more of chromosomes interested or chromosome segment interested The number of label and for it is described normalization chromosome sequence or normalize chromosome segment sequence in each identified The sequence label number, calculated in a manner of calculating for one or more of chromosomes interested or sense it is emerging The code of each monosome or section dosage in the chromosome segment of interest;And (e) is filled using described calculate Put for the monosome of each in one or more chromosomes interested or chromosome segment interested Each of dosage and each one in one or more of chromosomes interested or chromosome segment interested Individual respective threshold is compared and the code presence or absence of aneuploidy, wherein institute is thus determined in the sample Aneuploidy is stated to exist and/or described for every in this or these chromosome interested or chromosome segment interested One sequence label number increase instruction cancer identified exists and/or risk of cancer increase.In different embodiments, Code provides the instruction for performing the diagnostic method as described in above (and hereafter).
The method that treating cancer subject is also provided.In certain embodiments, these methods include performing as in this institute The one kind stated be used in mammal identify cancer exist and/or the increased method of risk of cancer, this method use come from by The sample of examination person or the result for receiving the such method performed to the sample;And when this method individually or with from pin A kind of other one or more indexs of antidiastole to cancer are combined and show to exist in the subject neoplastic thin During born of the same parents, subject is treated, or is treated subject, with the growth for removing neoplastic cell and/or suppressing neoplastic cell Or propagation.In certain embodiments, treating the subject includes removing cell by performing the operation.In certain embodiments, Treatment subject includes performing subject radiotherapy or makes execution radiotherapy, to kill the neoplastic cell. In certain embodiments, treat subject include give or make subject be given anticarcinogen (such as matuzumab, love must Appropriate, dimension gram replace than, Buddhist nun's trastuzumab, matuzumab, Victibix, fluorouracil, capecitabine, 5- trifluoromethyls -2'- take off Oxygen uridine, methotrexate (MTX), Raltitrexed, pemetrexed, cytarabine, Ismipur, imuran, 6-thioguanine, Double (the 2- of Pentostatin, fludarabine, Cladribine, floxuridine (FUDR), endoxan, knob sand, ifosfamide, thiotepa, 1,3- Chloroethyl) -1- nitroso ureas, 1- (2- chloroethyls) -3- cyclohexyl -1- nitroso ureas, hemel, busulfan, procarbazine, Dacarbazine, Chlorambucil, melphalan, cis-platinum, Kapo Platinum, oxaliplatin, bendamustine, BCNU, mustargen, nitrence Azoles amine, Fotemustine, lomustine, mannosulfan, Nedaplatin, Nimustine, prednimustine, Ranimustine, Satraplatin, department are not Take charge of spit of fland, streptozotocin, Temozolomide, Treosulfan, triethyleneiminobenzoquinone, triethylenemelamine, thiotepa, the platinum of four nitric acid three, chloroethene ring Phosphamide, uracil mastard, small red mould, daunomycin, mitoxantrone, Etoposide, Hycamtin, Teniposide, replace according to vertical Health, Ka Motuosha, camptothecine, Belotecan, rubitecan, vincristine, vinblastine, vinorelbine, eldisine, Japanese yew Alcohol, Docetaxel, Ah cloth Kern, Ipsapirone, La Ruotaxi, Ao Tataxi, Te Saitaxi, vinflunine, methanesulfonic acid she Imatinib, Sunitinib malate, Sorafenib Tosylate, AMN107 hydrochloride monohydrate/, Ta Sina, Sai Make Buddhist nun, ZD6474, PTK787, retinoic acid, retinoic acid derivatives etc.).
The method that the treatment of monitoring oncological patients is also provided.In different embodiments, these methods, which are included in, to be controlled A sample from subject is performed during treating preceding or treatment as described herein a kind of for being identified in mammal Cancer exists and/or the increased method of risk of cancer or the result for receiving the such method performed to the sample;And treating Perform this method after the slightly slow time of period or treatment again to second sample from subject or receive to this second The result for such method that sample performs;Aneuploidy in wherein second of measurement (such as compared with first time measures) Number or the order of severity reduce (such as aneuploidy frequency reduce and/or some aneuploidy reduce or in the absence of) instruction is positive The course for the treatment of and second measure in (such as compared with first time measures) number of aneuploidy or the order of severity it is identical or Increase indicates the negative course for the treatment of, and when the instruction is negative, the therapeutic scheme is adjusted to more aggressive treatment Scheme and/or palliative therapy scheme.
It is additionally provided in the side for the fraction that fetal nucleic acid is determined in the maternal sample of the mixture comprising fetus and maternal nucleic acids Method.In one embodiment, it is described to be used to determine that the method for fetus fraction includes in a maternal sample:(a) receive From the fetus in the parent test sample and the sequence reads of maternal nucleic acids;(b) these sequence reads are contaminated with one or more Colour solid reference sequences are compared, and thus provide the multiple sequence labels corresponding with these sequence reads;(c) identify Come from and be selected from chromosome 1 to 22, X and Y and one or more chromosomes interested of its section or dyeing interested One number of those sequence labels of body section, and for this or these chromosome interested or dyeing interested Each in body section is identified from that of at least one normalization chromosome sequence or normalization chromosome segment sequence One number of a little sequence labels, to determine a chromosome dosage or chromosome segment dosage, wherein, it is one or more of Chromosome interested or chromosome segment interested have copy number variation;Using with step (c) identified (d) It is described to copy number variation corresponding the chromosome dosage or chromosome segment dosage to determine the fetus fraction.At some In embodiment, the copy number variation is by by one or more of chromosomes interested or chromosome interested The dosage of each chromosome or chromosome segment in section is with being directed to one or more of chromosomes interested or sense One respective threshold of each chromosome or chromosome segment in the chromosome segment of interest is compared, come what is determined. Copy number variation can be selected from the group, and the group is made up of the following:Complete chromosome replicates, complete chromosome lacks, part Duplication, part multiplication, partial insertion and excalation.
In certain embodiments, the chromosome in step (c) or section Rapid Dose Calculation is for the selected senses The number for the sequence label that the chromosome or section of interest are identified and the chromosome interested for selecting or section The number for the sequence label that corresponding at least one normalization chromosome sequence or normalization chromosome segment sequence are identified Ratio.In some embodiments, the chromosome in step (c) or section Rapid Dose Calculation are the selected dyeing interested At least one of the sequence label density ratio of body or section and each selected chromosome interested or section corresponding returns One changes chromosome sequence or normalizes the ratio of the sequence label density ratio of chromosome segment sequence.
In certain embodiments, this method further comprises calculating a normalization chromosome value (NCV), and it is fallen into a trap Calculating the NCV is associated the average value of chromosome dosage chromosome dosage corresponding with one group of qualified samples, makees For:
WhereinAnd σiUIt is accordingly the estimation average value for i-th of the chromosome dosage in this group of qualified samples And standard deviation, and RiAIt is to be directed to the chromosome dosage that i-th of chromosome calculates in test sample, wherein described i-th Individual chromosome is the chromosome interested.Fetus fraction is determined then according to following formula:
Ff=2 × | NCViiCViU|
Wherein ff is fetus fractional value, NCViAIt is normalized on i-th of chromosome in an impacted sample Chromosome value, and CViUIt is the coefficient of variation of the dosage of i-th of the chromosome determined in the qualified samples, wherein described I-th of chromosome is the chromosome interested.
In certain embodiments, the fetus fraction determines that wherein the NSV makes using a normalization section value (NSV) The average value of chromosome segment dosage chromosome segment dosage corresponding with one group of qualified samples is associated, and is made For:
WhereinAnd σiUIt is accordingly that estimation for i-th of the chromosome segment dosage in this group of qualified samples is put down Average and standard deviation, and RiAIt is to be directed to the chromosome segment dosage that i-th of chromosome segment calculates in test sample, its Described in i-th of chromosome segment be the chromosome segment interested.Fetus fraction is determined then according to following formula:
Ff=2 × | NSViACViU|
Wherein ff is fetus fractional value, NSViAIt is the normalizing in an impacted sample on i-th of chromosome segment The chromosomal region segment value of change, and CViUIt is the variation lines of the dosage of i-th of the chromosome segment determined in the qualified samples Number, wherein i-th of chromosome segment is the chromosome segment interested.
In certain embodiments, the chromosome interested is the X chromosome of chromosome 1-22 or male fetus Any one chromosome, the chromosome segment interested is selected from the X chromosome of chromosome 1-22 or male fetus.
In certain embodiments, for the method that determines fetus fraction multiple embodiments at least one normalizing It is selected by a kind of associated chromosome or section interested to change chromosome sequence or normalization chromosome segment sequence A fixed chromosome or section, this is carried out in the following manner, i.e.,:(i) identification for the chromosome interested or Multiple qualified samples of section;(ii) come using multiple potential normalization chromosome sequences or normalization chromosome segment sequence The chromosome or section selected for this compute repeatedly chromosome dosage or chromosome segment dosage;And (iii) is individually Or the normalization chromosome sequence or normalization chromosome segment sequence are selected in one kind combines, so as to calculated Chromosome dosage or chromosome segment dosage in provide minimum variability or maximum resolvability.Normalize chromosome sequence Row can be a monosome any one or more in chromosome 1 to 22, X and Y.Alternately, chromosome is normalized Sequence can be that a group chromosome of any chromosome in chromosome 1 to 22, X and Y is same, and normalization sector sequence can be dye An any one or more single section in colour solid 1 to 22, X and Y.Alternately, normalization sector sequence can be dyeing One group of any one or more section in body 1 to 22, X and Y.
In certain embodiments, the method for determining fetus fraction can also be included such as described fetus obtained Fraction is unbalanced with that can use the allele in these fetuses and maternal nucleic acids for showing the parent test sample Fetus fraction is compared determined by the information of one or more polymorphisms.For determining that the unbalanced method of allele exists The application's is described elsewhere, and including the use of the polymorphic difference between fetus and maternal gene group (including but not The difference for being limited to detect in SNP or STR sequences) determine fetus fraction.
In certain embodiments, this method further comprises temporarily, at least storing sequence reads.
Provide a kind of additional method of the copy number variation classification by Fetal genome.The extra method includes: (a) sequence reads of the fetus and maternal nucleic acids in a parent test sample are obtained;(b) by these sequence reads and one Individual or multiple chromosome reference sequences are compared, and thus provide the multiple sequence marks corresponding with these sequence reads Label;(c) number of these sequence labels from one or more chromosomes interested is identified, and is determined in the fetus First chromosome interested with a kind of copy number variation;(d) one first is calculated by a kind of first method Fetus fractional value, the first method without using these labels from first chromosome interested information;(e) one is passed through Second method is planted to calculate a second fetus fractional value, the second method uses these labels from the first chromosome Information;And (f) by the first fetus fractional value compared with the second fetus fractional value and using this relatively to this The copy number variation of one chromosome is classified.
In certain embodiments, the first party of fetus fractional value is calculated described in (d) such as the step of the extra method Method includes:It is unbalanced a kind of or more using the allele in the fetus and maternal nucleic acids for showing the parent test sample The information of kind of polymorphism calculates the first fetus fractional value;Such as the step of the extra method fetus point is calculated described in (e) The second method of numerical value includes:(a) calculate from first chromosome interested and at least one normalization chromosome sequence Sequence label number to determine chromosome dosage;And using the second method from the chromosome Rapid Dose Calculation tire (b) Youngster's fractional value.
In certain embodiments, the information that the first method uses includesBy being carried out to predetermined polymorphic sequence The sequence label obtained is sequenced, each of the polymorphic sequence includes one or more of polymorphic sites.In some implementations In scheme, the information that the first method uses is obtained by non-sequence measurement, such as is surveyed by qPCR, digital pcr, mass spectrum The methods of determining method or capillary gel electrophoresis obtains.
In certain embodiments, the first method including the use of come from without copy number variation chromosome or dye The tag computation of colour solid section the first fetus fractional value.For example, when first chromosome interested is chromosome 21 When, use can be come from fetus fraction determined by the sequence label of chromosome 21 and come from basis in male fetus Fetus fraction is compared determined by the sequence label of chromosome x.It is known to occur with aneuploid state or by herein Description any method determine be not aneuploid (such as being determined by calculating its NCV or NSV) any chromosome or Chromosome segment may be used to determine the first fetus fraction.
In certain embodiments, the chromosome or section Rapid Dose Calculation that second method determines in step (e) is for institutes State the number for the sequence label that selected chromosome interested or section are identified and the dye interested for selecting The sequence that the corresponding at least one normalization chromosome sequence or normalization chromosome segment sequence of colour solid or section are identified The ratio of the number of label.In certain embodiments, the chromosome dosage or section Rapid Dose Calculation determined in step (e) For the sequence label density ratio of the selected chromosome interested or section and each selected dyeing interested Body or at least one corresponding normalization chromosome sequence of section or the sequence label density ratio of normalization chromosome segment sequence Ratio.
Some embodiments of the extra method further comprise one normalized chromosome value (NCV) of calculating, its In the second method use the normalized chromosome value, and wherein calculate the NCV and combine the chromosome dosage with one The average of corresponding chromosome dosage in lattice sample is associated, as:
WhereinAnd σiUIt is accordingly the estimation average value for i-th of the chromosome dosage in this group of qualified samples And standard deviation, and RiAIt is to be directed to the chromosome dosage that i-th of chromosome calculates in test sample, wherein described i-th Chromosome is the chromosome interested.
In certain embodiments, calculating the second method of the fetus fractional value is included to following formula evaluation:
Ff=2 × | NCViACViU|
Wherein ff is fetus fractional value, NSViABe in an impacted sample or test sample on i-th of chromosome Normalized chromosome value, and CViUIt is the variation lines of the dosage of i-th of the chromosome determined in the qualified samples Number, wherein i-th of the chromosome is the chromosome interested.
In certain embodiments, the first method for calculating fetus fraction includes (a) calculating from described non-described The sequence label number of the chromosome of first chromosome interested and at least one normalization chromosome sequence, to determine the non-institute State the chromosome dosage of the chromosome of the first chromosome interested;And (b) by the first method from the chromosome dosimeter Calculate the first fetus fractional value;The second method includes:(a) calculate from first chromosome interested and at least one The sequence label number of chromosome sequence is normalized to determine a chromosome dosage;And (b) by the second method from this Chromosome Rapid Dose Calculation the second fetus fractional value.
Preferably, chromosome or section Rapid Dose Calculation is are known for the selected chromosome interested or section The number of other sequence label contaminates with corresponding at least one normalization of the chromosome or section interested for selecting The ratio of the number for the sequence label that colour solid sequence or normalization chromosome segment sequence are identified;Or chromosome dosage or Section Rapid Dose Calculation be the selected chromosome interested or section sequence label density ratio with it is each described selected Chromosome interested or at least one corresponding normalization chromosome sequence of section or the sequence for normalizing chromosome segment sequence The ratio of column label density ratio.
Preferably, the extra method for copying number variation that is used to classify is also including calculating corresponding normalization chromosome value (NCV), and first method and second method use corresponding NCV.Calculate NCV the chromosome dosage of determination and one group is qualified The average value of corresponding chromosome dosage in sample is associated, as:
WhereinAnd σiUIt is the estimation average value and mark for the dosage of i-th of chromosome in this group of qualified samples respectively It is accurate poor, and RiAIt is the dosage of i-th of chromosome in the test sample calculated.First method and second method can use NCV Fetus fraction is calculated, passes through following formula evaluation:
Ff=2 × | NCViACViU|
Wherein ff is fetus fractional value, NCViAIt is the normalized dyeing in the test sample on i-th of chromosome Body value, and CViUIt is the coefficient of variation of the dosage of i-th of chromosome in the qualified samples.In above-mentioned formula, for first Kind method, i-th of the chromosome is not the described first chromosome interested;For for the second method, described i-th Chromosome is the described first chromosome interested.
First chromosome interested is selected from the group, and the group is made up of chromosome 1 to 22, X and Y.Described non-described The chromosome of one chromosome interested can be chromosome 1 to 22 any one, or when fetus is male be X chromosome.
In certain embodiments, step (f) include determine the two fetus fractional values whether approximately equal.In some realities Apply in scheme, step (f) further comprises:Determine what is implied in the second method in the two fetus fractional value approximately equals It is real that a kind of ploidy, which is assumed,.It can be first chromosome interested that the ploidy implied in second method, which is assumed, With a kind of complete chromosome aneuploidy.For example, the complete chromosome aneuploidy of the first chromosome interested is a kind of Monosomy or a kind of trisomy.
In certain embodiments, the additional method for copy number variation of classifying further comprises a step (g):Point The label information of first chromosome interested is analysed, to determine whether that the chromosomes interested of (i) first carry a kind of portion Divide aneuploidy, or (ii), in the two fetus fractional value not approximately equals, the fetus is a chimera.
In certain embodiments, wherein the first method is including the use of from the fetus for showing the parent test sample The first fetus fractional value, institute are calculated with the information of the unbalanced one or more polymorphisms of allele in maternal nucleic acids State the chromosome that polymorphism is present in non-first chromosome interested;With the second method including the use of from showing this The information of the unbalanced one or more polymorphisms of allele in the fetus and maternal nucleic acids of parent test sample calculates The second fetus fractional value, the polymorphism are present in the described first chromosome interested.The step of for comparing (f), can be with Including:Determine that described first is interested when the ratio of the second fetus fractional value and the first fetus fractional value is approximately 1 Chromosome is diploid;Described in being determined when the ratio of the second fetus fractional value and the first fetus fractional value is approximately 1.5 First chromosome interested is triploid;With when the second fetus fractional value is approximate with the ratio of the first fetus fractional value For 0.5 when determine that the described first chromosome interested is monoploid.Additional method for copy number variation of classifying can be with Further comprise when it is approximately 1,1.5 or 0.5 that the ratio of the second fetus fractional value and the first fetus fractional value, which is not, analyze institute The step of stating the label information of the first chromosome interested (g), to determine whether that the chromosomes interested of (i) first carry one Kind part aneuploidy, or (ii) fetus is a chimera.
In certain embodiments, the information used using the first method and second method of polymorphism is included by pre- The polymorphic sequence first determined be sequenced the sequence label of acquisition, and each of the polymorphic sequence is including one or more of Polymorphic site.Or using the information that uses of first method and second method of polymorphism obtained by sequence measurement, example Obtained in this way by the non-sequence measurement such as qPCR, digital pcr, mass spectroscopy or capillary gel electrophoresis.
In certain embodiments, the step of analyzing the label information of the first chromosome interested (g) includes:(a) will The sequence of first chromosome interested is cased into some;(b) whether any one for determining in the part includes Significantly more or significantly less nucleic acid than one or more other parts;Also, (c) with one or more other parts phases Than if any one of the part contains significantly more or significantly less nucleic acid, determining first dye interested Colour solid carries a kind of part aneuploidy;Or compared with one or more other parts, if the part is not all wrapped During containing significantly more or significantly less nucleic acid, it is a chimera to determine the fetus.Therefore, the extra method can enter one Step includes determining the first dyeing interested for including the nucleic acid significantly more more or significantly less than one or more other parts One part of body carries part aneuploidy.
The step of for will copy this method that number variation classified (f), includes being categorized into the copy number variation and being selected from One classification of the following group, the group are made up of the following:Complete chromosome replicates or multiplication, complete chromosome missing, part dye Colour solid replicates and chromosome dyad missing and chimera.
(f) determines the first fetus fractional value by the first fetus fractional value compared with the second fetus fractional value the step of With in the not approximately equalised embodiment of the second fetus fractional value, this method further comprises:
(i) determining the copy number variation is caused by part aneuploidy or chimera;And
(ii) when the copy number variation is caused by part aneuploidy, it is determined that on first chromosome interested The locus of part aneuploidy.
In certain embodiments, it is determined that the locus bag of the part aneuploidy on first chromosome interested Include the nucleic acid case or base being divided into these sequence labels of first chromosome interested in first chromosome interested Block;And these map tags in each case are counted.
In certain embodiments, the step of being compared in (b) includes comparing at least about 1,000,000 readings.
Any method described here can further comprise to the fetus in parent test sample and maternal nucleic acids (example Such as Cell-free DNA) it is sequenced to obtain sequence reads.Parent from parent test sample and fetal nucleic acid are sequenced Include large-scale parallel sequencing to produce sequence reads.In certain embodiments, large-scale parallel sequencing is synthetic method sequencing. Synthetic method sequencing can be realized using reversible dye-terminators.In other embodiments, large-scale parallel sequencing is connection method Sequencing.In other other embodiments, large-scale parallel sequencing is single-molecule sequencing.
Can according to method described here be used for determine fetus fraction maternal sample include blood, blood plasma, serum or Urine samples.In certain embodiments, maternal sample is plasma sample.In other embodiments, maternal sample is whole blood Product.
Multiple different equipment are additionally provided, including are set for carrying out medical analysis (such as maternal sample) to sample It is standby, and these equipment are performing multiple steps of the above method, such as be used separately for determining copy number variation, for true Determine fetus fraction, or for copy number variation to be classified.
Additionally provide kit, these kits include can individually or with for determine two genomes in one The method group of the influence (such as fetus fraction in maternal sample) of the mixture of the individual nucleic acid to from two genomes It is used for the reagent for determining copy number variation in conjunction.These kits can be used in combination with equipment described here.
Although these examples are related to the mankind herein and these wording are primarily directed to human problem, concept described here It is also applied for the genome from any plant or animal.
Brief Description Of Drawings
Fig. 1 is the flow chart of method 100, and this method is used to determine exist in the test sample of the mixture including nucleic acid Or in the absence of copy number variation.
Fig. 2 describes not to abridge scheme, simple scheme (ABB), two-step method and a step according to Yi Luna as described in this Method prepares the technological process of sequencing library." P " represents purification step;And " X " instruction does not include purification step and/or DNA is repaiied It is multiple.
Fig. 3 describes the technological process of the embodiment of the method for preparing sequencing library on a solid surface.
Fig. 4 shows one of the method for the integrality for verifying a sample for carrying out the sequencing bioassay of multistep single channel The flow chart of embodiment 400.
Fig. 5 shows one of the method for the integrality for verifying the multiple samples for carrying out the multiple sequencing bioassay of multistep The flow chart of embodiment 500.
Fig. 6 be in the parent test sample comprising fetus and the mixture of maternal nucleic acids and meanwhile determine exist or not The flow chart of aneuploidy and the method 600 of fetus fraction be present.
Fig. 7 is separated using large-scale parallel sequencing method or the size of polymorphic nucleotide sequence, is including fetus and parent core The flow chart of the method 700 of fetus fraction is determined in the parent test sample of the mixture of acid.
Fig. 8 is in the Maternal plasma test sample for being enriched with polymorphic nucleic acid while determining that existence or non-existence fetus is non- The flow chart of ortholoidy and the method for fetus fraction 800.
Fig. 9 is for being purified in the parent for being enriched with polymorphic nucleic acid in cfDNA test samples while determining existence or non-existence The flow chart of fetus aneuploidy and the method for fetus fraction 900.
Figure 10 is for from derived from parent test sample and being enriched with the fetus of polymorphic nucleic acid and constructed with maternal nucleic acids Sequencing library in determine the flow chart of method 1000 presence or absence of fetus aneuploidy and fetus fraction simultaneously.
Figure 11 is to summarize the large-scale parallel sequencing by shown in Fig. 7, determines that the replacement of the method for fetus fraction is implemented The flow chart of scheme.
Figure 12 is shown to determine the identification of the fetus of fetus fraction and parent polymorphic sequence (SNP) in the test sample Column diagram.Displaying is mapped to the sum (Y-axis) of the sequence reads of the SNP sequences identified by rs numbers (X-axis), and fetal nucleus The relative amount (*) of acid.
Figure 13 is the block diagram of the classification for the fetus and parent distribution type state for describing set genomic locations.
Figure 14 displayings use mixture model and known fetal fraction and the ratio of result caused by estimation fetus fraction Compared with.
Figure 15 shows the Yi Luna being compared by using the Eland with default parameter with human genome HG18 The estimation error that sequencing base positions on 30 paths of GA2 data are made.
Figure 16 displayings can make one point of upper reduction partially using machine error rate as known parameters.
Figure 17 displayings use machine error rate as known parameters, and the analogue data of the error model of intensive conditions 1 and 2 makes low The upper of fetus fraction in 0.2 considerably reduces to less than one point partially.
Figure 18 is described by comparing the flow of the CNV methods classified with the fetus fractional value that two kinds of different technologies calculate Figure.
Figure 19 is for processing test sample and finally making the block diagram of the discrete system of diagnosis.
Figure 20 schematic presentations when processing test sample how many different operations can by the different elements of system into Group's processing.
Figure 21 A and 21B displaying are according to the simple scheme (Figure 21 A) described in example 2a and the scheme described in example 2b The electrophoretogram of cfDNA sequencing libraries prepared by (Figure 21 B).
Figure 22 A to 22C provide displaying when according to simple scheme (ABB;When ◇) preparing sequencing library and work as according to without reparation Two-step method (INSOL;The flat of the total percentage of the sequence label of everyone chromosome is mapped to when) preparing sequencing library Average (n=16) (%ChrN;Figure 22 A) and figure of the sequence label percentage as the function (Figure 22 B) of chromosome size.Figure The label that 22C displayings map when preparing library using two-step method using simple (ABB) legal system with making the label obtained during library Function of the ratio percentage as the G/C content of chromosome.
Figure 23 A and 23B displaying provide the average of label percentage and the column diagram of standard deviation, and these label mappings are to from right Obtained chromosome x (Figure 23 A are sequenced from the cfDNA of the plasma purification of 10 pregnant woman 10 samples;%ChrX) and Y (Figure 23 B;%ChrY).The number of tags ratio that Figure 23 A displayings are mapped to X chromosome when using without restorative procedure (two steps) uses The number of tags that simple method (ABB) obtains is big.Figure 23 B shows are used without the label hundred that Y chromosome is mapped to when repairing two-step method Divide no more different than from using label percentage during simple method (ABB).
The number of non-excluded site (NE sites) is each with being mapped to 5 samples in Figure 24 displaying reference gene groups (hg18) The total ratio of the label in the non-excluded site of person, cfDNA are prepared from these samples and according to the letters described in example 2 Slightly without recovery scenario (two steps in scheme (ABB) (solid post), solution;Open tubular column) and the surface of solids without a recovery scenario (step; Grey post) constructing sequencing library.
Figure 25 A and 25B are displayings when according to simple scheme (ABB;When ◇) preparing sequencing library on a solid surface, work as root According to when preparing sequencing library without reparation two-step method () and when according to each without being mapped to when repairing one-step method (Δ) preparation library Average value (n=5) (%ChrN of the total percentage of the sequence label of individual human chromosome;Figure 25 A) and sequence label percentage Figure as the function (Figure 25 B) of chromosome size.From according to simple scheme (ABB;◇) and the surface of solids is without recovery scenario (two Step;) the regression coefficient for the map tags that the sequencing library prepared obtains.Figure 25 C displayings are from according to without two step scheme systems of repairing The sequence label of the mapping for each chromosome that standby sequencing library obtains with from the sequencing prepared according to simple scheme (ABB) Function of the ratio percentage of the label for each chromosome that library obtains as the G/C content percentage of each chromosome (◇), and from according to the sequence of mapping label of each chromosome that the sequencing library prepared without a step scheme of repairing obtains with from The ratio percentage of the label for each chromosome that the sequencing library prepared according to simple scheme (ABB) obtains is used as each The function () of the G/C content percentage of chromosome.
Figure 26 A and 26B displaying label percentage average and standard deviation comparison, these label mappings to according to ABB methods, Obtained chromosome is sequenced from 5 samples of the cfDNA to the plasma purification from 5 pregnant woman in two-step method and one-step method X (Figure 26 A) and Y (Figure 26 B).Figure 26 A displayings are mapped to the label of X chromosome when using without restorative procedure (two steps and a step) The number of tags that number ratio is obtained using simple method (ABB) is big.The use of Figure 26 B shows is without repairing two-step method and map during one-step method Label percentage to Y chromosome is no different from using label percentage during simple method.
Figure 27 A and 27B displaying are for 61 clinical samples (Figure 27 A) for being prepared in the solution using ABB methods and using nothing 35 study samples (Figure 27 B) prepared by the surface of solids (SS) one-step method are repaired, by preparing the purifying cfDNA of sequencing library Amount it is associated with the amount of gained library production.
Figure 28 is shown to manufacture the cfDNA in library amount with being obtained using two steps (), ABB (◇) and a step (Δ) method Library production amount correlation.
Figure 29 displayings acquisition and are used as 6 clumps when use a step (open tubular column) and two steps (solid post) to prepare to index library The percentage of the index sequence reading of (i.e. 6 index sample/flow cell paths) sequencing.
Figure 30 A and 30B are displayings when index sequencing library is prepared on a solid surface according to one-step method and is surveyed as 6 clumps Average (n=42) (%ChrN of the total percentage of the sequence label of everyone chromosome is mapped to during sequence;Figure 30 A) and institute Obtain figure of the sequence label percentage as the function (Figure 30 B) of chromosome size.
Figure 31 displayings are mapped to the sequence label percentage (ChrY) of Y chromosome relative to the label for being mapped to X chromosome Percentage (ChrX).
Figure 32 A and 32B illustrate the distribution of the chromosome dosage of chromosome 21 determined by cfDNA is sequenced, CfDNA is to extract from one group of 48 blood sample, and these samples are obtained from the human subjects for each nourishing sex fetus Person.For chromosome 1-12 and X (Figure 32 A) and be directed to chromosome 1-22 and X (Figure 32 B), by for it is qualified (i.e.:For The dosage of chromosome 21 and the test sample of trisomy 21 are shown as (Δ) for chromosome 21 (O) normally).
Fig. 3 illustrates the distribution of the chromosome dosage of chromosome 18 determined by cfDNA is sequenced, and cfDNA is One group of 48 blood sample is extracted from, these samples are obtained from the human experimenter for each nourishing sex fetus.For Chromosome 1-12 and X (Figure 33 A) and for chromosome 1-22 and X (Figure 33 B) show for it is qualified (i.e.:For dyeing The dosage of chromosome 18 and the test sample of trisomy 18 (Δ) for body 18 (O) normally).
Figure 34 A and 34B illustrate the distribution of the chromosome dosage of chromosome 13 determined by cfDNA is sequenced, CfDNA is to extract from one group of 48 blood sample, and these samples are obtained from the human subjects for each nourishing sex fetus Person.For chromosome 1-12 and X (Figure 34 A), and for chromosome 1-22 and X (Figure 34 B) show for it is qualified (i.e.: The dosage of chromosome 13 and the test sample of trisomy 13 (Δ) for chromosome 13 (O) normally).
Figure 35 A and 35B illustrate the distribution of the chromosome dosage of chromosome x determined by cfDNA is sequenced, CfDNA extracts from one group of 48 test blood samples, these samples be obtained from each nourish the mankind of sex fetus by Examination person.For chromosome 1-12 and X (Figure 35 A) and for chromosome 1-22 and X (Figure 35 B) show for male (46, XY;(O)), women (46, XX;(Δ)) chromosome x dosage, monosomy X (45, X;(+)), and complex karyotype (Cplx (X)) Sample.
Figure 36 A and 36B illustrate the distribution of the chromosome Y chromosome dosage determined by cfDNA is sequenced, CfDNA is to extract from one group of 48 test blood samples, and these samples are obtained from the mankind for each nourishing sex fetus Subject.Shown for chromosome 1-12 (Figure 36 A) and for chromosome 1-22 (Figure 36 B) for male (46, XY; (Δ)), women (46, XX;(O) chromosome Y dosage), monosomy X (45, X;(+)), and complex karyotype (Cplx (X)) Sample.
Figure 37 shown for from Figure 32 A and 32B, 33A and 33B, and the 34A and 34B dosage that is shown respectively determines Chromosome 21 (■), 18 (●) and the coefficient of variation of 13 (▲) (CV).
Figure 38 is shown for the dosage being shown respectively from Figure 35 A and 35B and 36A and 36B the chromosome x that determines (■) and Y (●) coefficient of variation (CV).
Figure 39 shows the cumulative bad distribution of the GC parts of human chromosomal.The longitudinal axis is represented to have to be less than on trunnion axis and shown Value G/C content chromosome frequency.
Figure 40 is illustrated for the chromosome 11 (81000082-103000103bp) determined by cfDNA is sequenced Section sequence dosage (Y-axis), cfDNA be extract from 7 qualified samples (O) of obtained one group and from pregnant human by 1 test sample (◆) of examination person.The sample from a subject is identified, this subject nourishes one and carries chromosome A kind of fetus of part aneuploidy of 11 (◆).
Figure 41 A-41E are illustrated, relative to the average value (Y- axles) of the homologue in unaffected sample Standard deviation, for chromosome 21 (41A), chromosome 18 (41B), chromosome 13 (41C), chromosome x (41D) and chromosome Y The distribution of the normalized chromosome dosage of (41E).
Figure 42 is shown using the normalization chromosome as described in example 12, in the sample in training group 1 The normalized chromosome value of the chromosome 21 (O) of middle determination, 18 (Δs) and 13 ().
Figure 43 is shown using the normalization chromosome as described in example 12, in the sample in test group 1 The normalized chromosome value of the chromosome 21 (O) of middle determination, 18 (Δs) and 13 ().
Figure 44 shows the method for normalizing using Chiu (Zhao) et al. (to chromosome institute recognition sequence label interested Number and the number of the sequence label that remaining chromosome is obtained in the sample be normalized, referring to the application other ground The example 13 of side), for the chromosome 21 (O) and the normalized chromosome of 18 (Δs) determined in the sample from test group 1 Value.
Figure 45 is shown using the normalization chromosome (as described in example 13) systematically determined, for from training group The chromosome 21 (O), 18 (Δs) and 13 () that are determined in 1 sample normalized chromosome value.
The normalized chromosome value of Figure 46 displaying chromosome x (X-axis) and Y (Y-axis).Arrow is pointed to such as institute in example 13 State, 5 identified respectively in training set and test set (Figure 46 A) and 3 (Figure 46 B) X monosomic samples.
Figure 47 is shown using the normalization chromosome (as described in example 13) systematically determined, for from test group The chromosome 21 (O), 18 (Δs) and 13 () that are determined in 1 sample normalized chromosome value.
Figure 48 is shown using the normalization chromosome (as described in example 13) systematically determined, for from test group The normalized chromosome value of the chromosome 9 (O) determined in 1 sample.
Figure 49 is shown using the normalization chromosome (as described in example 13) systematically determined, for from test group The chromosome 1-22 determined in 1 sample normalized chromosome value.
Figure 50 shows the flow chart of the design (A) and random sampling scheme (B) of the research described in example 16.
The analysis (being Figure 51 A to 51C respectively) of Figure 51 A to 51F displaying chromosomes 21,18 and 13 and women, male And the flow chart of the gender analysis (being Figure 51 D to 51F respectively) of X monosomy.Ellipse is included from the sequencing from laboratory The result of information acquisition, rectangle includes results of karyotype, and the rectangle with fillet is shown to determine test performance (sensitivity And selectivity) comparative result.Dotted line in Figure 51 A and 51B represent T21 (n=3) and T18 (n=1) mosaic sample it Between relation, these samples are inspected by the analysis of chromosome 21 and 18 respectively, but are correctly determined as described in example 16.
Figure 52 shows the test sample for the research described in example 16, chromosome 21 (●), 18 (■) and 13 Normalized chromosome value (NCV) the contrast caryogram classification relation of (▲).Circular sample represents not dividing with trisomy caryogram Class sample.
Figure 53 shows the normalized chromosome value (NCV) of the chromosome x of the test sample of the research described in example 16 Contrast the caryogram classification relation of Gender Classification.Show the sample (zero) with female karyotype, the sample (●) with male's caryogram, Sample () with 45, X and the sample (■) with other caryogram (i.e. XXX, XXY and XYY).
Figure 54 displayings are for the test sample of the clinical research described in example 16, chromosome Y normalized chromosome It is worth the figure of counterstain body X normalized chromosome value relation.Show euploid masculinity and femininity sample (zero), XXX samples (●), 45, X samples (X), XYY samples (■) and XXY samples (▲).Dotted line displaying is used for sample as described in example 16 The threshold value of classification.
Figure 55 schematic presentations CNV described here determines an embodiment of method.
Figure 56 displayings come from example 17, in the synthesis maternal sample (1) comprising the DNA from the child with trisomy 21 " ff " percentage (ff that the middle dosage using chromosome 21 determines21) as " ff " percentage for using the dosage of chromosome x to determine Than (ffX) function figure.
Figure 57 displayings come from example 17, are including the child from euploid mother and its carrying excalation of chromosome 7 DNA synthesis maternal sample (2) in using chromosome 7 dosage determine " ff " percentage (ff7) as using chromosome x Dosage determine " ff " percentage (ffX) function figure.
Figure 58 displaying come from example 17, comprising from euploid mother and its have the part of chromosome 15 duplication 25% " ff " percentage (ff determined in the DNA of mosaic child synthesis maternal sample (3) using the dosage of chromosome 1515) conduct " ff " percentage (ff determined using the dosage of chromosome xX) function figure.
Figure 59 displayings come from example 17, " ff " percentage determined in Artificial sample (4) using the dosage of chromosome 22 (ff22) and figure from its NCV obtained, the Artificial sample includes 0% child DNA (i), and does not have chromosome 22 from known The 10%DNA (ii) of the uninfluenced twin son of chromosome dyad aneuploidy, and there is chromosome 22 portion from known Divide the 10%DNA (iii) of the impacted twin son of chromosomal aneuploidy.
Figure 60 displayings come from example 18, and the CNffx contrasts CNff21 determined in the sample including fetus T21 trisomys is closed The figure of system.
Figure 61 displayings come from example 18, and the CNffx contrasts CNff18 determined in the sample including fetus T18 trisomys is closed The figure of system.
Figure 62 displayings come from example 18, and the CNffx contrasts CNff13 determined in the sample including fetus T13 trisomys is closed The figure of system.
Figure 63 displayings come from example 19, in the test sample the figure of chromosome 1 to 22 and X NCV values.
The fetus fraction that the sample with the female child with T21 is obtained is directed in Figure 64 displaying examples 18.
Figure 65 shows an a kind of embodiment of medical analysis equipment, and the medical analysis equipment is used to be determined as fetus The fetus fraction of the function of copy number variation in the presence of genome.
Figure 66 shows a kind of doctor for determining fetus fraction so that the copy number variation in Fetal genome to be classified One embodiment of credit desorption device.
Figure 67 shows a kind of kit, and the kit includes examining contrast agents and carried out on a large scale for following the trail of and verifying The reagent of the integrality of the parent cfDNA samples of parallel sequencing.
Figure 68 shows a kind of kit, and the kit includes blood collection device, DNA extracts reagents and for examining parent The contrast agents of DNA sample.
The inherent positive control that Figure 69 (A, B, C) displayings are examined for the copy number variation of chromosome 13,18 and 21 The NCV of [ ] and maternal sample [◇] schemes.
It is described in detail
Disclosed embodiment is related to a variety of methods, equipment and system and is used in the test specimens including mixtures of nucleic acids The copy number variation (CNV) of sequence interested is determined in product, it is known that or suspect these nucleic acid in one or more interested It is different in the amount of sequence.>Sequence interested includes such as scope from kilobase (kb) to megabasse (Mb) to whole dye The genomic segment sequence of colour solid, it is known that or suspect that these sequences with Genetic conditions or disease event are associated.It is interested The example of sequence include the chromosome (such as trisomy 21) associated with well known aneuploidy and in disease (such as cancer Disease) in increased chromosome section, such as the partial trisomy 8 in acute myelocytic leukemia.Can be with according to this method The CNV of determination include autosome 1-22 and sex chromosome X and Y (such as:45, X, 47, XXX, 47, XXY and 47, XYY) in Any one or more monosomy and trisomy, other chromosome polysomies, i.e. tetrasomy and five body constituents are (including but not It is confined toXXXXXXXXXXXXXYWithXYYYY), and the missing of the section of any one or more in these chromosomes and/ Or replicate.
This method is a kind of statistical method, that the statistical method is implemented on the one or more processors and will be derived from The cumulative bad variability of the variability of (between round) between Cheng Xiangguan, interchromosomal (same to round) and sequencing processing considers It is interior.These methods are related to the plurality of medical patient's condition suitable for the CNV and known or suspection for determining any fetus aneuploidy CNV。
Except as otherwise noted, implementation of the invention, which is related to, is generally used for molecular biology, microbiology, protein purification, egg White engineering, albumen and DNA sequencing and the routine techniques and device in recombinant DNA field, these are all in the technology of this area. Such technology and device are known for those of ordinary skill in the art, and are illustrated in numerous files and reference works (for example, seeing Sambrook (Pehanorm Brooker) et al., " Molecular Cloning:A Laboratory Manual (molecules Cloning experimentation guide) ", the third edition (Cold Spring Harbor (Cold SpringHarbor)), [2001]);And Ausubel (Ao Subei You) et al., " Current Protocols in Molecular Biology (newest experimental methods of molecular biology compilation) " [1987]。
Number range includes limiting the numerical value of the scope.It is intended that each maximum number provided through this specification Value limit includes each relatively low numerical limitation, as such relatively low numerical limitation is clearly write out herein.Through this specification The each minimum value limit provided is by including each higher numerical limitation, as such high value limit is clear and definite herein Write out.Through each number range that this specification provides by including falling each narrower number in such wider number range It is worth scope, as herein such narrower number range is all expressly written.
Title is not intended to limit present disclosure provided herein.
Unless separately defining herein, all technologies as used herein and science term all has art of the present invention In the identical meanings that are generally understood that of a those of ordinary skill.Include the different science dictionaries of the term included herein for It is known for those skilled in the art and is obtainable.Although similar or be equivalent to those methods described herein Implementing or testing in embodiment disclosed here to have found purposes with any method and material of material, but only illustrating one A little preferable methods and material.
The term being directly defined below more completely is illustrated by regarding this specification as entirety to refer to. It should be understood that this disclosure is not limited to illustrated specific method, code and reagent, because these can change, They are used by those skilled in the art according in the case of it.
Definition
As used in this, the term of odd number "one", " one kind " and "the" include plural reference, unless context Clearly dictate otherwise.Except as otherwise noted, accordingly, nucleic acid is from left to right to be write by 5' to 3' directions and amino acid sequence Row are from left to right write to carboxyl direction by amino.
Term " assessment " refers to chromosome or section is non-when being used herein in the case of analyzing the CNV of nucleic acid samples The state representation of ortholoidy is one of three types judgement:" normal " or " uninfluenced ", " impacted " and " no to judge ".Sentence Fixed normal and impacted threshold value is typically set.Parameter relevant with aneuploidy in sample is measured, and by this A little measured values are compared with threshold value.For the aneuploidy of copy type, if chromosome or section dosage (or sequence content Other measured values) exceed for defining threshold value set by impacted sample, then judge impacted.For these non-multiples Property, if chromosome or section dosage are less than the threshold value set by normal sample, then judge normal.By contrast, it is right In the aneuploidy of deletion type, if chromosome or section dosage define threshold value less than impacted sample, then judge by Influence, and if chromosome or section dosage exceed the threshold value set by normal sample, then judge normal.Citing comes Say, in the presence of trisomy, be less than the reliability thresholds that define of user for example, by the value of the parameters such as test chromosome dosage, really Fixed " normal " judgement, and for example, by parameters such as test chromosome dosage more than the reliability thresholds that user defines, it is determined that " by Influence " judge.It is located at for example, by parameters such as test chromosome dosage between the threshold value of " normal " or " impacted " judgement, it is determined that The result of " no to judge ".Term " no to judge " and " unfiled " used interchangeably.
Term " copy number variation " refers to compared with the copy number of nucleotide sequence present in qualified samples herein, test specimens The change of the copy number of nucleotide sequence present in product.In certain embodiments, nucleotide sequence is 1kb or bigger.At some In the case of, nucleotide sequence is whole chromosome or its pith." copy number variant " refers to by will feel emerging in test sample The sequence of interest finds the nucleotide sequence of copy number difference compared with the expection content of sequence interested.For example, will The content of sequence interested is compared with the content of sequence interested present in qualified samples in test sample.Copy Number variation body/variation includes missing (including micro-deleted), insertion (including micro- insertion), duplication, multiplication, inversion, transposition and answered Miscellaneous multiposition variation.CNV covers chromosomal aneuploidy and part aneuploidy.
Term " aneuploidy " refers to caused by the part as losing or obtaining whole chromosome or chromosome herein The imbalance of inhereditary material.
Term " chromosome aneuploidy " and " complete chromosome aneuploidy " refer to whole by losing or obtaining herein The imbalance of inhereditary material caused by individual chromosome, and including germline aneuploidy and mosaic aneuploidy.
Term " part aneuploidy " and " chromosome dyad aneuploidy " refer to by losing or obtaining chromosome herein A part (for example, partial monoploidy and partial trisomy) caused by inhereditary material imbalance, and cover by transposition, It is uneven caused by missing and insertion.
Term " aneuploidy sample " refers to show that herein the chromosome content of a subject is not one of euploid Sample, i.e.,:The sample shows abnormal copy number of the subject with chromosome or chromosomal section.
Term " aneuploidy chromosome " refers to a kind of chromosome herein, and it is known or is determined to be and is present in an exception Among the sample of copy number.
Term " multiple/a variety of " refers to more than one herein.For example, the term is here used to refer to nucleic acid molecules or sequence The number of column label be enough to identify under using method disclosed here in test sample and qualified samples copy number variation (such as Chromosome dosage) marked difference.In some embodiments, obtained for each test sample and be included in about 20 and 40bp At least about 3x 10 between reading6Individual sequence label, at least about 5x 106Individual sequence label, at least about 8x 106Individual sequence mark Label, at least about 10x 106Individual sequence label, at least about 15x 106Individual sequence label, at least about 20x 106Individual sequence label, extremely Few about 30x 106Individual sequence label, at least about 40x 106Individual sequence label or at least about 50x 106Individual sequence label.
Term " polynucleotides ", " nucleic acid " and " nucleic acid molecules " is used interchangeably, and refers to that one covalently connects The nucleotide sequence (i.e. RNA ribonucleotide and DNA deoxyribonucleotide) connect, the pentose of one of nucleotides 3' positions are connected to by a phosphodiester group on the 5' positions of the pentose of next nucleotides, and this includes any type of core The sequence of acid, including but not limited to RNA and DNA molecular, such as cfDNA molecules.Term " polynucleotides " includes without limiting to In single-stranded and double-strand polynucleotides.
Term " part (portion) " be used to refer to fetus and maternal nucleic acids molecule in a biological sample at this Sequence information amount, the total sequence information for being less than a human genome of this amount.
Term " test sample " refers to include at least one nucleic acid that will be screened for copy number variation herein The nucleic acid of sequence or the sample of mixtures of nucleic acids, are typically derived from biological fluid, cell, tissue, organ or organism. In some embodiments, sample includes the nucleotide sequence that at least one its copy number of suspection has made a variation.These samples include but It is not limited to saliva/saliva, amniotic fluid, blood, (such as surgical biopsy, fine needle live for clot or fine-needle biopsy samples Tissue examination etc.), urine, peritoneal fluid, liquor pleurae etc..Although sample is often derived from human experimenter (such as patient), Examine and can be used for copying in the sample from any mammal such as including but not limited to dog, cat, horse, goat, sheep, ox, pig Shellfish number variation (CNV).Sample can directly use when being obtained from biological source, or in pretreatment to change sample characteristic After use.For example, the pretreatment may include to prepare blood plasma, dilution viscous fluid etc. from blood.The method of pretreatment is also Can include but is not limited to filtering, precipitation, dilution, distillation, mixing, centrifugation, frost, lyophilized, concentration, amplification, nucleic acid fragment, Interfering component inactivation, addition reagent, dissolving etc..If the method for these pretreatments is used for sample, then these pretreatments Method can typically make one or more associated nucleic acids preferably with untreated test sample (such as do not carry out it is any so Preprocess method sample) in the proportional concentration of concentration retain in the test sample.For method described here, still It is biological " test " sample so to think these samples for carrying out " processing " or " processing ".
The known copy number that term " qualified samples " refers to include being compared with the nucleic acid in test sample at this is present Nucleic acid mixture sample, and for sequence interested, this sample is normal sample, i.e., is not non-multiple Body sample.In certain embodiments, qualified samples are used for the one or more normalization dyes for identifying the chromosome paid attention to Colour solid or section.For example, qualified samples can be used for the normalization chromosome of identification chromosome 21.In the case, it is qualified Sample be one be not trisomy 21 sample sample.Qualified samples can be also used for determining the threshold value for judging impacted sample.
Term " training group " refers to one group of sample herein, they may include impacted and unaffected sample and by For developing a kind of model for being used to analyze test sample.Unaffected sample may be used as qualified samples in training group Identification normalization sequence, such as normalizes chromosome, and the chromosome dosage of unaffected sample be used for for these senses it is emerging Each given threshold in the sequence (such as chromosome) of interest.These impacted samples in a training group can be by For verify impacted test sample can easily from unaffected sample it is discernable.
Term " qualified nucleic acid " is interchangeably used with " qualified sequence ", this be a cycle tests or test nucleic acid with The sequence being compared.Qualified sequence is preferably to be present in biology by known expression (amount of i.e. qualified sequence is known) A kind of sequence in sample.In general, qualified sequence is the sequence being present in " qualified samples "." qualified sequence interested Row " be to a kind of qualified sequence known to its amount in qualified samples, and it be with a kind of individual of medical condition Sequence table reach in the associated sequence of a species diversity.
Term " sequence interested " refers to a kind of nucleotide sequence herein, it with health contrast diseased individuals sequence table A species diversity in reaching is associated.One sequence interested can be the sequence on a kind of chromosome, and it is in disease or heredity By false demonstration under situation, i.e.,:It is overexpressed or expression is insufficient.One sequence interested can be a part for a chromosome (i.e. chromosome segment) or chromosome.For example, a sequence interested can be that (it is in non-multiple for a kind of chromosome It is to be overexpressed under implementations), or (it is compiled a kind of gene to a kind of tumor suppressor that deficiency is expressed in cancer Code).Sequence interested be included in the cell of subject total group or subgroup in be overexpressed or expression deficiency sequence.One " qualified sequence interested " is the sequence interested in qualified samples.One " cycle tests interested " is to test Sequence interested in sample.
Term " normalization sequence " refers to will be mapped to the interested sequence associated with the normalization sequence herein The normalized sequence of number of the sequence label of row.In certain embodiments, normalization sequence, which is shown, is mapped to normalization sequence Variability of the number of the sequence label of row in sample and sequencing round, the variability are used as normalizing close to normalization sequence Change the variability of the sequence interested of parameter, and impacted sample and one or more uninfluenced samples can be distinguished Open.In some implementations, compared with other potential normalization sequences such as other chromosomes, the normalization sequence is optimal Or effectively impacted sample and one or more uninfluenced samples are distinguished." normalization chromosome " or " normalization dye Colour solid sequence " is that the example " normalization chromosome sequence " of " normalization sequence " can be by a monosome or a group chromosome Form." one " normalization section " is another example of " normalization sequence ".One " normalization sector sequence " can be by one The single section of individual chromosome is formed, or it can be by two or more section structures of same or different chromosome Into.In certain embodiments, normalization sequence is for for the related variability of such as technique, interchromosomal (same to round) Variability and sequencing between the variability such as variability (between round) be normalized.
A kind of term " resolvability " feature for normalizing chromosome when referring to herein, this is enabled it to from one or more Impacted (i.e. aneuploidy) sample identifies one or more unaffected (i.e. normal) samples.
Term " sequence dosage " refers to for the number of the sequence label of recognition sequence interested with being directed to return herein The parameter that one number for changing the sequence label of recognition sequence is associated.In some cases, sequence dosage is for interested The ratio of the number for the sequence label that the number for the sequence label that sequence is identified is identified with being directed to normalization sequence.At some In the case of, sequence dosage refers to the sequence label density of sequence interested is associated with the label densities for normalizing sequence Parameter." cycle tests dosage " is a parameter, and it makes the sequence label of a sequence interested (such as chromosome 21) close The sequence label density for spending the normalization sequence (such as chromosome 9) with being determined in a test sample is associated.It is similar Ground, one " qualified sequence dosage " is a parameter, and it makes the sequence label density of a sequence interested with being closed at one The label densities of the normalization sequence determined in lattice sample are associated.
Term " sequence label density " refers to the number of sequence reads herein, and these readings are mapped to a reference gene In group sequence, for example, the sequence label density for chromosome 21 is that the back of the body is mapped to reference gene group as caused by sequence measurement Chromosome 21 on sequence reads number.Term " sequence label density ratio " refers to be mapped to reference gene group herein The ratio of the sequence label number of chromosome (such as chromosome 21) and the length of reference gene group chromosome
Term " next generation's sequencing (NGS) " refers to allow to the molecule of clonal expansion and the progress of single nucleic acid molecules herein The sequence measurement of large-scale parallel sequencing.NGS non-limiting examples including the use of reversible dye-terminators synthetic method sequencing, And connection method sequencing.
Term " parameter " refers to a kind of numerical relation for characterizing physical characteristic herein.Often, parameter token state in number Change the numerical relation between data set and/or quantized data collection.For example, the number for the sequence label being mapped on a chromosome And the ratio (or function of ratio) that these labels are be mapped between the length of chromosome above is exactly a parameter.
Term " threshold value " and " qualified threshold value " refer to a kind of for example containing being suffered from from suspection to characterize as cut-off herein Any number of the samples such as the test sample of the nucleic acid of the organism of Medical Condition.Threshold value can be compared with parameter value, with true Whether the sample that the parameter value is given birth in fixed output quota shows that the organism suffers from the Medical Condition.In certain embodiments, use is qualified Data set calculates qualified threshold value, and serves as the boundary for copying number variation in diagnosis organism such as aneuploidy.If from The result that method disclosed here obtains has exceeded a threshold value, then subject can be diagnosed with copy number variation, example Such as, three body 21.By analyze the normalized value calculated for the sample of a training group (such as chromosome dosage, NCV or NSV the appropriate threshold value for method described herein can) be identified.Using including qualified (i.e. unaffected) sample and by Qualified (i.e. unaffected) sample in the training group of the sample of influence can be with recognition threshold.There is chromosome known These samples (i.e. impacted sample) in the training group of aneuploidy can be used to confirm that the threshold value of these selections from survey It is useful (referring to these examples in this) that unaffected sample in examination group, which is identified in impacted sample,.Threshold value Selection depends on the confidence level for making classification that user intentionally gets.In some embodiments, for identifying appropriate threshold The training group of value include at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, At least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least , at least 3000, at least 4000, or more 1000th, at least 2000 a qualified samples.Maybe advantageously it is qualified using bigger group Sample improves the diagnosis effectiveness of threshold value.
Term " normalized value " refers to a numerical value herein, the numerical value make for sequence interested (such as chromosome or Chromosome segment) the sequence label number that is identified with for normalization sequence, (such as normalization chromosome or normalization are dyed Body section) identification sequence label number be associated.For example, " normalized value " can be the explanation elsewhere in the application Chromosome dosage, either it can be the NCV (normalized chromosome value) illustrated elsewhere of the application or it Can be the NSV (normalized section value) illustrated elsewhere in the application.
Term " reading " refers to the sequence reads from a part of nucleic acid samples.Typically, but not necessarily, reading represents sample The short sequence of adjacent base pair in product.Reading can symbolically be represented by the base-pair sequence (ATCG) of samples part. The reading is storable in storage device, and is dealed with the merits of each case, to determine whether the reading matches with reference sequences or reach other Index.Reading can be obtained directly from sequencing device, or be obtained indirectly from the storage sequence information about sample.In certain situation Under, what term " reading " referred to one section of long enough (such as at least 30bp) can be used to identify bigger sequence either region DNA sequence dna, than if either a genome area or a gene are compared and targetedly compared with a chromosome It is right.
Term " sequence label " is interchangeably used with term " sequence label of mapping " herein, refers to pass through comparison Definitely distribute to the sequence reads of (i.e. mapping is arrived) bigger sequence (such as reference gene group).The sequence label of mapping is uniquely It is mapped to reference gene group, i.e., they are assigned to the unit of reference gene group and put.Label can be used as data structure or others Data acquisition system provides.In certain embodiments, label includes reading sequence and the relevant information of the reading, such as in genome The position of sequence, such as the position on chromosome.In certain embodiments, position is with positive chain direction explanation.Label can be entered Row is defined to provide limited amount mispairing when being compared with reference gene group.Can be with position more than one in mapping reference gene group The label (label not mapped uniquely) put can not be included in analysis.
" compare (aligned, alignment or aligning) " as used herein, term refer to by reading or label with Reference sequences are compared and thereby determine that whether the reference sequences include the process of the reading sequence.If the reference sequences Include the reading, then the reading maps to reference sequences, or in certain embodiments, is mapped in reference sequences Particular location.In some cases, compare simply inform reading whether be with specific reference to sequence member (i.e. reading exist and also It is to be not present in reference sequences).For example, reading is compared with the reference sequences of human chromosome 13, this will be informed Reading whether there is in the reference sequences of chromosome 13.The test of set member's identity can be determined by providing the instrument of this information Device.In some cases, the position that reading or label are mapped in instruction reference sequences in addition is compared.For example, if ginseng It is whole mankind's genome sequence to examine sequence, then comparison may indicate that reading is present on chromosome 13, and may further indicate that Reading is on the specific stock of chromosome 13 and/or site.
The reading or label of comparison are the order according to its nucleic acid molecules, are identified as and the known sequence from reference gene group Arrange one or more sequences of matching.Comparison can be carried out manually, but compared and realized typically via computerized algorithm, because right In realizing for method disclosed here, it is impossible that reading is compared within the reasonable time.Algorithm for aligned sequences One example is that few nucleotide is assigned as Yi Luna genomics according to effective Local Alignment (ELAND) computer program, the program A part for analysis conduit (Illumina Genomics Analysis pipeline).As an alternative, Bloom filter (Bloom filter) or similar set member's identity tester can be used for reading being compared with reference gene group.Referring to In the U.S. Patent Application No. 61/552,374 that on October 27th, 2011 submits, the patent application is combined by quoting with its full text In this.The matching for comparing time series reading can be 100% sequences match or less than 100% (non-ideal matching).
As used in this, term " reference gene group " or " reference sequences " refer to any organism or virus it is any Specific known group sequence (either part or complete), it can be used for the identification from a subject Sequence carries out reference.For example, it is found in the together with the reference gene group of a lot of other organisms for human experimenter National Center for Biotechnology Information (American National Biotechnology Information center),www.ncbi.nlm.nih.gov." genome " refers to the entire genetic information of an organism or virus, and this expression is in core In acid sequence.
In different embodiments, reference sequences are significantly greater than the reading being compared with it.For example, it can be big At least about 100 times, or it is big at least about 1000 times, or big at least about 10,000 times, or big at least about 105Times, or big at least about 106 Times, or big at least about 107Times.
In an example, reference sequences are the sequences of total length human genome.These sequences can be described as genome reference Sequence.In another example, reference sequences are limited to specific human chromosome, such as chromosome 13.These sequences can be described as contaminating Colour solid reference sequences.Other examples of reference sequences include the genome of other species and the chromosome of any species, sub- dye Chromosomal regions (such as stock) etc..
In different embodiments, reference sequences are derived from multiple individual consensus or other combinations.However, In some applications, reference sequences may be derived from a specific individual.
Term " made Target sequence gene group " refers to the known array for covering the allele of known polymorphic site herein Group.For example, " SNP reference genes group " is the made Target for including covering the sequence group of known SNP allele Sequence gene group.
Term " clinically related sequence " refers to a nucleotide sequence herein, and the sequence is known to be or under a cloud is and one Kind heredity or disease situation is associated or implication therewith.It is determined that a kind of medical condition diagnosis or confirm the medical science feelings During the diagnosis of condition or when providing the prediction for a kind of development of disease, it is determined that can presence or absence of clinically related sequence To be useful.
When using term " derivative " under the background of a kind of nucleic acid or a mixtures of nucleic acids, refer to herein from this Or this or these nucleic acid mode is obtained at the source that must originate from of these nucleic acid.For example, in one embodiment, it is derived from The mixture of the nucleic acid of two different genes groups refers to the process of that these nucleic acid (such as cfDNA) are by naturally occurring by cell (such as necrosis or apoptosis) and discharging naturally.In another embodiment, nucleic acid derived from two different genes groups Mixture refers to that these nucleic acid are extracted from the two distinct types of cell from a subject.
Term " Patient Sample A " refers to the life obtained from patient (i.e. medical aid, nursing or the recipient for the treatment of) herein Thing sample.Patient Sample A can be any sample described here.In certain embodiments, Patient Sample A passes through Noninvasive Program obtains, such as periphery blood sample or fecal specimens.Method described here is not necessarily limited to the mankind.As such, it is contemplated that different beast Medical application, in the case, Patient Sample A can be sample (such as cat, pig, horse, ox etc. from non-human mammal Deng).
Term " biased sample " refers to the sample containing the mixtures of nucleic acids derived from different genes group herein.
Term " maternal sample " refers to the biological sample obtained from pregnant subject (such as women) herein.
Term " biological fluid " refer to herein the liquid for being derived from biological source and including such as blood, serum, blood plasma, Saliva, irrigating solution, cerebrospinal fluid, urine, seminal fluid, sweat, tears, saliva etc..As used herein, term " blood ", " blood plasma " with And " serum " clearly covers its part or processing part.Equally, it is derived from biopsy, cotton swab, smear etc. in sample In the case of, " sample " is clearly covered derived from the processing part of biopsy, cotton swab, smear etc. or part.
Term " maternal nucleic acids " and " fetal nucleic acid " refer respectively to nucleic acid and the pregnancy female of pregnant female subject herein The nucleic acid of fetus entrained by property.
As used herein, term " from ... it is corresponding " it is sometimes referred to be present in the genome of different subjects, and The identical sequence of need not have in all genomes, and it is provided for the body of sequence interested such as gene or chromosome Part rather than the nucleotide sequence such as gene or chromosome of hereditary information.
As used herein, term " substantially acellular " covers the cellular component for removing from required sample and being generally attached thereto Required sample formulation.For example, by removing the haemocyte being generally connected such as red blood cell with blood plasma, plasma sample is made It is substantially acellular.In certain embodiments, substantial cell-free sample is processed, to remove cell, otherwise these Cell will be treated the desired inhereditary material tested for CNV and be had an impact.
As used herein, term " fetus fraction " refers to include fetal nucleic acid present in the sample of fetus and maternal nucleic acids Fraction.Fetus fraction is often characterizing the cfDNA in mother's blood.
As used herein, term " chromosome " refers to the genophore that heredity is undertaken in living cells, it is derived from chromatin And including DNA and protein component (especially histone).Herein using the indivedual human genomes of routine generally acknowledged in the world Chromosome numbers system.
As used herein, term " polynucleotides length " refers in sequence or the region nucleic acid molecule of reference gene group The absolute number of (nucleotides).Term " chromosome length " refers to the known chromosome length in units of base-pair, such as It is found in WWW genome.ucsc.edu/cgi-bin/hgTracksHgsid=167155613&chromInfoPage= On human chromosome NCBI36/hg18 set provided in.
Term " subject " refers to human experimenter and nonhuman subjects herein, such as mammal, moves without vertebra Thing, vertebrate, fungi, yeast, bacterium and virus.Although example in this is related to the mankind and language is primarily directed to people Class problem, but concept disclosed here is applied to the genome from any plant or animal, and suitable for veterinary science, poultry Stock breeding, research laboratory etc. field.
Term " symptom " refers to " Medical Condition " herein, and as broader term, it includes all diseases and illness, may be used also Including [damage] and such as pregnancy normal health, it may influence the health of a people, benefit from medical aid or tool Have therapeutic treatment contains meaning.
Term " complete " uses when referring to chromosomal aneuploidy herein, refers to acquisition or the loss of whole chromosome.
Term " part " is when referring to chromosomal aneuploidy in use, referring to a part (i.e. section) for chromosome herein Acquisition or loss.
Term " chimera " refers to represent to exist in an individual come from single fertilization egg development have different IPs at this Two kinds of cell colonys of type.Mosaic may be caused by the mutation that adult cell's subset is only spread to during development.
Term " non-chimera " refers to include having a kind of organism of the cell of caryogram herein, such as human foetus.
Term " using chromosome " is when referring to determination chromosome dosage in use, referring to that use obtains for chromosome herein The sequence information obtained, i.e., the number of the sequence label obtained for chromosome.
Term " sensitivity " as used in this is equal to the number divided by true positives and false negative sum of true positives.
Term " selectivity " as used in this is equal to the number divided by true negative and false positive sum of true negative.
Term " hypodiploid " refers to a chromosome number herein, and it is than the genome feature for the species Normal haploid number wants small one or more.
" polymorphic site " is the locus that nucleotide sequence difference occurs.Locus may diminish to a base-pair.Signal Property label there are at least two allele, each frequency occurred is more than the 1% of colony selected, and more typical Ground is more than 10% or 20%.Polymorphic site can be SNP (SNP), small-scale more base deletions or insertion, more The site of nucleotide polymorphisms (MNP) or Short tandem repeatSTR (STR).Term " polymorphic locus " exchanges herein with " polymorphic site " Use.
" polymorphic sequence " refers to include one or more polymorphic sites (such as SNP or series connection SNP) herein Nucleotide sequence, such as DNA sequence dna.It can be used for specifically mixing with maternal nucleic acids including fetus according to the polymorphic sequence of this technology Parent distinguishes with non-maternal allele in the maternal sample of thing.
As used herein, " SNP " (SNP) is appeared on the polymorphic site that mononucleotide occupies, the site The site morphed between the sequence for being allele.The sequence highly conserved with being followed by allele before the site is usual Arrange (such as the sequence changed in less than colony 1/100 or 1/1000 member).SNP is generally because of a nucleosides on polymorphic site Acid is substituted and produced by another nucleotides.Conversion is that a purine is replaced by another purine or a pyrimidine is phonetic by another Pyridine is replaced.Transversion is that purine is replaced by cytosine or pyrimidine by purine.SNP can also be by the core relative to reference allele Thuja acid lacks or nucleotides inserted causes.SNP (SNP) be in human colony two substitute bases with considerable Frequency (>1%) it is in the presence of, and is the human genetic variation of most common type.
" series connection SNP " refers to two or more existing SNP in a polymorphic target nucleic acid sequence to term herein.
As used herein, term " Short tandem repeatSTR " or " STR " refer to when the pattern of two or more nucleotides repeats And a kind of polymorphism occurred when repetitive sequence is immediately adjacent to one another.The length of the pattern can be in the base-pair from 2 to 10 (bp) (such as (CATG) in genome arean) in the range of, and typically in non-coding includes subregion.If by checking Dry str locus seat and counting repeat how many specific STR sequences on set locus, it is possible to establish individual uniqueness Genetic profile.
As used herein, term " miniSTR " refers to cross over less than about 300 base-pairs, less than about 250 bases herein To, less than about 200 base-pairs, less than about 150 base-pairs, less than about 100 base-pairs, less than about 50 base-pairs or small In four or more base-pair tandem sequence repeats of about 25 base-pairs." miniSTR " is can be from the STR of cfDNA template amplifications.
Term " polymorphic target nucleic acid ", " polymorphic sequence ", " polymorphic target nucleic acid sequence " and " polymorphic nucleic acid " is mutual herein Use is changed, refers to the nucleotide sequence (such as DNA sequence dna) for including one or more polymorphic sites.
Term " multiple polymorphic target nucleic acids " refers to respectively big including at least one polymorphic site (such as a SNP) herein Measure nucleotide sequence so that 1,2,3,4,5,6,7,8,9,10,15,20,25,30, 40 or more different polymorphic sites expand from the polymorphic target nucleic acid, to identify and/or quantify to include fetus and parent core Foetal allele present in the maternal sample of acid.
Term " enrichment " refers to by the polymorphic target nucleic acid amplification included in a maternal sample part and by institute herein The process that the remainder of maternal sample of the amplified production with removing the part combines.For example, its remaining part of maternal sample It can be original parents sample to divide.
Term " original parents sample " refers to from a removal part is served as to expand the source of polymorphic target nucleic acid herein The non-enrichment biological sample obtained in pregnant subject (such as women)." primary sample " can be obtained from pregnant subject Any sample and its process part, such as purifying cfDNA samples extracted from Maternal plasma sample.
As used herein, term " primer " refers to the bar synthesized when the primer extension product for being placed in initiation with nucleic acid stock compensates When under part (in the presence of nucleotides and the initiator such as DNA polymerases and at suitable temperature and pH value), Neng Gouchong When the separation oligonucleotides of synthesis starting point.Most efficiently to expand, primer is preferably sub-thread, but as an alternative, can To be bifilar.It is if bifilar, then for being handled first primer before preparing extension products to separate its stock.Primer Preferably oligodeoxyribonucleotide.The necessary long enough of primer, to trigger extension products synthesis in the presence of initiator.Primer Precise length will depend on many factors, include use and the parameter for design of primers of temperature, Primer Source, method.
Phrase " having behavior to be taken (cause) " refer to medical profession (such as doctor) or control or instruct by The control and/or permit in question one or more medicaments/one or more compounds that the people of examination person's medical treatment and nursing is taken Give the action of subject.Administration may include to diagnose and/or determine appropriate treatment or prevention scheme, and/or be outputed for subject Concrete medicament/compound.This, which is prescribed, may include for example to draft composition, writes case record etc..Equally, such as diagnostic program " having pending behavior (cause) " refer to medical profession (such as doctor) or control or instruct subject medical treatment shield Control and/or allowance that the people of reason is taken perform the action of one or more diagnosis schemes to subject.
Introduction
There is disclosed herein the copy number variation (CNV) for determining in test sample different sequences interested method, Equipment, system and kit, the test sample are included derived from two different genes groups and known or suspection one or more The mixture of the different nucleic acid of the amount of individual sequence interested.Additionally provide for determining by two genes in mixtures of nucleic acids Method, equipment, system and the kit of the contributed fraction of group.The copy number determined by method and apparatus disclosed herein Variation include whole chromosome acquisition or loss, be related to microscopic very big chromosome segment change and size From kilobase (kb) to a large amount of submicroscopic copy number variations of the DNA fragmentation of megabasse (Mb).In different embodiments, this A little methods include the statistical method that a kind of machine is realized, the statistical method illustrates by the related variability of technique, interchromosomal Naturally increased variability caused by variability between variability and sequence.This method is applied to determine any fetus aneuploidy CNV, it is and known or suspect the CNV relevant with plurality of medical symptom.The CNV that can be determined according to the inventive method includes dyeing Any one or more trisomys and monosomy in body 1 to 22, X and Y, other chromosome polysomies and any one or more The missing of the section of chromosome and/or duplication, by only to the nucleic acid sequencing of test sample once, you can detect.It is any non- Ortholoidy can be determined from the sequencing information by only once being obtained to the nucleic acid sequencing of test sample.
CNV in human genome significantly affects mankind's diversity and neurological susceptibility (Redon (thunder east) etc. to disease People, Nature (nature) 23:444-454 [2006], Shaikh (Xie He) et al. .Genome Res (genome research) 19: 1682-1690[2009].Known CNV forms genetic disease by different mechanisms, causes gene dosage in most cases uneven Weighing apparatus is also or gene disruption.Except they are directly related to genetic block, it is also known that CNV mediations can be that harmful phenotype changes Become.Recently, some researchs are it is reported that such as compared with normal control, in complicated imbalance, such as self-closing disease, ADHD (hyperactivity), In schizophrenia, rare or CNV again increased burden, the potential pathogenic of rare or unique CNV is highlighted (Sebat (Sai Baite) et al., 316:445-449[2007];Walsh (Walsh) et al., Science (science) 320:539- 543[2008].CNV from genome rearrangement rises, and is primarily due to lack, replicates, inserting and unbalanced translocation events.
Method described here, device can use the sequencing technologies of future generation for carrying out large-scale parallel sequencing (NGS).In certain embodiments, the DNA profiling or list expanded with the extensive parallel mode sequencing clone in flowing groove DNA molecular is (such as in Volkerding (Wo Keerding) et al., Clin Chem (clinical chemistry) 55:641-658[2009]; Metzker (maze can) M, Nature Rev (commenting on naturally) 11:Described in 31-46 [2010]).Except high through-put sequence is believed Breath, NGS provide quantitative information, and each of which sequence reads are computable " sequence labels ", and these sequence labels represent individual Body clones DNA profiling or single DNA molecular.NGS sequencing technologies include pyrosequencing, the synthesis by reversible dye-terminators Method sequencing, the sequencing connected by oligonucleotide probe and ionic semiconductor are sequenced.Can individually it be sequenced from single sample The DNA (i.e. singleplex sequencings) of product, or when round is singly sequenced, as index genome molecules, from multiple samples DNA can be collected at together and be sequenced (i.e. multiple sequencing), with produce up to some hundred million DNA sequence dna reading. Illustrate the example of sequencing technologies below, can be used for the sequence information for obtaining the method according to the invention.
In some embodiments, method and apparatus disclosed here can use some or all operations of following order: Nucleic acid test sample is obtained from patient (typically via Noninvasive program);Test sample is processed, is ready for being sequenced;To coming The nucleic acid of test sample is sequenced, to produce a large amount of readings (for example, at least 10,000);By these readings and refer to sequence A part for row/genome is compared, and determines that chromosome or dye (such as are defined in the part of defining for being mapped to reference sequences Colour solid section) DNA amount (such as number of reading);The one or more for being directed to by using being mapped to and defining and selecting part The DNA of normalization chromosome or chromosome segment amount normalized mapping calculates one or more to the amount for the DNA for defining part The individual dosage for defining part;Determine whether the dosage indicates that this defines part " impacted " (such as aneuploidy or chimera); Report determines and is optionally converted into diagnosis;Using the diagnosis or determine to suffer to develop treatment, monitoring or further test The plan of person.
Determine the normalization sequence in qualified samples:Normalize chromosome sequence and normalization sector sequence
Using the qualified samples identification normalization sequence that subject is derived from from one group, these subjects are known to be included having One normal copy number of any sequence (such as chromosome or its section) interested.The implementation for the method described in Fig. 1 The determination of normalization sequence is outlined in the step 110 of scheme, 120,130,140 and 145.The sequence obtained from qualified samples Information is used to statistically meaningfully identify the chromosomal aneuploidy (Fig. 1 steps 165 and example) in test sample.
Fig. 1 provides one of the CNV for determining sequence interested such as chromosome or its section in biological sample The flow chart 100 of embodiment.In some embodiments, obtain biological sample from subject, and the sample include by The mixture for the nucleic acid that different genes group is formed.Can be made up of different genes group two individual samples, for example, by fetus and The parent for nourishing fetus forms different genes group.Alternately, aneuploidy cancer cell that can be with origin from same subject Genome is formed with the sample (such as plasma sample from cancer patient) of normal multiple cell.
In addition to the test sample of analysis patient, one or more of each possible chromosome interested is also selected Individual normalization chromosome or one or more normalization chromosome segments.Normalize identification and the Patient Sample A of chromosome or section The asynchronous progress of proper testing, both can be carried out in a clinical setting.In other words, identify and return before Patient Sample A is tested One changes chromosome or section.Storage normalization chromosome or section and chromosome or section interested between relevance with Used during test.As described below, the relevance, which typically preserves, tests the period that many samples are crossed over.Following discussion It is related to the embodiment of the normalization chromosome or chromosome segment for selecting indivedual chromosomes interested or section.
One group of qualified samples is obtained to identify qualified normalization sequence, and to provide variation value, for determining to test The statistically significant identification of CNV in sample.In step 110, the qualified sample of multiple biology is obtained from multiple subjects Product, it is known that these subjects include the cell of the normal copy number with any one sequence interested.In an embodiment party In case, qualified samples are obtained from the parent for nourishing fetus, are confirmed using cytogenetics means with normal copy number Chromosome.Biology qualified samples can be a kind of biological fluid, such as blood plasma, or any suitable sample as described below Product.In some embodiments, qualified samples contain the mixture of nucleic acid molecules (such as cfDNA molecules).In some embodiment party In case, qualified samples are the plasma samples of the parent of the mixture of the cfDNA molecules containing fetus and parent.By using appoint What known sequence measurement, in these nucleic acid at least a portion (such as fetus and parent nucleic acid) be sequenced, obtain Normalize chromosome and/or part thereof of sequence information.Preferably, in the sequencing of future generation illustrated elsewhere of the application (NGS) any one of method is used for fetus and parent the nucleic acid sequencing to the molecule as list or clonal expansion. In different embodiments, qualified samples are for example described below, are processed before sequencing and during sequencing.These samples can It is processed using equipment as in this disclosure, system and kit.
In step 120, at least a portion of each of all qualified nucleic acid in qualified samples is sequenced, with Million sequence reads, such as 36bp readings are produced, this is compared with reference gene group, such as hg18.In some embodiment party In case, sequence reads include about 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, about 450bp or about 500bp.Expect that technical advantage will make it possible to carry out the single-ended reading more than 500bp, match somebody with somebody opposite end reading producing When, the reading allows to the reading for being greater than about 1000bp.In one embodiment, the sequence reads of mapping include 36bp.In another embodiment, the sequence reads of mapping include 25bp.The sequence reads compared with reference gene group, with And unique mapping is to the reading of reference gene group, it is known that they are as sequence label.In one embodiment, from unique mapping At least about 3x 10 is obtained in the reading of reference gene group6Individual qualified sequence label, at least about 5x 106Individual qualified sequence label, extremely Few about 8x 106Individual qualified sequence label, at least about 10x 106Individual qualified sequence label, at least about 15x 106Individual qualified sequence mark Label, at least about 20x 106Individual qualified sequence label, at least about 30x 106Individual qualified sequence label, at least about 40x 106It is individual qualified Sequence label or at least about 50x 106Qualified sequence label between the individual reading including 20 and 40bp.
In step 130, all labels for deriving from the nucleic acid in sequencing qualified samples are counted, to determine that qualified sequence label is close Degree.In one embodiment, sequence label density, which is confirmed as referring to, corresponds to sequence interested in reference gene group This multiple qualified sequence label.In another embodiment, qualified sequence label density is to be defined as being mapped to sequence interested This multiple qualified sequence label of row, it is normalized to the length for the qualified sequence interested that they map.It is confirmed as marking Label density is referred to here as label densities ratio relative to the sequence label density of the ratio of the length of sequence interested.And it is not required to The length of sequence interested is normalized to, and a step can be included as, reduces the digit in a number, is come simple Change it and be used for manual interpretation.All qualified sequence labels are by mapping and count down to each qualified samples, the sense in qualified samples The sequence label density of the sequence (such as clinically related sequence) of interest is determined, while order identification additional sequences (are returned One change sequence comes from it) sequence label density.
In certain embodiments, sequence interested is the chromosome associated with complete chromosome aneuploidy, example Such as chromosome 21, and qualified normalization sequence is not associated with chromosomal aneuploidy and sequence label density change Close to the complete chromosome of sequence (i.e. chromosome) interested such as chromosome 21.The normalization chromosome selected can be with It is closest to the chromosome or a group chromosome of the sequence label variable density of sequence interested.Chromosome 1-22, X and Any one or more in Y can be sequence interested, and the one or more chromosome can be identified as it is qualified The normalization sequence of each in any one chromosome 1-22, X, Y in sample.It can individually contaminate to normalize chromosome Colour solid, or it can be the group chromosome described elsewhere of the application.
In another embodiment, sequence interested is and part aneuploidy (such as chromosome deficiency or insertion Or uneven chromosome translocation) associated chromosome segment, and it is not associated with part aneuploidy to normalize sequence And sequence label density change close to the chromosome segment associated with part aneuploidy a chromosome segment (or One group of section).The normalization chromosome segment selected can be closest to the sequence label variable density of sequence interested One or more chromosome segments.Any one or more chromosomes 1-22, X and Y any one or more sections can be with It is sequence interested.
In other embodiments, sequence interested is the chromosome segment associated with part aneuploidy, and It is a whole chromosome or multiple whole chromosomes to normalize sequence.In still other embodiments, sequence right and wrong interested An associated whole chromosome of ortholoidy and to normalize sequence be not associated with an aneuploidy chromosomal region Section or multiple chromosome segments.
No matter simple sequence or the normalization that one group of recognition sequence is any one or more sequences interested in qualified samples Sequence, sequence label variable density can be selected closest or effectively close to the sequence interested such as determined in qualified samples The qualified normalization sequence of row.For example, it is qualified normalization sequence be when sequence interested to be normalized, The sequence of the variability of minimum is produced between qualified samples, that is, the variability for normalizing sequence determines in qualified samples The variability of sequence interested.In other words, qualified normalization sequence is selected to make sequence dosage (sequence interested Arrange) change minimum sequence between qualified samples.Therefore, the process choosing is when as normalization chromosome, it is contemplated that can produce The sequence of the minimum variability in chromosome dosage between the different batches of raw sequence interested.
It is that selection is used to be kept in qualified samples for the normalization sequence that any one or more sequences interested are identified A few days, several weeks, several months and possibility are up in the normalization sequence for determining existence or non-existence aneuploidy in the test sample The time of several years, its condition is that program needs to produce sequencing library, and the sequencing carried out to sample is basically unchanged with the time.Such as It is upper described, the normalization sequence of aneuploidy be present because between sample room (such as different samples) and sequencing round for determination (such as the sequencing round carried out on the same day on the same day and/or not) is mapped to the closest use of variability of its sequence label number Its variability (and may other reasons) as the sequence interested of normalized parameter and select.The essence of these programs Property change will influence be mapped to all sequences label number, so as to again will determine which or which group sequence it is identical and/ Or in different sequencing rounds, on the same day or not on the same day sample room variability closest to sequence interested variability, this It will need to determine that the group normalizes sequence again.The substantive change of program includes being used for the lab scenario hair for preparing sequencing library Changing, including the change relevant for the sample of multiple sequencing rather than single channel sequencing with preparing;And the change of microarray dataset, Include the change of the chemical substance for sequencing.
In some embodiments, normalization sequence is best to identify one from one or more impacted samples The sequence of individual or multiple qualified samples, it means that normalization sequence is the sequence for having maximum resolvability, that is, normalizes sequence The resolvability of row is so so that provides optimal differentiation to the sequence interested in impacted test sample, is used for Easily impacted test sample is identified from other unaffected samples.In other embodiments, sequence is normalized Row are the sequences with minimum variability with the combination of maximum resolvability.
The level of resolvability can be determined that in a group qualified samples sequence dosage (such as chromosome dosage or Section dosage) and one or more test samples in the one or more chromosome dosage between statistical discrepancy, such as it is following It is described and shown in these examples.For example, resolvability can be expressed as T test values by numeral, it is qualified that it represents a group Statistical discrepancy between one or more of chromosome dosage and one or more test samples in sample chromosome dosage. z-score for chromosome doses as long as the distribution for the NCV is normal.<}0{>Alternately, resolvability can be expressed as normalized chromosome value (NCV) by numeral, as long as NCV point Cloth is normal, and it is exactly the z-score of chromosome dosage.Similarly, resolvability can be expressed as T test values by numeral, it Represent between one or more of section dosage and the one or more test samples in a group qualified samples section dosage Statistical discrepancy.In the case of the sequence that chromosome segment is interested, the resolvability of section dosage can represent in number For normalized section value (NSV), the normalized section value is the z-score of chromosome segment dosage, as long as NSV distribution is just Often.In the average value that chromosome or section the dosage in one group of qualified samples it is determined that in z-score, can be used and Standard deviation.Alternately, can use includes chromosome in qualified samples and the training group of impacted sample or section agent The average value and standard deviation of amount.In other embodiments, normalization sequence is divided with minimum variability and maximum The property distinguished or small variability and the sequence of the best of breed of big resolvability.
This method identifies the sequence for being inherently associated with similar characteristics, and tends to sample and the similar change being sequenced between round It is different, and it is useful for determining the sequence dosage in test sample.
Determine the sequence dosage (i.e. chromosome dosage or section dosage) in qualified samples
In step 140, the qualified label density based on calculating, qualified sequence dosage (the i.e. chromosome of sequence interested Dosage or section dosage) it is confirmed as the sequence label density of sequence interested and additional sequences (then identify in step 145 From its normalization sequence) qualified sequence label density ratio.The normalization sequence of identification is subsequently used in determination and surveyed Sequence dosage in test agent.
In one embodiment, the sequence dosage in qualified samples is a chromosome dosage, the chromosome dosage quilt This sequence for the normalization chromosome sequence being calculated as in this sequence label number and qualified samples of chromosome interested The ratio of column label number.Normalization chromosome sequence can be monosome, a group chromosome, chromosome section, Or one group of section from coloured differently body.Therefore, the chromosome dosage of chromosome interested is confirmed as in the sample: (i) this multiple label of this multiple label of chromosome interested and the normalization chromosome sequence being made up of monosome Ratio, the number of (ii) for the label of chromosome interested and the normalization dye for including two or more chromosomes The ratio of the number of the label of colour solid sequence;(iii) one is included with being directed to for the number of the label of chromosome interested The ratio of the number of the label of the normalization sector sequence of single section of chromosome;(iv) it is directed to the label of chromosome interested Number with for including two or more sections from a chromosome normalization sector sequence label number Ratio;Or (v) for the label of chromosome interested number with for including two of two or more chromosomes Or more the number of the label of the normalization sector sequence of section ratio.According to (i)-(v), for determining dye interested The example of the chromosome dosage of colour solid is as follows:The chromosome dosage of chromosome (such as chromosome 21) interested is confirmed as contaminating The sequence label density of colour solid 21 and all remaining chromosome (i.e. chromosome 1-20, chromosome 22, chromosome x and chromosome Y) The sequence label density of each ratio;(i) the chromosome dosage of chromosome interested (such as chromosome 21) is true Sequence label density and the sequence label that may all combine of two or more remaining chromosomes for being set to chromosome 21 are close The ratio of degree;(ii) the chromosome dosage of chromosome interested (such as chromosome 21) is confirmed as the sequence mark of chromosome 21 Sign the ratio of the sequence label density of the section of density and another chromosome (such as chromosome 9);(iii) chromosome interested The chromosome dosage of (such as chromosome 21) is confirmed as the sequence label density of chromosome 21 and the Liang Ge areas of another chromosome The ratio of the sequence label density of section (such as two sections of chromosome 9);(iv) and chromosome interested (such as dyes Body 21) chromosome dosage be confirmed as the sequence label density of chromosome 21 and two section (examples of two coloured differently bodies Such as the section of chromosome 9 and the section of chromosome 14) sequence label density ratio.
In another embodiment, the sequence dosage in qualified samples is section dosage, and it is calculated as in qualified samples Number for the sequence label of the section interested of non-whole chromosome and the sequence label for normalization sector sequence The ratio of number.Normalization sector sequence can be a such as whole chromosome, one group of whole chromosome, one of chromosome Section or one group of section from coloured differently body.For example, in qualified samples, the section dosage quilt of section interested Be defined as (i) section interested this multiple label and by the single section of chromosome form normalization sector sequence this The ratio of multiple labels, this multiple label of section (ii) interested and is made up of two or more sections of a chromosome Normalization sector sequence this multiple label ratio, or this multiple label of (iii) section interested and by two or The ratio of this multiple label of the normalization sector sequence that two or more sections of more chromosomes are formed.
The chromosome dosage of one or more chromosomes interested is determined in whole qualified samples, and in step Identification normalization chromosome sequence in 145.Similarly, one or more sections interested are determined in whole qualified samples Section dosage, and the identification normalization sector sequence in step 145.
Normalization sequence is identified from qualified sequence dosage
In step 145, based on the sequence dosage calculated, the normalization sequence for identifying sequence interested is for example to make The sequence of the sequence dosage of sequence interested variability minimum between all qualified samples.This method identification is inherently associated with The sequence of similar characteristics, and tend to sample be sequenced round similar variation, and it for determination test sample in Sequence dosage is useful.
In one group of qualified samples, the normalization sequence of one or more sequences interested can be identified, and is being closed The sequence identified in lattice sample can be for subsequent use in the sequence for calculating one or more sequences interested in each test sample Row dosage (step 150), to determine in each test sample presence or absence of aneuploidy.Using different microarray datasets When, and/or when being had differences in the purifying and/or the preparation of sequencing library for wanting sequencing nucleic acid, to chromosome interested or Section, the normalization sequence of identification can be different.It is chromosome or its area to use normalization sequence according to method described here The copy number variation of section is provided single-minded and sensitively measured, regardless of sample preparation and/or the microarray dataset used.
In some embodiments, more than one normalization sequence is identified, i.e. can be true to a sequence interested Fixed different normalization sequences, and multiple sequence dosage can be determined to a sequence interested.For example, using dyeing During the sequence label density of body 14, the variation (such as coefficient of variation) in the chromosome dosage of chromosome 21 interested is minimum. However, it is possible to identify two, three, four, five, six, seven, eight or more normalization sequences, for it is determined that Used in test sample in the sequence dosage of sequence interested.As an example, can use chromosome 7, chromosome 9, Chromosome 11 or chromosome 12 are as normalization chromosome sequence, it is determined that the of chromosome 21 in any one test sample Two dosage, because CV of these chromosomes all with the CV close to chromosome 14 (referring to the table 10 of example 8).Preferably, selecting When selecting normalization chromosome sequence of the monosome as chromosome interested, normalization chromosome will be a chromosome, The chromosome causes the chromosome dosage of chromosome interested to have the minimum across whole test samples (such as qualified samples) Variability.
Normalize normalization sequence of the chromosome sequence as chromosome
In other implementations, normalization chromosome sequence can be simple sequence, or it can be one group of sequence.Example Such as, in some embodiments, it is to be identified as chromosome 1-22, X and Y any one or more normalizings to normalize sequence Change one group of sequence of sequence, such as a group chromosome.The normalization sequence for forming chromosome interested (normalizes chromosome Sequence) the group chromosome, can be one group two, three, four, five, six, seven, eight, nine, ten, 11,12,13,14, ten 5th, 16,17,18,19,20,21 or 20 disome, and include or exclude in chromosome x and Y One or both.>The group chromosome for being identified as normalizing chromosome sequence is such group chromosome, and they cause The chromosome dosage of chromosome interested has the minimum variability across whole test samples (i.e. qualified samples).Preferably, Single or multigroup chromosome is tested together, and for the ability of their best simulations sequence interested, it is selected for this As normalization chromosome sequence.
In one embodiment, the normalization sequence of chromosome 21 is selected from chromosome 9, chromosome 1, chromosome 2, dye Colour solid 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 10, chromosome 11, chromosome 12, dyeing Body 13, chromosome 14, chromosome 15, chromosome 16 and chromosome 17.In another embodiment, the normalization of chromosome 21 Sequence is to be selected from chromosome 9, chromosome 1, chromosome 2, chromosome 11, chromosome 12 and chromosome 14.Alternately, dye The normalization sequence of body 21 is selected from chromosome 9, chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6th, chromosome 7, chromosome 8, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, dyeing One group chromosome of body 16 and chromosome 17.In another embodiment, the group chromosome be selected from chromosome 9, chromosome 1, Chromosome 2, chromosome 11, a group of chromosome 12 and chromosome 14.
In some embodiments, by using normalization sequence further improve this method, by individually and Determined with may all the combining of all remaining chromosomes using the system-computed of whole chromosome dosage of each chromosome Normalize sequence (referring to example 13).For example, by using any of chromosome 1-22, X and Y, and chromosome 1-22, X, two or more combination and in Y is to determine which single or groups of chromosome is normalization chromosome, the normalizing Changing chromosome causes the minimum variability across the chromosome dosage of the chromosome interested of one group of qualified samples, thus system meter Calculate all may chromosome, each chromosome interested can be determined system determination normalization chromosome (referring to example 13).Therefore, in one embodiment, the normalization sequence of the system-computed of chromosome 21 be by chromosome 4, chromosome 14, Chromosome 16, chromosome 20, a group chromosome of and chromosome 22s composition.To whole chromosomes in genome, it may be determined that Single or groups of chromosome.
In one embodiment, the normalization sequence of chromosome 18 is selected from chromosome 8, chromosome 2, chromosome 3, dye Colour solid 4, chromosome 5, chromosome 6, chromosome 7, chromosome 9, chromosome 10, chromosome 11, chromosome 12, the and of chromosome 13 Chromosome 14.Preferably, the normalization sequence of chromosome 18 is selected from chromosome 8, chromosome 2, chromosome 3, chromosome 5, dye Colour solid 6, chromosome 12 and chromosome 14.In one embodiment, the normalization sequence of chromosome 18 be selected from chromosome 8, Chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 9, chromosome 10, chromosome 11, dye One group chromosome of colour solid 12, chromosome 13 and chromosome 14.Preferably, the group chromosome is selected from chromosome 8, chromosome 2nd, a group of chromosome 3, chromosome 5, chromosome 6, chromosome 12 and chromosome 14.
In another embodiment, by individually and by normalization chromosome may all be applied in combination it is each can Chromosome can be normalized, thus system-computed all may chromosome dosage determine normalization sequence (such as this Shen of chromosome 18 Elsewhere being explained please).Therefore, in one embodiment, the normalization sequence of chromosome 18 is by one group of dyeing The normalization chromosome of body composition, the group chromosome are made up of chromosome 2, chromosome 3, chromosome 5 and chromosome 7.
In one embodiment, the normalization sequence of chromosome x is selected from chromosome 1, chromosome 2, chromosome 3, dye Colour solid 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, dyeing Body 13, chromosome 14, chromosome 15 and chromosome 16.Preferably, the normalization sequence of chromosome x is selected from chromosome 2, dye Colour solid 3, chromosome 4, chromosome 5, chromosome 6 and chromosome 8.In one embodiment, the normalization sequence of chromosome x is Selected from chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, Chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, a group chromosome of chromosome 15 and chromosome 16. Preferably, the group chromosome is one selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6 and chromosome 8 Individual group.
In another embodiment, by individually and by normalization chromosome may all be applied in combination it is each can Chromosome can be normalized, thus system-computed all may chromosome dosage determine normalization sequence (such as the application of chromosome x Elsewhere explained).Therefore, in one embodiment, the normalization sequence of chromosome x is by chromosome 4 and dye The normalization chromosome that the group of colour solid 8 is formed.
In one embodiment, the normalization sequence of chromosome 13 is selected from chromosome 2, chromosome 3, chromosome 4, dye Colour solid 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 14, dye One chromosome of colour solid 18 and chromosome 21.Preferably, the normalization sequence of chromosome 13 is selected from chromosome 2, chromosome 3rd, a chromosome of chromosome 4, chromosome 5, chromosome 6, and chromosomes 8.In another embodiment, chromosome 13 Normalization sequence is selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, dyeing Body 9, chromosome 10, chromosome 11, chromosome 12, chromosome 14, a group chromosome of chromosome 18 and chromosome 21.It is preferred that Ground, the group chromosome are a groups selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6 and chromosome 8.
In another embodiment, the normalization sequence for chromosome 13 is that each may normalize is used alone to contaminate Colour solid and the whole possible combination for normalizing chromosome, by system-computed all may chromosome dosage determine (such as being explained elsewhere of the application).Therefore, in one embodiment, the normalization sequence of chromosome 13 is to include The normalization chromosome of the group of chromosome 4 and chromosome 5.In another embodiment, the normalization sequence of chromosome 13 The normalization chromosome being made up of the group of chromosome 4 and chromosome 5.
Independently of it is determined which the normalization chromosome used in chromosome Y dosage, in chromosome Y chromosome dosage Variation be more than 30.Therefore, one group of two or more chromosome selected from chromosome 1-22 and chromosome x are used as contaminating Colour solid Y normalization sequence.In one embodiment, at least one normalization chromosome is by chromosome 1-22 and dyeing One group chromosome of body X compositions.In another embodiment, the group chromosome is by chromosome 2, chromosome 3, chromosome 4, dyeing Body 5 and chromosome 6 form.
In another embodiment, by individually and by normalization chromosome may all be applied in combination it is each can Chromosome can be normalized, thus system-computed all may chromosome dosage determine chromosome Y normalization sequence (such as the application Elsewhere explained).Therefore, in one embodiment, chromosome Y normalization sequence is included by chromosome 4 The normalization chromosome of the group chromosome formed with chromosome 6.In another embodiment, chromosome Y normalization sequence The normalization chromosome that row are made up of a group chromosome, the group chromosome are made up of chromosome 4 and chromosome 6.
The normalization sequence of dosage for calculating coloured differently body interested or different sections interested can be Identical, or respectively for coloured differently body or section, it can be different normalization sequences.For example, normalization sequence, Chromosome A interested normalization sequence (such as normalizing chromosome) (one or one group) can be identical, or it It can be differently configured from chromosome B interested normalization sequence (such as normalizing chromosome) (one or one group).
The normalization sequence of complete chromosome can be a complete chromosome or one group of complete chromosome, or it can be with It is the section of chromosome, or one group of section of one or more chromosomes.
Normalize normalization sequence of the sector sequence as chromosome
In another embodiment, the normalization sequence of chromosome can be normalization sector sequence.Normalize section sequence Row can be single section, either it can be a chromosome one group of section or they can be from two or more Multiple sections of multiple coloured differently bodies.Pass through the system-computed all combined of sector sequence in genome, it may be determined that return One changes sector sequence.For example, the normalization sector sequence of chromosome 21 can be than about 47Mbp (megabase to) dye The big or small single section of the size of colour solid 21, such as normalize the section that section can be chromosome 9, its about 140Mbp. As an alternative, the normalization sequence of chromosome 21 can be (such as from chromosome 1 for example from two coloured differently bodies With from chromosome 12) sector sequence combination.
In one embodiment, the normalization sequence for chromosome 21 be chromosome 1-20,22, one of X and Y One normalization sector sequence of section or one group of two or more section.In another embodiment, for chromosome 18 Normalization sequence be chromosome 1-17,19-22, X' and Y a section or multigroup section.In another embodiment, pin Normalization sequence to chromosome 13 is chromosome 1-12,14-22, X' and Y a section or multigroup section.In another reality Apply in scheme, the normalization sequence for chromosome x is chromosome 1-22 and Y a section or multigroup section.In another reality Apply in scheme, the normalization sequence for chromosome Y is chromosome 1-22 and X a section or one group of section.To a base Because whole chromosomes in group can determine the normalization sequence of single or multigroup section.Normalize two of sector sequence or more Multiple sections can be the section from a chromosome, or the two or more section can be two or more not The section of homologous chromosomes.As to illustrated by normalization chromosome sequence, a normalization sector sequence is to two or more It can be identical for coloured differently body.
Normalize normalization sequence of the sector sequence as chromosome segment
When sequence interested is the section of chromosome, it may be determined that presence or absence of sequence interested CNV.Variation in the copy number of chromosome segment allows to determine presence or absence of a kind of chromosome dyad aneuploidy.With Lower explanation be the chromosome dyad aneuploidy associated from different fetal abnormalities and the state of an illness example.The section of chromosome There can be any length.For example, it can be with scope from kilobase to several hundred million individual bases.Human genome is only accounted for more than 3,000,000,000 DNA base, it is divided into tens of, thousands of, hundreds thousand of and millions of having different size of section, their copy Number can determine in the method in accordance with the invention.The normalization sequence of one chromosome segment is such a normalization section sequence Row, it can come from any one single section in chromosome 1-22, X and Y, or it can come from chromosome 1- 22nd, any one one group of section in X and Y.
Normalization sequence for a section interested is such a sequence, and the sequence has across multiple chromosomes And across the variability of multiple samples, variability of the variability closest to segment interested.It is dye in the normalization sequence During any one or more one group section in colour solid 1-22, X and Y, the determination of sequence can be normalized as described, For determining the normalization sequence of chromosome interested.By using for (being known to be interested in one group of qualified samples Section diploid sample) each sample in section interested as normalization sequence two or more One of section and all possible combination carry out calculation of sector dosage, the normalization section of one or one group section can be identified Sequence, and this normalization sequence is confirmed as there is provided the normalization sequence of a section dosage, this section dosage There is minimum variability for this section interested across whole qualified samples, to normalizing chromosome sequence as more than Explanation.
For example, to section interested, it is 1Mb (megabasse), 3,000,000 areas of residue in about 3Gb human genomes Section (subtracting 1mg sections interested) individually or can be mutually combined uses, with sense of the calculating in the sample of qualified clusters The section dosage of the section of interest, so that it is determined which or which group section will act as the normalization of qualified and test sample Sector sequence.Section interested can change to tens million of individual bases from about 1000 bases.Normalizing sector sequence can be with By being formed with sequence size identical one or more section interested.In other embodiments, sector sequence is normalized Can be by being formed different from sequence interested, and/or section different from each other.For example, for 100,000 bases longs The normalization sequence of sequence can be 20,000 base length, and can include for example in 7,000+8,000+5,000 The combination of the sequence of the different length of base.As the application elsewhere to normalization chromosome sequence illustrated by, lead to Cross independently and by normalize section may all be applied in combination it is each it is possible normalization chromosome segment systematically in terms of Calculate all possible chromosome and/or section dosage, it may be determined that normalization sector sequence (such as solving for the application elsewhere Release).To the whole sections and/or chromosome in genome, it may be determined that single or groups of section.
The normalization sequence of dosage for calculating coloured differently body section interested can be identical, or it can To be the different normalization sequences for different chromosome segments interested.For example, for chromosome segment A interested Normalization sequence, such as a normalization section (one or one group) can be identical, or it can be differently configured from and is directed to Chromosome segment B interested normalization sequence, such as a normalization section (one or one group).
Normalize normalization sequence of the chromosome sequence as chromosome segment
In another embodiment, normalization chromosome can be used to determine that this is returned for the copy number variation of chromosome segment One change chromosome can be monosome as described above or a group chromosome.It can pass through system to normalize chromosome sequence Ground determines which or which group chromosome makes the variability of chromosome dosage in one group of qualified samples minimum, to be directed to one group of qualified sample The normalization chromosome of chromosome interested identification or chromosome group in product.For example, it is determination existence or non-existence The excalation of chromosome 7, it is qualified first at one group for the normalization chromosome of analysis part missing or chromosome group The chromosome for the normalization sequence for being identified as making the chromosome dosage of whole chromosome 7 minimum in sample or chromosome group.Such as , can be by using each possible normalizing described in the normalization chromosome sequence for being directed to chromosome interested elsewhere herein Change chromosome individually and normalization chromosome calculates all possible chromosome dosage with being possible to combined system, to determine The normalization chromosome sequence of chromosome segment (as explained elsewhere herein).All chromosomes in genome can be directed to Section determines monosome or chromosome group.Illustrate to determine chromosome dyad missing and part be present using normalization chromosome The example of chromosome replication is provided as example 17 and 18.
In certain embodiments, by chromosome interested is separated into section or the data box of variable-length first come Determine the CNV of chromosome segment.Data box length can be at least about 1kbp, at least about 10kbp, at least about 100kbp, at least about 1mbp, at least about 10mbp or at least about 100mbp.Data box length is smaller, obtains to position area in chromosome interested The CNV of section resolution ratio is higher.
It is determined that can be by by dyeing interested in test sample presence or absence of the CNV of chromosome segment interested The dosage of the data box of body each is with being directed to the respective counts that data box of equal length each determines in one group of qualified samples It is compared to realize according to the average of case dosage.The normalized binary value of each data box can be such as above in relation to normalization Section value described in be calculated as normalized binary value (NBV), the normalized binary value is by the data in test sample Case dosage is associated with the average of corresponding data case dosage in one group of qualified samples.The NBV is calculated as:
WhereinWithIt is the estimation average and standard deviation for j-th of data box dosage in one group of qualified samples respectively, And xijIt is j-th of the data box dosage observed to test sample i.
Determine the aneuploidy in test sample
It is interested for one in the test sample based on the one or more normalization sequences identified in qualified samples Sequence determine a sequence dosage, the sample includes mixtures of nucleic acids, these nucleic acid are derived from emerging in one or more sense Different genome in the sequence of interest.
In step 115, one is obtained from a subject of suspection or the known clinically relevant CNV for carrying sequence interested Individual test sample.This test sample can be a kind of biological fluid (such as blood plasma) or as described below any suitable Sample.As described, the Noninvasive program such as simple blood drawing can be used to obtain for sample.In some embodiments, test Sample contains the mixture of nucleic acid molecules (such as cfDNA molecules).In some embodiments, the test sample is containing fetus And parent cfDNA molecules mixture a Maternal plasma sample.
In step 125, as to the situation illustrated by qualified samples, core is tested at least a portion in the test sample Acid is sequenced, to produce millions of sequence reads (such as 36bp readings).Such as in the step 120, to the test sample In nucleic acid be sequenced caused by reading be uniquely mapped in a reference gene group or with a reference gene group Compare to produce label.As described in the step 120, obtained at least about from the reading for uniquely mapping reference gene group 3x106 qualified sequence labels, at least about 5x106 a qualified sequence label, at least about 8x106 qualified sequence labels, at least About 10x106 qualified sequence labels, at least about 15x106 a qualified sequence label, at least about 20x106 qualified sequence labels, At least about 30x106 qualified sequence labels, at least about 40x106 qualified sequence labels or at least about 50x106 qualified sequences Column label, these qualified sequence labels include the reading between 20 and 40bp.In certain embodiments, produced by sequencing device Raw reading provides in electronic format.Complete to compare using computing device as discussed below.By individual readings with often greatly (counting Million base-pairs) reference gene group be compared, to identify reading site corresponding with reference gene group uniqueness.Some In embodiment, alignment programs allow mispairing limited between reading and reference gene group.In some cases, in a reading Allow 1,2 or 3 base-pair base-pair mismatch corresponding to reference gene group, but still produce mapping.
In step 135, using computing device as described below, the nucleic acid to test sample is sequenced and obtained All or most of label counting to determine cycle tests label densities.In certain embodiments, by each reading It is compared, and passes through with a specific region (being in most cases a chromosome or section) of reference gene group Site information is attached on reading, reading is transformed into label.When the process spread, computing device can be kept to being mapped to The number of the label/reading in each region (being in most cases chromosome or section) of reference gene group is rolled Count.Store the counting of each chromosome interested or section and each corresponding normalization chromosome or section person.
In certain embodiments, reference gene group has one or more regions being excluded, this or these arranged The region removed is a part for real biological genome, but is not included in reference gene group.To what may be excluded with these The reading that region is compared does not count.Region, X and the Y chromosome of the example in the region being excluded including long repetitive sequence it Between zone similarity etc..
In certain embodiments, this method is determined when multiple readings and the same site in reference gene group or sequence Whether label counting is exceeded once when being compared.There may be two labels have identical sequence therefore with reference sequences When identical site is compared.Identical sequencing sample can will be derived from some cases to count the method for label Same label repel count it is outer.If the label of disproportionate number is identical in given sample, then shows to deposit in program In huge deviation or other defect.Therefore, according to some embodiments, counting method not to from given sample with from the sample The former label identical label counted of product is counted.
When ignoring identical label from simple sample, settable different index is used to select.In some embodiments In, it must be unique to define the counting label of percentage.If the label more than the threshold value is not unique, then ignores this A little labels.For example, if it is unique to define percentage requirement at least 50%, then until the hundred of the unique label of sample Ratio is divided just to count identical label more than 50%.In other embodiments, the chain-reacting amount of unique label is at least about 60%.In other embodiments, the critical percentage of unique label is at least about 75%, or at least about 90%, or at least about 95%, or at least about 98%, or at least about 99%.For chromosome 21, threshold value can be located under 90%.If 30M labels with Chromosome 21 is compared, then at least 27M label must be unique.It is not unique and that if 3M, which counts label, 30,000,000 labels are not unique, then including it is not counted.
Appropriate statistical analysis can be used, select to determine when not count the specific threshold value of other identical label or Other indexs.A factor for influenceing the threshold value or other standards is that the genome that sample can be compared relative to label is sequenced Size amount.Other factors include the size of reading and similar Consideration.
In one embodiment, the sequence label number being mapped in a sequence interested is normalized to them It is mapped in the known length of a sequence interested above, to provide a cycle tests label densities ratio.Such as to this Described in a little qualified samples, it is not necessarily to normalize in the known length of a sequence interested, and this can be by Including reducing the digit in a number for a step so as to being simplified for human interpretation.With test sample The middle cycle tests label all mapped is all counted, in these test samples for sequence interested (such as clinically Related sequence) sequence label density be determined, what is be equally determined is the sequence label density for additional sequences, these Additional sequences correspond at least one normalization sequence identified in these qualified samples.
In step 150, based on the identification of at least one normalization sequence in these qualified samples, in test sample A sequence interested determine relevant cycle tests dosage.In different embodiments, cycle tests dosage passes through Sequence interested and the corresponding sequence label density for normalizing sequence are determined in a manner of calculating as described in this for operation.It is negative The computing device electronics for blaming the task accesses the sequence interested relevance associated there normalized between sequence, and it can It is stored in database, table, chart or is included in as code in programmed instruction.
Such as in the illustrated elsewhere of the application, at least one normalization sequence can be a simple sequence or one group Sequence.The sequence dosage for a sequence interested is true to sequence interested in the test sample in the test sample The ratio of the sequence label density of fixed sequence label density and at least one normalization sequence determined in the test sample, Normalization sequence wherein in the test sample corresponds to be identified in these qualified samples for particular sequence interested Normalization sequence.For example, if the normalization sequence identified for the chromosome 21 in these qualified samples is not confirmed as It is a chromosome (such as chromosome 14), then just true for the cycle tests dosage of chromosome 21 (sequence interested) Be set to the ratio of the sequence label density and the sequence label density for chromosome 14 for chromosome 21, be each Determined in test sample.Similarly, it is determined that for chromosome 13,18, X, Y and with a variety of chromosome aneuploidy phases The chromosome dosage of other chromosomes of association.Normalization sequence for chromosome interested can be one or one group of dye Colour solid, or one or a group chromosome section.As described above, a sequence interested can be a part for chromosome, example Such as a chromosome segment.Therefore, the dosage for a chromosome segment can be determined that in the test sample The sequence label density that determines of this section with it is close for the sequence label for normalizing chromosome segment in the test sample The ratio of degree, wherein the normalization section in the test sample corresponds in these qualified samples for interested specific The normalization section (single or one group of section) of section identification.Chromosome segment can be scope in size from kilobase (kb) To megabasse (Mb).(e.g., from about 1kb to 10kb, or about 10kb to 100kb, or about 100kb to 1Mb).<}0{>
In step 155, the qualified sequence dosage determined to multiple qualified samples and to being known to be sequence interested Aneuploid sample determine sequence dosage establish standard deviation derived from go out multiple threshold values.Pay attention to the operation typically with The analysis asynchronous execution of patient's test sample.It can be with for example selecting normalization sequence to perform simultaneously from qualified samples.Accurate point Class depend on for it is different classes of (i.e.:Aneuploidy type) probability distribution between difference.In some instances, from for Multiple threshold values are selected in the experience distribution of the aneuploidy (such as trisomy 21) of each type.As described in instances, use Possible threshold value is established in carrying out classification to trisomy 13, trisomy 18, trisomy 21 and monosomy X aneuploidy, they Illustrate for determining the method for chromosome aneuploidy by the way that the cfDNA for extracting from a maternal sample is sequenced Purposes, this maternal sample include fetus and parent nucleic acid mixture.It is confirmed as being used to identify for one kind The non-multiple of chromosome and this threshold value of impacted sample is used to identify for a kind of different non-multiples from being confirmed as Property and the threshold value of impacted sample can be same or different.As shown in these examples, for each interested The threshold value of chromosome is that the variability from across the dosage of multiple samples and the chromosome interested of multiple sequencing rounds is come really Fixed.It is smaller for the changeability of the chromosome dosage of any any chromosome interested, for across all uninfluenced samples It is scattered narrower in the dosage of the chromosome interested of product, and these samples are used to setting and are used to determine different non-multiples The threshold value of property.
The technological process associated with carrying out classification to patient's test sample is returned to, it is emerging by the way that sense will be directed in step 160 The cycle tests dosage of the sequence of interest is compared with least one threshold value established from these qualified samples dosage, in the test The copy number variation of sequence interested is determined in sample.The operation can by using with measure sequence label density and/or The identical calculations device of calculation of sector dosage performs.
In step 165, by for the dosage of cycle tests calculating interested compared with the dosage for being set as threshold value, And the selection of these threshold values is the reliability thresholds defined according to a user, with this by the sample be categorized as " normal ", " impacted " or " no to judge (no call) ".These " no to judge " samples are can not to make the certainty of reliability to it The sample of diagnosis.The impacted sample of each type (such as trisomy 21,21 partial trisomies, X monosomy) all has it certainly Oneself threshold value, one be used to judging normal (uninfluenced) sample and another be used to judge impacted sample (although one Two threshold values overlap in the case of a little).As described elsewhere herein, in some cases, if test sample amplifying nucleic acid Fetus fraction is sufficiently high, then can be transformed into judgement (impacted or normal) without judgement.The classification of cycle tests can be by using In the computing device report of other operations of the technological process.In some cases, classification is reported in electronic format, and can be shown People for show, send e-mails, sending short messages to correlation etc..
Some embodiments provide a method that this method, which is used to provide, includes fetus and parent nucleic acid at one The pre-natal diagnosis of fetus aneuploidy in the biological sample of molecule.This diagnosis is made based on following steps:Obtain To in fetus and parent the nucleic acid molecules mixture derived from a Biological test sample (such as Maternal plasma sample) The sequence information that is sequenced of at least a portion;Calculated from the sequencing data for one or more dyeing interested One normalization chromosome dosage of body, and/or a normalization section dosage for one or more sections interested; And determine in the accordingly test sample for the chromosome dosage of this chromosome interested and/or for this A system between the section dosage of section interested and the threshold value established in multiple qualified (normal) samples Meter learns upper significant difference, and provides pre-natal diagnosis based on the statistical discrepancy.As described in the step 165 in this method, do Go out a normal or impacted diagnosis.In the case where normal or impacted diagnosis can not be made with confidence, there is provided one Individual " no to judge ".
Sample and sample processing
Sample
For determine the CNV sample such as chromosomal aneuploidy, part aneuploidy may include to be derived from any cell, The sample for copying number variation that will determine one or more sequences interested of tissue or organ.Wish that these samples include to deposit In the nucleic acid in cell and/or " acellular " nucleic acid (such as cfDNA).
In certain embodiments, it is advantageous that obtain acellular nucleic acid, such as Cell-free DNA (cfDNA).Including without thin Acellular nucleic acid including born of the same parents DNA can by different method as known in the art from including but not limited to blood plasma, serum with And obtained in the biological sample of urine (see, for example, model (Fan) et al., NAS's proceeding (Proc Natl Acad Sci)105:16266-16271[2008];It is small go out (Koide) et al., pre-natal diagnosis (Prenatal Diagnosis) 25:604- 607[2005];Old (Chen) et al., Natural medicine (Nature Med.) 2:1033-1035[1996];Lu (Lo) et al., willow leaf Knife (Lancet) 350:485-487[1997];Baud pricks figure (Botezatu) et al., clinical chemistry (Clin Chem.) 46: 1078-1084,2000;With revive (Su) et al., molecular diagnostics magazine (J Mol.Diagn.) 6:101-107[2004]).To incite somebody to action Cell-free DNA separates with cell in sample, and different methods can be used, including but not limited to classification separation, centrifugation (such as density Gradient centrifugation), DNA specificity precipitation or high-flux cell sorting and/or other separation methods.It is available for artificial and automatic Separate cfDNA commercially available kit (Roche Diagnistics of state of Indiana Indianapolis city (Roche Diagnostics, Indianapolis, IN), California Valencia city triumphant outstanding (Qiagen, Valencia, CA), Delaware State Di Lun cities Mai Kairuinajieer (Macherey-Nagel, Duren, DE)).Biological sample comprising cfDNA has been used Examined in the sequencing by can detect chromosomal aneuploidy and/or different polymorphisms, used in determination presence or absence of example In such as inspection of trisomy 21 chromosome abnormality.
In different embodiments, the cfDNA that is present in sample (such as can prepare sequencing library before use Before) it is specific enrichment or nonspecific enrichment.The nonspecific enrichment of sample DNA refers to the full-length genome of the genomic DNA fragment of sample Amplification, it can be used for before cfDNA sequencing libraries are prepared the content for improving sample DNA.Nonspecific enrichment can be including one The selective enrichment of one of two genomes present in the sample of individual above genome.For example, nonspecific enrichment can be right Fetal genome has selectivity in maternal sample, its can be realized by known method with increase in sample foetal DNA relative to Mother body D NA ratio.As an alternative, nonspecific enrichment can be the non-selective expansion of two genomes present in sample Increase.For example, nonspecific amplification can be the tire in the sample of the mixture including the DNA from fetus and maternal gene group Youngster and mother body D NA amplification.The method of whole genome amplification is known in the art.Degenerate oligonucleotide primed PCR (DOP), Primer extend round pcr (PEP) and multiple displacement amplification (MDA) are the examples of whole genome amplification method.In some implementations In scheme, include the sample not genome present in enriched Mixture of the mixture of the cfDNA from different genes group cfDNA.In other embodiments, the not specific enrichment presence of sample of the mixture of the cfDNA from different genes group is included Any one genome in sample.
The sample including nucleic acid that method described here is applied typically comprises biological sample (" test sample "), example It is as described above.In certain embodiments, purify or separate by the either method in a large amount of well-known methods Prepare the nucleic acid screened to one or more CNV.
Therefore, in certain embodiments, sample includes or consisting of polynucleotides by purifying or separation, or can Including the sample such as tissue sample, biological fluid sample, cell sample.Suitable biological fluid sample includes but unlimited In blood, blood plasma, serum, sweat, tears, phlegm, urine, phlegm, ear effluent, lymph, saliva, Cerebrospinal fluid, irrigating solution (ravages), bone marrow floater liquid, vaginal fluid, transcervical irrigating solution, brain liquid, ascites, milk, respiratory tract, intestines and reproduction The urinary tract secretion, amniotic fluid, milk and leucocyte penetrate sample.In certain embodiments, sample is by non-invasive mistake Program easily obtainable sample, for example, blood, blood plasma, serum, sweat, tears, phlegm, urine, phlegm, ear effluent, saliva or Excrement.In certain embodiments, sample is the blood plasma and/or sera components of periphery blood sample or periphery blood sample. In other embodiments, this biological sample is cotton swab or smear, biopsy sample or cell culture.In another reality Apply in scheme, this sample is the mixture of two or more biological samples, such as biological sample can be including two kinds Or more kind biological fluid sample, tissue sample and cell culture samples.As used in this, term " blood ", " blood Slurry " and " serum " clearly cover their classification part or the part processed.Similarly, when a sample is taken from a kind of living group When knitting inspection, cotton swab, smear etc., it is somebody's turn to do " sample " and clearly covers the processing derived from this biopsy, cotton swab, smear etc. Separation unit or part.
In certain embodiments, sample can derive from multiple sources, include but is not limited to:Sample from Different Individual Product, the sample from identical or different individual different stages of development, from different diseased individuals (such as with cancer or Suspect with genetic block individual), the sample of normal individual, individual disease different phase obtain sample, obtain The different individual samples treated from experience to disease, the individual sample from experience varying environment factor, to one The susceptible individual sample of the kind state of an illness, from exposed to a kind of individual of infectious disease factor (such as HIV) etc..
At one schematically but in non-limiting embodiment, this sample is derived from pregnant female (such as pregnant woman) Maternal sample.In this case, the sample can be analyzed using method described herein, potential in fetus to provide The pre-natal diagnosis of chromosome abnormality.This maternal sample can be tissue sample, biological fluid sample or cell sample.It is raw Thing fluid is included (as non-limiting examples):Blood, blood plasma, serum, sweat, tears, phlegm, urine, phlegm, ear effluent, leaching Bar, saliva, cerebrospinal fluid, irrigating solution, bone marrow floater liquid, vaginal discharge, the irrigating solution through uterine neck, brain liquid, ascites, milk, exhale The secretion of suction, intestines and genitourinary tract, and leukapheresis sample.
At another schematically but in non-limiting embodiment, maternal sample is two or more biological samples Mixture, for example, the biological sample can include two or more biological fluid samples, tissue sample and cell Culture sample.In some embodiments, this sample be as non-invasive process easily obtained by sample, for example, blood Liquid, blood plasma, serum, sweat, tears, phlegm, urine, milk, phlegm, ear effluent, saliva and excrement.In some embodiments, this Kind biological sample is peripheral blood sample, and/or its blood plasma or sera components.In other embodiments, this biology imitates Product are the samples of cotton swab or smear, biopsy sample or cell culture.As disclosed above, term " blood ", " blood plasma " " serum " clearly cover they separation unit or processing part.Similarly, when a sample is derived from biopsy, cotton When label, smear etc., this " sample " clearly covers separation unit or the portion of the processing derived from biopsy, cotton swab, smear etc. Point.
In certain embodiments, sample can also be the tissue derived from vitro culture, cell or other contain polynucleotides Source.The sample of these cultures can be derived from multiple sources, include but is not limited to:Maintain different culture media and condition (example Such as pH value, pressure or temperature) under culture (such as tissue or cell), maintain the culture (example of the period of different length Such as tissue or cell), with the culture of the different factors or reagent (such as drug candidate, or conditioning agent) processing (such as tissue or Cell), or the culture of different types of tissue and/or cell.
Method from biological origin seperated nuclear acid is widely known and the property depending on source is by difference.Ability The those of ordinary skill in domain easily can be isolated as one kind required for method described herein or more from a source Kind nucleic acid.In some cases, can be favourable by the nucleic acid molecule fragmentization in nucleic acid samples.Fragmentation can be random , or it can be situation that is special, such as being reached using digestion with restriction enzyme.Side for random fragmentation Method is known in this area, and including for example restricted dnase digestion, alkali process and physical shear.In a reality Apply in scheme, sample nucleic is obtained in the form of cfDNA, and it does not undergo fragmentation.
In other illustrative embodiments, sample nucleic is obtained in the form of genomic DNA, and it is melted into about 300 by fragment Or more, about 400 or more or about 500 or more base-pair fragment, and NGS methods are readily applicable to thereon.
It is prepared by sequencing library
In one embodiment, method described here can utilize sequencing technologies of future generation (NGS), and these technologies allow (i.e. single channel is sequenced) or the remittance as the genome molecules including indexing is sequenced individually in multiple samples in the form of genome molecules Collect sample and (such as multiple sequencing) is sequenced in single sequencing batch.These methods can produce up to several hundred million readings of DNA sequence dna Number.In different embodiments, the sequence of genomic nucleic acids and/or the genomic nucleic acids indexed can be used for example retouches herein The sequencing technologies of future generation (NGS) stated determine.In different embodiments, it can be used as described in this at one or more Device is managed to analyze a large amount of sequence datas obtained using NGS.
In different embodiments, the use of these sequencing technologies is not related to the preparation of sequencing library.
However, in certain embodiments, the sequence measurement covered herein is related to the preparation of sequencing library.Show at one In meaning property method, the preparation of sequencing library includes producing a series of DNA for being ready for sequencing of random adapted sub- modifications Fragment (such as polynucleotides).The sequencing library of polynucleotides can from including DNA or cDNA (such as the work in reverse transcriptase Complementary as caused by RNA templates or copy DNA DNA or cDNA under) coordinate, DNA the or RNA systems including analog It is standby.Polynucleotides can originate in bifilar form (such as dsDNA (such as genomic DNA fragment), cDNA, pcr amplification product etc. Deng), or in certain embodiments, polynucleotides can originate in single-stranded form (such as ssDNA, RNA etc.) and change Into dsDNA forms.For example, in certain embodiments, sub-thread mRNA molecules can be copied into suitable for preparing sequencing library Bifilar cDNA.The precise sequence of main polynucleotide molecule is generally unimportant for method prepared by library, and It is probably known or unknown.In one embodiment, polynucleotide molecule is DNA molecular.More specifically, in some realities To apply in scheme, polynucleotide molecule represents the whole genetic complement of organism or the whole genetic complement of substantial organism, and And be typically comprise intron sequences and exon sequence (coded sequence) and non-coding regulatory sequence (such as promoter and Strengthen subsequence) genomic DNA molecule (such as cell DNA, Cell-free DNA (cfDNA) etc.).In some embodiments In, main polynucleotide molecule includes human genome DNA's molecule, such as is present in the periphery blood of pregnant subject CfDNA molecules.
Promote the sequencing text of some NGS microarray datasets by using the polynucleotides of the piece size including particular range The preparation in storehouse.The preparation in these libraries is typically comprised big polynucleotides (such as cell genomic dna) fragmentation to obtain Polynucleotides needed for obtaining in size range.
Fragmentation can be realized by any one of a variety of methods known to persons of ordinary skill in the art.Citing comes Say, fragmentation can be realized by including but is not limited to the mechanical means of spraying, sonication and hydraulic shear.It is however, mechanical Fragmentation typically can be such that DNA backbone is cracked on C-O, P-O and C-C key, so as to produce have disconnect C-O, P-O and The blunt end of C-C keys is with the multiphase mixture of 3'- and 5'- jags (see, for example, A Nairui (Alnemri) and Li Wake (Liwack), journal of biological chemistry (J Biol.Chem) 265:17323-17333[1990];Richard (Richards) and cloth Wa Ye (Boyer), molecular biology periodical (J Mol Biol) 11:327-240 [1965]), these ends may need to repair, Because it may lack to preparing for for the subsequent enzyme reaction (such as connection of sequencing aptamer) required for the DNA of sequencing Necessary 5'- phosphate.
By contrast, cfDNA less than about the pieces of 300 base-pairs typically to exist, therefore for using cfDNA Sample is produced for sequencing library, does not need fragmentation typically.
Typically, no matter polynucleotides are firmly to be broken into fragment (such as being broken into fragment in vitro), or naturally with piece Section form is present, and it will be transformed into the blunt end DNA with 5'- phosphate and 3'- hydroxyls.Such as using for example herein its The standard scheme such as scheme of Yi Luna platforms sequencing described in his place instructs user to carry out end reparation to sample DNA, with Purifying carries out the product of end reparation and the production of dA tailings is purified before aptamer Connection Step prepared by library before dA tailings Thing.
The different embodiments of Sequence Library preparation method described here need not perform standard scheme and typically require that To obtain can by NGS be sequenced the DNA product through modification one or more steps.The following describe simple method (ABB Method), one-step method and two-step method.Continuous dA tailings connect referred to here as two-step process with aptamer.Continuous dA tailings, fit Gamete connects and amplification is referred to here as one-step method.In different embodiments, ABB methods and two-step method can in the solution or Performed on the surface of solids.In certain embodiments, one-step method performs on a solid surface.
The standard method such as Yi Luna is illustrated in Fig. 2 and prepares DNA molecular with being used for according to embodiment of the present invention Simple method (the ABB being sequenced for use by NGS;Example 2), the comparison of two-step method and one-step method (example 3-6).
Simple preparation-ABB
In one embodiment, there is provided for preparing the simple method (ABB methods) of Sequence Library, it includes end and repaiied Multiple, dA tailings and the consecutive steps (ABB) of aptamer connection.In the reality without dA tailing steps for preparing sequencing library Scheme is applied (see, for example, use Roche 454 and SOLIDTMThe scheme that 3 platforms are sequenced) in, end is repaired and connected with aptamer The step of may not include aptamer connection before by end repair product purified the step of.
Repaired including end, the sequencing library preparation method for the consecutive steps that dA tailings and aptamer connect is referred to here as Simple method (ABB), and show generate quality unexpectedly improve sample analysis quickening simultaneously sequencing library (referring to Such as example 2).According to some embodiments of this method, ABB methods can perform in the solution, as in this illustration.ABB methods are also It can on a solid surface perform, be by first in the solution to DNA progress end reparations and dA tailings, and then such as exist This by DNA described by one-step or two-step preparation on a solid surface elsewhere for being attached to the surface of solids.Including inciting somebody to action Three enzymatic steps that aptamer is connected to including the step on the DNA with dA tails are all held in the case of no polyethylene glycol OK.For perform including aptamer be connected to DNA including the open scheme of coupled reaction instruct user polyethylene glycol being present In the case of perform connection.Applicant determines that aptamer is connected on the DNA with dA tails can be in the case of no polyethylene glycol Perform.
In another embodiment, sequencing library is prepared without carrying out end reparation to cfDNA before dA tailing steps. Applicant need not carry out end reparation it has been determined that the cfDNA of fragment need not be broken into, and according to embodiment of the present invention Preparing cfDNA sequencing libraries does not include end reparation step and purification step, so as to combine enzymatic reaction and further simplify DNA to be sequenced preparation.CfDNA exists with the form of mixtures of blunt end and 3'- and 5'- jags, and these ends are to make Cell genomic dna is cracked into the presence of the nuclease for the cfDNA fragments that end is 5'- phosphate and 3'- hydroxyls in vivo Produce.The elimination of end reparation step naturally naturally dashes forward with cfDNA molecules existing for blunt end molecular forms and with 5' selection Go out the cfDNA molecules at end, one or more deoxynucleotides for example, by for being attached to by these 5' jags as described below The polymerase activity of the enzymes such as 3'-OH upper (dA tailings) the circumscribed polymerases of Ke Lienuo (Klenow Exo-) is filled.CfDNA's The elimination that step is repaired in end does not select the cfDNA molecules with 3'- jags (3'-OH).Unexpectedly, these 3'-OH CfDNA molecules exclude not influenceing the expression of genome sequence in library outside sequencing library, and this shows the end of cfDNA molecules Repairing step can exclude (referring to example) from the preparation of sequencing library.In addition to cfDNA, available for preparing sequencing library Other kinds of polynucleotides of not repairing include the DNA molecular as caused by RNA molecule (such as mRNA, siRNA, sRNA) reverse transcription With the non-DNA plerosis molecule as DNA cloning synthesized from phosphorylated primers.It is anti-from RNA when using non-phosphorylated primers The DNA of transcription and/or from the DNA (i.e. DNA cloning sub) of DNA profiling amplification can also after being synthesized by polynucleotide kinase phosphorus Acidifying.
In another embodiment, the DNA not repaired be used to prepare sequencing library according to two-step method, wherein not including DNA end is repaired, and the DNA not repaired carries out dA tailings and the two consecutive steps are connected with aptamer (referring to Fig. 2).Two Footwork can perform in the solution or on the surface of solids.When performing in the solution, two-step method obtains using from biological sample DNA, the step of not including carrying out end reparation to the DNA, and for example for example, by Plutarch (Taq) polymerase or Ke Lienuo The activity of some type DNA polymerases such as circumscribed polymerase adds monodeoxyribonucleotide (such as desoxyadenossine (A)) to not repairing The 3'- ends of polynucleotides in multiple DNA sample.In subsequent consecutive steps, the product of dA tailings is connected to aptamer, these Product is compatible with `T` jags present on the 3' ends in each double helix region of commercially available aptamer.DA tailings are prevented Self connection of two blunt end polynucleotides is stopped, in favor of forming the sequence through connecting aptamer.Therefore, in some embodiment party In case, the cfDNA not repaired carries out the consecutive steps that dA tailings connect with aptamer, wherein the DNA with dA tails is never to repair DNA prepare and after dA tailings reactions without purification step.Bifilar aptamer may be connected to the DNA's with dA tails Both ends.There are mutually homotactic aptamer or one group of two different aptamer using one group.In different embodiments, The identical or different aptamer of one group or multiple different groups can also be used.Aptamer may include index sequence with can be to library DNA carries out multiple sequencing.Aptamer is connected on the DNA with dA tails and optionally performed in the case of no polyethylene glycol.
Two steps-prepare in the solution
In different embodiments, when two-step method performs in the solution, the product of aptamer coupled reaction can be purified With the aptamer that removes not connected aptamer, may be connected to each other.Purifying is it is also an option that be used for template caused by cluster Size range, optionally can first be expanded before, for example, PCR amplification.Connection product can be solidifying by including but is not limited to Any one of a variety of methods of the reversible fixation of gel electrophoresis, solid phase (SPRI) etc. purify.In some embodiments, process is pure The DNA of the connection aptamer of change is expanded before sequencing, such as PCR amplifications.Some microarray datasets require that library DNA is further Carry out another amplification.For example, should be used as according to Yi Luna technologies, the cluster amplification of Yi Luna Platform Requirements library DNAs The inalienable part of sequencing is performed.In other embodiments, the DNA denaturation of purified connection aptamer is made simultaneously And make single-stranded DNA attaching molecules to the flow cell of sequenator.Therefore, in certain embodiments, for never repairing in the solution Multiple DNA prepares sequencing library so that the NGS methods being sequenced include obtaining DNA molecular from sample;And to what is obtained from sample The DNA molecular do not repaired carries out the consecutive steps that dA tailings connect with aptamer.
As indicated above, in different embodiments, library prepare these methods be integrated into determine it is for example non- In the method for the copy number variation such as ortholoidy (CNV).Therefore, in an illustrative embodiments, there is provided one kind is used to determine Presence or absence of the method for one or more fetal chromosomal aneuploidies, this method includes:(a) obtaining includes fetus and mother The maternal sample of the mixture of body Cell-free DNA;(b) fetus is separated with parent cfDNA mixture from the sample; (c) sequencing library is prepared by fetus and parent cfDNA mixture;Wherein preparing the library includes carrying out dA tailings to cfDNA The consecutive steps connected with aptamer, and the library is wherein prepared including carrying out end reparation, and the preparation to cfDNA It is to perform in the solution;(d) large-scale parallel sequencing is carried out at least a portion in the sequencing library, sample is directed to obtain Fetus and parent cfDNA sequence information in product;(e) sequence information is temporarily, at least stored in a kind of computer-readable matchmaker In matter;(f) sequence information of the storage is used, is identified in a manner of calculating each in one or more chromosomes interested The sequence label of the normalization sequence of each in the number of individual sequence label and any one or more chromosomes interested Number;(g) number using the sequence label of each in this or these chromosome interested and this or these sense The number of the sequence label of the normalization sequence of each in the chromosome of interest, for this or these chromosome interested In each chromosome dosage is calculated in a manner of calculating;And by for this or these chromosome interested (h) In each chromosome dosage with entering for each respective threshold in this or these chromosome interested Row compares, and thus determines that wherein step (e)-(h) is to make presence or absence of fetal chromosomal aneuploidy in the sample Performed with one or more processors.The method illustration is in example 3 and 4.
It is prepared by two steps and one step-solid phase
In certain embodiments, sequencing library exists according to above in relation to the two-step method described by preparing library in the solution Prepared on the surface of solids.Preparing sequencing library on a solid surface according to two-step method includes obtaining the DNA such as cfDNA from sample Molecule, and the consecutive steps that dA tailings connect with aptamer are performed, wherein aptamer connection performs on a solid surface. The DNA for repairing or not repairing can be used.In certain embodiments, the product for connecting aptamer is separated from the surface of solids, is pure Change and expanded before sequencing.In other embodiments, by connect aptamer product from the surface of solids separation, purifying and Do not expanded before sequencing.In other other embodiments, separated by the product amplification for connecting aptamer, from the surface of solids, And purify.In certain embodiments, purified product is expanded.In other embodiments, not to passing through The product of purifying is expanded.Sequencing scheme may include to expand, such as cluster amplification.In different embodiments, separation The product of connection aptamer is purified before amplification and/or sequencing.
In certain embodiments, sequencing library is prepared on a solid surface according to one-step method.In different embodiment party In case, preparing sequencing library on a solid surface according to one-step method includes obtaining the DNA molecular such as cfDNA from sample, and The consecutive steps of dA tailings, aptamer connection and amplification are performed, wherein aptamer connection performs on a solid surface.Connection The product of aptamer without being separated before purification.
Fig. 3 depicts the two-step method and one-step method for preparing sequencing library on a solid surface.Can be used repair or not The DNA of reparation prepares sequencing library on a solid surface.In certain embodiments, using the DNA not repaired.Available for solid Prepared on body surface face the DNA not repaired of sequencing library example include but is not limited to cfDNA, using phosphorylated primers from The DNA of RNA reverse transcriptions, the DNA (i.e. phosphorylated cdna amplicon) expanded using phosphorylated primers from DNA profiling.It can be used for Prepare on a solid surface the DNA of the reparation of sequencing library example include but is not limited to cfDNA and formed blunt end and The genomic DNA into fragment of phosphorylation is (i.e. for example, by repairing caused by the RNA reverse transcriptions such as mRNA, sRNA, siRNA Phosphorylated cdna).In some illustrative embodiments, the cfDNA not repaired obtained from maternal sample be used to prepare sequencing Library.
Preparing sequencing library on a solid surface includes the Part I applying solid surface with two parts conjugate, passes through The Part II of two parts conjugate is attached on aptamer to modify the first aptamer and by two parts conjugate The binding interactions of first and second part fix aptamer on a solid surface.For example, make on a solid surface Standby sequencing library may include the end that polypeptide, polynucleotides or small molecule are attached to library aptamer, the polypeptide, multinuclear Thuja acid or small molecule can be formed with fixed polypeptide, polynucleotides or small molecule on a solid surface combines compound.It can use In the surface of solids of immobilized polypeptide, polynucleotides or small molecule include but is not limited to plastics, paper, film, filter paper, chip, pin or Slide, silica or polymer beads (such as polypropylene, polystyrene, makrolon), 2D or 3D molecular skeletons or for solid It is combined to any supporter of polypeptide or polynucleotides.
Between polypeptide-polypeptide, polypeptide-polynucleotides, polypeptide-small molecule and polynucleotides-polynucleotides conjugate Bonding can be covalently or non-covalently.Preferably, combined with reference to compound by non-covalent bond.For example, available for The conjugate that sequencing library is prepared on the surface of solids includes but is not limited to streptavidin-biotin conjugate, antibody-anti- Former conjugate and ligand-receptor conjugate.Combined available for the polypeptide-polynucleotides for preparing sequencing library on a solid surface The example of thing includes but is not limited to DNA- associated proteins-DNA conjugates.Available for preparing the more of sequencing library on a solid surface The example of nucleotides-polynucleotides conjugate includes but is not limited to oligodT-oligoA and oligodT-oligodA.Polypeptide- The example of small molecule and polynucleotides-small molecule binders includes streptavidin-biotin.
According to the embodiment (step and two steps) of surface of solids method as shown in Figure 3, with such as avidin chain The polypeptides such as rhzomorph are coated with the surface of solids of the container (such as polypropylene PCR pipe or 96 porose discs) for preparing sequencing library.The The end of one group of aptamer is modified by being attached the small molecule such as biotin molecule, and the aptamer quilt of biotinylation The streptavidin (1) being attached on the surface of solids.Then, the DNA for not repairing or repairing is connected to avidin chain bacterium On the biotinylation aptamer that element combines, so as to be fixed on the surface of solids (2).Second group of aptamer is connected to fixation DNA (3) on.
Two steps-prepared in solid phase
In one embodiment, two-step method is performed using the DNA not repaired such as cfDNA, in solid Sequencing library is prepared on surface.The DNA not repaired is by the way that the mononucleotide base such as dA is attached to such as cfDNA not DA tailings are carried out on the 3' ends of the DNA of reparation stock.Optionally, multiple nucleotide bases could attach on the DNA not repaired.Bag Include the mixture through the DNA with dA tails to be added into fixed aptamer on a solid surface, the DNA is connected on aptamer. It is continuous to carry out the step of dA tailings connect with aptamer to DNA, i.e., the purifying for not performing the product by dA tailings (is such as schemed For shown in two-step method in 2).As described above, aptamer can be with the complementary protrusion of the jag on the DNA molecular with not repairing End.Then, second group of aptamer is added into DNA- biotinylation aptamer compounds to provide the DNA texts of connection aptamer Storehouse.Optionally, library is prepared using the DNA of reparation.The DNA of reparation can into fragment and carry out 3' and 5' ends The genomic DNA of unorganized ferment reparation.In one embodiment, at the end as described by the simple method for performing in the solution In the consecutive steps that end is repaired, dA tailings and aptamer connect, end reparation is carried out to the DNA such as parent cfDNA, dA adds Tail and aptamer are connected on fixed aptamer on a solid surface.
Using two-step method some embodiments in, by connect aptamer DNA by chemically or physically means (such as Heat, ultraviolet etc.) from surface of solids separation (4a in Fig. 2), purifying (5 in Fig. 2), and optionally, starting sequencing procedure Before, it is expanded in the solution.In other embodiments, the DNA for connecting aptamer is not expanded.What is do not expanded In the case of, be connected to DNA aptamer could be structured to include with the flow cell of sequenator present on oligonucleotide hybridization sequence Arrange (Ku Zhawa (Kozarewa) et al., natural method (Nat Methods) 6:291-295 [2009]), and avoid meeting Introduce the amplification of the sequence for the flow cell of library DNA and sequenator to be hybridized.As fitted for caused connection in the solution Described by the DNA of gamete, large-scale parallel sequencing (6 in Fig. 2) is carried out to connecting the DNA library of aptamer.In some realities Apply in scheme, sequencing is to use the large-scale parallel sequencing that the synthetic method by reversible dye-terminators is sequenced.In other implementations In scheme, sequencing is to carry out large-scale parallel sequencing using connection method sequencing.Sequencing technique may include solid-phase amplification, such as cluster Amplification, as described elsewhere herein.
Therefore, in different embodiments, the DNA for never repairing on a solid surface prepare sequencing library for NGS method may include to obtain DNA molecular from sample;And the DNA molecular to not repairing carries out dA tailings and connected with aptamer Consecutive steps, wherein aptamer connection performed in solid phase.In certain embodiments, aptamer may include to index sequence Row, to allow to carry out multiple sequencing to multiple samples in single reaction container (such as passage for flow cell).As above institute State, DNA molecular can be cfDNA molecules, and it can be the DNA molecular from rna transcription, and it can be the amplicon of DNA molecular Etc..
As indicated above, in different embodiments, it is for example non-that these library preparation methods are integrated into determination In the method for the copy number variation such as ortholoidy (CNV).Therefore, in certain embodiments, for never repairing on a solid surface The method that multiple cfDNA prepares sequencing library is integrated into for analyzing maternal sample to determine presence or absence of fetal chromosomal In the method for body aneuploidy.Therefore, in one embodiment, there is provided one kind is used to determine presence or absence of a kind of or more The method of kind fetal chromosomal aneuploidy, this method include:(a) obtaining includes fetus and the mixture of parent Cell-free DNA Maternal sample;(b) fetus is separated with parent cfDNA mixture from the sample;(c) by fetus and parent cfDNA Mixture prepare sequencing library;Wherein prepare the continuous step that the library includes cfDNA is carried out dA tailings and connected with aptamer Suddenly, wherein preparing the library does not include carrying out cfDNA end reparation, and it is to perform on a solid surface to prepare;(d) to this At least a portion in sequencing library carries out large-scale parallel sequencing, and fetus and parent cfDNA in sample are directed to obtain Sequence information;(e) temporarily, at least the sequence information is stored in a kind of computer-readable medium;(f) using the storage Sequence information, identified in a manner of calculating in one or more chromosomes interested the number of the sequence label of each and The number of the sequence label of the normalization sequence of each in any one or more chromosomes interested;(g) using one or Each in the number of the sequence label of each and this or these chromosome interested in multiple chromosomes interested Normalization sequence sequence label number, for each in this or these chromosome interested with the side of calculating Formula calculates chromosome dosage;And (h) by for each chromosome dosage in this or these chromosome interested with It is compared for the respective threshold of each in this or these chromosome interested, and it is thus true in the sample It is fixed presence or absence of fetal chromosomal aneuploidy, wherein step (e)-(h's) is performed using one or more processors. Sample can be biological fluid sample, such as blood plasma, serum, urine and saliva.In certain embodiments, sample is parent Blood sample or its blood plasma and sera components.The method illustration is in example 4.
One step-prepared in solid phase
In another embodiment, dA tailings are carried out to the DNA not repaired, but dA tailing products not entered before amplification Row purifying, so that the step of dA tailings, aptamer are connected and expanded is continuous or consistently performs.It is continuous before sequencing DA tailings, aptamer connection and amplification, then purifying is referred to here as a step process.One-step method can perform on a solid surface (see, for example, Fig. 3).First group of aptamer is attached to the surface of solids (1), the DNA by not repairing and with dA tails is connected to surface With reference to aptamer (2) on and by second group of aptamer be connected to surface combination DNA (3) on step can such as above in relation to Performed described in two-step method.However, in one-step method, the DNA that can be combined to the surface for connecting aptamer is expanded, while attached It is connected on the surface of solids (4b in Fig. 2).Then, by the gained library point of the DNA of caused connection aptamer on a solid surface From and purify (5 in Fig. 2), then in the solution it is caused connection aptamer DNA as described in progress it is parallel on a large scale Sequencing.In certain embodiments, sequencing is to use the extensive parallel survey that the synthetic method by reversible dye-terminators is sequenced Sequence.In other embodiments, sequencing is the large-scale parallel sequencing being sequenced using connection method.
Therefore, in certain embodiments, there is provided a kind of method for being used to prepare the sequencing library for NGS sequencings, the party The step of method includes the following by performing is carried out:DNA molecular is obtained from a sample;And carry out dA to DNA molecular to add The consecutive steps of tail, aptamer connection and amplification, wherein aptamer connection perform on a solid surface.Such as it is directed to two steps Described in method, in different embodiments, aptamer may include index sequence, to allow (such as to flow in single reaction container One passage in pond) it is interior to the multiple sequencing of multiple samples progress.
In certain embodiments, DNA can be repaired.DNA molecular can be cfDNA molecules, and it can be from RNA The DNA molecular of transcription, or DNA molecular can be the amplicons of DNA molecular.Aptamer connection performs as described above.It is excessive Not connected aptamer can be washed away from the DNA of fixed connection aptamer;Reagent needed for expanding adds fixed In the DNA for connecting aptamer, the DNA is subjected to more wheel amplifications, such as PCR amplifications, as known in the art.In other embodiment party In case, the DNA for connecting aptamer is not expanded.In the case where not expanding, chemistry can be passed through by connecting the DNA of aptamer Or physical means (such as hot, ultraviolet lamp etc.) remove from the surface of solids.In the case where not expanding, DNA adaptation is connected to Son may include with the flow cell of sequenator present on oligonucleotide hybridization sequence (Ku Zhawa (Kozarewa) et al., from Right method (Nat Methods) 6:291-295[2009]).
In different embodiments, sample can be biological fluid sample (such as blood, blood plasma, serum, urine, brain Marrow liquid, amniotic fluid, saliva etc.).In certain embodiments, it is used to analyze maternal sample to determine existence or non-existence in one kind The cfDNA that the method for fetal chromosomal aneuploidy includes being used to never repair on a solid surface prepares being somebody's turn to do for sequencing library Method is as a step.
Therefore, in one embodiment, there is provided one kind is used to determine presence or absence of one or more fetal chromosomals The method of body aneuploidy, this method include:(a) obtaining includes fetus and the maternal sample of the mixture of parent Cell-free DNA; (b) fetus is separated with parent cfDNA mixture from the sample;(c) prepared by fetus and parent cfDNA mixture Sequencing library;The consecutive steps that the library includes carrying out cfDNA dA tailings, aptamer is connected and expanded wherein are prepared, and And wherein prepare and perform on a solid surface;(d) extensive parallel survey is carried out at least a portion in the sequencing library Sequence, fetus and parent cfDNA sequence information in sample are directed to obtain;(e) temporarily, at least the sequence information is stored In a kind of computer-readable medium;(f) sequence information of the storage is used, one or more senses are identified in a manner of calculating The number of the sequence label of each in the chromosome of interest and each in any one or more chromosomes interested Normalization sequence sequence label number;(g) using the sequence of each in this or these chromosome interested The number of the sequence label of the normalization sequence of each in the number of label and this or these chromosome interested, pin Chromosome dosage is calculated in a manner of calculating to each in this or these chromosome interested;And by pin (h) To each chromosome dosage in this or these chromosome interested with for this or these chromosome interested In each respective threshold be compared, and thus in the sample determine presence or absence of fetal chromosomal it is non- Ortholoidy, wherein step (e)-(h) are performed using one or more processors.In certain embodiments, DNA is carried out Repair end.In other embodiments, preparing the library does not include carrying out end reparation to cfDNA.The method illustration is in reality In example 5 and 6.
The technique for being used to prepare sequencing library as described above is applied to sample analysis method, is including but not limited to used to determine Copy number variation (CNV) method, and in the sample comprising single-gene group and comprising by it is known or suspect one or Determined in the sample of the mixture of at least two different genomes of multiple sequences interested emerging presence or absence of any sense The method of the polymorphism of the sequence of interest,.
The amplification of the product for the connection aptamer that possible needs are prepared in solid phase or in the solution, will be with some NGS Flow cell present in platform or other surfaces carry out hybridizing the template point that required oligonucleotide sequence introduces connection aptamer In son.The content of amplified reaction be it is known to persons of ordinary skill in the art and including appropriate substrate (such as dNTPs), Buffer components needed for enzyme (such as DNA polymerases) and amplified reaction.Optionally, the polynucleotides of connection aptamer can be saved Amplification.Generally, amplified reaction needs at least two amplimers, such as primer tasteless nucleotide, these primers can it is identical or It is different and may include can during annealing steps polynucleotide molecule to be amplified (or if template regards sub-thread as, that Its complement) in be annealed into " the aptamer specific part " of primer binding sequence.
Once being formed, the library of the template of method preparation as described above can be used for some NGS platforms to need Solid-phase nucleic acid amplification.As used herein, term " solid-phase amplification " refer on solid support or with solid support phase Any nucleic acid amplification reaction associatedly carried out so that all or part of amplified production is fixed on solid when it is formed On supporter.In specific embodiments, the term covers solid phase PCR (Solid phase PCR) and its solid phase isothermal Amplification, these reactions are analogous to the reaction that standard liquid mutually expands, except one of forward and reverse amplimer or both It is fixed on solid support.Solid phase PCR also includes such as the following system:Emulsion, one of primer anchor to pearl Grain and another primer be in free solution;Colony forming in solid phase gel-type vehicle, one of primer anchor to surface And a primer is in free solution.
In different embodiments, after amplification, can by micro fluidic Capillary Electrophoresis come analyze sequencing library with Ensure that library is free of aptamer dimer or single-stranded DNA.The library of template polynucleotide molecule is particularly suitable for use in solid phase sequencing side In method.In addition to providing for the template of solid phase sequencing and Solid phase PCR, library template also provides the mould for whole genome amplification Plate.
For following the trail of the label nucleic acid with verification sample integrality
, can be by sample gene group nucleic acid (such as cfDNA) and for example before processing in different embodiments The sequencing of the mixture for the adjoint label nucleic acid having been introduced into sample carrys out the integrality and tracking sample of verification sample.
Label nucleic acid can combine with test sample (such as biological source sample) and be subjected to including for example with next Or the process of multiple steps:Biological source sample classification is separated, such as substantially acellular blood plasma portion is obtained from whole blood sample Point, from carry out be classified separation biological source sample (such as blood plasma) or do not carry out be classified separation biological source sample (example Such as tissue sample) under purification of nucleic acid and sequencing.In certain embodiments, sequencing includes preparing sequencing library.With source sample The sequence or combined sequence of the marker molecules of product combination are unique for source sample by selection.In some implementations In scheme, the unique tag thing molecule in sample all has identical sequence.In other embodiments, the unique tag in sample Thing molecule is multiple sequences, for example, two, three, four, five, six, seven, eight, nine, ten, 15,20 Individual or more not homotactic combination.
In one embodiment, the integrality of sample can be used, and there are mutually homotactic multiple label nucleic acid molecules to enter Row checking.As an alternative, the identity of sample can be used with least two, at least three, at least four, at least five, At least sixth, at least seven, at least eight, at least nine, at least ten, at least 11, at least 12, at least 13, at least 14 It is individual, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 25, at least 30, extremely Few 35, at least 40, at least 50 or more not homotactic multiple label nucleic acid molecules are verified.Verify multiple The integrality of biological sample (i.e. two or more biological samples) need in the two or more sample each use Label nucleic acid with the sequence for for each in multiple test samples to being marked being uniqueness is marked.Citing For, the available label nucleic acid marking with sequence A of first sample, and the available mark with sequence B of second sample Remember thing nucleic acid marking.As an alternative, the available multiple label labeled nucleic acid molecules all with sequence A of first sample, And second sample available sequences B and C mixing substance markers, wherein sequence A, B and C are that have not homotactic label Molecule.
Label nucleic acid can prepare any rank of the sample preparation occurred before (if to prepare library) and sequencing in library It is added into section in sample.In one embodiment, marker molecules can combine with undressed source sample.For example, Label nucleic acid may be provided on to collect in the collecting pipe of blood sample.As an alternative, label nucleic acid can be after blood drawing Add in blood sample.In one embodiment, label nucleic acid is added into collect in the container of biological fluid sample, example In the blood collection tube for being added into collect blood sample such as label nucleic acid.In another embodiment, label nucleic acid quilt Add in a part of biological fluid sample.For example, label nucleic acid is added into blood plasma and/or the serum portion of blood sample Divide in (such as Maternal plasma sample).In another embodiment again, marker molecules are added into purified sample (example As from the nucleic acid samples of biological sample purification) in.For example, label nucleic acid is added into purified parent and tire In youngster cfDNA sample.Equally, label nucleic acid can be added into biopsy sample before processing specimen.In some realities Apply in scheme, label nucleic acid can be with delivering marker molecules to the carrier combinations in the cell of biological sample.Cell delivering carries Body includes pH sensitive liposomes and cationic liposome.
In different embodiments, marker molecules have antigene strand sequence, and these sequences are biological origin samples The sequence being not present in the genome of product.In an exemplary embodiment, to verify the complete of mankind's biological source sample The marker molecules of whole property have the sequence being not present in human genome.In an alternative embodiment, label point Son has the sequence being not present in source sample and in any one or more known groups.For example, to identifier The marker molecules of the integrality of class biological source sample have the sequence being not present in human genome and in mouse gene group Row.Alternative solution allows the integrality for verifying the test sample for including two or more genomes.For example, from by cause of disease The integrality of the mankind's Cell-free DNA sample obtained in the subject of body (such as bacterium) invasion and attack, which can be used, to be had in human gene The marker molecules for the sequence being all not present in the genome of group and invasion and attack bacterium are verified.Many pathogen (such as bacterium, Virus, yeast, fungi, protozoan etc.) genome sequence, the public can be in WWW ncbi.nlm.nih.gov/ Obtained on genomes.In another embodiment, marker molecules are with the sequence being not present in any known group The nucleic acid of row.The sequence of marker molecules can be randomly generated by algorithm.
In different embodiments, marker molecules can be naturally occurring DNA (DNA), ribose core Acid or artificial nucleic acid analog (nucleic acid mimics), these artificial nucleic acid analogs include peptide nucleic acid (PMA), morpholino nucleic acid, (itself and naturally occurring DNA or RNA difference are molecular backbone for lock nucleic acid, glycol nucleic acid and threose nucleic acid Change) or DNA analog without phosphodiester backbone.DNA may come from naturally occurring genome or It can be produced in the lab by using enzyme or by solid-state chemical reaction method.Chemical method also can be used to produce and not find naturally DNA analog.Phosphodiester bond is replaced, but the DNA derivatives that obtain that deoxyribose retains include but is not limited to logical The DNA analog for the main chain that over cure dimethoxym ethane or formamide key are formed, it has proved that these analogies are excellent structural DNA moulds Intend thing.Other DNA analogs include morpholino derivative and include the false peptide main chain based on N- (2- aminoethyls) glycine (biophysics is with commenting (Ann Rev Biophys Biomol Struct) 24 biomolecular structure years for peptide nucleic acid (PNA): 167-183[1995]).PNA is very excellent DNA (or ribonucleic acid [RNA]) structural simulation thing, and PNA oligomer energy Enough and Watson-Crick (Watson-Crick) complementary DNA and RNA (or PNA) oligomer form very stable double-spiral structure, And it can also be invaded by spiral to be attached in the target in duplex DNA (molecular biotechnology (Mol Biotechnol)26:233-248[2004]).It can be used as the structure mould of another excellent DNA analog of marker molecules It is phosphorothioate DNA to intend thing/analog, and one of them unbridged oxygen is replaced by sulphur.This modification reduces arrives including 5' to 3' and 3' The exonucleases of 5'DNA POL 1, s1 nuclease and P1, ribalgilase, serum nuclease and snake venom phosphodiesterase exist The effect of interior endonuclease and exonuclease 2.
The length of marker molecules can be different from the length of sample nucleic or similar, i.e. the length of marker molecules can Similar to the length of sample gene group molecule, or it can be more than or less than the length of sample gene group molecule.Marker molecules Length be that the number of nucleotides by forming marker molecules or nucleotide analog base measures.Ability can be used Known separation method distinguishes marker molecules of the length different from sample gene group molecular length with source nucleic acid in domain.Lift For example, the difference in length of label and sample nucleic acid molecule can determine for example, by electrophoretic separation such as Capillary Electrophoresis.Chi Very little differentiation is advantageously possible for that the quality of label nucleic acid and sample nucleic is quantified and evaluated.Preferably, label nucleic acid It is shorter than genomic nucleic acids, and length is enough to exclude it to be mapped to sample gene group.For example, uniqueness is mapped to mankind's base Because of a group human sequence for 30 bases of needs.Therefore, in certain embodiments, in the sequencing bioassay of human sample Marker molecules should be at least 30bp length.
Marker molecules length selection mainly by using with verify the sequencing technologies of source sample integrality determine.May be used also To consider the length of sample gene group nucleic acid be sequenced.For example, some sequencing technologies are expanded using the clone of polynucleotides Increase, it can require to treat there is minimum length with the genomic polynucleotide of clonal fashion amplification.For example, using Yi Luna GAII sequential analysers, which carry out sequencing, to be included (being also known as cluster to expand by the bridge-type PCR for the polynucleotides that minimum length is 110bp Increase) in vitro clonal expansion is carried out, aptamer is connected on these polynucleotides, to provide with clonal fashion amplification at least 200bp and the nucleic acid less than 600bp and sequencing.In certain embodiments, the length of the marker molecules of aptamer is connected Degree between about 200bp and about 600bp, between about 250bp and 550bp, between about 300bp and 500bp or about 350 and 450 it Between.In other embodiments, the length for connecting the marker molecules of aptamer is about 200bp.For example, when to parent When fetus cfDNA is sequenced present in sample, the length of selectable marker molecule is analogous to fetus cfDNA molecules Length.Therefore, in one embodiment, used in including carrying out large-scale parallel sequencing to cfDNA in maternal sample with true The length of marker molecules in the fixed inspection presence or absence of fetal chromosomal aneuploidy can about 150bp, about 160bp, 170bp, about 180bp, about 190bp or about 200bp;Marker molecules are preferably about 170bp.Such as SOLiD sequencings, Other sequence measurements such as polonies sequencing (Polony Sequencing) and 454 sequencings are using emulsion-based PCR with the side of clone Formula DNA amplification molecule is for sequencing, and each technology all defines the minimum and maximum length of molecule to be amplified.In with gram The length of the marker molecules to be sequenced for the nucleic acid that grand mode expands can reach about 600bp.In certain embodiments, The length of marker molecules to be sequenced can be more than 600bp.
Single point that does not use molecular cloning to expand and the nucleic acid in the range of extremely wide template length can be sequenced Sub- sequencing technologies do not require molecule to be sequenced in most cases has any length-specific.However, per unit mass Sequence yield depends on the number of 3' terminal hydroxy groups, therefore it is than more having with long template to be used to be sequenced with relatively short template Effect.If since the nucleic acid for being longer than 1000nt, then these nucleic acid preferably generally are clipped into 100 to 200nt and are averaged Length, more sequence informations can be produced so as to the nucleic acid of slave phase homogenous quantities.Therefore, the length of marker molecules can be tens Base is in the range of many kilobases.Length for the marker molecules of single-molecule sequencing can reach about 25bp, reach about 50bp, Reach about 75bp, reach about 100bp, reach about 200bp, reach about 300bp, reach about 400bp, reach about 500bp, reach about 600bp, reach about 700bp, reach about 800bp, reach about 900bp, reach about 1000bp or more.
Select also to be determined by the length for the genomic nucleic acids being sequenced for the length of marker molecules.For example, CfDNA circulates as the genomic fragment of cell genomic dna in mankind's blood flow.The fetus found in pregnant woman blood plasma CfDNA molecules (old (Chan) et al., clinical chemistry (Clin Chem) 50 generally shorter than parent cfDNA molecules:8892 [2004]).The size classification separation of circulation foetal DNA has confirmed, circulates the average length of fetal DNA fragments<300bp, and Estimate mother body D NA between about 0.5Kb and 1Kb (Lee (Li) et al., clinical chemistry, 50:1002-1011[2004]).These hairs Now with determining that fetus cfDNA rarely exceeds 340bp model (Fan) et al. (model et al., clinical chemistry 56 using NGS:1279- 1286 [2010]) discovery it is consistent.It is made up of with based on DNA of the standard method of silica from urine separation two parts:From de- Fall the high-molecular-weight DNA of cell and low molecule amount (150-250 base-pairs) part (baud bundle figure etc. through kidney DNA (Tr-DNA) People, clinical chemistry 46:1078-1084,2000;With Soviet Union et al., molecular diagnostics magazine 6:101-107,2004).Newly-developed Be used for the technology of acellular nucleic acid separated from body fluid shown in the application through kidney nucleic acid separate, DNA present in urine with RNA fragments more (U.S. Patent Application Publication No.s 20080139801) shorter than 150 base-pairs.It is the base being sequenced in cfDNA Because in the embodiment of group nucleic acid, the marker molecules of selection can approximately reach cfDNA length.For example, in mononucleotide Molecular forms or in clonal fashion expand it is nucleic acid, for the label in parent cfDNA samples to be sequenced point The length of son can be between about 100bp and 600.In other embodiments, sample gene group nucleic acid is the fragment of bigger molecule. For example, the sample gene group nucleic acid being sequenced is into the cell DNA of fragment.Surveyed to the cell DNA into fragment In the embodiment of sequence, the length of marker molecules can reach the length of DNA fragmentation.In certain embodiments, label point The length of son is at least for sequence reads uniqueness being mapped to the minimum length required for appropriate reference gene group.In other embodiment party In case, the length of marker molecules is to exclude the minimum length that marker molecules are mapped to required for samples Reference genome.
In addition, marker molecules are not tested by nucleic acid sequencing available for checking and can be by addition to sequencings The sample of common biotechnology (real-time PCR) checking.
Sample controls (such as being sequenced and/or analyzing during positive control)
In different embodiments, for example, the above-described label sequence being introduced into sample may act as it is positive right According to checking sequencing and the accuracy and effect with post-processing and analysis.
It thus provides for provide DNA in sample is sequenced during positive control (IPC) composition and Method.In certain embodiments, there is provided for the sun that the cfDNA in the sample including genome mixture is sequenced Property control.IPC can be used for will be from different groups of samples (such as the sample being sequenced in different sequencing batches in different time) The baseline shift of the sequence information of middle acquisition is associated.So that it takes up a position, for example, IPC will can obtain for parent test sample Sequence information is associated with the sequence information obtained from one group of qualified samples being sequenced in different time.
Equally, in the case of fragment analysis, IPC can by the sequence information obtained from subject for specific fragment with The sequence (similar sequence) obtained from one group of qualified samples being sequenced in different time is associated.In some embodiments In, IPC can by the sequence information obtained from subject for specific cancer related gene seat with from one group of qualified samples (such as From known amplification/missing etc.) obtain sequence information be associated.
In addition, IPC can be used as following the trail of the label of sample in sequencing procedure.IPC can also provide dyeing interested The qualitative positive sequence dose value (example of one or more aneuploidy (such as trisomy 21,13 trisomys, 18 trisomys) of body Such as NCV) more appropriate understand to provide and ensure the reliability and accuracy of data.In certain embodiments, can establish Include the IPC of the nucleic acid from masculinity and femininity genome, to provide chromosome x and Y dosage in maternal sample, so that it is determined that Whether fetus is male.
During the type that compares and number depend on the type or property of required test.For example, for needs pair DNA from the sample including genome mixture is sequenced to determine whether there is the test of chromosomal aneuploidy, mistake Control may include the DNA obtained from the known test sample including identical chromosomal aneuploidy in journey.In some embodiments In, IPC includes the DNA from the known sample including chromosomal aneuploidy interested.For example, determining in mother IPC in body sample presence or absence of the test of fetal trisomic (such as trisomy 21) is included from trisomy 21 The DNA that body obtains.In certain embodiments, IPC includes from two or more there is the individual of different aneuploidy to obtain DNA mixture.For example, for determining presence or absence of 13 trisomys, 18 trisomys, trisomy 21 and X The test of monosomy, IPC include the combination of the DNA sample obtained from the pregnant woman of the respective fetus for carrying one of test trisomy. In addition to complete chromosome aneuploidy, it can be established as determining providing the positive presence or absence of the test of part aneuploidy The IPCs of control.
The cellular genome obtained from two subjects can be used in the IPC for serving as the control for detecting single aneuploidy DNA mixture is established, and one of subject is the donor of aneuploid genome.For example, as to true The IPC for determining the control of the test of fetal trisomic (such as trisomy 21) can be by by from carrying the trisomy chromosome The genomic DNA of the genomic DNA of sex subject and the known female subjects for not carrying the trisomy chromosome It is combined to establish.Genomic DNA can extract from the cell of two subjects, and be sheared to provide about 100bp To the fragment between 400bp, between about 150bp to 350bp or between about 200bp to 300bp to simulate following in maternal sample Ring cfDNA fragments.DNA into fragment of the selection from the subject for carrying aneuploidy (trisomy 21) ratio is to simulate The circulation fetus cfDNA found in maternal sample ratio, and provide include comprising about 5%, about 10%, about 15%, about 20%th, the IPC of about 25%, about 30% DNA from the subject for carrying the aneuploidy DNA mixtures into fragment. The IPC may include the DNA from the different subjects for each carrying different aneuploidy.For example, IPC may include about 80% Not ill women DNA, and residue 20% can be from each carrying a kind of trisomy chromosome 21, trisomy chromosome 13 and trisomy chromosome 18 three different subjects DNA.The mixture for preparing the DNA of section type is used to be sequenced.It is right Mixture into the DNA of fragment is processed and may include to prepare sequencing library, and the sequencing library can use any extensive flat Row method is sequenced with single channel or multiplex mode.Genome IPC stoste can store and be used for multiple diagnostic tests.
As an alternative, IPC can be used from known mother for carrying the fetus with known chromosomal aneuploidy The cfDNA of middle acquisition is established.For example, cfDNA can obtain from the pregnant woman for carrying the fetus with trisomy 21. CfDNA extracts from maternal sample, and is cloned into bacteria carrier and is grown in bacterium, continual to provide IPC sources.Restriction enzyme can be used to extract DNA from bacteria carrier.As an alternative, the cfDNA of clone can pass through example As PCR is expanded.IPC DNA can be processed, with from the survey to be analyzed presence or absence of chromosomal aneuploidy It is sequenced in the cfDNA identical batches of test agent.
Although the foregoing describing foundation of the IPC relative to trisomy, it is to be appreciated that can establish reflection includes such as difference Fragment amplification and/or missing including other parts aneuploidy IPC.So that it takes up a position, for example, in known different cancer In the case of associated with specific amplification (such as breast cancer is associated with 20Q13), it can establish and incorporate those known amplifications IPCs.
Sequence measurement
It is as noted above, as a part for the program for differentiating copy number variation, to prepared sample (for example, surveying Preface storehouse) it is sequenced.Any of a variety of sequencing technologies can be utilized.
Some sequencing technologies are commercially commercially available, such as A Feimei companies (Sunnyvale, CA) (Affymetrix Inc. (Sunnyvale, CA)) hybrid method microarray dataset and 454 life sciences (Bradford, CT) (454Life Sciences (Bradford, CT)), her Rumi/Suo Lekesa (Hayward, CA) (Illumina/Solexa (Hayward, )) and the conjunction of Cohan bioscience (Cambridge, MA) (Helicos Biosciences (Cambridge, MA)) in the sea CA Into method microarray dataset and application biosystem (Foster city, CA) (Applied Biosystems (Foster City, CA connection method microarray dataset)), as described below.Except single point of the synthesis sequencing progress using nautical mile Cohan bioscience Outside son sequencing, other single-molecule sequencing technologies include but is not limited to Pacific Ocean bioscience (Pacific Biosciences) SMRTTMTechnology, ION TORRENTTMTechnology and such as Oxford nano-pore technology (Oxford Nanopore Technologies) the nano-pore sequencing method of exploitation.
Although the Sang Geer methods (Sanger method) of automation are considered as ' first generation ' technology, said Method in can also use include automation Sang Geer PCR sequencing PCRs Sang Geer PCR sequencing PCRs.Other appropriate sequence measurement includes But nucleic acid imaging technique is not limited to, such as AFM (AFM) or transmission electron microscopy (TEM).Schematical sequencing Technology is described in greater detail in hereinafter.
At one schematically but in non-limiting embodiment, method described herein is real including the use of Cohan in the sea Single-molecule sequencing (tSMS) technology (for example, Harris T.D. (Harris T.D.) et al., science (Science) 320:106- Described in 109 [2008]) this single-molecule sequencing technology obtains the sequence information of the nucleic acid in test sample, such as parent sample CfDNA in product, for the examination of cancer institute subject cfDNA or cell DNA etc..In tSMS technologies, DNA sample point Being cleaved into has substantially 100 stocks to 200 nucleotides, and more A sequences are added to 3 ' ends of each DNA stocks.Each personal share Marked by the adenosine nucleoside acid for adding fluorescence labeling.Then DNA stocks are made to hybridize with flow cell, flow cell contains millions of The individual few T catch sites for being fixed to flowing pool surface.In certain embodiments, template density may be about 100,000,000 templates/ cm2.Then flow cell is loaded into instrument, such as HeliScopeTMSequenator, and laser irradiation flowing pool surface, so as to Show the position of each template.Ccd video camera can determine position of the template on flowing pool surface.Template fluorescence labeling is then Divide and wash off.Sequencing reaction is started by introducing the nucleotides of DNA polymerases and fluorescence labeling.Few T nucleic acid serves as primer. Polymerase makes marked nucleotides be attached in a manner of template-directed in primer.Remove polymerase and uncombined nucleotides. The template of the combination of the nucleotides of fluorescence labeling is guided to be distinguished by flowing pool surface imaging.After imaging, step toward division is removed Fluorescence labeling has been removed, and the program is repeated to the nucleotides of other fluorescence labelings, until obtaining desired reading length.Profit Collection step sequence information is added with each nucleotides.Genome sequencing is carried out by single-molecule sequencing technology to survey preparing Excluded during preface storehouse or typically avoid the amplification of PCR-based, and these methods allow direct measurement sample, and it is non-measured that The copy of individual sample.
At another schematically but in non-limiting embodiment, method described herein is including the use of 454 PCR sequencing PCRs (Roche) (for example, agate Gulass M. (Margulies, M.) et al., natural (Nature) 437:Institute in 376-380 [2005] State) obtain the sequence information of nucleic acid in test sample, such as cfDNA in parent test sample, for the examination of cancer institute CfDNA or cell DNA of subject etc..454 PCR sequencing PCRs typically comprise two steps.The first step, DNA, which is cut into, to be had The fragment of substantially 300 to 800 base-pairs, and these fragments are blunt end.Then oligonucleotide aptamer is connected to piece The end of section.Aptamer serves as fragment amplification and the primer of sequencing.Fragment, which can be used, for example contains the suitable of 5 '-biotin label Gamete B attaches to DNA and caught on bead, such as the bead of coating streptavidin.The fragment attached on bead is in water Enter performing PCR amplification in bag fat liquor drop.As a result it is the multiple copies with the DNA fragmentation of clonal fashion amplification on each bead. Second step, bead is caught in hole (for example, hole of picoliters size).Carry out pyrosequencing parallel to each DNA fragmentation.Add One or more nucleotides are added to produce optical signal, the optical signal recorded in instrument is sequenced by ccd video camera.Signal intensity with With reference to nucleotide number it is proportional.Pyrosequencing method can be departed from when nucleotides adds using pyrophosphoric acid (PPi).PPi ATP is converted into by ATP sulfurylases in the presence of the phosphosulphate of adenosine 5 '.Luciferase is turned fluorescein using ATP Oxyluciferin is turned to, and this reaction produces light, measures this light and is analyzed.
In another schematical but non-limiting embodiment, method described herein is including the use of SOLiDTMSkill Art (Applied Biosystems, Inc. (Applied Biosystems)) obtains the sequence information of the nucleic acid in test sample, such as CfDNA in parent test sample, for the examination of cancer institute subject cfDNA or cell DNA etc..In SOLiDTMConnection In PCR sequencing PCR, genomic DNA is cut into fragment, and aptamer is attached into 5 ' ends of fragment and 3 ' and held to produce fragment text Storehouse.As an alternative, interior aptamer can be introduced as follows:By aptamer be connected to fragment 5 ' end and 3 ' end, make fragment into Ring, the cyclic fragment is digested to produce interior aptamer, and aptamer is attached to the 5 ' of gained fragment and held with 3 ' ends to produce Match library.Next, clone bead group is prepared in the microreactor containing bead, primer, template and PCR components. After PCR, by template denaturation and bead is enriched with to separate the bead with the template expanded.To on the bead selected Template carries out 3 ' modifications, to allow to be attached on slide.By part random oligonucleotide and specific fluorogen can be passed through The sequentially hybridization and connection of the base (or base-pair) of the center measure of discriminating determine sequence.After recording color, it will connect The oligonucleotides connect divides and removed, and then repeats the process.
In another schematical but non-limiting embodiment, method described herein is given birth to including the use of the Pacific Ocean Real-time (the SMRT of the unimolecule of thing scientific companyTM) sequencing technologies obtain the sequence information of the nucleic acid in test sample, such as it is female CfDNA in body examination test agent, for the examination of cancer institute subject cfDNA or cell DNA etc..In SMRT PCR sequencing PCRs In, during DNA is synthesized, continuous combine of the nucleotides of dye marker is imaged.Single DNA polymerases attaching molecules are in obtaining The basal surface of the independent null mode wavelength detecting (ZMW detectors) of sequence information was obtained, and the nucleotides of phosphoric acid connection is just tied Synthesize the primer strand of growth.ZMW detectors include closed structure, and it allows with (such as microsecond) the quickly diffusion outside ZMW scopes Fluorescent nucleotide observe the combination that single nucleotides passes through DNA polymerases for background.Nucleotides is combined into growth stock typically Need several milliseconds.During this period, fluorescence labeling is excited and produces fluorescence signal, and divides fluorescence labels.Measure phase Which base the dye fluorescence answered, which indicates, is combined.The process is repeated to obtain sequence.
In another schematical but non-limiting embodiment, method described herein is surveyed including the use of nano-pore Sequence method is (for example, GV and Mai Le A. in rope, clinical chemistry (Clin Chem) 53:1996-2001 [2007]) obtain test sample In nucleic acid sequence information, such as cfDNA in parent test sample, for the examination of cancer institute subject cfDNA or Cell DNA etc..Nano-pore sequencing DNA analysis technology is developed by multiple companies, including such as Oxford nano-pore technology company (England Oxford city) (Oxford Nanopore Technologies (Oxford, United Kingdom)), this Kui Long company (Sequenom), Na Bosi companies (NABsys) etc..Nano-pore sequencing method is single-molecule sequencing technology, wherein working as unimolecule Directly it is sequenced when passing through nano-pore by DNA.Nano-pore is aperture, and its diameter is typically about 1 nanometer.Nano-pore is immersed Apply current potential (voltage) in conductor fluid and across it, because ionic conduction produces Weak current by nano-pore.Flow through The magnitude of current is sensitive to the size and dimension of nano-pore.When DNA molecular passes through nano-pore, each nucleotide pair on DNA molecular Nano-pore causes different degrees of obstruction, so that different degrees of change occurs by the current magnitude of nano-pore.Therefore, when This change that DNA molecular passes through the electric current occurred during nano-pore provides the reading of DNA sequence dna.
In another schematical but non-limiting embodiment, method described herein is including the use of chemical-sensitive Property field-effect transistor (chemFET) array (for example, described in U.S. Patent Application Publication No. 2009/0026082) obtains CfDNA in the sequence information of nucleic acid in test sample, such as parent test sample, the subject for the examination of cancer institute CfDNA or cell DNA etc..In an example of this technology, DNA molecular can be put into reative cell, and can make Template molecule and the sequencing primer being attached on polymerase hybridize.One or more triphosphates are combined at the end of sequencing primer 3 ' New nucleic acid stock can be distinguished by chemFET with curent change.One array can have multiple chemFET sensings Device.In another example, mononucleotide can be made to attach to bead, and can on bead amplification of nucleic acid, and can will Single bead is transferred in the independent reative cell on chemFET arrays, wherein each room has chemFET sensors, and Nucleic acid can be sequenced.
In another embodiment, the inventive method is using the Hall health using transmission electron microscopy (TEM) Molecular engineering (Halcyon Molecular ' s technology) obtains the sequence information of the nucleic acid in test sample, such as CfDNA in parent test sample.The method of referred to as independent molecule placement rapid nano transmission (IMPRNT) includes:Utilize single original Sub- resolution transmission electron microscope enters to HMW (150kb or bigger) DNA through heavy atom label selected marker Row imaging, and these molecules is arranged with consistent base to base spacing, with the parallel array of highly dense (3nm stocks to stock) It is listed on ultrathin film.Electron microscope is used for the molecular imaging on film to determine the position of heavy atom label and carry Take DNA base sequence information.This method is further described in PCT Publication WO 2009/046445.This method allows The sequence of complete human genome was determined within ten minutes.
In another embodiment, DNA sequencing technology is ion stream (Ion Torrent) single-molecule sequencing method, and it will Semiconductor technology coordinates so that chemical code information (A, C, G, T) is directly changed into semiconductor chip with simple sequencing chemical technology On digital information (0,1).Substantially, when nucleotides is combined into DNA stocks by polymerase, hydrogen ion is released as accessory substance Put.Ion stream is to carry out this biochemical process using the high density arrays in micro Process hole, with extensive parallel mode.Each pore volume Receive different DNA moleculars.It is ion-sensitive layer below hole, and is ion transducer below ion-sensitive layer.When addition nucleosides Sour (such as C) to DNA profiling, then in conjunction with into DNA stocks when, by release hydrogen ions.The electric charge of that ion will change solution PH value, this can be detected by the ion transducer of ion stream (Ion Torrent).Sequenator is (substantially in the world most Small solid-state PH meters) read base (chemically information directly arrives digital information).Ion human genome machine (PGMTM) sequencing Instrument and then with nucleotides bump chip one by one.If the next nucleotides for impacting chip mismatches, will not It recorded voltage change and base will not be determined.If two identical bases on DNA stocks be present, voltage can double, and Chip can record two be determined identical bases.The nucleotides directly detected in the recordable several seconds combines.
In another embodiment, the inventive method obtains the nucleic acid in test sample including the use of Sequencing by hybridization CfDNA in sequence information, such as parent test sample.Sequencing by hybridization includes making multiple polynucleotide sequences and multiple multinuclears Thuja acid probe is contacted, and each of plurality of polynucleotide probes optionally can be tethered on substrate.Substrate is probably Include the flat surfaces of known nucleotide sequence alignment.It can be used for the pattern of the hybridization array more present in determination sample Nucleotide sequence.In other embodiments, each probe is tethered on bead, such as magnetic bead etc..It can determine and bead Hybridization and for differentiating multiple polynucleotide sequences in sample.
In another embodiment, the inventive method including the use of Yi Lu meter Na (Illumina) synthesis sequencings and Sequencing chemical technology based on reversible terminator is (for example, Bentley (Bentley) et al., natural (Nature) 6:53-59 [2009] described in), by carrying out large-scale parallel sequencing to millions of DNA fragmentations to obtain the sequence of the nucleic acid in test sample CfDNA in column information, such as parent test sample.Template DNA can be genomic DNA, such as cfDNA.In some implementations In scheme, the genomic DNA of separated cell is used as template, and its fragmentation is turned into the length of hundreds of base-pairs. In other embodiments, cfDNA is used as template, and because cfDNA exists as short-movie section, does not require fragmentation.Lift For example, fetus cfDNA circulates that (model (Fan) et al., faces as the fragment of substantially 170 base-pairs of length (bp) in blood flow Bed chemistry (Clin Chem) 56:1279-1286 [2010]), and before sequencing, do not require DNA fragmentation.Her Rumi Sequencing technologies of receiving depend on the genomic DNA into fragment to be attached on the optical clear flat surfaces that oligonucleotides anchor is combined. Template DNA end is repaired and produces 5'- phosphorylation blunt ends, and the polymerase of Klenow fragment (Klenow fragment) Activity is used for the 3' ends for making single A bases be added to blunt end phosphorylated cdna fragment.This addition is prepared for being used to be connected to few nucleosides DNA fragmentation on sour aptamer, these fragments at its 3' end there is single T base overhangs to improve joint efficiency.Aptamer is few Nucleotides is complementary with flow cell anchor.Under restricted diluting condition, the sub-thread template DNA of adapted sub- modification is added to flowing It is fixed in pond and by hybridization on anchor.DNA fragmentation attached by extension and bridge amplification has hundreds of millions clumps to establish Flow cell, each Cong Hanyou about 1, the same template of 000 copy is sequenced in VHD.In one embodiment, at random into The genomic DNA (such as cfDNA) of fragment is expanded before cluster amplification is subjected to using PCR.As an alternative, use Genomic library preparation without amplification, and cluster TRAP (Gao Nawa (Kozarewa) et al., natural method is used alone (Nature Methods)6:291-295 [2009]) it is enriched with random into the genomic DNA of fragment, such as cfDNA.Utilize use Template is sequenced the reliable four colors DNA synthesis sequencing technologies with the reversible terminator that can remove fluorescent dye.Use laser Excite and obtain high sensitivity fluoroscopic examination with total internal reflection optics.About 20bp to 40bp (such as 36bp) short sequence is read The reference gene group that number compares repeated fragment masking is compared, and is come using the data analysis pipeline software specially developed Differentiate unique mapping of the short sequence reads to reference gene group.The reference gene group that non-duplicate fragment can also be used to cover.Nothing By the reference gene group covered using repeated fragment, or the reference gene group of non-duplicate fragment masking, only to being uniquely mapped to The reading of reference gene group counts.Read for the first time after completing, can be by template in-situ regeneration so as to from the opposite end of fragment Second of reading can be carried out.Therefore, it is possible to use DNA fragmentation single-ended or being sequenced with opposite end.To the DNA being present in sample Fragment carries out part sequencing, and to the reading comprising predetermined length (such as 36bp), be mapped to the sequence of known reference genome Column label is counted.In one embodiment, reference gene group sequence is NCBI36/hg18 sequences, and it can be in WWW genome.ucsc.edu/cgi-bin/hgGatewayOrg=Human&db=hg18&hgsid=166260105 is obtained.Make For alternative solution, reference gene group sequence is GRCh37/hg19, and it can be in WWW genome.ucsc.edu/cgi-bin/ HgGateway is obtained.Other common sequence information sources include GenBank, dbEST, dbSTS, EMBL (European Molecular Biology Laboratory (European Molecular Biology Laboratory)) and DDBJ (DNA Data Bank of Japan).Have a variety of Computerized algorithm is available for aligned sequences to use, including but not limited to BLAST (Ao Ciqiu (Altschul) et al., 1990), BLITZ (MPsrch) (pul is inferior and Lippmann by (Si Teluoke and Collins (Sturrock&Collins), 1993), FASTA (Person&Lipman), 1988), BOWTIE (youth's lattice rice (Langmead) et al., genome biology (Genome Biology)10:R25.1-R25.10 [2009]) or ELAND (Illumina Inc., Santiago, CA, USA (Illumina, Inc.,San Diego,CA,USA)).In one embodiment, to the copying with clonal fashion amplification of blood plasma cfDNA molecules One end of shellfish is sequenced and by Yi Lu meter Na genome analysises instrument (Illumina Genome Analyzer) biology Informatics compares analysis and is acted upon, and Yi Lu meter Na genome analysises instrument uses the RiboaptDB efficiently compared on a large scale (ELAND) software.
In some embodiments of said method, the sequence label that is mapped includes about 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about 90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, about 450bp or about 500bp sequence reads.It is expected that technological progress can be real Now it is more than 500bp single-ended reading, when opposite end reading is matched somebody with somebody in generation, greater than about 1000bp reading can be realized.In a reality Apply in scheme, the sequence label mapped includes 36bp sequence reads.By comparing sequence label with reference sequences to determine The chromosome starting point of nucleic acid (such as cfDNA) molecule of sequencing can obtain the mapping of sequence label, and need not specifically lose Pass sequence information.Lesser degree of mispairing (each 0 to 2 mispairing of sequence label) can be with explanation reference genome and biased sample In genome between small polymorphism that may be present.
Every kind of sample typically obtains multiple sequence labels.In certain embodiments, it is mapped to using reading with reference to base Because of group, every kind of sample obtains at least about 3 × 106Individual sequence label, at least about 5 × 106Individual sequence label, at least about 8 × 106 Individual sequence label, at least about 10 × 106Individual sequence label, at least about 15 × 106Individual sequence label, at least about 20 × 106Individual sequence Label, at least about 30 × 106Individual sequence label, at least about 40 × 106Individual sequence label, at least about 50 × 106Individual sequence label, These sequence labels include the reading of (such as 36bp) between 20bp and 40bp.In one embodiment, all sequences reading It is mapped to all areas of reference gene group.In one embodiment, to having been mapped into all areas of reference gene group The label in domain (such as all chromosomes) is counted, and determines sequence interested in hybrid dna sample (such as chromosome Or one part) CNV (that is, excessively represent or represent deficiency).This method does not require to make differentiation between two genomes.
It is correct to determine that accuracy necessary to whether there is in sample or lack CNV (such as aneuploidy) is according to sequencing Change (interchromosomal variability) and different survey of the sequence label number of reference gene group between each sample are mapped in operation Change (variability between the sequence) judgement of the sequence label number of reference gene group is mapped in sequence operation.For example, reflect The change for being mapped to the label of rich GC or poor GC reference sequences may be particularly significant.Other changes can be because using different nucleic acid extractions With purification schemes, prepare sequencing library and using caused by different microarray datasets.The inventive method is according to normalizing sequence Arrange the understanding of (normalization chromosome sequence normalizes sector sequence) and use sequence dosage (chromosome dosage or section agent Amount), so as to explain in itself because interchromosomal variability (with batch) between sequence variability (between round) it is related to platform Naturally increased variability caused by variability.Chromosome dosage is based on the understanding to normalizing chromosome sequence, normalization Chromosome sequence can include monosome, or the chromosome of chromosome 1 to 22, X and Y is selected from including two or more.Make For alternative solution, normalization chromosome sequence can include monosome section, or including a chromosome or two or more Two or more sections of individual chromosome.Section dosage is based on the understanding to normalizing sector sequence, normalizes section sequence Row can include single section of any one chromosome, or including any two in chromosome 1 to 22, X and Y or more dyeing Two or more sections of body.
Substance is sequenced
Fig. 4 illustrates the flow chart of an embodiment of this method, wherein by label nucleic acid and the source sample of monocyte sample Product Nucleic acid combinations determine the integrality of biological cosmogony sample to analyze genetic abnormality.In step 410, comprising The biological cosmogony sample of genomic nucleic acids.At step 420, label nucleic acid is marked with biological cosmogony sample combination Thing sample.Prepare in step 430 with the source sample gene group nucleic acid of clonal fashion amplification and the mixture of label nucleic acid Sequencing library, and library is sequenced with extensive parallel mode to provide and sample source genome core in step 440 The acid sequencing information relevant with label nucleic acid.Large-scale parallel sequencing method provides the sequencing information on sequence reads, These sequence reads are mapped to one or more reference gene groups to produce the sequence label that can be analyzed.In step 450, All sequencing informations are analyzed, and in step 460, according to the sequencing information relevant with marker molecules, examine source sample Integrality.Inspection source sample integrity is by determining in the sequencing information of the marker molecules of step 450 acquisition and in step What 420 uniformity being added between the known array of the marker molecules in original source sample were completed.Can be to being sequenced respectively Multiple sample application identical process, wherein each sample includes the molecule with the exclusive sequence of the sample, i.e. a sample Marked with unique marker molecules, and other samples in its flow cell or slide with sequenator are separately sequenced. If sample survey integrality, the sequencing information relevant with sample gene group nucleic acid can be analyzed, with provide for example with source sample The relevant information of the situation of the subject derived from.For example, if sample survey integrality, analysis has with genomic nucleic acids The sequencing information of pass is to determine presence or absence of chromosome abnormality.If not sample survey integrality, do not consider sequencing information.
Method depicted in figure 4 applies also for including the biological analysis for carrying out unimolecule substance sequencing, such as sea In the tSMS of Cohan, the SMRT of Pacific Ocean bioscience, the BASE and other technologies of Oxford nano-pore, such as IBM propose skill Art, it does not require to prepare library.
Multiple sequencing
A large amount of sequence reads that every batch of sequencing procedures can obtain allow to analyze the sample of merging, i.e., multiple point Analysis, it maximises sequencing ability and reduces workflow.For example, using the eight of Yi Lu meter Na genome analysis instrument The large-scale parallel sequencing that swimming lane flow cell is carried out to eight libraries can be with multiple progress with to two in each swimming lane or more Multiple sample sequencings, so as to which 16,24,32 etc. or more samples are sequenced in single operation.Multiple samples are carried out Parallel sequencing (that is, multiple sequencing) requirement is during prepared by sequencing library by sample specificity index sequence (being also known as bar code) Merge.Sequencing index be about 5 of 3 ' end additions of genomic nucleic acids and label nucleic acid, about 10, about 15, The unique base sequence of about 20, about 25 or more bases.Multiplicated system can in single batch of sequencing procedures logarithm Hundred biological samples are sequenced.Can be by the way that index sequence be incorporated in one of PCR primer expanded for cluster to prepare The sequencing library indexed is for the sequence expanded with clonal fashion is sequenced.As an alternative, index sequence can To be incorporated in aptamer, cfDNA is connected to before PCR amplifications.Index library for single-molecule sequencing can be by place In 3 ' ends of label and genome molecules or the addition sequence required with the hybridization of flow cell anchor (such as more A tails are added to make With tSMS carry out single-molecule sequencing) 5 ' ends merge index sequences and establish.Uniquely tagged and the nucleic acid indexed are surveyed Sequence provides the index sequence information for differentiating the sample merged in sample library, and the sequence information of marker molecules makes base Because the sequencing information and sample source of group nucleic acid are interrelated.Multiple samples are individually being sequenced with the reality of (that is, substance is sequenced) Apply in scheme, it is only necessary to modify label and the genomic nucleic acids molecule of each sample to be included as required by microarray dataset Adaptor sequence and exclude index sequence.
Fig. 5 provides the flow chart of the embodiment 500 for the method for sample survey integrality, and these samples are carried out The multiple sequencing biological analysis of multi-step, i.e. be sequenced the Nucleic acid combinations of individual samples and as complex mixture. In step 510, multiple biological cosmogony samples are obtained, each sample includes genomic nucleic acids.In step 520, by uniquely tagged Thing nucleic acid and each biological cosmogony sample combination and obtain multiple uniquely tagged samples.In step 530, for each uniquely tagged The sequencing library of sample preparation sample gene group nucleic acid and label nucleic acid.It is prepared by the library for being scheduled for the sample of multiple sequencing Including unique index tab is incorporated in the label nucleic acid of sample and each uniquely tagged sample to provide its source nucleic acid sequence Can be interrelated with correspondence markings thing nucleotide sequence and the sample differentiated be able in complex solution.Including that can enter , can be the 3 ' of sample and marker molecules in the embodiment of the method for the marker molecules (such as DNA) of row enzymatic modification Adaptor sequence is sequenced to be incorporated to indexed molecule comprising index sequence by connecting in end.Including enzymatic modification can not be carried out Marker molecules (such as DNA analogs without phosphate backbone) method embodiment in, index sequence is to close It is incorporated into period at 3 ' ends of analog marker molecules.The sequencing library of two or more samples is merged and loaded Into the flow cell of sequenator, they are sequenced with extensive parallel mode in step 540.In step 550, analysis is all Sequencing information and in step 560, the integrality of source sample is examined according to the sequencing information relevant with marker molecules.Inspection The integrality for testing multiple source samples each is by the way that the sequence label relevant with same index sequence is grouped so that category first These genome sequences and label sequence and distinguishing sequence in each library that the genome molecules by multiple samples are formed It is related and completion.Then the label and genome sequence that are grouped are analyzed, marker molecules institute is directed to examine The sequence of acquisition corresponds to the known unique sequence code being added in corresponding source sample.If sample survey integrality, it can analyze The sequencing information relevant with sample gene group nucleic acid, to provide the hereditary information relevant with the subject that source sample is derived from.Lift For example, if sample survey integrality, the sequencing information relevant with genomic nucleic acids is analyzed to determine presence or absence of dye Colour solid is abnormal.It is lack of consistency between the sequencing information and known array of marker molecules and represents that sample is chaotic, and is not considered The subsidiary sequencing information relevant with genome cfDNA molecules.
Measure CNV is used for pre-natal diagnosis
The acellular foetal DNA and RNA circulated in maternal blood can be used for the ever-increasing hereditary conditions of number Early stage Non-invasive Prenatal Diagnosis (NIPD), can be not only used for management and may also aid in reproduction decision-making.The nothing circulated in blood flow The presence of cell DNA was had been known for more than 50 years.Recently, it is found that in the parent blood flow of period of gestation in the presence of a small amount of Foetal DNA (Lo (sieve) et al., Lancet (lancet) 350 of circulation:485-487[1997]).It is considered as from dying Placenta cells, acellular foetal DNA (cfDNA) have proven to be made up of the short-movie section for being typically less than 200bp in length, (Chan (old) et al., clinical chemistry, 50:88-92 [2004]), it can be distinguished when morning is pregnant by only 4 weeks (Illanes (she draws Nice) et al., Early Human Dev (early stage human developmental), 83:563-566 [2007]), and Know and (Lo (sieve) et al., Am J Hum Genet (U.S. human inheritances are removed from maternal circulation within a few hours of childbirth Learn magazine), 64:218-224[1999]).In addition to cfDNA, acellular fetal rna can also be distinguished in parent blood flow (cfRNA) fragment, this is derived from the gene being transcribed in fetus or placenta.These fetuses from maternal blood sample are lost Pass key element extracts the new chance provided with subsequent analysis for NIPD.
This method is a kind of method independently of polymorphism, and it is for being used in NIPD and it is not required from parent CfDNA identifies fetus cfDNA so as to determine fetus aneuploidy.In some embodiments, the aneuploidy is one The complete Trisomy of kind or monosomy, or a kind of partial trisomy or monosomy.Part aneuploidy is by obtaining or losing Lose caused by chromosome dyad, and cover chromosome imbalance, these imbalance generations are from unbalanced transposition, unbalanced Inversion, missing and insertion.So far, it is trisomy 21 with the compatible most common known aneuploidy of life, i.e. Tang Shi is integrated Disease (DS), as caused by there is all or part of chromosome 21 in it.Under few cases, DS can be by a kind of hereditary or accidental The defects of cause, thus an all or part of additional copy for chromosome 21 becomes to be attached to another chromosome and (is typically Chromosome 14) on, to form a single aberrant chromosomal.DS is with intellectual damage, serious difficulty of learning and by being good for for a long time What excess mortality rate caused by Kang Wenti (such as heart disease) was associated.Other aneuploidy bags with known clinical significance Edward's syndrome (trisomy 18) and pa tower syndrome (trisomy 13) are included, their life in the past few months are often fatal Property.The aneuploidy related to sex chromosome number is also known and including monosomy X, such as in female newborn In Turner syndrome (XO)) and three times X syndromes (XXX), and the Ke Lin Fitows syndrome (XXY) in male neonate With XYY syndromes, all of which is all associated from the different phenotypes reduced including infertility and intellectual skill.Monosomy X [45, X] It is the common cause of Abortion, it accounts for about 7% in spontaneous abortion.45, X (also referred to as spies based on 1-2/10,000 Receive syndrome) life birth frequency, estimate that 45, the X carcasses less than 1% are survived and arrive term.About 30% Turner syndrome patient It is 45, X cell system and 46, XX cell line or chimera (Hooke (Hook) and Patrick Warburton containing the cell line for resetting X chromosome (Warburton), 1983).The phenotype relatively mild (considering high embryonic death rate) of life birth baby and it has assumed that with Tener it is comprehensive All life birth women of possibility of simulator sickness carry the cell line containing two sex chromosome.Monosomy X can be with 45, X or with 45, X/ 46XX is betided in women, and is betided with 45, X/46XY in male.Autosome monosomy in the mankind is generally recognized To be incompatible with life;However, considerable cytogenetics report describes the chromosome 21 of life birth child Complete monosomy (Butterworth orchid baby (Vosranova) et al., molecular cytogenetics (Molecular Cytogen.) 1:13 [2008];Zhu Tan (Joosten) et al., pre-natal diagnosis (Prenatal Diagn.) 17:271-5[1997]).It is described here Method can be used for pre-natal diagnosis these and other chromosome abnormality.
According to some embodiments, method disclosed here can determine any chromosome in chromosome 1 to 22, X and Y The existence or non-existence of Trisomy.It can be included according to the Trisomy example that the inventive method detects but unlimited In (the T21 of trisomy 21;Down syndrome), (T18 of trisomy 18;Edward's syndrome), trisomy 16 (T16), trisomy 20 (T20), (T22 of trisomy 22;Cat's eye syndrome), (T15 of trisomy 15;Puri moral Willi Syndrome), (T13 of trisomy 13;Pa Tower syndrome), (T8 of trisomy 8;Hua Kani syndromes (Warkany Syndrome)), trisomy 9 and XXY (gram Lai Lifeier Special syndrome), XYY or XXX trisomys.Other it is autosomal completely trisomys to be fatal in the presence of non-chimeric state, but It is with can be compatible with life in the presence of chimeric state.It will be appreciated that in fetus cfDNA, different complete trisomys is (no matter with embedding Close state or non-chimeric state presence) and partial trisomy can be determined according to teachings provided herein.
The non-limiting examples of the partial trisomy of the inventive method measure can be utilized to include but is not limited to partial trisomy Property 1q32-44, trisomy 9p, the chimera of trisomy 4, trisomy 17p, partial trisomy 4q26-qter, part 2p trisomys, Partial trisomy 1q and/or partial trisomy 6p/ monosomy 6q.
Method disclosed here can be also used for determining chromosome monosomy X, chromosome monosomy 21 and partial monosomy Property, such as monosomy 13, monosomy 15, monosomy 16, monosomy 21 and monosomy 22, it is known that they have with pregnancy miscarriage Close.The partial monosomy of chromosome typically relevant with complete aneuploidy can also be determined using method described here Property.The non-limiting examples for the deletion syndrome that can be determined in the method in accordance with the invention are included caused by partial deletion of chromosome Syndrome.Can according to method described here determine excalation example include but is not limited to chromosome 1,4,5,7, 11st, 18,15,13,17,22 and 10 excalation, during it is described below.
1q21.1 deletion syndromes or 1q21.1 (recurrent) it is micro-deleted be chromosome 1 rare deformity.Deletion syndrome Afterwards, 1q21.1 also be present and replicate syndrome.Although deletion syndrome lacks a DNA part in specified point, synthesis is replicated Levy two or three copies that DNA similar portions be present in identical point.It is that 1q21.1 is copied that missing and duplication are referred in document Shellfish number variation (CNV).1q21.1 missings can be relevant with TAR syndromes (thrombopenia is with absence of radius).
Wolf-He Qihuoen syndromes (Wolf-Hirschhorn syndrome, WHS) (OMIN#194190) are one Kind relevant with chromosome 4p16.3 semizygote missing adjoins gene delection syndrome.Wolf-He Qihuoen syndromes are A kind of congenital malformation syndrome, it is characterized in that before birth the maldevelopment insufficient, different degrees of with postnatal growth, have feature Cranium region feature (be in ' Greece's soldier's helmet ' appearance nose, high forehead, convex cheek, hypertelorism, high arch eyebrow, eyes In protrusion, epicanthus, short people, lower turn of the apparent corners of the mouth of face and micromandible) and epilepsy.
(also known as 5p- or 5p subtracts the excalation of chromosome 5, and referred to as cat's cry syndrome (Cris du Chat Syndrome (OMIN#123450)) it is caused by galianconism (galianconism) (5p15.3-p15.2) missing of chromosome 5.Suffer from this disease The baby of shape often sends the high-pitched tone cry sounded as mewing.The feature of the illness be disturbance of intelligence and development delay, head Size small (microcephalus), birth weight be low and infancy Muscle tensility weak (hypotonia), characteristic facial characteristics and Heart defect that may be present.
Also known as William's-Bi Ren syndromes (Williams- of chromosome 7q11.23 deletion syndromes (OMIN 194050) Beuren Syndrome) be cause multisystem illness adjoin gene delection syndrome, it is because on chromosome 7q11.23 Caused by 1.5Mb to 1.8Mb semizygote missing, this semizygote missing contains substantially 28 genes.
The Jacobsen syndrome (Jacobsen Syndrome) of also known as 11q deficit disorders is a kind of rare congenital Venereal disease disease, it is caused by the stub area missing of the chromosome 11 including zone 11q24.1.It can cause disturbance of intelligence, have The looks of feature and various practical problems, including heart defect and illness of bleeding.
The partial monoploidy for being referred to as monosomy 18p chromosome 18 is a kind of rare chromosome illness, wherein lacking The all or part of galianconism (p) (monosome) of chromosome 18.This disease is typically characterized by of short and small stature, degree The deformity in variable mental retardation, language retardation, skull and facial (cranium face) region, and/or extra body are different Often.For different cases, related craniofacial defect can change very big in scope and seriousness.
The patient's condition as caused by structure or copy number the purpose change of chromosome 15 includes peace lattice Mann syndrome and Puri De-prestige Leigh's syndrome, they are related to the loss of the gene activity in the same part of chromosome 15 (15q11-q13 regions).Should Work as understanding, in father and mother carrier, some transpositions and it is micro-deleted can be asymptomatic, but still can cause main in offspring Genetic disease.For example, healthy mother micro-deleted carrying 15q11-q13 can bear, with peace lattice Mann syndrome, (one kind is serious Neurodegenerative disease) child.Therefore, method described here, equipment and system can be used for identifying this in fetus Class excalation and other missings.
Partial monoploidy 13q is a kind of rare chromosomal disorders, and its occur in chromosome 13 one section of long-armed (q) lacks During mistake (monomer).Baby with partial monoploidy 13q during birth can show low birth weight, head and face (craniofacial region Domain) deformity, skeletal abnormality (especially hand and pin) and other body abnormalities.Mental retardation is the feature of the patient's condition. At birth in the individual with the disease, the death rate of infancy is very high.Nearly all partial monoploidy 13q case All there is no obvious cause and occur at random (sporadic).
Smith-Margie Nice syndrome (Smith-Magenis syndrome) (SMS-OMIM#182290) is because of dye Caused by missing or inhereditary material on one copy of colour solid 17 are lost.This famous syndrome is sent out with hypoevolutism, spirit Educating slow, feeblemindedness, congenital anomaly (such as heart and kidney defects) and neurobehavioral exception, (such as severe sleep is disorderly Unrest and Self-injurious behavior) it is relevant.Smith-Margie Nice syndrome (SMS) is because of dyeing under majority of case (90%) Caused by 3.7-Mb intercalary delections in body 17p11.2.
22q11.2 deletion syndromes, also referred to as DiGeorge syndrome, caused by the missing of a bit of chromosome 22 Syndrome.It is this missing (22q11.2) occur this to one of chromosome it is long-armed on chromosome near middle.The synthesis The feature of disease even can also change very extensively in the member of same family, and influence many parts of body.Characteristic mark As that can include inborn defect, such as congenital heart disease with symptom, the neuromuscular problem most commonly involved switch off (close by palate pharynx Close incomplete) jaw defect, learning disorder, the Light Difference in facial characteristics, and recurrent infection.Chromosomal region Micro-deleted in 22q11.2 is associated with schizoid 20 to 30 times of risk increase.
Missing on the galianconism of chromosome 10 is relevant with the phenotype of DiGeorge syndrome sample.Chromosome 10p part is single Body is rare, but is observed in the patient that a part shows DiGeorge syndrome feature.
In one embodiment, method described here, equipment and system are used to determine partial monoploidy, including But the partial monoploidy of chromosome 1,4,5,7,11,18,15,13,17,22 and 10 is not limited to, can also be come using this method Determine such as partial monoploidy 1q21.11, partial monoploidy 4p16.3, partial monoploidy 5p15.3-p15.2, partial monoploidy 7q11.23, partial monoploidy 11q24.1, partial monoploidy 18p, the partial monoploidy (15q11-q13) of chromosome 15, part Monosomy 13q, partial monoploidy 17p11.2, the partial monoploidy (22q11.2) and partial monoploidy 10p of chromosome 22.
It can be included according to the other parts monosomy that method described here determines:Unbalanced translocation t (8;11) (p23.2;p15.5);11q23 is micro-deleted;17p11.2 is lacked;22q13.3 is lacked;Xp22.3 is micro-deleted;10p14 is lacked;20p Micro-deleted [del (22) (q11.2q11.23)], 7q11.23 and 7q36 missings;1p36 is lacked;2p is micro-deleted;1 type nerve is fine Tie up knurl sick (17q11.2 is micro-deleted), Yq missings;It is 4p16.3 micro-deleted;It is 1p36.2 micro-deleted;11q14 is lacked;19q13.2 micro- lack Lose;Rubinstein-Taybi syndrome (Rubinstein-Taybi) (16p13.3 is micro-deleted);7p21 is micro-deleted;Miller-Dick Syndrome (Miller-Dieker syndrome) (17p13.3);And 2q37 is micro-deleted.Excalation can be chromosome The small missing of a part, or it can be the micro-deleted of chromosome, wherein monogenic missing can occur.
The several duplication syndrome identified caused by a part for chromosome arm replicates is ([online referring to OMIN Mankind's Mendelian inheritance (Online Mendelian Inheritance in Man), exists in ncbi.nlm.nih.gov/omim Line is checked).In one embodiment, the inventive method is available for any chromosomal region in determination chromosome 1 to 22, X and Y The existence or non-existence of duplication and/or the amplification of section.Can be according to the non-limiting of the duplication syndrome of the inventive method determination Example includes the duplication of a part for chromosome 8,15,12 and 17, during it is described below.
It is the rare heredity barrier caused by the duplication in a region of human chromosomal 8 that 8p23.1, which replicates syndrome, Hinder.This replicates the incidence of disease of the syndrome in survivor is gone out and is estimated as 1/64,000, and is 8p23.1 deletion syndromes It is reciprocal.8p23.1 replicates relevant from different phenotypes, including slow, hypoevolutism of speaking, mile abnormality form, protrudes with forehead With it is one or more in arch eyebrow and congenital heart disease (CHD).
It is a kind of syndrome that can clinically differentiate that chromosome 15q, which replicates syndrome (Dup15q), and it is because of chromosome 15q11-13.1 duplication caused by.Baby with Dup15q typically exhibits hypotonia (Muscle tensility is low), growth retardation;He May suffer from harelip and/or cleft palate or heart, kidney or other malformations from birth;They show that the cognition of some degree is slow Slow/obstacle (mental retardation), speak and delayed speech and sense organ processing imbalance.
Pa Nisite-Kai Lian syndromes (Pallister Killian syndrome) are extra #12 chromosomal materials As a result.Cell mixture (chimera) generally be present, some have extra #12 materials, and some are normally (not have volume 46 chromosomes of outer #12 materials).Many problems be present in the baby with this syndrome, including severe mental retardation, Muscle tensility is low, the facial characteristics of " vulgarity " and forehead protrude.They tend to have very thin upper lip, thicker lower lip, And brachyrhinia.Other health problems, which include epilepsy, feed bad, ankyloses, adulthood cataract, hearing loss and heart lacks Fall into.People's lost of life with Pa Nisite-Kai Lian syndromes.
Galianconism of the individual with the hereditary symptom for being appointed as dup (17) (p11.2p11.2) or dup17p in chromosome 17 It is upper to carry extra hereditary information (being referred to as replicating).Chromosome 17p11.2 duplication causes Bai Tuoqi-Lu Puqi syndromes (Potocki-Lupski syndrome, PTLS), it is the hereditary symptom just identified, and the case reported in medical literature is only There are tens.Patient with this duplication is often presented that Muscle tensility is low, feeds bad and infancy arrest of development, and And also presentation action and the development of language milestone delays.Many individuals with PTLS have tired in pronunciation and Language Processing It is difficult.In addition, patient may have similar to the behavioural characteristic seen in self-closing disease or autism spectrum disorder patient.With PTLS Individual may suffer from heart defect and sleep apnea.Large area in chromosome 17p12 including gene PMP22 Te-Ma Li-tell this disease (Charcot-Marie-Tooth disease) can be caused to investigate known to duplication.
CNV is relevant with stillbirth.However, due to the inherent limitations of conventional cell science of heredity, it is taken as that CNV causes stillbirth It is (Harris (Harris) et al., the pre-natal diagnosis (Prenatal Diagn) 31 not represented fully:932-944 [2011]).As shown by example and described at other herein, this method can determine the presence of part aneuploidy, Such as missing and the amplification of chromosome segment, and available for the existence or non-existence for differentiating and determining the CNV relevant with stillbirth.
It is determined that complete fetal chromosomal aneuploidy
In one embodiment, there is provided for true in the parent test sample comprising fetus and maternal nucleic acids molecule The fixed method presence or absence of any one or more of different, complete fetal chromosomal aneuploidies.Preferably, the party Method is determined presence or absence of any four kinds or more different, the complete fetal chromosomal aneuploidies of kind.This method Step includes:(a) obtain for the fetus in parent test sample and the sequence information of maternal nucleic acids;And use should (b) Sequence information being directed to each in any one or more chromosomes interested selected from chromosome 1-22, X and Y and A number of sequence label is identified, and for being used for each in any one or more described chromosomes interested One normalization chromosome sequence and identify a number of sequence label.This normalization chromosome sequence can be one Individual monosome, or it can be the group chromosome selected from chromosome 1-22, X and Y.This method is further in step (c) The middle number using for each described sequence label identified in any one or more described chromosomes interested The number of mesh and the sequence label identified for each normalization chromosome sequence is come for described any one Each in individual or multiple chromosomes interested calculates a monosome dosage;And described any one will be directed to (d) The each monosome dosage of each in individual or multiple chromosomes interested with for it is described any one or it is more Each threshold value in individual chromosome interested is compared, and thus determines to exist in the maternal test sample Or in the absence of any one or more of complete, different fetal chromosomal aneuploidy.
In some embodiments, step (c) includes calculating a single dye for each chromosome interested Colour solid dosage, as emerging with being directed to each sense for the sequence label number that each chromosome interested identifies The ratio for the sequence label number that the normalization chromosome sequence of the chromosome of interest identifies.
In other embodiments, step (c) includes calculating a single dye for each chromosome interested Colour solid dosage, as emerging with being directed to each sense for the sequence label number that each chromosome interested identifies The ratio for the sequence label number that the normalization chromosome of the chromosome of interest identifies.In other embodiments, step (c) include being closed by the sequence label number and the length of chromosome interested for making to obtain for chromosome interested Join and make the number of tags and normalization chromosome sequence for the corresponding normalization chromosome sequence of chromosome interested The length of row is associated to calculate the sequence label ratio of a chromosome interested, and is directed to chromosome interested To calculate a chromosome dosage as the sequence label density of chromosome interested with being directed to normalization chromosome sequence The ratio of sequence label density.The calculating is repeated for all sequences interested each.For from different parents by The test sample of examination person can be with repeat step (a)-(d).
By an example of the embodiment a mixture comprising fetus and parent Cell-free DNA molecule mother Four kinds or more the complete fetal chromosomal aneuploidies of kind are determined in body examination test agent, the example includes:(a) to acellular At least a portion in DNA molecular is sequenced to obtain for fetus in the test sample and the Cell-free DNA of parent The sequence information of molecule;(b) using the sequence information come be directed in chromosome 1-22, X and Y each is interested Any 20 or more chromosomes identify a number of sequence label and to be directed to described interested 20 An each normalization chromosome identifies a number of sequence label in individual or more chromosome;(c) use is directed to The number of the sequence label each identified in 20 or more the chromosomes interested and for every The number for the sequence label that individual normalization chromosome identifies is come in 20 or more the chromosomes interested Each calculate a monosome dosage;And by for 20 or more the chromosomes interested (d) In each monosome dosage of each with for each one in 20 or more the chromosomes interested Individual threshold value is compared, and thus different presence or absence of any 20 kinds or more kinds in the test sample to determine , complete fetal chromosomal aneuploidy.
In another embodiment, it is as described above to be used to determine in parent test sample presence or absence of any It is true that the method for one or more different, complete fetal chromosomal aneuploidies has used a normalization sector sequence to be used for The dosage of fixed chromosome interested.In this case, this method includes:(a) obtain for the fetus in the sample With the sequence information of maternal nucleic acids;And it is selected from appointing for chromosome 1-22, X and Y using the sequence information to be directed to (b) Each in what one or more chromosome interested identifies a number of sequence label, and is directed to for described Each one normalization chromosome sequence in any one or more chromosomes interested identifies sequence label One number.The normalization sector sequence can be single section of chromosome, or it can be from one or more different One group of section of chromosome.This method is further using for any one or more described dyes interested in step (c) Each described sequence label number identified in colour solid and described in identifying for the normalization sector sequence Sequence label number calculates a monosome to be directed to each in any one or more described chromosomes interested Dosage;And by for each simple stain of each in any one or more described chromosomes interested (d) And thus body dosage is compared with for each threshold value in one or more of chromosomes interested, To determine in the sample presence or absence of one or more different, complete fetal chromosomal aneuploidies.
In some embodiments, step (c) includes calculating a single dye for each chromosome interested Colour solid dosage, as emerging with being directed to each sense for the sequence label number that each chromosome interested identifies The ratio for the sequence label number that the normalization sector sequence of the chromosome of interest identifies.
In other embodiments, step (c) includes the sequence label number by making to obtain for chromosome interested Mesh is associated with the length of chromosome interested and makes the corresponding normalization section sequence for chromosome interested The number of tags of row and the length of normalization sector sequence are associated to calculate the sequence label of a chromosome interested Ratio, and sequence of the chromosome dosage as the chromosome interested is calculated for the chromosome interested The ratio of column label density and the sequence label density for normalization sector sequence.For each of sequence all interested The individual repetition calculating.Test sample repeat step (a)-(d) from different female subjects can be directed to.
By determining that normalized chromosome value (NCV) provides the hand of the chromosome dosage for more different sample sets Section, this carries out the average value of the chromosome dosage corresponding with one group of qualified samples of the chromosome dosage in test sample Association.This NCV is calculated, as:
WhereinWithIt is accordingly for j-th of the chromosome dosage in one group of qualified samples Average value and standard deviation are estimated, andIt is for j-th of chromosome dosage observed by test sample i.
In some embodiments, it is determined that presence or absence of at least one complete fetal chromosomal aneuploidy. In other embodiments, determined in a sample presence or absence of at least two, at least three kinds, at least four, extremely Few five kinds, at least six kinds, at least seven kinds, at least eight kinds, at least nine kinds, at least ten kinds, at least ten a kind of, at least two kinds of stone, extremely Few 13 kinds, at least 14 kinds, at least 15 kinds, at least 16 kinds, at least 17 kinds, at least 18 kinds, at least 19 kinds, extremely Few 20 kinds, at least 20 one kind, at least at least 22 kinds, 23 kinds or 24 kinds of complete fetal chromosomals are non- Ortholoidy, wherein 22 kinds in complete fetal chromosomal aneuploidy correspond to it is any one or more of autosomal Complete chromosome aneuploidy;23rd and the 24th kind of chromosome aneuploidy correspond to chromosome x and Y Complete fetal chromosomal aneuploidy.Because the aneuploidy of sex chromosome can include tetrasomy, five body constituents and other Polysomy, it is possible to which the number of the different complete chromosome aneuploidy determined according to this method can be at least 24 Kind, at least at least 25 kinds, at least 26 kinds, at least 27 kinds, at least 28 kinds, 29 kinds or at least 30 kinds complete chromosomes are non-whole Ploidy.Therefore, the number for the different complete chromosome aneuploidy being determined is with selecting for the interested of analysis The number of chromosome is related.
In one embodiment, it is as described above to determine in parent test sample presence or absence of any one Or multiple different, complete fetal chromosomal aneuploidies have used the normalization section for a chromosome interested Sequence, it is to be selected from chromosome 1-22, X and Y.In other embodiments, two or more chromosomes interested are choosings From appointing in chromosome 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X or Y What two or more.In one embodiment, any one or more selected from chromosome 1-22, X and Y are interested Chromosome includes at least 20 chromosomes selected from chromosome 1-22, X and Y, and wherein determines presence or absence of extremely Few 20 kinds different, complete fetal chromosomal aneuploidies.In other embodiments, selected from chromosome 1-22, X and Y Any one or more chromosomes interested be whole chromosome 1-22, X and Y, and wherein determine presence or not In the presence of whole chromosome 1-22, X and Y complete fetal chromosomal aneuploidy.Confirmable complete different fetus dyes Colour solid aneuploidy includes complete chromosome trisomy, complete chromosome monosomy and complete chromosome polysomy.Completely The example of fetal chromosomal aneuploidy includes but is not limited to:Any one or more autosomal trisomys, such as three Body 2, trisomy 8, trisomy 9, trisomy 20, trisomy 21, trisomy 13, trisomy 16, trisomy 18, trisomy 22; The trisomy of sex chromosome, such as 47, XXY, 47XXX and 47XYY;The tetrasomy of sex chromosome, such as 48, XXYY, 48, XXXY, 48XXXX and 48, XYYY;The five body constituents of sex chromosome, such as 49, XXXYY, 49, XXXXY, 49, XXXXX, 49, XYYYY;And monosomy X.It will be described below other the complete fetal chromosomal aneuploidies that can be determined according to this method.
It is determined that partial fetal chromosomal aneuploidy
In another embodiment, there is provided in the parent test sample comprising fetus and maternal nucleic acids molecule It is determined that the method presence or absence of any one or more of different, part fetal chromosomal aneuploidies.This method Step includes:(a) sequence information of the fetus being directed in the sample and maternal nucleic acids is obtained;And believed using the sequence (b) Breath come be directed to selected from chromosome 1-22, X and Y any one or more chromosomes interested any one or more Each in section identifies a number of sequence label, and is directed to and is used for any one or more dyeing interested Each normalization sector sequence in any one or more described sections in body identifies the one of sequence label Individual number.The normalization sector sequence can be single section of a chromosome, or it can come from one or more not One group of section of homologous chromosomes.This method further uses any one or more for described in interested in step (c) The number for the sequence label that any one or more sections of chromosome identify and for each normalization area The number of the sequence label that section recognition sequence goes out is any come any one or more chromosomes interested described in be directed to Each in one or more sections calculates a single section dosage;And will be directed to (d) it is described any one or more Each monosome dosage of each in any one or more sections of chromosome interested for described with appointing What one or more any one or more chromosome segments of chromosome threshold value of each interested is compared, And thus determine in the sample presence or absence of one or more different, part non-multiples of fetal chromosomal Property.
In some embodiments, step (c) is included for any the one of any one or more chromosomes interested Each in individual or multiple sections calculates a single section dosage, as any one or more dyeing interested Each sequence label number identified in any one or more sections of body with for it is described any one or more The sequence that the normalization sector sequence of each in any one or more sections of chromosome interested identifies The ratio of number of tags.
In other embodiments, step (c) includes calculating a sequence mark for a section interested as follows Sign ratio:Closed by the number and the length of section interested for the sequence label for making to obtain for section interested Join and make the number and normalization sector sequence of the label of the corresponding normalization sector sequence for section interested Length is associated and calculates sequence label of the section dosage as section interested for section interested The ratio of density and the sequence label density for the normalization sector sequence.For each weight of sequence all interested The multiple calculating.Test sample repeat step (a)-(d) from different female subjects can be directed to.
By determining that a normalized section value (NSV) provides the hand of the section dosage for more different sample sets Section, this makes the section dosage in a test sample be closed with the average value of the corresponding section dosage in one group of qualified samples Connection.NSV is calculated, as:
WhereinWithIt is accordingly that estimation for j-th of the section dosage in one group of qualified samples is put down Average and standard deviation, and xijIt is for j-th of section dosage observed by test sample i.
In some embodiments, it is determined that presence or absence of a kind of fetal chromosomal aneuploidy of part.At it In his embodiment, determined in a sample presence or absence of two kinds, three kinds, four kinds, five kinds, six kinds, seven kinds, eight Kind, nine kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds, or more kind part fetal chromosomal aneuploidy.In a reality Apply in scheme, any one section interested in chromosome 1-22, X and Y be selected from chromosome 1-22, X, and Y.In another embodiment, two or more sections interested selected from chromosome 1-22, X and Y are selected from dye Colour solid chromosome 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X or Y.One In individual embodiment, any one or more sections interested selected from chromosome 1-22, X and Y include being selected from chromosome 1- 22nd, X and Y at least one, five, ten, 15,20,25 or more sections, and wherein determine presence or In the absence of at least one, five kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds of different, part fetal chromosomal aneuploidies.Can be true Fixed different, part fetal chromosomal aneuploidy includes part duplication, part multiplication, partial insertion and part and lacked Lose.The example of partial fetal chromosomal aneuploidy includes autosomal partial monoploidy and partial trisomy.Often dyeing The partial monoploidy of body includes the partial monoploidy of chromosome 1, the partial monoploidy of chromosome 4, the partial monosomy of chromosome 5 Property, the partial monoploidy of chromosome 7, the partial monoploidy of chromosome 11, the partial monoploidy of chromosome 15, the portion of chromosome 17 Divide the partial monoploidy of monosomy, the partial monoploidy of chromosome 18 and chromosome 22.Will be described below can be according to we The fetal chromosomal aneuploidy for the other parts that method determines.
In any of the above embodiments, this test sample is to be selected from blood, blood plasma, serum, urine and saliva sample The maternal sample of product.In some embodiments, the parent test sample is plasma sample.The nucleic acid molecules of the maternal sample are The mixture of fetus and parent Cell-free DNA molecule.It can use such as in the illustrated elsewhere next of the application Generation sequencing (NGS) carries out the sequencing of nucleic acid.In some embodiments, sequencing is to use the synthesis by reversible dye-terminators The large-scale parallel sequencing of method sequencing.In other embodiments, sequencing is connection method sequencing.In other other embodiments In, sequencing is single-molecule sequencing.Optionally, an amplification step is carried out before sequencing.
Determine the CNV of clinical disease
In addition to early stage determines inborn defect, the genetic sequence that method described here can be used for determining in genome exists Any exception in expression.Abnormal number of the genetic sequence in expression in genome is relevant from different symptom.It is such Symptom includes but is not limited to cancer, infectiousness and autoimmune disease, the nervous system disease, metabolism and/or painstaking effort Pipe disease etc..
Correspondingly, considered in different embodiments by method described herein be used for diagnose and/or monitor and/or Treat the purposes of these symptom.For example, these methods are determined for the existence or non-existence of disease, monitor disease The effect of progress and/or therapeutic scheme, determine that the existence or non-existence, determination and graft of pathogen (such as virus) nucleic acid resist The effect of the relevant chromosome abnormality of host disease (GVHD) and determination individual in forensic analysis.
The CNV of cancer
It has been proved that blood plasma and serum CRP from cancer patient contain the Tumour DNA of measurable value, it can be with It is recovered and is used as the alternative source of Tumour DNA, and the feature of tumour is aneuploidy or gene order or even complete dye The inappropriate number of colour solid.It is determined that in the amount of the given sequence (sequence i.e. interested) in an individual sample Difference can thus be accordingly used in the prognosis and diagnosis of medical condition.In some embodiments, this method is determined for Suspect or the known patient with cancer in existence or non-existence chromosome aneuploidy.
In certain embodiments, aneuploidy is the feature of the genome of subject and causes cancer liability Overall raising.In certain embodiments, be susceptible to suffer from tumour formed or tumour formed liability improve specific cells (for example, Tumour cell, former tumour neoplastic cell etc.) there is aneuploidy feature.Specific aneuploidy and particular cancers or particular cancer Disease liability is relevant, as described below.
Correspondingly, the different embodiments of said method provide sequence interested in the test sample to subject Arrange the measure of (such as clinically relevant sequence) copy number variation, certain variation of wherein copy number provide to exist cancer and/ Or the index of cancer liability.In certain embodiments, the sample includes the nucleic acid from two or more cells Mixture.In one embodiment, the mixtures of nucleic acids derives from normal cell and cancer cell, and cancer cell is derived from suffering from The subject of Medical Condition (such as cancer).
The development of cancer is frequently accompanied by the change of whole chromosome number, i.e., complete chromosomal aneuploidy, and/or chromosome The change of number of sections, i.e. part aneuploidy, these changes are due to the process for being referred to as chromosome instability (CIN) (Tom (Thoma) et al., Switzerland's medical science weekly (Swiss Med Weekly) 2011:141:w13170).It is it is believed that a lot Solid tumor (such as breast cancer) is by the accumulation of some genetic freaies and from starting to develop into transfer.[Sa Tuo (Sato) et al., cancer Study (Cancer Res.), 50:7184-7189[1990];Jian Sima (Jongsma) et al., clinicopathologia magazine:Molecule Pathology (J Clin Pathol:Mol Path)55:305-309[2002])].Such genetic freak may assign when accumulation Hyperplasia sexual clorminance, genetic instability and fast-developing drug-fast subsidiary ability and angiogenesis enhancing, protein hydrolysis And transfer.Genetic freak may influence " tumor suppressor gene " of recessiveness or the oncogene of dominance action.Lack and cause heterozygosis Property lose the recombinant of (LOH) and be considered as playing main work in tumour progression by disclosing the tumor suppression allele of mutation With.
CfDNA is had been found in the circulatory system of patient of the diagnosis with malignant diseases, and these malignant diseases are included but not It is limited to lung cancer (Pa Saka (Pathak) et al., clinical medicine 52:1833-1842 [2006]), prostate cancer (Xue Hua Zibaqi (Schwartzenbach) et al., Clinical Cancer Research (Clin Cancer Res) 15:1032-8 [2009]) and breast cancer (Xue Hua Zi Baqi et al., it can be obtained online [2009] in breast-cancer-research.com/content/11/5/R71).Mirror The genomic instability relevant with cancer (being determined according to the circulation cfDNA of cancer patient) is not a kind of potential diagnosis And prognostic tool.In one embodiment, method described herein is used to determination sample (such as comprising mixtures of nucleic acids Sample, these nucleic acid sources are suffered from or the known subject with cancer in suspecting, for example, cancer, sarcoma, lymthoma, leukaemia, Gonioma and blastoma) in one or more sequences interested CNV.In one embodiment, the sample It is the plasma sample that peripheral blood derives (through processing), the peripheral blood may be included from normal cell and cancer cell CfDNA mixture.In another embodiment, it is thus necessary to determine that be derived from other lifes with the presence or absence of CNV biological sample The cell of thing tissue, if cancer be present, the cell includes cancer cell and the mixture of non-cancerous cells, other biological tissue Including but not limited to biological fluid, such as serum, sweat, tears, phlegm, urine, phlegm, ear effluent, lymph, saliva, myelencephalon Liquid, irrigating solution, bone marrow floater liquid, vaginal fluid, transcervical irrigating solution, brain fluid, ascites, milk, respiratory tract, enteron aisle and The juice of genitourinary tract, and leucopheresis sample, or in tissue biopsy, cotton swab or smear.In other implementations In scheme, the biological sample is stool (excrement) sample.
Method described herein is not limited to cfDNA analysis.It will be appreciated that similar analysis can be carried out to cell DNA sample.
In different embodiments, sequence interested includes known or suspection and risen in cancer development and/or progress The nucleotide sequence of effect.The example of sequence interested is included in the nucleic acid sequence for expanding or lacking in cancer cell as described below Row, such as complete chromosome and/or chromosome segment.
Total CNV numbers and risk of cancer.
Common cancer SNPs and common cancer CNVs by that analogy each make disease risks only produce small increase.So And in general, they may cause risk of cancer substantially to raise.On this point, it should be noted that it is reported that big DNA pieces The germline of section, which is obtained and lost as individual, is susceptible to suffer from neuroblastoma, prostate cancer and colorectal cancer, breast cancer and BRCA1 phases Close oophoroma factor (see, for example, gram Lay strange (Krepischi) et al., breast cancer research (Breast Cancer Res.), 14:R24[2012];Di Sijin (Diskin) et al., natural (Nature) 2009,459:987-991;Liu (Liu) et al., cancer Study (Cancer Res) 2009,69:2176-2179;Lu Situo (Lucito) et al., carcinobiology and treatment (Cancer Biol Ther)2007,6:1592-1599;Si En (Thean) et al., gene chromosome cancer (Genes Chromosomes Cancer)2010,49:99-106;Fan Katachalan (Venkatachalam) et al., international journal of cancer (Int J Cancer)2011,129:1635-1642;(Yoshihara) former with Ji et al., gene chromosome cancer (Genes Chromosomes Cancer)2011,50:167-177).It is noted that the CNVs found often in healthy population is (common CNVs) it is considered as working in cancer teiology (see, for example, silk woods (Shlien) and golden (Malkin) (2009) gene of wheat Group medical science (Genome Medicine), 1 (6):62).In a research test, hypothesis below is tested:Common CNVs with it is pernicious Sick (silk woods (Shlien) et al., NAS's proceeding (Proc Natl Acad Sci USA) 2008,105:11264- 11269) relevant, this is a kind of each known CNV mapping, and its locus is consistent with the locus of true cancer related gene (as breathed out golden (Higgins) et al., nucleic acids research (Nucleic Acids Res) 2007,35:Classified in D721-726).This A little CNV are referred to as " cancer CNVs ".In initially analysis (silk woods (Shlien) et al., NAS's proceeding (Proc Natl Acad Sci USA)2008,105:In 11264-11269), A Feimei 500K (Affymetrix 500K) array collection (its is used Distance is 5.8kb between average probe) assess 770 healthy genomes.It is excluded due to being generally considered CNVs in gene regions (thunder Tang (Redon) et al. (2006), natural (Nature) 2006,444:444-454), therefore it was surprisingly found that one In more people of individual restricted publication of international news and commentary entitled population, 49 cancer genes are directly covered or overlapping by CNV., can be four in preceding ten genes Cancer CNVs is found in individual or more people.
Result, it is believed that it can be used CNV frequencies as the measurement of risk of cancer (see, for example, U.S. Patent Publication No.:2010/ 0261183A1).CNV frequencies can determine or it can represent and derive from simply by the constitutive gene group of organism The part of one or more tumours (neoplastic cell) (if these are present).
In certain embodiments, test sample is determined using the method being directed to herein described in copy number variation (such as to wrap Sample containing composition (germline) nucleic acid) in or mixtures of nucleic acids (such as germline nucleic acid and nucleic acid from neoplastic cell) In CNVs numbers.Identify that the CNVs numbers in test sample improve (such as compared with reference value) and represent that subject has cancer Risk has cancer liability.It should be understood that reference value can become with specified population.It should also be understood that CNV frequency amplification is absolute Value will become depending on the resolution ratio of the method for determining CNV frequencies and other specification.Typically, the increasing of CNV frequencies is determined Add as reference value at least about 1.2 times represent risk of cancer (see, for example, U.S. Patent Publication No.:2010/0261183A1), example As for example, the increase of CNV frequencies is at least 1.5 times or about 1.5 times of reference value or bigger (2 to 4 times of such as reference value) are cancers The index (for example, referring to groupy phase ratio with normal health) that disease risk improves.
It is additionally considered that the structure variation (compared with reference value) for determining mammalian genome represents risk of cancer.On herein Hereinafter, in one embodiment, term " structure variation " can be multiplied by being averaged for mammal with the CNV frequencies of mammal CNV sizes (bp) are defined.Therefore, high structure variation fraction will increase because of CNV frequencies and/or because big genome core occurs What acid was lacked or replicated.Therefore, in certain embodiments, using method described herein determine test sample (for example, comprising The sample of composition (germline) nucleic acid) in CNVs numbers, with measure copy number variation size and number.In some embodiments In, greater than about 1 megabasse or greater than about 1.1 megabasses or greater than about 1.2 megabasses or greater than about 1.3 megabasses or it is more than About 1.4 megabasses or greater than about 1.5 megabasses or greater than about 1.8 megabasses or greater than about 2 megabasse DNA genomic DNA Interior structure variation total score represents risk of cancer.
These methods are considered as providing the measurement of any risk of cancer, and these cancers are including but not limited to acute and chronic Leukaemia, lymthoma, many solid tumors, the cancer of the brain, breast cancer, liver cancer, stomach cancer, colon cancer, the B cell lymph of interstitial or epithelial tissue Knurl, lung cancer, bronchiolar carcinoma, colorectal cancer, prostate cancer, breast cancer, cancer of pancreas, stomach cancer, oophoroma, carcinoma of urinary bladder, the cancer of the brain or in Pivot nervous system cancer, peripheral neverous system cancer, cancer of the esophagus, cervical carcinoma, melanoma, uterine cancer or carcinoma of endometrium, mouth It is chamber cancer or pharynx cancer, liver cancer, kidney, cancer of bile ducts, small intestine or appendix cancer, salivary-gland carcinoma, thyroid cancer, adrenal, osteosarcoma, soft Osteosarcoma, sarcolipoma, carcinoma of testis and MFH and other cancers.
Complete chromosomal aneuploidy.
It is as noted above, high-frequency aneuploidy in cancer be present.Checking body cell copy number variation (SCNAs) in some researchs of the prevalence rate in cancer, it has been found that full the arm SCNAs or whole chromosome of aneuploidy SCNAs has an impact (natural see, for example, uncle soft golden (Beroukhim) et al. to a quarter genome of typical cancer cell (Nature)463:899-905[2010]).Whole chromosome variation is repeatedly observed in some cancer types.Such as example, In 10% to 20% acute myelocytic leukemia (acute myeloid leukaemia, AML) case, and some entities The acquisition of chromosome 8 is seen (see, for example, Bayer in knurl (including Ai Wen sarcomas (Ewing ' s Sarcoma) and fiber-like knurl) Nader (Barnard) et al., leukaemia (Leukemia) 10:5-12[1996];Thatch Ritz (Maurici) et al., cancer heredity Learn and cytogenetics (Cancer Genet.Cytogenet.) 100:106-110[1998];Very (Qi) et al., cancer heredity Learn and cytogenetics (Cancer Genet.Cytogenet.) 92:147-149[1996];Bayer Nader D.R. (Barnard, D.R.) et al., blood (Blood) 100:427-434[2002];Etc..The signal that chromosome is obtained and lost in human cancer Property but non-limiting catalogue is shown in table 1.
Table 1:The acquisition of schematical particular rendition chromosome and loss are (see, for example, Gordon (Gordon) in human cancer Et al. (2012), naturally summarize science of heredity (Nature Rev.Genetics), 13:189-203).
In different embodiments, method described herein can be used for detection and/or quantify generally relevant with cancer And/or the whole chromosome aneuploidy relevant with specific cancer.Thus, for example, in certain embodiments, it is contemplated that detection And/or quantify to be characterized in that with the acquisition shown in table 1 or the whole chromosome aneuploidy of loss.
The horizontal chromosome segment copy number variation of arm.
Multinomial research has reported that the horizontal copy number variation of arm across the pattern of a large amount of cancer samples (woods (Lin) et al., grind by cancer Study carefully (Cancer Res) 68,664-673 (2008);George (George) et al., PLoS ONE 2, e255 (2007);Dai meter Che In this (Demichelis) et al., gene chromosome cancer (Genes Chromosomes Cancer) 48:366-380 (2009);Uncle soft golden (Beroukhim) et al., natural (Nature.) 463 (7283):899-905[2010]).See in addition Examine and observe, the frequency of the horizontal copy number variation of arm reduces with chromosome arm length.According to this tendency adjustment, most of dye The preferential strong evidence for obtaining or losing of colour solid arm performance, but it is rare (see, for example, the soft gold of uncle across multiple cancer pedigrees, both of which (Beroukhim) et al., natural (Nature) 463 (7283):899-905[2010]).
Therefore, in one embodiment, the horizontal CNVs of arm that method described here is used in determination sample (includes one The CNVs of individual chromosome arm or substantially one chromosome arm).In the test sample comprising composition (germline) nucleic acid In CNVs, CNVs can be measured, and in a little composition nucleic acid, the horizontal CNVs of arm can be identified.In some embodiment party In case, the sample of mixtures of nucleic acids (for example, the nucleic acid from normal cell and nucleic acid from neoplastic cell) is being included The horizontal CNVs (if present) of arm is identified in product.In certain embodiments, sample source is in suspecting or known suffer from cancer (example Such as, cancer, sarcoma, lymthoma, leukaemia, gonioma, blastoma and similar cancer) subject.In an implementation In scheme, sample is the plasma sample that peripheral blood derives (through processing), and the peripheral blood can be included from normal thin Born of the same parents and the cfDNA of cancer cell mixture.In another embodiment, for CNV existing for determination biological sample whether From cell, if cancer be present, these cells include mixing for cancer cell from other biological tissue and non-cancerous cells Compound, the other biological tissue include but is not limited to biological fluid, such as serum, sweat, tears, phlegm, urine, phlegm, ear stream Go out thing, lymph, saliva, celiolymph, irrigating solution (ravages), bone marrow floater liquid, vaginal fluid, transcervical irrigating solution, brain Fluid, ascites, milk, respiratory tract, enteron aisle and genitourinary tract juice, and leukapheresis sample, or lived in tissue In inspection, cotton swab or smear.In other embodiments, biological sample is excrement (excrement) excrement (excrement) sample.
It is identified to represent that cancer is present or the CNVs of risk of cancer increase includes but is not limited in different embodiments The cited horizontal CNVs of arm in table 2.Illustrated by such as in table 2, including the horizontal some CNVs obtained of substantive arm represent to deposit Increase in cancer or some risk of cancer.Thus, for example, 1q obtain represent Acute Lymphoblastic Leukemia (ALL), breast cancer, GIST, HCC, lung NSC, medulloblastoma, melanoma, MPD, oophoroma and/or prostate cancer exist or risk increase.3q is obtained Represent that esophageal squamous cell carcinoma, lung SC and/or MPD are present or risk increases.7q, which is obtained, represents colorectal cancer, neuroglia Knurl, HCC, lung NSC, medulloblastoma, melanoma, prostate cancer and/or kidney exist or risk increase.7p, which is obtained, represents breast Cancer, colorectal cancer, esophageal adenocarcinoma, glioma, HCC, lung NSC, medulloblastoma, melanoma, and/or kidney exist or Risk increases.20q, which is obtained, represents breast cancer, colorectal cancer, dedifferentiated liposarcoma, esophageal adenocarcinoma, esophageal squamous cell carcinoma, neuroglia Matter knurl cancer, HCC, lung NSC, melanoma, oophoroma, and/or kidney etc. presence or risk increase.
Similarly, as illustrated by table 2, including the horizontal some CNVs lost of substantive arm represent that some cancers are present And/or risk increase.Thus, for example, 1p, which loses, represents that gastrointestinal stromal tumors are present or risk increases.4q, which loses, represents that colon is straight Intestinal cancer, esophageal adenocarcinoma, lung sc, melanoma, oophoroma and/or kidney exist or risk increase.17p, which loses, represents breast cancer, colon The carcinoma of the rectum, esophageal adenocarcinoma, HCC, lung NSC, lung SC, and/or oophoroma etc. presence or risk increase.
Table 2:16 kinds of cancer subtypes (breast cancer, colorectal cancer, dedifferentiated liposarcoma, esophageal adenocarcinoma, esophageal squamous cell carcinoma, GIST (gastrointestinal stromal tumors), glioma, HCC (hepatocellular carcinoma), lung NSC, lung SC, medulloblastoma, melanoma, MPD (myeloproliferative disorders), oophoroma, prostate cancer, Acute Lymphoblastic Leukemia (ALL) and kidney) each in The notable horizontal chromosome segment copy number variation of arm is (see, for example, uncle soft golden (Beroukhim) et al., natural (Nature) (2010)463(7283):899-905)。
The example of relation is it is intended that illustrative and not restrictive between the horizontal copy number variation of arm.Other arm levels are copied Shellfish number variation and its cancer have been known to those skilled in the art.
Smaller (such as focus) copies number variation.
As noted above, in certain embodiments, method described here can be used for depositing for measure chromosome amplification Or be not present.In some embodiments, chromosome amplification is the acquisition of one or more whole chromosomes.In other implementations In scheme, chromosome amplification is the acquisition of one or more sections in chromosome.Still in other other embodiments, chromosome Amplification is the acquisition of two or more sections in two or more chromosomes.In different embodiments, chromosome expands The acquisition of one or more oncogenes can be related to by increasing.
The dominant acting gene associated with human entity knurl plays it typically via the expression for being overexpressed or changing Effect.Gene magnification is a kind of common mechanism for causing gene expression to be raised.Evidence table from cytogenetical study Bright, there occurs notable amplification in people's breast cancer more than 50%.Most notably, positioned at (17 (the 17q21- of chromosome 17 Q22 the amplification of the proto-oncogene human epidermal growth factor receptor 2 (HER2) on)) causes the HER2 acceptors on cell surface Overexpression, so as to excessive and dysregulation signal (Park (Piao) etc. in breast cancer and other malignant tumours is caused People, Clinical Breast Cancer (clinical breast cancer), 8:392-401[2008]).In other human malignancies It is found that a variety of oncogenes are amplified.The example that cellular oncogene expands in human tumor includes the amplification of the following:Preceding marrow C-myc in cell leukemia cell line HL60 and ED-SCLC, primary neuroblastoma (stage III and IV), Neuroblastoma cell line, Retinoblastoma Cells system and primary tumo(u)r and small cell lung cancer cell system and tumour In N-myc, the L-myc in small cell lung cancer cell system and tumour, in acute myelocytic leukemia and colon carcinoma cell line In c-myb, the c-erbb in epidermoid carcinoma cell and primary glioma, lung, colon, the original of bladder and rectum The c-K-ras-2 in cancer is sent out, (Varmus (Wa Musi) H., Ann Rev Genetics (lose the N-ras in breast cancer cell line Pass academic year mirror), 18:553-612 (1984), [quote in Watson (Watson) et al., Molecular Biology of the Gene (molecular biology of gene) (the 4th edition;Benjamin/Cummings Publishing Co. companies 1987)].
Oncogene replicates the common etiology for the cancer for being many types, and the amplification of P70-S6 kinases 1 and breast cancer are exactly this feelings Condition.In such cases, genetic replication betides in body cell and only influences cancer cell itself (rather than whole organism) Genome, the influence for any later filial generation is then much smaller.Other examples of the oncogene expanded in human cancer Including MYC, ERBB2 (EFGR) in breast cancer, CCND1 (cycle element D1), FGFR1 and FGFR2;MYC in cervical carcinoma and ERBB2;HRAS, KRAS and MYB in cervical carcinoma;MYC, CCND1 and MDM2 in cancer of the esophagus;CCNE, KRAS in stomach cancer and MET;ERBB1 and CDK4 in glioblastoma;CCND1, ERBB1 and MYC in head and neck cancer;CCND1 in hepatocellular carcinoma; MYCB in neuroblastoma;MYC:ERBB2 and AKT2 in oophoroma;MDM2 and CDK4 in sarcoma;ED-SCLC In MYC.In one embodiment, the inventive method can be used for the amplification presence for determining the oncogene relevant with cancer or not In the presence of.In certain embodiments, the oncogene expanded and breast cancer, cervical carcinoma, colorectal cancer, cancer of the esophagus, stomach cancer, colloid Blastoma, head and neck cancer, hepatocellular carcinoma, neuroblastoma, oophoroma, sarcoma and ED-SCLC are relevant.
In one embodiment, this method may be used to determine whether presence or absence of a kind of chromosome deficiency.One In a little embodiments, this chromosome deficiency is to lose one or more complete chromosomes.In other embodiments, this dye Colour solid missing is to lose one or more sections of chromosome.In other other embodiments, this chromosome deficiency is to lose Lose two or more sections of two or more chromosomes.It is one or more swollen that this chromosome deficiency can be related to loss Knurl suppressor.
Being related to the chromosome deficiency of tumor suppressor gene is considered as playing a kind of important work in the development of solid tumor and progress With.Retinoblastoma tumor suppressor gene (Rb-1) (being located at chromosome 13q14) is the tumour suppression most widely characterized Gene processed.Rb-1 gene outcomes (a kind of 105kDa nuclear phosphoprotein) obviously play an important role in cell cycle regulating (Howe (person of outstanding talent according to) et al., Proc Natl Acad Sci (NAS's proceeding) (U.S.), 87:5883-5887 [1990]).By by a point mutation also or the inactivation of the allele of the two genes of chromosome deficiency causes Rb albumen Change or loss expression.It has been found that Rb-i gene alterations are not only present in retinoblastoma, but also deposit In other malignant tumours, such as osteosarcoma, ED-SCLC, (Rygaard (Rui Gede) et al., Cancer Res (grind by cancer Study carefully), 50:5312-5317 [1990)]) and breast cancer.RFLP (RFLP) is studied it has been shown that such swollen Knurl type often lost heterozygosity in 13q, prompt due to one of total chromosome deficiency, the allele of Rb-1 genes Be lost (Bowcock (primary cock) et al., Am J Hum Genet (American Journal of Human Genetics), 46:12[1990]).Bag Include be related to chromosome 6 and other with it is x linked replicate, the chromosome 1 of missing and unbalanced translocation is abnormal shows chromosome 1 Region, particularly q21-1q32 and 1p11-13, the chronic and advanced stage with hemoblastosis's property neoplasm may be accommodated Relevant oncogene or tumor suppressor gene (Caramazza (OK a karaoke club horse Sa) et al., Eur J Hematol (European blood in morbidity Liquid magazine), 84:191-200[2010]).Hemoblastosis's property neoplasm is also associated with the missing of chromosome 5.Dyeing The complete loss or intercalary delection of body 5 are most common chromosome abnormalities in myelodysplastic syndrome (MDS).The del of separation (5q)/5q-MDS patient has the prognosis more favourable than those patients with extra caryogram defect, and they tend to develop bone Myeloid tissue proliferative neoplasm (MPN) and acute myelocytic leukemia.The frequency that unbalanced chromosome 5 lacks has been drawn One idea, i.e.,:5q accommodates one or more tumor suppressor genes, and these genes are in candidate stem cell/HPC (HSCsHPC) basic effect is played in growth control.The cytogenetics mapping in the region (CDR) generally lacked concentrates on The candidate tumor suppressor of 5q31 and 5q32 identifications, including ribosomal subunit RPS14, transcription factor Egr1/Krox20 and thin Born of the same parents' skeleton remodeling proteins, α-connection albumen (Eisenmann (Ai Siman), Oncogene (oncogene), 28:3429-3441 [2009]).The cytogenetics and allelotype research of fresh and tumor cell line are it has been proved that come from chromosome 3p On the loss of allele of some clear and definite regions (including 3p25,3p21-22,3p21.3,3p12-13 and 3p14) be in lung Cancer, breast cancer, kidney, head and neck cancer, oophoroma, cervix cancer, colon cancer, cancer of pancreas, cancer of the esophagus, the cancer of carcinoma of urinary bladder and other organs Involved earliest and most common genomic abnormality in the main epithelioma of the wide spectrum of disease.Some tumor suppressor genes by Chromosome 3p region is mapped to, and thinks intercalary delection or promoter high methylation prior to the developing 3p or complete in cancer Loss ((Angeloni (An Geluoni) D., Briefings Functional Genomics (functional genes of whole chromosome 3 Group learns bulletin), 6:19-39[2007]).
Neonate and children with Down syndrome (DS) typically exhibit inborn symptomatic leukemia and with anxious The increased risk of property myelocytic leukemia and Acute Lymphoblastic Leukemia.Chromosome 21 (accommodates about 300 genes) Various structures distortion, such as transposition, missing and amplification in leukaemia, lymthoma and solid tumor can be involved.In addition, Important function of the identified gene on chromosome 21 played in tumour generation.The company of the number of entities of chromosome 21 Isostructural distortion is associated with leukaemia, and specific gene includes RUNX1, TMPRSS2 and TFF, and they are located at 21q, worked in tumour generation (Fonatsch (Feng Nacike) C, Gene Chromosomes Cancer (gene, dyeing Body and cancer), 49:497-508[2010]).
In view of the above, in different embodiments, method described here can be used for determining section CNVs, this Known one or more oncogenes or the tumor suppressor genes and/or known increase with cancer or risk of cancer of including of a little CNVs has Close.In certain embodiments, the CNVs in the test sample comprising composition (germline) nucleic acid can be determined, and at those Can be with identification section in composition nucleic acid.In certain embodiments, comprising mixtures of nucleic acids (for example, from normal thin The nucleic acid of born of the same parents and the nucleic acid from neoplastic cell) sample in identification section CNVs (if present).In some embodiments In, sample source is in suspection or the known cancer that suffers from (for example, cancer, sarcoma, lymthoma, leukaemia, gonioma, mother cell Knurl etc.) subject.In one embodiment, sample is the plasma sample that peripheral blood derives (through processing), and this is outer All blood can include the mixture of the cfDNA from normal cell and cancer cell.In another embodiment, for true Whether the biological sample that Dare CNV surely be present derives from cell, if cancer be present, the cell includes coming from other biological group The cancer cell and the mixture of non-cancerous cells knitted, the other biological tissue include but is not limited to biological fluid, such as serum, Sweat, tears, phlegm, urine, phlegm, ear effluent, lymph, saliva, celiolymph, irrigating solution (ravages), bone marrow floater liquid, the moon Road fluid, transcervical irrigating solution, brain fluid, ascites, milk, respiratory tract, enteron aisle and genitourinary tract juice and leucocyte Exclusion sample, or in tissue biopsy, cotton swab or smear.In other embodiments, biological sample is excrement (excrement) Sample.
For determining that cancer is present and/or the CNVs of risk of cancer increase can include amplification or missing.
It is identified to represent that cancer is present or the CNVs of risk of cancer increase includes institute in table 3 in different embodiments The one or more amplifications shown.
Table 3:It is characterized in that the schematic but nonrestrictive chromosome segment of the amplification relevant with cancer.Cited cancer Disease type is that uncle is soft golden (Beroukhim), natural (Nature) 18:463:Those identified in 899-905.
In certain embodiments, with the Amplification described in (herein) above or respectively, there is cancer in identified expression The CNVs of disease or risk of cancer increase includes one or more missings shown in table 4.
Table 4:It is characterized in that the schematic but nonrestrictive chromosome segment of the missing relevant with cancer.Cited cancer Disease type is that uncle is soft golden (Beroukhim), natural (Nature) 18:463:Those identified in 899-905.
The identified aneuploidy (for example, the aneuploidy identified in table 3 and table 4) for characterizing various cancers can include Know and involve the etiologic etiological gene of cancer (such as tumor suppressor, oncogene etc.).These aneuploidy can also be detected to identify Related but unknown in advance gene.
For example, above-mentioned uncle soft golden (Beroukhim) et al. utilizes GRAIL (gene relationship between the Loci20 involved) (algorithm of functional relation between a kind of muca gene group region), potential oncogene is assessed according to copy number change.It is based on Refer to that text of the open summary of all papers of gene in the viewpoint that some target genes are worked with common pathway is similar Property, GRAIL evaluates ' correlation ' of each gene and the gene in other regions in one group of genome area.These methods permit Perhaps the advance and incoherent gene of specific cancer of identification/sign in dispute.Table 5 is located at identified amplification section known to illustrating With the target gene in predicted gene, and table 6 illustrates the known target in identified deleted segment and predicted gene Gene.
Table 5:The schematic but non-limit that known or prediction is present in the region for the amplification being characterized in that in various cancers Sex chromosome section and gene processed (see, for example, above-mentioned uncle soft golden (Beroukhim) et al.).
Table 6:The schematic but non-limit that known or prediction is present in the region for the amplification being characterized in that in various cancers Sex chromosome section and gene processed (see, for example, above-mentioned uncle soft golden (Beroukhim) et al.).
In different embodiments, it is contemplated that include the amplification identified in table 5 using method for distinguishing identification is known at this The CNV of the section of region or gene, and/or using know at this method for distinguishing identification comprising the absent region identified in table 6 or The CNV of the section of gene.
In one embodiment, these methods described here provide a kind of means to evaluate gene magnification and tumour Relevance between the degree of evolution.Amplification and/or missing and the association between carcinoma stage or grade can be with for prognosis It is important, because this type of information may be constructed the definition of hereditary tumor grade, this can be better anticipated with the worst prognosis More late tumor the following course of disease.In addition, on the information of earlier amplifications and/or deletion events using these events as with Can be useful when being associated afterwards in terms of the predictive factorses of progression of disease.
Can by the gene magnification identified by this method and missing and other known parameters (such as tumor grade, medical history, Brd/Urd labels index, Hormonal States, lymphatic metastasis, tumor size, life span and from epidemiology and biometrics Learn other obtainable tumor characteristics of research) it is associated.Can be with for example, needing the Tumour DNA tested by this method Including atypical hyperplasia, the carcinoma in situ of conduit, stage I-III cancer and lymphnode metastatic, to allow identification expanding Increase the relevance between missing and stage.The association made can make it possible effective therapeutic intervention.For example, The region unanimously expanded can be containing a gene being overexpressed, and perhaps its product can receive therapeutic attachment (for example, growth Factor receptor tyrosine kinases p185HER2)。
In different embodiments, these methods described here are by determining from primary cancer to being transferred into it The copy number variation of those nucleotide sequences of the cell at his position, it can be used for identifying the amplification related to the resistance to the action of a drug and/or lack Have an accident part.If gene magnification and/or missing are a kind of performances for the karyotype instability for allowing the resistance to the action of a drug to develop rapidly, then Compared with the tumour of the patient from chemosensitivity, it will expect more in the primary tumo(u)r of the patient from chemoresistant Amplification and/or missing.For example, if the amplification of specific gene causes drug-fast development, then from chemoresistant It will expect to have obtained consistent amplification around the region of those genes in the tumour cell of patient rather than in primary tumo(u)r. The discovery of relevance between gene magnification and/or missing and development of drug resistance can allow identification can or can not be by Beneficial to the patient of complementary therapy.
With similar to for determining to determine presence or absence of complete and/or partial fetal chromosomal in maternal sample Mode illustrated by body aneuploidy, method, equipment and system described here may be used to determine whether comprising nucleic acid (example Such as DNA or cfDNA) any Patient Sample A (including not being the Patient Sample A of maternal sample) in determine presence or absence of complete And/or partial chromosome aneuploidy.This Patient Sample A can be such as in the illustrated elsewhere of the application Any biological sample type.Preferably, this sample is obtained by non-invasive process.For example, this sample can be Blood sample, or its serum and blood plasma fractions.Alternately, this sample can be urine samples or excrement sample.In its other His embodiment, this sample is a kind of Tissue biopsy samples.In all cases, this sample includes nucleic acid, such as cfDNA Or genomic DNA, it is purified, and is sequenced using any of the above described NGS sequence measurements.
Both formation with cancer and the associated complete and partial chromosome aneuploidy of progress can Determined according to this method.
, can when determining that cancer is present and/or risk increases using method described here in different embodiments With relative to the CNV determined one or more chromosomes by data normalization.In certain embodiments, can be relative to The CNV determined one or more chromosome arms are by data normalization.In certain embodiments, can be relative to being determined CNV the specific sections of one or more by data normalization.
In addition to effects of the CNV in cancer, CNV is also relevant with increasing common complex disease, including people is immunized Defect syndrome virus (HIV), autoimmune disease and a series of Neuropsychiatric disorders.
CNV in communicable disease and autoimmune disease
So far, numerous studies are it is reported that be related to the CNV and HIV of the gene of inflammation and immune response, asthma, Crow Between grace disease (Crohn ' s disease) and other autoimmune conditions relation (Fan Cini (Fanciulli) et al., Clinical genetics (Clin Genet) 77:201-213[2010]).For example, CNV in CCL3L1 with HIV/AIDS neurological susceptibilities (CCL3L1,17q11.2 are lacked), rheumatoid arthritis (CCL3L1,17q11.2 are lacked) and Kawasaki disease (Kawasaki Disease) (CCL3L1,17q11.2 are replicated) implication;CNV in HBD-2, which has been reported, is susceptible to suffer from colonic Crohn disease (HDB- 2,8p23.1 missings) and psoriasis (HDB-2,8p23.1 are lacked);CNV in FCGR3B has shown to be susceptible to suffer from systemic loupus erythematosus In glomerulonephritis (FCGR3B, 1q23 lack, 1q23 replicate), anti-neutrophil's matter antibody (ANCA) relevant blood vessel it is scorching (FCGR3B, 1q23 are lacked), and suffer from the risk increase of rheumatoid arthritis.At least two kinds of inflammation or LADA disease Disease has shown relevant with the CNV of different genes seat.For example, Crohn disease is not only low with HDB-2 copy number relevant, and with volume The common deletion polymorphism of the IGRM upstream region of gene of code p47 immunity correlative GTP ase family members is relevant.Except being copied with FCGR3B Outside shellfish number is relevant, also report that SLE neurological susceptibilities dramatically increase in the relatively low subject of complement part C4 copy numbers.
The genomic deletion and change of GSTM1 (GSTM1,1q23 are lacked) and GSTT1 (GSTT1,22q11.2 are lacked) locus Relation between the increase of answering property asthma risk has been reported in substantial amounts of independent studies.In some embodiments, retouch herein The method stated can be used for the existence or non-existence for determining the CNV relevant with inflammation and/or autoimmune disease.For example, these Method can be used for determining the presence for suspecting CNV in the patient with HIV, asthma or Crohn disease.The CNV relevant with such disease Answering at the missing that example includes but is not limited at 17q11.2,8p23.1,1q23 and 22q11.2, and 17q11.2 and 1q23 System.In some embodiments, the inventive method can be used for the presence for determining CNV in gene, and these genes include but is not limited to CCL3L1, HBD-2, FCGR3B, GSTM, GSTT1, C4 and IRGM.
The CNV diseases of nervous system
Relation between newborn CNV and hereditary CNV and some common neurologys and psychiatric disorders has been reported in In some cases of self-closing disease, schizophrenia and epilepsy and neurodegenerative disease, such as Parkinson's, amyotrophic lateral sclerosis Lateral schlerosis (ALS) and autosomal dominant stages alzheimer's disease (Fan Cini (Fanciulli) et al., clinical genetics (Clin Genet)77:201-213[2010]).Observed in the patient with self-closing disease and autism spectrum disorder (ASD) It is abnormal to there is the cytogenetics replicated at 15q11-q13.According to infantile autism gene group plan alliance (Autism Genome Project Consortium), including some recurrent CNV 154CNV is located at chromosome 15q11-q13 also or new gene Group position, including chromosome 2p16,1q21, and in the lucky syndrome of Smith-horse about, the region overlapping with ASD 17p12.Recurrent on chromosome 16p11.2 is micro-deleted or microreplicated has emphasized following observation result:Newborn CNV is known The locus of the gene of controllable cynapse differentiation and the neurotransmitter release of regulation and control Glutamatergic detects, such as SHANK3 (NLGN4, Xp22.33 are lacked for (22q13.3 missings), the overhanging albumen 1 of presynaptic membrane (NRXN1,2p16.3 are lacked) and neuroglia quality Lose).Schizophrenia is also relevant with multiple newborn CNV.Micro-deleted and microreplicated include relevant with schizophrenia belongs to god Gene through developing with glutamatergic pathways excessively represents, and prompts the multiple CNV for influenceing these genes to directly constitute spirit point Split the pathogenesis of disease, such as ERBB4,2q34 missing;SLC1A3,5p13.3 are lacked;RAPEGF4,2q31.1 are lacked;CIT, 12.24 missings;With the polygenes with newborn CNV.CNV is also relevant with other nervous disorders, including epilepsy (CHRNA7, 15q13.3 is lacked), Parkinson's (SNCA 4q22 duplication) and ALS (SMN1,5q12.2.-q13.3 missing;Lacked with SMN2). In some embodiments, method described here can be used for determining the CNV relevant with the nervous system disease presence or not deposit .For example, these methods can be used for determining to suspect with self-closing disease, schizophrenia, epilepsy, neurodegenerative disease (such as Parkinson's), the presence of CNV in the patient of amyotrophic lateral sclerosis (ALS) or autosomal dominant stages alzheimer's disease. Method can be used for measure (to include but is not limited to autism spectrum disorder (ASD), schizophrenia and epilepsy with the nervous system disease Any one of) CNV of relevant gene, and the CNV of the gene relevant with neurodegenerative illness (such as Parkinson's). The CNV example relevant with such disease includes but is not limited at 15q11-q13,2p16,1q21,17p12,16p11.2 and 4q22 Duplication, and in 22q13.3,2p16.3, Xp22.33,2q34,5p13.3,2q31.1,12.24,15q13.3 and 5q12.2 The missing at place.In some embodiments, these methods can be used for the presence for determining CNV in gene, and these genes are included but not It is limited to SHANK3, NLGN4, NRXN1, ERBB4, SLC1A3, RAPGEF4, CIT, CHRNA7, SNCA, SMN1 and SMN2.
CNV and metabolism or cardiovascular disease
Metabolism and cardiovascular sick feature (such as familial hypercholesterolemia (FH), atherosclerosis And coronary artery disease) relation between CNV be reported in numerous studies (Fan Cini (Fanciulli) et al., clinic lose Pass and learn (Clin Genet) 77:201-213[2010]).For example, do not carrying some FH patients' of other LDLR mutation Observe that germline is reset and (predominantly lacked) in LDLR genes (LDLR, 19p13.2 missing/duplication) place.Another example is coding The LPA genes of apolipoproteins (a) (apo (a)), plasma concentration and coronary artery disease, the myocardial infarction of apolipoproteins (a) (MI) it is relevant with the risk of apoplexy.The variability of the plasma concentration of apo (a) comprising lipoprotein Lp (a) between individuals exceedes 1000 times, and this variability 90% is genetically in the decision of LPA locus, wherein plasma concentration and Lp (a) molded dimension of the same race To height change ' repetitive sequences of kringle 4 ' number (scope 5 to 50) is proportional.These genes of as shown by data at least two In CNV can be associated with cardiovascular risk.Method described here can be specifically used for searching for CNV and the heart in large-scale research The relation of vascular disorder.In some embodiments, the inventive method can be used for determining with metabolism or cardiovascular disease The relevant CNV of disease existence or non-existence.For example, the inventive method can be used for determining that suspection suffers from familial hypercholesterolemia Patient in CNV presence.Method described here can be used for measure and metabolism or cardiovascular disease (such as high courage Sterol mass formed by blood stasis) relevant gene CNV.The CNV example relevant with such disease includes but is not limited in LDLR genes Amplification in 19p13.2 missings/duplication, and LPA genes.
Determine the complete chromosomal aneuploidy in Patient Sample A
In one embodiment, there is provided method, for determining to deposit in patient's test sample comprising nucleic acid molecules Or in the absence of any one or more of different, complete chromosome aneuploidy.In some embodiments, this method It is determined that presence or absence of any one or more of different, complete chromosome aneuploidy.The step of this method, includes: (a) sequence information for patient's nucleic acid in patient's test sample is obtained;And it is directed to choosing using the sequence information (b) Sequence label is identified from each in chromosome 1-22, X and Y any one or more chromosomes interested One number, and each a normalization for being used in described interested any one or more chromosome Chromosome sequence identifies a number of sequence label.This normalization chromosome sequence can be a monosome, or Person it can be the group chromosome selected from chromosome 1-22, X and Y.This method further uses in step (c) and is directed to institute State the number of each described sequence label identified in any one or more chromosomes interested and for every The number of the sequence label that the individual normalization chromosome sequence identifies be directed to interested any one or Each in more chromosomes calculates a monosome dosage;And any one or more described senses will be directed to (d) The each monosome dosage of each in the chromosome of interest with for described interested any one or more Each threshold value in individual chromosome is compared, and thus determines the existence or non-existence in patient's test sample Any one or more of different, complete patient's chromosome aneuploidy.
In some embodiments, step (c) includes calculating a single dye for each chromosome interested Colour solid dosage, as emerging with being directed to each sense for the sequence label number that each chromosome interested identifies The ratio for the sequence label number that the normalization chromosome sequence of the chromosome of interest identifies.
In other embodiments, step (c) includes calculating a single dye for each chromosome interested Colour solid dosage, as emerging with being directed to each sense for the sequence label number that each chromosome interested identifies The ratio for the sequence label number that the normalization chromosome of the chromosome of interest identifies.In other embodiments, step (c) include:Entered by the number and the length of chromosome interested for the sequence label for making to obtain for chromosome interested Row associates and the number of tags for the corresponding normalization chromosome sequence of chromosome interested is dyed with normalization The length of body sequence is associated, and calculates a sequence label ratio for a chromosome interested, and be directed to this Individual chromosome interested calculates a chromosome dosage, as chromosome interested sequence label density with for returning One changes the ratio of the sequence label density of chromosome sequence.The calculating is repeated for all sequences interested each.Can With for test sample repeat step (a)-(d) from different patients.
Determined by an example of the embodiment in cancer patient's test sample comprising Cell-free DNA molecule One or more complete chromosome aneuploidy, the example include:(a) at least a portion in Cell-free DNA molecule It is sequenced to obtain the sequence information for patient's Cell-free DNA molecule in the test sample;(b) believed using the sequence Cease and identified to be directed to each any 20 or more chromosome interested selected from chromosome 1-22, X and Y One number of sequence label and to be directed to a normalization of each 20 or more chromosomes interested Chromosome identifies a number of sequence label;(c) using for each 20 or more dyeing interested The number of the number for the sequence label that body is identified and the sequence label identified for each normalization chromosome To calculate a monosome dosage for each 20 or more chromosomes interested;And it will be directed to every (d) Each monosome dosage of individual 20 or more the chromosomes interested with for each 20 interested Or more a threshold value of chromosome be compared, and thus determine in patient's test sample presence or absence of appointing He Ershi kinds or more plant different, complete chromosome aneuploidy.
In another embodiment, it is used to determine in patient's test sample presence or absence of any one as previously discussed The method of individual or multiple different, complete chromosome aneuploidy has used a normalization sector sequence to determine to feel emerging The dosage of the chromosome of interest.In this example, this method includes:(a) sequence for the nucleic acid in the sample is obtained Information;And (b) using the sequence information come be directed to be selected from chromosome 1-22, X and Y any one or more senses it is emerging Each in the chromosome of interest identifies a number of sequence label, and for being used for interested any one Or more each normalization sector sequence in chromosome identify a number of sequence label.The normalization Sector sequence can be single section of a chromosome, or it can be one group of area from one or more coloured differently bodies Section.This method has further been used in step (c) for each in any one or more described chromosomes interested The number of the individual sequence label identified and for the sequence label that identifies of normalization sector sequence Number calculates a monosome dosage to be directed to each in described interested any one or more chromosome;And And (d) by for each monosome dosage of each in any one or more described chromosomes interested with It is compared for each threshold value in one or more chromosomes interested, and thus determine Presence or absence of one or more different, complete chromosome aneuploidy in Patient Sample A.
In some embodiments, step (c) includes calculating a single dye for each chromosome interested Colour solid dosage, as emerging with being directed to each sense for the sequence label number that each chromosome interested identifies The ratio for the sequence label number that the normalization sector sequence of the chromosome of interest identifies.
In other embodiments, step (c) includes:Pass through the sequence label number for making to obtain for chromosome interested Mesh is associated with the length of chromosome interested and makes the corresponding normalization section sequence for chromosome interested The number of tags of row is associated with normalizing the length of sector sequence, and a sequence is calculated for a chromosome interested Column label ratio, and a chromosome dosage is calculated for this chromosome interested, as chromosome interested Sequence label density with for normalization sector sequence sequence label density ratio.For sequence all interested Each repeats the calculating.Test sample repeat step (a)-(d) from different patients can be directed to.
By determining that a normalized chromosome value (NCV) provides the chromosome dosage for more different sample sets A kind of means, it makes the flat of the chromosome dosage corresponding with one group of qualified samples of the chromosome dosage in test sample Average is associated.NCV is calculated, as:
WhereinWithIt is the estimation average and standard deviation of the jth time chromosome dosage of qualified samples collection respectively, and xij It is test sample i jth time chromosome dosage observed value.
In some embodiments, it is determined that presence or absence of a complete chromosome aneuploidy.At other In embodiment, in a sample determine presence or absence of two kinds, three kinds, four kinds, five kinds, six kinds, seven kinds, eight kinds, nine Kind, ten kinds, it is ten a kind of, 12 kinds, 13 kinds, 14 kinds, 15 kinds, 16 kinds, 17 kinds, 18 kinds, 19 kinds, 20 Kind, 20 a kind of, 22 kinds, 23 kinds or 24 kinds of complete chromosome aneuploidy, wherein 22 finish Whole chromosome aneuploidy corresponds to any one or more autosomal complete chromosome aneuploidy;Second 13 and the 24th kind of chromosome aneuploidy correspond to chromosome x and Y complete chromosome aneuploidy.Because Aneuploidy can include trisomy, tetrasomy, five body constituents and other polysomies, and in various disease and in same disease Different phase in, the number of complete chromosome aneuploidy changes, according to this method determine complete dyeing The number of body aneuploidy is at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30complete, At least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100 or more kind chromosome aneuploidy.It is swollen The system karyotyping of knurl is it is disclosed that the chromosome number in cancer cell is alterable height, and scope is from hypodiploid (phase Locality be less than 46 chromosomes) to tetraploid and hypertetraploid (being up to 200 chromosomes) (Storchova (stoke watt) and Kuffer (withered no), J Cell Sci (cell science magazine), 121:3859-3866[2008]).In some embodiments, This method include determining from one suspect or the sample of the known patient with cancer (such as colon cancer) in the presence of not or In the absence of up to 200 kinds or more kind chromosome aneuploidy.These chromosome aneuploidy include losing one or more Individual complete chromosome (hypodiploid), acquisition include trisomy, tetrasomy, five body constituents and other polysomic complete dyes Colour solid.Such as in the illustrated elsewhere of the application, acquisition and/or the loss of chromosome segment can also be determined.This method Suitable for determining from suspection or known with such as the sample of the patient of the cancer illustrated elsewhere of the application Presence or absence of different aneuploidy.
In some embodiments, any one in chromosome 1-22, X and Y can be it is determined that as described above It is emerging presence or absence of the sense in any one or more of different, complete chromosome aneuploidy in patient's test sample The chromosome of interest.In other embodiments, two or more chromosomes interested be selected from chromosome 1,2,3,4,5, 6th, any two in 7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X or Y or more.One In individual embodiment, any one or more chromosomes interested selected from chromosome 1-22, X and Y include being selected from chromosome 1-22, X and Y at least 20 chromosomes, and wherein determine different, complete presence or absence of at least 20 kinds Chromosome aneuploidy.In other embodiments, any one or more senses selected from chromosome 1-22, X and Y are emerging Interest chromosome be whole chromosome 1-22, X and Y, and wherein determine presence or absence of whole chromosome 1-22, X, and Y complete chromosome aneuploidy.Complete, the different chromosome aneuploidy that can be determined include dye The complete chromosome monosomy of any one or more in colour solid 1-22, X and Y;Any one in chromosome 1-22, X and Y Individual or multiple complete Trisomy;Any one or more complete chromosomes in chromosome 1-22, X and Y Tetrasomy;The complete chromosome five body constituents of any one or more in chromosome 1-22, X and Y;And chromosome 1-22, X With other complete chromosome polysomies of any one or more in Y.
Determine the chromosome dyad aneuploidy in Patient Sample A
In another embodiment, there is provided a variety of methods, in patient's test sample comprising nucleic acid molecules It is determined that presence or absence of any one or more of different, part chromosome aneuploidy.The step of this method, includes: (a) sequence information for the patient's nucleic acid being directed in the sample is obtained;And it is selected from dyeing using the sequence information to be directed to (b) Each in body 1-22, X and Y any one or more chromosomes interested identifies a number of sequence label Mesh, and it is each in any one or more described sections in any one or more chromosomes interested for being used for An individual normalization sector sequence identifies a number of sequence label.The normalization sector sequence can be a dyeing Single section of body, or it can be one group of section from one or more coloured differently bodies.This method is entered in step (c) One step has used any one or more sections for each any one or more chromosomes interested to identify The sequence label number and the number of the sequence label that is identified for each normalization sector sequence Each to be directed in any one or more sections of any one or more chromosomes interested calculates one Individual single section dosage;And (d) by for each any one or more chromosomes interested any one or it is more Each monosome dosage in individual section and appointing for each any one or more chromosomes interested One threshold value of what one or more chromosome segment is compared, and thus determines to exist in the sample or do not deposit In one or more different, part chromosome aneuploidy.
In some embodiments, step (c) includes:For appointing for any one or more each chromosomes interested What one or more section calculates a single section dosage, as any one or more each chromosomes interested The sequence label number that identifies of any one or more sections with described any one or more are interested for each The ratio for the sequence label number that the normalization sector sequence of any one or more sections of chromosome identifies.
In other embodiments, step (c) includes:By the number for the sequence label for making to obtain for section interested Mesh is associated with the length of section interested and makes the corresponding normalization sector sequence for section interested Number of tags is associated with normalizing the length of sector sequence, and a sequence label is calculated for a section interested Ratio, and calculate a section dosage for this section interested, the sequence label as section interested is close Degree and the ratio of the sequence label density for normalization sector sequence.Each repetition for sequence all interested should Calculate.Test sample repeat step (a)-(d) from different patients can be directed to.
By determining that normalized section value (NSV) provides a kind of hand of the section dosage for more different sample sets Section, this is closed the average value of the section dosage corresponding with one group of qualified samples of the section dosage in test sample Connection.NSV is calculated, as:
WhereinWithIt is the estimation average and standard deviation of the jth time section dosage of qualified samples collection respectively, and xijIt is Test sample i jth time section dosage observed value.
In some embodiments, it is determined that presence or absence of a kind of chromosome aneuploidy of part.At other In embodiment, determined in a sample presence or absence of two kinds, three kinds, four kinds, five kinds, six kinds, seven kinds, eight kinds, Nine kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds, or more kind part chromosome aneuploidy.In an embodiment party In case, any one section interested in chromosome 1-22, X and Y be selected from chromosome 1-22, X and Y.In other embodiments, two or more sections interested selected from chromosome 1-22, X and Y are to be selected from chromosome 1st, any two in 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X or Y or More.In one embodiment, any one or more sections interested selected from chromosome 1-22, X and Y include Selected from chromosome 1-22, X and Y it is at least one, five, ten, 15,20,25,50,75,100 or more Individual section, and wherein determine presence or absence of it is at least one, five kinds, ten kinds, 15 kinds, 20 kinds, 25 kinds, 50 kinds, 75 kinds, 100 kinds, or more different, part the chromosome aneuploidy of kind.Confirmable different, part chromosomes are non- Ortholoidy includes part duplication, part multiplication, partial insertion and excalation.
Available for the sample for determining existence or non-existence chromosome aneuploidy (part or complete) in patients It can be any biological sample illustrated elsewhere in the application.Available for the sample for determining the aneuploidy in patient Category type or sample will be known or suspected the type of disease that suffers from depending on patient.For example, fecal specimens can be selected to make Determined for DNA sources presence or absence of the aneuploidy associated with colorectal cancer.This method is applied also in this institute The tissue sample stated.Preferably, the sample is the biological sample obtained by non-invasive mode, such as plasma sample.Such as exist The application's is illustrated elsewhere, can use that illustrated sequencing (NGS) of future generation is carried out elsewhere in the application The sequencing of nucleic acid in Patient Sample A.In some embodiments, sequencing is to use the synthetic method by reversible dye-terminators The large-scale parallel sequencing of sequencing.In other embodiments, sequencing is connection method sequencing.In other other embodiments, Sequencing is single-molecule sequencing.Optionally, an amplification step is carried out before sequencing.
In some embodiments, it is determined that a patient's body is suspected presence or absence of aneuploidy, this patient With such as in the cancer illustrated elsewhere of the application, such as lung cancer, breast cancer, kidney, head and neck cancer, oophoroma, cervix Cancer, colon cancer, cancer of pancreas, cancer of the esophagus, the cancer of carcinoma of urinary bladder and other organs, and hematologic cancers.Hematologic cancers include marrow, The cancer of blood and lymphatic system, and lymphatic system includes lymph node, lymphatic vessel, tonsillotome, thymus gland, spleen and alimentary canal Lymphoid tissue.The leukaemia and myeloma of marrow are started from, and it is most common leukemia disease class to start from the lymthoma of lymphatic system Type.
It can be made in Patient Sample A presence or absence of the determination of one or more chromosome aneuploidy and right The following does not limit, i.e.,:Patient is determined to a kind of neurological susceptibility of specific cancer, as known or do not know a kind of susceptible cancer A part for routine screening determines presence or absence of cancer of concern among the patient of disease, there is provided the prognosis to disease, The needs to complementary therapy are assessed, and determine progress or the recovery of disease.
Genetic counselling
Fetal chromosomal abnormalities are (Wellesleys the main reason for causing miscarriage, congenital anomaly and perinatal death (Wellesley) et al., European human genetics magazine (Europ.J.Human Genet.), 20:521-526[2012];It is long Ridge (Nagaoka) et al., science of heredity (Nature Rev.Genetics) 13 is summarized naturally:493-504[2012]).Since introducing Amniocentesis, has been subsequently introduced chorionic villi sampling (CVS), pregnant woman has had the right to obtain the letter about fetal chromosomal situation Breath (ACOG practice bulletins No. 77 (ACOG Practice Bulletin No.77):Obstetrics and gynaecology (Obstet Gynecol)109:217-227[2007]).When obtaining enough tissues, to the fetal cell or suede obtained from these programs Chorionic villi carries out cytogenetics caryogram sizing, makes diagnostic sensitivity and specificity very high (about 99%) in most cases (Halle graceful (Hahnemann) and Fu Jisile (Vejerslev), pre-natal diagnosis (Prenat Diagn.), 17:801- 8201997;NICHD amniocentesises research national registration JAMA 236:1471-1476[1976]).However, these programs are also right Fetus and pregnant woman bring risk, and (Audi wins (Odibo) et al., obstetrics and gynaecology (Obstet Gynecol) 112:813- 819[2008];Audi wins (Odibo) et al., obstetrics and gynaecology (Obstet Gynecol) 111:589-595[2008]).
In order to mitigate these risks, a series of Prenatal Screening algorithms are developed, for there is the most common body of fetus three Property-T21 (Down syndrome) and trisomy 18 (T18, edward's syndrome), and lesser degree of trisomy 13 (T13, pa Tower syndrome) their possibility women is classified.Examination typically relates to more in different time points measure maternal serum Kind of biochemical analysis thing, with reference to ultrasonic examination measurement fetus nuchal translucency (NT), and other maternal factors (such as year Age) merging, to produce risk score.According to its development and improvement for many years and depend on when to give examination (only pregnant The most junior three of phase month or second three months, continuous or abundant integration) and how to give examination (only serum or serum with NT is combined), develop the options menu (ACOG with different recall rates (65% to 90%) and high screening positive rate (5%) Practice bulletin No. 77 (ACOG Practice Bulletin No.77):Obstetrics and gynaecology (Obstet Gynecol) 109:217-227[2007])。
For patients, after this multi-step process, gained information or " risk score " may make its puzzlement simultaneously And trigger its anxiety, particularly in the case where comprehensive sexual counseling lacks.Finally, when women makes decision, for because of intrusion Property program caused by risk of miscarriage balance result.Obtain the more preferable Noninvasive of the clearer and more definite information on fetal chromosomal situation Mode assists to make decision in this context.Obtain and change on such Noninvasive of the clearer and more definite information of fetal chromosomal situation Good means are considered as to provide by method described herein.
In different embodiments, it is contemplated that a part of the genetic counselling as use analysis described herein, especially It is under clinical settingses.On the contrary, aneuploidy detection method described herein can be included in antenatal care and correlated inheritance is consulted The option provided under background is provided.
Therefore, in different embodiments, method described herein can be used as preliminary examination (for example, before for having If pregnancy risk women) or be provided as the secondary examination for those women being positive to " routine " examination.Some In embodiment, it is contemplated that antenatal test (NIPT) method of Noninvasive described herein comprises additionally in genetic counselling part, and And/or optionally or it is expressly incorporated in genetic counselling in NIPT methods described herein and is pregnant " management ".
For example, in certain embodiments, there is the pregnancy risk set before one or more in women.Such risk include but It is not limited to following one or more:
1) maternal age was more than 35 years old, although pointing out, about 80% children for suffering from Down syndrome from birth are by less than 35 What the women in year gave birth to.
2) there are previous fetus/children of autosome trisomy.Depending on whether trisomy type, previous pregnancy are natural Maternal age when miscarriage and first maternal age when occurring and later pre-natal diagnosis, it is believed that incidence is mother's year again About 1.6 times of age risk arrive about 8.2 times.
3) with sex chromosomal abnormality previous fetus/children --- not every sex chromosomal abnormality is with parent Source, and all not there is recurrent risk.When they occur, then incidence is about 1.6 times of parent age risk To about 1.5 times.
4) the parental generation carrier of chromosome translocation.
5) the parental generation carrier of chromosome inversion.
6) parental generation aneuploidy or mosaic.
7) some auxiliary procreation technologies are used.
In such situations, obey and state different considerations, mother, such as through and doctor, genetic counselling teacher et al. consult, can be with Be provided with method described herein, for Noninvasive determine fetus aneuploidy (such as trisomy 21, trisomy 18, Trisomy 13, monosomy X etc.) existence or non-existence.At this point, it should be noted that method described herein is considered as effective , even in the gravidic most junior three month.Therefore, in certain embodiments, it is contemplated that using said when 8 weeks NIPT methods, and in different embodiments, at about 10 weeks or more late.
In certain embodiments, method described herein can be provided to those women that " routine " examination is positive to make For secondary examination.For example, in certain embodiments, textural anomaly, such as such as fetus water pocket shape lymph may be presented in pregnant woman Tuberculation, or the nuchal translucency improved, are such as detected using ultrasonography.Typically, carried out at 18 weeks to 22 weeks The ultrasound examination of fault of construction, and can be particularly coupled when observing scrambling with fetal ultrasound electrocardiogram. This is considered when observing abnormal (for example, " routine " examination is positive), mother, such as through with doctor, genetic counselling teacher etc. People is consulted, and can be provided with method described herein, and fetus aneuploidy (such as trisomy is determined for Noninvasive 21st, trisomy 18, trisomy 13, monosomy X etc.) existence or non-existence.
Therefore, in different embodiments, it is contemplated that genetic counselling, make wherein providing (NIPT) described herein analysis For a part of antenatal care, pregnancy management and/or exploitation/design of labor scheme.By being in sun to routine screening Property (or risk is set before other) those women provide NIPT be used as secondary examination, it is contemplated that can reduce unnecessary amniocentesis and The number of CVS programs.However, because letter of consent is NIPT important component, the necessity of genetic counselling improves.
Because NIPT positive findingses (using method described herein) are more closely similar to amniocentesis or CVS positive findings, Therefore before should testing herein, in genetic counselling, being provided to women can determine whether it needs the machine of the information of this degree Meeting.NIPT genetic counsellings before test should also include discussion/suggestion to confirm via CVS, amniocentesis, umbilical cord puncture etc. The abnormality test result of (depending on conceptional age), so as to which the desired arrangement of time to result can be given with due regard to, For the planning after test according to national genetic counselling Shi Xuehui (NSGC, USA) on the theme statement (see, for example, wearing not This (Devers) et al., the antenatal test/non-invasive prenatal diagnosis of Noninvasive:National genetic counselling Shi Xuehui position is (logical Cross the NSGC public policies committee) 2012 (Noninvasive Prenatal Testing/ of NSGC positions statement Noninvasive Prenatal Diagnosis:the position of the National Society of Genetic Counselors(by NSGC Public Policy Committee).NSGC Position Statements 2012;Berne (Benn) et al., pre-natal diagnosis (Prenat Diagn), 31:519-522 [2011]) because NIPT is not sieved at present All chromosome or hereditary conditions are looked into, so it may not substitute the risk assessment and pre-natal diagnosis of standard.It is considered herein that The patient with the other factors (for example, some abnormal ultrasonic wave results of study) for implying chromosome abnormality should receive something lost Consulting is passed, wherein the option of conventional authentication diagnostic test is provided to them, but regardless of NIPT results.Women is in genetic counselling It should also be appreciated that for some patients, the possible information content of NIPT results is little.
Compared with amniocentesis, the genome of fetus is typically represented as into but at some in the detection of aneuploidy In the case of may represent restricted placenta aneuploidy or restricted placenta mosaic (CPM) in terms of, use is said Perhaps, the NIPT of method is more closely similar to CVS,.In the CVS results of today, there are CPM, and some in about 1% to 2% situation Women undergoes amniocentesis after CVS, in more late conceptional age, to cause the placenta aneuploidy contrast in clear separation Created a difference between fetus aneuploidy.With NIPT implement it is more extensive, it is therefore expected that CPM situations can produce certain amount can Can then will not the positive NIPT results that confirm of being broken into property program (particularly amniocentesis).Again, in different implementation In scheme, it is contemplated that this information under the background of genetic counselling (such as by doctor, genetic counselling teacher etc.) is presented to patient.
It should be understood that in different embodiments, a part of genetic counselling is probably that mode is made a definite diagnosis in recommendation, Risk level arrangement of time is informed, and makes a definite diagnosis mode for difference and carries out arrangement of time, can be used for providing on passing through this The input of the value of information provided Deng verification method, particularly under the background of selection pregnancy time.In different embodiments In, genetic counselling can also establish a scheme, and for monitoring pregnancy, (such as the inspection of subsequent ultrasonic ripple, extra doctor are paid a home visit Etc.), and for setting up a series of decision points in due course.In addition, genetic counselling can be suggested and help to develop Labor scheme, labor scheme can be included for example on childbirth place (such as family, hospital, specialized facilities etc.), childbirth place Third party's nursing etc. obtained by involved personnel, baby.
Concentrate on method described herein although discussed above and (and may is that as a part of pre-natal diagnosis Second instrument), but with clinical experience accumulation and if the result from comparative studies to routine screening is successful, then in this institute The NIPT methods stated may substitute existing examination scheme and possibly serve for main tool.
Purposes will be found for the pregnancy of multifetation by also contemplating method described herein.
Typically, it is contemplated that genetic counselling (such as described above) can pass through doctor (such as main doctor, obstertrician etc.) And/or provided by genetic counselling teacher or other qualified medical professions.In certain embodiments, official communication is provided face-to-face Ask, it will be appreciated, however, that in some cases, can by remote access (for example, by text, mobile phone, application program of mobile phone, Tablet PC application program, internet etc.) consulting is provided.
It will also be appreciated that in certain embodiments, genetic counselling or one part can pass through department of computer science System delivering.For example, can provide, " intelligence suggests " system, its instruction in response to test result, from medical treatment and nursing supplier And/or provide genetic counselling information (such as described above) in response to inquiry (such as from patient query).In some implementations In scheme, information will be the specific clinical information that is provided by doctor, health care system and/or patient.In certain embodiments, Information can be provided iteratively.Thus, for example, patient can provide the inquiry of " if etc " and system can return Information, such as diagnose the connotation of option, risk factor, arrangement of time and Different Results.
In certain embodiments, information can be provided (for example, presenting on the computer screen) in a manner of temporary. In some embodiments, information can be provided in a manner of non-transitory.Thus, for example, information can be printed (for example, conduct Option and/or the menu of suggestion, its optionally with correlation time arrange etc.) and/or be stored in computer-readable media (such as Magnetic medium, such as local hard drive, server etc.;Optical media;Flash memory etc.) on.
It will be appreciated that such system is typically configured to provide enough securities, to maintain patients' privacy, such as root According to the current standard in industry.
Genetic counselling it is discussed above it is intended that illustrative and not restrictive.Genetic counselling is one in medical science The individual branch well confirmed, and belong to the technical ability model of practitioner on the combination of the consulting part of analysis described herein In enclosing.Furthermore, it is appreciated that as the field is developed, genetic counselling and relevant information and the property of suggestion are likely to change.
Determine fetus fraction
Fetus fraction determines that method is disclosed in U.S. Patent Application Publication 2010-0010085 (117.201), United States Patent (USP) The open 2011-0201507 (120.201) of application, U.S. Patent Application No. 13/365,240 (submitting on 2 2nd, 2012) and U.S. In state's number of patent application 13/445,778 (submission on April 12nd, 2012).It can be found in these files for determining fetus The technology of fraction expounds adequately.
Method described herein allows to determine the fetus fraction in sample, and the sample includes the mixed of fetus and maternal nucleic acids Compound, or more generally, it is derived from the mixture of the nucleic acid of two different genes groups.The purpose discussed for this, will be described Parent and fetal nucleic acid, however, it is understood that therefore any two genome can be substituted.In some embodiments, tire is determined Youngster's fraction, while determine to copy the existence or non-existence of number variation (such as aneuploidy).As described more fully below, can use The a group of labels of test sample determine fetus fraction and copy number variation.
The method of quantization fetus fraction is to rely on the difference between Fetal genome and maternal gene group.It is described herein In some embodiments, determine the fetus fraction of sample DNA dependent on the known sequence site for accommodating one or more polymorphisms The multiple dna sequence reads at place.In some embodiments, to sequence label each other and/or reference sequences are compared Find polymorphic site or target nucleic acid sequence simultaneously.In certain embodiments, the fetus fraction of sample DNA is by considering have The copy number information of Autosome or chromosome sequence determines copy wherein between maternal DNA and fetal chromosomal be present Number difference.In such embodiment, the fetus fraction of sample DNA is the sample DNA relative number by considering mother and fetus Amount determines, wherein chromosome or section originally determined that or known had copy number variation.In such embodiment, fetus Fraction can use the copy number variation between maternal DNA and fetal chromosomal to be calculated.For this purpose, this method and Equipment can be calculated as follows the normalized chromosome value (NCV) described in text, or similar module.
Some methods are limited by sex of foetus, such as the method for quantifying fetus fraction is depended on to Y chromosome The chromosome dosage of the X chromosome of presence or decision male fetus with specific sequence.In certain embodiments, measure It is to be directed to fetus target to change foetal DNA, and these fetus targets do not have parent counter pair, such as Y chromosome sequence (model (Fan) etc. People, Proceedings of the National Academy of Sciences (Proc Natl Acad Sci) 105:16266-16271 [2008] and U.S. Patent Application Publication Number on November 6th, 2010/0112590,2009 submits, sieve (Lo) et al.) or RhD feminine gender parents in there is no RHD1 genes, also or By in multiple DNA bases pair, different from parent background.Other method independently of sex of foetus, and dependent on fetus with Polymorphic differences between maternal gene group.
Allele imbalance in polymorphism can be detected and quantified by different technologies.In some embodiments In, determine that the allele in polymorphism is uneven using digital pcr, such as the SNP on mRNA.Alternately, using capillary Pipe gel electrophoresis detects the difference of Polymorphic Regions size, such as in the case of STR.
In some embodiments, outer hereditary difference can be detected, such as promoter region is discrepant methylates, can be single Combine solely or with digital pcr (virgin for determining the difference between Fetal genome and maternal gene group and quantifying fetus fraction (Tong) et al., clinical chemistry (Clin Chem) 56:90-98[2010]).Also include the modification of epigenetic methods, such as based on The DNA to methylate distinguishes (Ai Niqi (Erich) et al., AJOG 204:The 205.e1 pages to the 205.e11 pages [2011]). In some embodiments, using such as the application one or more pre-selected groups illustrated elsewhere polymorphic sequence Sequencing, to estimate fetus fraction.
In addition to the method that multigroup pre-selection polymorphic sequence is sequenced illustrated elsewhere such as in the application, use It is (including micro- to include but is not limited to real-time qPCR, mass spectroscopy, digital pcr in the method for quantifying the foetal DNA in Maternal plasma Fluid digital pcr), capillary gel electrophoresis.
This section, which is discussed, to be started to consider fetus fraction, and such as never (or through determining not) has the chromosome or dye of copy number variation The one or more polymorphisms or other information of colour solid section are determined.The fetus fraction determined by such technology is herein It is referred to as non-CNV fetuses fraction or " NCNFF ".Part behind this section, describes multiple technologies, for from through determining to possess The chromosome or chromosome segment for copying number variation calculate fetus fraction.The fetus fraction determined from such technology is referred to as herein CNV fetuses fraction or " CNFF ".
In some embodiments, by determine from Fetal genome polymorphic allele Relative Contribution and Fetus fraction is assessed from the contribution of the corresponding polymorphic allele of maternal gene group.In some embodiments, lead to The Relative Contribution contrast for determining the polymorphic allele from Fetal genome is crossed from Fetal genome and parent base Fetus fraction is assessed because of total contribution of the corresponding polymorphic allele of group.
Polymorphism can be indicative, (informative) of informedness, or both.Indicative polymorphism shows mother Fetus Cell-free DNA (" cfDNA ") in body sample be present.Informedness polymorphism (such as informedness SNP) is produced on fetus Information, for example, the existence or non-existence of disease, genetic abnormality or any other biological information, such as stages of gestation or sex. In this case, informedness polymorphism be between the sequence for identifying mother and fetus difference those, and for draping over one's shoulders herein In the method for dew.In other words, informedness polymorphism is to possess not homotactic nucleic acid samples (that is, they have different equipotentials Gene) in polymorphism, and these sequences exist in different amounts.In certain methods in this, using varying number sequence/ Allele determines fetus fraction, particularly NCNFF.
Polymorphic site includes but is not limited to SNP (SNP), series connection SNP, small-scale more base deletions or inserted Enter (IN-DELS or missing insertion polymorphism (DIP)), polynucleotides polymorphism (MNP), Short tandem repeatSTR fragment (STR), limitation Property fragment length polymorphism (RFLP), or possess in chromosome any other allelic sequences variation any polymorphism. In some embodiments, each target nucleic acid includes two series connection SNP.Series connection SNP is as single unit (for example, as short list Build) analyzed, and provided in this as multiple set with two SNP.
In some embodiments, fetus fraction is determined by statistics and approximation technique, and these technologies are by making With for determining the polymorphic site of Relative Contribution to assess the Relative Contribution of the distribution type of fetus and maternal gene group.It can also pass through Electrophoresis determines fetus fraction, wherein certain form of polymorphic site is separated with electrophoretic and comes from fetus for identifying The relative tribute of the Relative Contribution of the polymorphic allele of genome and corresponding polymorphic allele from maternal gene group Offer.
In an embodiment shown in Fig. 6 process charts, fetus fraction is determined by method 600, method 600 Test sample including obtaining the mixture comprising fetus and maternal nucleic acids in operation 610 first, for more in operation 620 State target nucleic acid enriched nucleic acid mixture, the mixtures of nucleic acids of enrichment is sequenced in operation 630, and in operation 640 In determine fetus fraction and aneuploidy in sample simultaneously.
Fig. 7 shows the process chart for some embodiments.Pass through fetus fraction identified below:(i) in operation 710 Middle acquisition Maternal plasma sample, the cfDNA of (ii) in operation 720 in purification of samples, (iii) expands polymorphic in operation 730 Mixture is sequenced using large-scale parallel sequencing method in operation 740 by nucleic acid, (iv), and (v) calculates tire in operation 760 Youngster's fraction.In another embodiment, fetus fraction identified below is passed through:(i) Maternal plasma sample is obtained in operation 710 Product, the cfDNA of (ii) in operation 720 in purification of samples, (iii) expands polymorphic nucleic acid in operation 730, and (iv) is being operated Fetus fraction is calculated in operation 770 according to size seperated nuclear acid, and (v) using electrophoresis in 750.
In an embodiment shown in Fig. 8 process charts, pass through fetus fraction identified below:(i) in operation 810 The middle sample for obtaining the mixture comprising fetus and maternal nucleic acids, (ii) expands sample in operation 820, and (iii) is in operation 830 In by the way that the sample of amplification is merged come enriched sample with the sample that do not expand of original mixture, (iv) operation 840 in purify Sample is sequenced to determine fetus fraction using distinct methods in operation 850 by sample, and (v), is determined simultaneously in 860 operations The existence or non-existence of fetus fraction and aneuploidy.
In the another embodiment shown in Fig. 9 process charts, pass through fetus fraction identified below:(i) in operation 910 The middle sample for obtaining the mixture comprising fetus and maternal nucleic acids, (ii) purification of samples in operation 920, (iii) is in operation 930 A part for middle amplification sample, (iv) is in operation 940 by by the warp of the sample of amplification and the initial sample of original mixture Purifying but the part not expanded, which are combined, carrys out enriched sample, and (v) sample is sequenced to determine fetus fraction in operation 950, Determine the existence or non-existence of fetus fraction and aneuploidy in 960 operations simultaneously using distinct methods.
In another embodiment shown in Figure 10 process charts, pass through fetus fraction identified below:(i) operating The sample of the mixture comprising fetus and maternal nucleic acids is obtained in 1010, (ii) purification of samples in operation 1020, (iii) is being grasped Make the Part I of amplification sample in 1040, (iv) prepares the sequencing library through expanding part of sample, (v) in operation 1050 Second sequencing library that is purified but not expanding part of sample is prepared in operation 1030, and (vi) passes through in operation 1060 Two sequencing library combinations are carried out into enriched Mixture, and (vii) mixture is sequenced in operation 1070, makes in 1080 operations The existence or non-existence of fetus fraction and aneuploidy is differently determined simultaneously.
In another embodiment, fetus fraction identified below is passed through:(i) obtain mixed comprising fetus and maternal nucleic acids The sample of compound, (ii) purification of samples, (iii) use electrophoresis to sample using labeled primer amplification sample, and (iv) Sequencing, so as to differently determine fetus fraction.
In another embodiment, fetus fraction identified below is passed through:(i) obtain mixed comprising fetus and maternal nucleic acids The sample of compound, (ii) purification of samples, (iii) is by expanding a part for sample come optionally enriched sample, and (iv) to sample Product are sequenced, so as to differently determine fetus fraction.
Purify the sample, the sample through amplification or the sample through expanding and being enriched with that initially obtain or with side disclosed here Other relevant nucleic acid samples (such as in operation 720,840,920 and 1020) of method, can be completed by any routine techniques. To separate cfDNA from cell, can use classification separation, centrifugation (such as density gradient centrifugation), DNA specificity precipitation or High-flux cell sorts, and/or separation method.Optionally, gained sample can before purifying or amplification fragmentation.If institute CfDNA is included with sample, then fragmentation is may not request, because fragmentation, wherein piece size are cfDNA often in nature About 150bp to 200bp.
In some above-mentioned programs, the nucleic acid in the region residing for polymorphism is improved using selective amplification and enrichment Relative populations.Similar results can be goed deep into by the selected areas (region particularly residing for polymorphism) to genome It is sequenced to obtain.
Amplification
After obtaining sample and purification of samples, the purified mixture of fetus and maternal nucleic acids (such as cfDNA) is used A part expands multiple polymorphic target nucleic acids, and each nucleic acid includes polymorphic site.Expand in fetus and maternal nucleic acids mixture Target nucleic acid, it is any side by using PCR (PCR) or the variation of this method in some implementations Method (includes but is not limited to asymmetric PCR, helicase dependent amplification, heat start PCR, qPCR, Solid phase PCR and touchdown PCR) Realize.In some embodiments, sample can be expanded partly to assist to determine fetus fraction.In some embodiments, Without amplification.Disclosed amplification method and other amplification techniques can be used in operation 730,820,930 and 1040.
Expand SNP
There is the DNA fragmentation that substantial amounts of nucleic acid primer includes SNP available for amplification, and its sequence can be obtained, such as From the database known to those of ordinary skill in the art.Other primer can also be designed, for example, it is public using documents below institute The similar approach opened:Vickers E.F. (Vieux, E.F.), Guo P-Y (Kwok, P-Y) and Miller R.D. (Miller, R.D.) are raw Thing technology (BioTechniques) (in June, 2002), volume 32, supplementary issue:“SNP:Discovery (the SNPs of label disease: Discovery of Marker Disease) ", page 28 to page 32.
Sequence specific primers is selected to expand target nucleic acid.In one embodiment, if amplicon amplification is comprising more The target nucleic acid in state site.In another embodiment, as amplicon amplification comprising two or more polymorphic sites (such as Two series connection SNP) target nucleic acid.At least about 100bp target nucleic acid amplicon through amplification includes single or series connection SNP. Primer for expanding the target sequence comprising series connection SNP can cover two SNP sites through design.
Amplification of STR
Some nucleic acid primers include STR DNA fragmentation available for amplification, and such sequence can be one from this area Database known to technical staff obtains.
In some embodiments, had at least as expanding using a part for fetus and maternal nucleic acids mixture The template of one STR target nucleic acid.On STR, disclosed PCR primer, common multiplicated system and related population data The comprehensive directory of bibliography, argument and sequence information is compiled in STRBase, and the STRBase can exist via internet Conducted interviews at cstl.nist.gov/strbase.Come comfortable ncbi.nlm.nih.gov/genbank's, By STRBase it is also addressable for the sequence information for commonly using str locus seat.
STR multiplicated systems allow to expand multiple nonoverlapping locus simultaneously in single reaction, so as to substantially improve Flux.Because STR polymorphism is high, most of individual is heterozygous.STR can be used for electrophoresis as described further below In analysis.
It can also be expanded using miniSTRs to produce the amplicon of size reduction, it is shorter in length so as to distinguish STR allele.The method of disclosed embodiment covers the fetal nucleus for determining to be enriched with the maternal sample of target nucleic acid Sour fraction, each self-contained miniSTR of target nucleic acid, this method include quantifying positioned at a polymorphism miniSTR at least One fetus and a maternal allele, it can be expanded to produce the expansion that length is about the size for circulating fetal DNA fragments Increase son.Any combination to miniSTR primers or two pairs or more to miniSTR primers is at least one available for expanding miniSTR。
Enrichment
The sample being enriched with may include:The blood plasma separate section of blood sample;What is extracted from blood plasma is purified CfDNA sample;The sequencing library sample prepared from the purified mixture of fetus and maternal nucleic acids;Etc..
In certain embodiments, before to genome sequencing, DNA is included for full-length genome unspecific enrichment The sample of molecule mixture, i.e. before sequencing, carry out whole genome amplification.Unspecific enrichment mixtures of nucleic acids refers to pair The genomic DNA fragment of DNA sample carries out the whole genome amplification DNA sample available for by before identification polymorphism is sequenced Improve the level of sample DNA.Unspecific enrichment can be one of two genomes (fetus and parent) present in sample Selective enrichment.
In other embodiments, the cfDNA in sample is through specific enrichment.Specific enrichment refers to genomic samples pin Enrichment to particular sequence (such as polymorphism target sequence), it is complete by the method including specific amplification target nucleic acid sequence Into target nucleic acid sequence includes polymorphic site.
In other embodiments, the mixtures of nucleic acids being present in sample is for the polymorphic of each self-contained polymorphic site Target nucleic acid is enriched with.Such enrichment can be used in operation 620.The mixture of enriches fetal and maternal nucleic acids includes, from Target sequence is expanded in a part for the nucleic acid that initial maternal sample is included, and by part or whole amplified production and initially The remainder combination of maternal sample, such as in operation 830 and 940.
In still another embodiment, the sample being enriched with is prepared by the purified mixture of fetus and maternal nucleic acids Sequencing library sample.The amount of the amplified production for being enriched with initial sample is selected to obtain the sequence for being enough to be used in determining fetus fraction Column information.At least about 3% in the sum of sequence label obtained from sequencing, at least about 5%, at least about 7%, at least about 10%, At least about 15%, at least about 20%, at least about 25%, at least about 30% or more is mapped to determine fetus fraction.
In one embodiment, in Fig. 10, enrichment is included in the purifying of fetus and maternal nucleic acids in operation 1040 Target core included in a part for the initial sample (for example, the cfDNA purified from Maternal plasma sample) of mixture Acid amplification.Similarly, in operation 1050, primary sequencing library is prepared using cfDNA that is purified but not expanding part. In operation 1060, the part in target library is combined with the primary libraries as caused by the mixtures of nucleic acids not expanded, and The fetus included in two libraries and maternal nucleic acids mixture are sequenced in operation 1070.The library of enrichment may include At least about 5%, at least about 10%, at least about 15%, at least about 20% or at least about the 25% of target library.In operation 1080 In, the data from sequencing round are analyzed, and described in the operation 640 of embodiment as depicted in figure 6, simultaneously Determine the existence or non-existence of fetus fraction and aneuploidy.
Sequencing technologies
Fetus and maternal nucleic acids mixture to enrichment are sequenced.To determine that sequence information necessary to fetus fraction can To be obtained using any of DNA sequencing method, wherein many methods are in the explanation elsewhere of the application.Such sequencing Method is really singly divided including PCR sequencing PCR of future generation (NGS), Sang Geer PCR sequencing PCRs (Sanger sequencing), nautical mile Cohan Sub- PCR sequencing PCR (Helicos True Single Molecule Sequencing) (tSMSTM), 454 PCR sequencing PCRs (Roche), Real-time (the SMRT of SOLiD technologies (applying biosystem), unimoleculeTM), sequencing technologies (Pacific Ocean bioscience), nano-pore sequencing Method, chemosensitivity field-effect transistor (chemFET) array, the Hall health molecule process using transmission electron microscopy (TEM) (Halcyon Molecular ' s method), ion stream single-molecule sequencing method, Sequencing by hybridization etc..In some embodiments In, using large-scale parallel sequencing method.In one embodiment, it is sequenced and based on reversible termination using Yi Lu meter Na synthetic methods The sequencing chemical technology of son.In certain embodiments, using part PCR sequencing PCR.
The DNA being sequenced is mapped to reference gene group.Reference gene group can be artificial genome or can be mankind's reference sequence Row genome.Such reference gene group includes:Made Target sequence gene group comprising polymorphic target nucleic acid sequence;Artificial SNP Reference gene group;Artificial STR reference gene groups;Artificial series connection STR reference gene groups;Mankind's canonical sequence genome NCBI36/ Hg18 sequences, it is in internet genome.ucsc.edu/cgi-bin/hgGatewayOrg=Human&db=hg18& Hgsid=166260105 can be obtained;And mankind's canonical sequence genome NCBI36/hg18 sequences including target polymorphic sequence Row and made Target sequence gene group, such as SNP genomes.Allow some mispairing be present in mapping process.
In one embodiment, in operation 630 sequencing information that obtains analyzed while make determination, Determine fetus fraction and determine the existence or non-existence of aneuploidy.
As described above, every kind of sample obtains multiple sequence labels.In certain embodiments, it is mapped to using reading Reference gene group, every kind of sample obtain at least about 3x 106Individual sequence label, at least about 5x 106Individual sequence label, at least about 8x 106Individual sequence label, at least about 10x 106Individual sequence label, at least about 15x 106Individual sequence label, at least about 20x 106It is individual Sequence label, at least about 30x 106Individual sequence label, at least about 40x 106Individual sequence label or at least about 50x 106Individual sequence Label, these sequence labels include the reading between 20bp and 40bp.In one embodiment, all sequences reading is mapped to The all areas of reference gene group.In one embodiment, to owning comprising being mapped to mankind's canonical sequence genome The label of the reading in region (such as all chromosomes) is counted, and the non-multiple of fetus is determined in the DNA sample of mixing Property, i.e. the excessive representative of sequence (such as chromosome or one part) interested represents deficiency, and containment mapping is arrived The label of the reading of made Target sequence gene group is counted to determine fetus fraction.This method is not required in maternal gene group Differentiation is made between Fetal genome.
In one embodiment, the data from sequencing round are analyzed while determine fetus fraction, with And presence or absence of aneuploidy.
Sequencing library
In some embodiments, part or all for using expanded polymorphic sequence is used for described flat to prepare The sequencing library of line mode sequencing.In one embodiment, library is prepared to be based on reversible terminator using Yi Lu meter Na Sequencing chemical technology carry out synthetic method sequencing.Can be prepared from the cfDNA of purifying library and including at least about 10%, at least About 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45% or extremely Few about 50% amplified production.
Library caused by any method for being described by Figure 11 is sequenced, there is provided from the target of amplification The sequence label of nucleic acid and the label from the maternal sample not expanded initially.Fetus fraction is from being mapped to artificial reference base Calculated because of the number of tags of group.
Calculate fetus fraction
As explained above, after relevant DNA being sequenced, sequence mapping or comparison are arrived into specific base using computational methods In cause, chromosome, allele or other structures.A variety of computerized algorithms for aligned sequences be present, include but is not limited to BLAST (Ao Ciqiu (Altschul) et al., 1990), BLITZ (MPsrch) (Si Teluoke and Collins (Sturrock& Collins), 1993), FASTA (pul is inferior and Lippmann (Pearson&Lipman), 1988), BOWTIE (youth's lattice rice (Langmead) et al., genome biology (Genome Biology) 10:R25.1-R25.10 [2009]) or ELAND (she Rumi receives company, Santiago, CA, the U.S. (Illumina, Inc., San Diego, CA, USA)).In some embodiments In, data box sequence is found in nucleic acid database known to those skilled in the art, including GenBank, dbEST, DbSTS, EMBL (European Molecular Biology Laboratory) and DDBJ (DNA Data Bank of Japan).Using BLAST or similar means pair The sequence identified according to search of sequence database, and the sequence identified is categorized into appropriate data using search hit Case.Alternately, Bloom filter (Bloom filter) or similar set member's tester (set can be used Membership tester) reading and reference gene group are compared.Referring to the U.S. Patent application submitted on October 27th, 2011 Number 61/552,374, this application is incorporated herein by reference with its full text.
As mentioned, determine that fetus fraction is based on being mapped to the according to some embodiments (particularly NCNFF technologies) The total number of labels of one allele and the sum for being mapped to the second allele, the second allele are located at reference gene group and wrapped The informedness polymorphic site (such as SNP) contained.Informedness polymorphic site is the difference and each possibility by allelic sequences The quantity of allele is identified.Fetus cfDNA often with<10% parent cfDNA concentration is present.Accordingly, with respect to mother The main contributions of body allele, time that can distribute to fetus, fetus and maternal nucleic acids mixture allele be present Contribute.From maternal gene group allele be referred herein to be main allele, and derive from Fetal genome etc. Position gene referred to here as time allele.The allele represented with the similar level of the sequence label mapped represents parent etc. Position gene.The result that exemplary multiplex amplification is carried out to the target nucleic acid comprising the SNP from Maternal plasma sample is shown in In Figure 12.
Herein, term " chromosome aneuploidy " and " complete chromosome aneuploidy " refer to herein by loss or The imbalance of inhereditary material caused by whole chromosome is obtained, and including germline aneuploidy and mosaic aneuploidy. Term " part aneuploidy " and " chromosome dyad aneuploidy " refer to the part by losing or obtaining chromosome herein The imbalance of inhereditary material caused by (for example, partial monoploidy and partial trisomy), and cover by transposition, missing and insert Enter caused imbalance.
Estimate fetus fraction using allele ratio
For each of two allele at predetermined polymorphic site, fetus cfDNA is in maternal sample Relative abundance can be determined, the sum as the unique sequences label for the target nucleic acid sequence being mapped in reference gene group Parameter.In one embodiment, for each informedness allele, (fetus and parent core is calculated as below in allele x) The fraction of fetal nucleic acid in acid blend:
And calculate the fetus fraction for sample, the fetus score average as all informedness allele.Appoint Selection of land, (allele x), fetal nucleic acid in fetus and maternal nucleic acids mixture is calculated as below for each informedness allele Fraction:
In order to compensate the presence of two foetal alleles, one is covered by parent background.
Fetus fraction is determined by the way that predetermined polymorphic sequence is sequenced
On determining that it is as follows that the more details of fetus fraction provide by the way that predetermined polymorphic sequence is sequenced.
Referring to Fig. 7, the displaying of operation 720,730,740 and 760 to the polymorphic target nucleic acid by PCR amplifications by carrying out Large-scale parallel sequencing determines the fraction of the fetal nucleic acid in a maternal biological sample technological process.In step In 720, the maternal sample of the mixture comprising fetus and maternal nucleic acids is obtained from a subject.The sample is from a pregnancy The maternal sample that women (such as pregnant woman) obtains.Other maternal samples may come from mammal, for example, cow, horse, dog or Cat.If subject is the mankind, then sample can obtain in first of gestation or second trimenon.Any maternal biological Sample can be used as being included in the cell or source of acellular fetus and maternal nucleic acids.In certain embodiments, have Profit is to obtain the maternal sample for including acellular nucleic acid (cfDNA).Preferably, the maternal biological sample is biological fluid sample Product.Preferably, the maternal sample is to be selected from blood, blood plasma, serum, urine and the maternal sample of saliva.In some embodiments In, the maternal sample is plasma sample.
In step 720, the mixture of fetus and maternal nucleic acids is further handled from the sample part such as blood plasma, to obtain The sample of the purified mixture of fetus and maternal nucleic acids (such as cfDNA) must be included.For handling the method for maternal sample at this Text describes elsewhere.
In step 730, a part for fetus and parent cfDNA purified mixture is used to expand multiple polymorphic target cores Acid, each polymorphic target nucleic acid include a polymorphic site.In certain embodiments, these target nucleic acids each include SNP.In other embodiments, each self-contained a pair of series SNP of these target nucleic acids.In other other embodiments, Each target nucleic acid includes STR.Polymorphic site included in target nucleic acid includes but not limited to SNP (SNP), connect SNP, small-scale more base deletions or insertion (being referred to as IN-DELS, also referred to as missing insertion polymorphism or DIP), Polynucleotides polymorphism (MNP), Short tandem repeatSTR fragment (STR), restriction fragment length polymorphism (RFLP), or including dyeing The polymorphism of any other sequence variation in body.In certain embodiments, the polymorphic site that this method is covered is positioned at often dye On colour solid, the fetus fraction unrelated with sex of foetus so can determine that.With the dyeing in addition to chromosome 13,18,21 and Y The polymorphism of body phase association can be used in method described here.
Polymorphism can be it is indicative, informedness, or both.Indicative polymorphism shows tire in maternal sample be present Youngster's Cell-free DNA.For example, specific genetic sequence (such as SNP) is more, and a kind of method is easier to have conversion by it Into specific colouring intensity, color density or detectable and measurable and show specific DNA section and/or specific more The presence of state property (such as SNP of embryo), other some properties for being not present and measuring.On the present invention, these methods are not Carried out using all possible SNP in a genome, but be likely to identify mother and fetus using previously selected Between the polymorphism (i.e. informedness polymorphism) of sequence difference carry out.The sequence that informedness polymorphic site passes through allele Difference and possible allele in each amount identify.Pass through reading institute caused by sequence measurement described here Any polymorphic site covered may be used to determine fetus fraction.
A part using fetus in sample and maternal nucleic acids (such as cfDNA) mixture is used as to comprising at least one The template that SNP target nucleic acid is expanded.In certain embodiments, each target nucleic acid includes single (i.e. one) SNP.Target nucleic acid sequence comprising SNP can obtain from publicly accessible database, and these databases include but is not limited to The NCBI that mankind's snp database that Web address is wi.mit.edu, Web address are ncbi.nlm.nih.gov DbSNP homepages, Web address lifesciences.perkinelmer.com, Web address are Appliedbiosystems.com Life TechnologiesTM(Carlsbad, CA city (Carlsbad, CA application biosystem (Applied Biosystems))), Web address are celera.com Celera mankind SNP Database, Web address are the snp databases of gan.iarc.fr genome analysis group (GAN).In an embodiment In, selection is selected from Parkes (Pakstis) et al. (Parkes et al., the mankind for the SNP of enriches fetal and parent cfDNA Science of heredity (Hum Genet) 127:315-324 [2010]) description 92 indivedual identification SNP (IISNP) groups, these SNP Have shown that has very small change (F throughout colony in frequencyst<0.06) it is and in the whole world with elevation information Property, average heterozygosity >=0.4.The SNP that the inventive method is covered includes connection and not connected SNP.It can apply or be applied to Other available SNP of method described here be disclosed in U.S. Patent Application No. 20080070792,20090280492, 20080113358th, in 20080026390,20080050739,20080220422 and 20080138809, these patent applications It is hereby incorporated by by quoting with its full text.Each target nucleic acid includes at least one polymorphic site, such as single SNP, and this is polymorphic Site is different from polymorphic site present on another target nucleic acid, so as to produce one of the polymorphic site containing enough numbers Group polymorphic site, such as SNP, wherein at least 1, at least two, at least three, at least four, at least five, at least six, at least 7 Individual, at least eight, at least nine, at least ten, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 25, at least 30, at least 40 or more are Informedness.For example, one group of SNP can be configured to contain at least one informedness SNP.In one embodiment, Target be the SNP expanded be selected from rs560681, rs1109037, rs9866013, rs13182883, rs13218440, rs7041158、rs740598、rs10773760、rs4530059、rs7205345、rs8078417、rs576261、 rs2567608、rs430046、rs9951171、rs338882、rs10776839、rs9905977、rs1277284、 Rs258684, rs1347696, rs508485, rs9788670, rs8137254, rs3143, rs2182957, rs3739005 with And rs530022.In one embodiment, this group of SNP include at least three, at least five, at least ten, at least 13, at least 15, at least 20, at least 25, at least 30 or more SNP.In one embodiment, this group of SNP includes rs560681、rs1109037、rs9866013、rs13182883、rs13218440、rs7041158、rs740598、 Rs10773760, rs4530059, rs7205345, rs8078417, rs576261 and rs2567608.Include the polymorphic of SNP Nucleic acid can use provides and is disclosed as SEQ ID NOs in example 24:63-118 exemplary primer pair expands.
In other embodiments, each target nucleic acid includes two or more SNP, i.e. each target nucleic acid bag The SNP containing series connection.Preferably, each target nucleic acid includes two series connection SNP.Series connection SNP is as single unit (for example, conduct Short haplotype) analyzed, and provided in this as multiple set with two SNP.To identify suitable series connection SNP sequences, it may search for international HapMap group (International HapMap Consortium) number According to storehouse (international HapMap plan (The International HapMap Project), nature (Nature)426:789-796[2003]).The database can obtain with the world wide web (www at hapmap.org.In an implementation In scheme, the series connection SNP that target is used to be expanded is the following set for being selected from SNP pairs of series connection:rs7277033- rs2110153;rs2822654-rs1882882;rs368657-rs376635;rs2822731-rs2822732; rs1475881-rs7275487;rs1735976-rs2827016;rs447340-rs2824097;rs418989- rs13047336;rs987980-rs987981;rs4143392-rs4143391;rs1691324-rs13050434; rs11909758-rs9980111;rs2826842-rs232414;rs1980969-rs1980970;rs9978999- rs9979175;rs1034346-rs12481852;rs7509629-rs2828358;rs4817013-rs7277036; rs9981121-rs2829696;rs455921-rs2898102;rs2898102-rs458848;rs961301-rs2830208; rs2174536-rs458076;rs11088023-rs11088024;rs1011734-rs1011733;rs2831244- rs9789838;rs8132769-rs2831440;rs8134080-rs2831524;rs4817219-rs4817220; rs2250911-rs2250997;rs2831899-rs2831900;rs2831902-rs2831903;rs11088086- rs2251447;rs2832040-rs11088088;rs2832141-rs2246777;rs2832959-rs9980934; rs2833734-rs2833735;rs933121-rs933122;rs2834140-rs12626953;rs2834485-rs3453; rs9974986-rs2834703;rs2776266-rs2835001;rs1984014-rs1984015;rs7281674- rs2835316;rs13047304-rs13047322;rs2835545-rs4816551;rs2835735-rs2835736; rs13047608-rs2835826;rs2836550-rs2212596;rs2836660-rs2836661;rs465612- rs8131220;rs9980072-rs8130031;rs418359-rs2836926;rs7278447-rs7278858; rs385787-rs367001;rs367001-rs386095;rs2837296-rs2837297;And rs2837381- rs4816672。
In one embodiment, made using the part of fetus in sample and maternal nucleic acids (such as cfDNA) mixture For for the template expanded to the target nucleic acid comprising at least one STR.In certain embodiments, each target core Acid includes single (i.e. one) SNP.Str locus seat almost can find and can use on each chromosome in genome A variety of polymerase chain reaction (PCR) primers are expanded.Tetranucleotide repeat fragment due to the fidelity in being expanded in PCR and It is preferred in forensic science family belongings, but also uses some trinucleotides and pentanucleotide repeated fragment.It is relevant STR, disclosed PCR primer, the reference of conventional multiplicated system and Reference Group's data, the detail list editor of the fact and sequence information exist In STRBase, STRBase can pass through WWW ibm4.carb.nist.gov:8800/dna/home.htm access.Come from(http://www2.ncbi.nlm.nih.gov/cgi-bin/genbank) on commonly using str locus seat Sequence information can also be obtained by STRBase.Commercial reagents box available for analysis str locus seat is the commonly provided all necessary Reactive component and amplification required for control.STR multiplicated systems allow in single reaction while expanded multiple nonoverlapping Locus, this substantially adds throughput.Detected using multicolor fluorescence, in addition overlapping locus can also it is multiple enter OK.The polymorphism of the tandem sequence repeats DNA sequence dna blazoned throughout human genome makes these sequences turn into important genetic marker, Identify and test for assignment of genes gene mapping research, linking parsing and the mankind.Because STR polymorphism is high, most of individuals will It is heterozygous, i.e. it is each that most people possesses two allele (version) --- one by the heredity of each parental generation --- With different repetition numbers.PCR primer comprising STR can using artificial, semi-automatic or automatic mode separating and Detection.Automanual system is to be combined into a unit based on gel, and by electrophoresis, detection and analysis.Semi-automatic In formula system, gel assembling and sample loading are still artificial process;However, once sample is carried on gel, electrophoresis, detection And analysis will be carried out automatically.When the fragment of fluorescence labeling migrate across the detector of fixed point and can with collect it To observe them when, it is " real-time " progress Data Collection.As its name suggests, Capillary Electrophoresis is in microcapillary rather than in glass Carried out between plate.Once sample, gelatin polymer and buffer solution are loaded on instrument, then capillary is full of gelatin polymer simultaneously And automatic load sample.Therefore, the fetus STR sequences of non-maternal inheritance will be different from parental sequences in repetition number.Amplification These STR sequences can produce one or two kinds of corresponding with maternal allele (and foetal allele of maternal inheritance) Main amplified production, and a kind of secondary product corresponding with the foetal allele of non-maternal inheritance.This technology is in 2000 Year reports (Pu'er (Pertl) et al., human genetics (Human Genetics) 106 first:45-49 [2002]) and then A variety of different STR regions are identified using real-time PCR simultaneously and be developed (Liu et al., Acta Obset Gyn Scand 86:535-541[2007]).Pair of circulation fetus and mother body D NA materials has been distinguished using the PCR amplicons of various sizes Answer particle diameter distribution, and shown fetal DNA in maternal plasma DNA molecular (Chan et al., clinic generally shorter than mother body D NA molecules Chemistry (Clin Chem) 50:8892[2004].The size classification separation of circulation foetal DNA has confirmed, circulates foetal DNA piece The average length of section<300bp, and estimate mother body D NA between about 0.5Kb and 1Kb (Li et al., clinical chemistry, 50:1002- 1011[2004]).The invention provides a kind of method for being used to determine fetal nucleic acid fraction in a maternal sample, this method Comprising the copy number for determining at least one fetus and a maternal allele positioned at polymorphic miniSTR site, It is about to circulate size (for example, less than about 250 bases of fetal DNA fragments that miniSTR, which can pass through amplification to produce length, It is right) amplicon.In one embodiment, fetus fraction can be by a kind of including to the polymorphic target nucleic acid by amplification The method that is sequenced of at least a portion determine that each target nucleic acid includes a miniSTR.Positioned at informedness STR The fetus in site and maternal allele pass through its different length, i.e. repetition number distinguishes, and fetus fraction can lead to The ratio percentage for crossing the amount of the fetomaternal allele positioned at the site calculates.This method can use an informedness MiniSTR or any number of informednesses miniSTR's combines to determine the fraction of fetal nucleic acid.In one embodiment, This method includes determining to be located at least in a polymorphic miniSTR at least one fetus and copying at least one maternal allele Shellfish number, the miniSTR by amplification with generate less than about 300bp, less than about 250bp, less than about 200bp, less than about 150bp, Amplicon less than about 100bp or less than about 50bp.In another embodiment, produced by being expanded to miniSTR Raw amplicon is less than about 300bp.In another embodiment, by carrying out expanding caused amplicon to miniSTR Less than about 250bp.In another embodiment, it is less than about by amplicon caused by being expanded to miniSTR 200bp.The amplicon that the amplification of informedness allele can reduce including the use of miniSTR primers, these primers to size Expanded with detection less than about 500bp, less than about 450bp, less than about 400bp, less than about 350bp, less than about 300 base-pairs (bp), the STR equipotential bases less than about 250bp, less than about 200bp, less than about 150bp, less than about 100bp or less than about 50bp Cause.Using caused by miniSTR primers size reduce amplicon be referred to as miniSTR, these miniSTR according to them The corresponding label title identification of locus through mapping.In one embodiment, miniSTR primers include being directed to can All 13 CODIS str locus seats found in commercially available STR kits, in addition to D2S1338, Penta D and pentaE, Have allowed for amplicon size miniSTR primers (Nicholas Murray Butler (Butler) et al., medical jurisprudence that farthest size reduces Magazine (J Forensic Sci) 48:1054-1064 [2003]), as described in Cusparia (Coble) and Nicholas Murray Butler not with CODIS MiniSTR locus (Cusparia and Nicholas Murray Butler, the Journal of Forensic Sciences 50 of label connection:43-53 [2005]) and exist Other miniSTR that NIST is characterized.The information of miniSTR about being characterized in NIST can be via WWW Cstl.nist.gov/biotech/strbase/newSTRs.htm is obtained.It is any to miniSTR primers or two pairs or more pair The combination of miniSTR primers can be used for expanding at least one miniSTR.
Target nucleic acid in amplification fetus and maternal nucleic acids (such as cfDNA) mixture is by using PCR or such as at this Any method of the variation described elsewhere of application is realized.Expand these target sequences be can be multiple using each Amplification includes the primer pair realization of a target nucleic acid sequence of polymorphic site (such as SNP) in PCR reactions.Multiplex PCR is anti- Should include by least two, at least three, at least three, at least five, at least ten, at least 15, at least 20, at least 25, At least 30, at least 40 or more primer sets are combined in same reaction, to quantify in same sequencing reaction Including at least two, at least three, at least five, at least ten, at least 15, at least 20, at least 25, at least 30, extremely The target nucleic acid by amplification of few 40 or more polymorphic sites.Any group of primer set can be configured to expand Increase at least one informedness polymorphic sequence.
Primer is designed to hybridize to ensure that the SNP site is included in the sequence of a SNP site close on cfDNA In length by reading caused by sequenator.As provided in example, for identifying the primer set of any one polymorphic site In two primers in it is at least one hybridized in a manner of close enough polymorphic site passing through so that the polymorphic site is covered Carried out on Yi Lu meter Na analyzers GII caused by large-scale parallel sequencing in 36bp readings, and produce length and be enough Cluster carries out the amplicon of bridge amplification during being formed.Therefore, primer is designed to produce at least 110bp amplicon, this A little amplicons are in sub (Illumina Inc. of San Diego, CA city of General adaptive with being expanded for cluster (Illumina Inc., San Diego, CA)) combination when produce at least 200bp DNA molecular.The SNP provided in table 33 is used In in a multiple check simultaneously expand 13 target sequences.It is an exemplary SNP group that group is provided in table 33.Can To be directed to polymorphic target nucleic acid enriches fetal and mother body D NA using less or more SNP.The extra SNP that can be used It is included in the SNP provided in table 34.SNP allele runic shows and underlined.Available for the side according to the present invention Method determine fetus fraction other SNP include rs315791, rs3780962, rs1410059, rs279844, rs38882, rs9951171、rs214955、rs6444724、rs2503107、rs1019029、rs1413212、rs1031825、 Rs891700, rs1005533, rs2831700, rs354439, rs1979255, rs1454361, rs8037429 and rs1490413.These SNP are analyzed by TaqMan PCR for determining fetus fraction, and are disclosed in United States Patent (USP) In the open 2010-0010085 of application.
The DNA sequence dna of primer forward or backwards and a close enough polymorphic site in each primer set is miscellaneous Hand over to be included in by carrying out sequence caused by the large-scale parallel sequencing by the previously selected polymorphic nucleic acid of amplification In row reading.The length of sequence reads is relevant with specific sequencing technologies.Large-scale parallel sequencing method provides size from several The sequence reads that ten base-pairs change to hundreds of base-pairs.At least one primer in each primer set is designed to know Not in 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about 90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, about 450bp or about 500bp sequence An existing polymorphic site in reading.In certain embodiments, at least one primer in each described primer set It is designed to identification existing polymorphic site in about 25bp, about 40bp, about 50bp or about 100bp sequence reads.
Circulate Cell-free DNA about<300bp.Therefore, primer set is designed to length averagely up to about 300bp's Polymorphic sequence hybridizes and it is expanded, and wherein foetal DNA length is averagely about 170bp.In certain embodiments, draw Thing set and DNA hybridization, produce up to about 300bp amplicon.In other embodiments, primer set and the DNA sequences Row hybridization, produce at least about 100bp, at least about 150bp, at least about 200bp amplicon.Primer set can with identical dye DNA sequence dna present on colour solid hybridizes or with hybridizing in DNA sequence dna present on coloured differently body.For example, one or more Individual primer set can be with hybridizing in sequence present on phase homologous chromosomes.Alternately, two or more primer sets with Hybridize in sequence present on coloured differently body.In one embodiment, one or more in chromosome 1 to 22 of primer pair Polymorphic sequence is expanded present on individual.In certain embodiments, primer set not with chromosome 13,18,21, X or Y Present on DNA sequence dna hybridization.
In step 740 (Fig. 7), part or all for using expanded polymorphic sequence is used for described flat to prepare The sequencing library of line mode sequencing.In one embodiment, library is prepared so that use Yi Lu meter Na is based on reversible termination The sequencing chemical technology synthetic method of son is sequenced.
In step 740, the sequence information required for determining fetus fraction is come using any known DNA sequencing method Obtain.Preferably, method described here is provided such as the institute elsewhere in the application using sequencing technologies of future generation (NGS) The countable sequence label of description.Sequencing can be synthetic method large-scale parallel sequencing.Preferably, synthetic method is parallel on a large scale Sequencing uses reversible dye-terminators.Alternately, large-scale parallel sequencing can be that connection method is sequenced, or single-molecule sequencing.
Part sequencing is carried out to the polymorphic nucleic acid of target expanded, and to including predetermined length (such as 36bp) reading The sequence label for counting, being mapped to known reference genome is counted.The sequence reads only compared with reference gene group uniqueness Counted as sequence label.In one embodiment, reference gene group includes polymorphic target nucleic acid (SNP) sequence Made Target sequence gene group.In one embodiment, reference gene group is artificial SNP reference genes group.In another reality Apply in scheme, reference gene group is artificial STR reference genes group.In still another embodiment, reference gene group is manually to go here and there Join STR reference gene groups.Artificial reference genome can use the polymorphic nucleotide sequence editor of target.Artificial reference genome can be with The polymorphic target sequence of one or more different types of polymorphic sequences is included including each.For example, artificial reference base Because group can include the polymorphic sequence comprising SNP allele and/or STR.In one embodiment, reference gene group is people Class reference sequences genome NCBI36/hg18 sequences, it is in WWW genome.ucsc.edu/cgi-bin/hgGateway Org=Human&db=hg18&hgsid=166260105 can be obtained.Other disclosed source of sequence information include GenBank, DbEST, dbSTS, EMBL (European Molecular Biology Laboratory (European Molecular Biology Laboratory)) And DDBJ (DNA Data Bank of Japan).In another embodiment, reference gene group includes mankind's reference gene group NCBI36/hg18 sequences and the made Target sequence gene group including target polymorphic sequence, such as SNP genomes.By that will reflect The sequence of the sequence and reference gene group of penetrating label is compared to determine the dyeing of nucleic acid (such as cfDNA) molecule be sequenced Body starting point can realize the mapping of sequence label, and not need specific genetic sequence information.A variety of computerized algorithms can be used In aligned sequences, include but not limited to BLAST (Ao Ciqiu (Altschul) et al., 1990), BLITZ (MPsrch) (Si Teluo Section and Collins (Sturrock&Collins), 1993), FASTA (pul is inferior and Lippmann (Pearson&Lipman), 1988), BOWTIE (youth's lattice rice (Langmead) et al., genome biology (Genome Biology) 10:R25.1-R25.10 Or ELAND (Illumina Inc. of San Diego, CA, USA city (Illumina, Inc., San [2009]) Diego,CA,USA)).In one embodiment, one end of the copy with clonal fashion amplification of blood plasma cfDNA molecules is entered Row is sequenced and is acted upon by the bioinformatics comparison analysis of Yi Lu meter Na genome analysis instrument, Yi Lu meter Na genomes Analyzer is efficiently compared to carry out using the extensive of RiboaptDB (ELAND) software.Including the use of NGS sequence measurements It is determined that presence or absence of in the embodiment of aneuploidy and the method for fetus fraction, sequencing is believed to determine aneuploidy The analysis that breath is carried out can allow lesser degree of mispairing (each 0 to 2 mispairing of sequence label), with explanation reference genome with mixing Small polymorphism that may be present between genome in conjunction sample.For the analysis for determining fetus fraction and being carried out to sequencing information Lesser degree of mispairing can be allowed, this depends on polymorphic sequence.For example, if polymorphic sequence is STR, then Ke Yiyun Perhaps lesser degree of mispairing.In the case where polymorphic sequence is SNP, first pair with two allele of SNP site Any one all sequences accurately matched counted and filtered out from residual readings, for residual readings, Ke Yiyun Perhaps lesser degree of mispairing.As described in this, or it can use using by the sequence label of chromosome interested Digit normalizes (model (Fan) et al., NAS relative to the median of the label of each in other autosomes Proceeding (Proc Natl Acad Sci) 105:16266-16271 [2008]) or compare what is be compared with each chromosome The number of unique reading and the reading sum that is compared with all chromosomes are to draw the genomic expression of each chromosome The substitution analysis of percentage, it is determined that the quantization of the number for the sequence reads being compared with each chromosome is to determine chromosome Aneuploidy." z-score " is produced to represent the genomic expression percentage of chromosome interested with phase homologous chromosomes in multiple The difference divided by standard deviation (Zhao (Chiu) et al., clinical chemistry (Clin between average expression percentage between body control group Chem)56:459-463[2010]).In another embodiment, sequencing information can as filed in 19 days January in 2010 mark Topic is determined described in the U.S. provisional patent application cases 32047-768.101 of " normalized biological test ", the Shen Please it is hereby incorporated by by quoting with its full text.
Analysis to determine fetus fraction and being carried out to sequencing information can allow lesser degree of mispairing, and this depends on more State sequence.For example, if polymorphic sequence is STR, then lesser degree of mispairing can be allowed.It is SNP in polymorphic sequence In the case of, counted first pair with any one all sequences accurately matched in two allele of SNP site Count and filtered out from residual readings, for residual readings, lesser degree of mispairing can be allowed.By being surveyed to nucleic acid Sequence determines that the inventive method of fetus fraction can be applied in combination with other method.
In step 760, fetus fraction be based on included in reference gene group informedness polymorphic site (such as SNP the sum of the label of the first allele is mapped on) with the sum for the label for being mapped to the second allele to determine.Lift Example for, reference gene group be cover including SNP rs560681, rs1109037, rs9866013, rs13182883, rs13218440、rs7041158、rs740598、rs10773760、rs4530059、rs7205345、rs8078417、 rs576261、rs2567608、rs430046、rs9951171、rs338882、rs10776839、rs9905977、 rs1277284、rs258684、rs1347696、rs508485、rs9788670、rs8137254、rs3143、rs2182957、 The made Target sequence gene group of rs3739005 and rs530022 polymorphic sequence.In one embodiment, artificial reference Genome includes SEQ ID NO:7 to 62 polymorphic target sequence (referring to example 24).
In another embodiment, artificial gene group is to cover the made Target of the polymorphic sequence comprising series connection SNP Sequence gene group.In another embodiment, made Target genome covers the polymorphic sequence comprising STR.Made Target The composition of sequence gene group for determining the polymorphic sequence of fetus fraction by depending on changing.Therefore, made Target sequence gene group Be not limited to this illustration SNP, series connection SNP or STR sequences.
It is every in the difference and possible allele of the sequence that informedness polymorphic site (such as SNP) passes through allele The amount of one identifies.Fetus cfDNA exists with the concentration less than parent cfDNA 10%.Accordingly, with respect to parent equipotential base The main contributions of cause, the allele that can distribute to fetus be present to fetus and the minor contributions of maternal nucleic acids mixture.Come The allele for coming from maternal gene group is referred herein to main allele, and derives from the allele of Fetal genome herein Referred to as secondary allele.The allele represented with the similar level of the sequence label mapped represents maternal allele.It is right The result of the exemplary multiplex amplification of target nucleic acid progress comprising SNP and from Maternal plasma sample is shown in Figure 12. By informedness SNP and positioned at polymorphic site single nucleotides change distinguished, and foetal allele by with parent Nucleic acid is compared to the main contributions of fetus and maternal nucleic acids mixture, and its contribution to the mixture in sample is relatively secondary To distinguish.Therefore, for each of two allele at predetermined polymorphic site, fetus cfDNA is in parent sample Relative abundance in product can be determined, the unique sequences label as the target nucleic acid sequence being mapped in reference gene group The parameter of sum.In one embodiment, for each informedness allele (allelex), such as the application its Described in his place, the fraction of fetal nucleic acid in fetus and maternal nucleic acids mixture is calculated.
Use STR sequences and capillary electrophoresis estimation fetus fraction
Because repetition number is different, individual has different STR length.Because STR polymorphism is high, most of individuals To be heterozygous, i.e. it is every that most people possesses two allele (version) --- one by the heredity of each parental generation --- It is individual that there is different repetition numbers.The fetus STR sequences of non-maternal inheritance will be different from parental sequences in repetition number.Amplification These STR sequences can produce one or two kinds of corresponding with maternal allele (and foetal allele of maternal inheritance) Main amplified production, and a kind of secondary product corresponding with the foetal allele of non-maternal inheritance.When sequencing, can incite somebody to action Collected sample is associated with corresponding allele and is counted to determine relative fractions by using equation 3.
The sample purified by using the primer pair of fluorescence labeling enters performing PCR.It can use artificial, semi-automatic or automatic Change electrophoresis to separate and detect the PCR primer comprising STR.Automanual system be based on gel and by electrophoresis, detection A unit is combined into analysis.In automanual system, gel assembling and sample loading are still manual program;However, Once sample is loaded on gel, then electric ice, detection and the automatic progress of analysis.As its name suggests, capillary electricity ice is in microcapillary In rather than carry out between glass plate.Once sample, gelatin polymer and buffer solution are loaded on instrument, then capillary is full of solidifying Xanthan polymer and automatic load sample.When the fragment of fluorescence labeling migrates across the detector of fixed point and can be with When collecting them and being observed that them, " real-time " progress Data Collection.The sequence that capillary electricity ice obtains altogether can pass through survey The program of amount fluorescence labeling wavelength is detected.The calculating of fetus fraction is to be based on being averaged out informedness label.Informedness Label is identified by the presence of electrophoresis pattern upward peak, and these peak values fall the present count in the STR for being analyzed According in case parameter.
Fraction for the secondary allele of any specify information label is removed by the peak height of submember Calculated with the peak height summation of principal component, and the fraction representation is is directed to the percentage of each information gene seat as follows Than:
The fetus fraction for the sample comprising two or more informednesses STR can be calculated, as two or more The fetus score average that multiple informedness labels are calculated.
Estimate fetus fraction using mixed model
In embodiment disclosed here, up to four kinds of different data types (distribution type situation) be present, they are formed in The secondary gene frequency data of polymorphism in consideration.
As shown in Figure 13, situation 1 and situation 2 are polymorphic implementations, and wherein mother is pure at a certain allele Mould assembly.In situation 1, if baby and mother are homozygous, then polymorphism is the polymorphism of situation 1.This situation is typically It is not to make us especially interested, because collected data only have a type of equipotential base in the polymorphic site analyzed Cause.In situation 2, if mother is homozygous and baby is heterozygous, then fetus fraction f is on paper by secondary equipotential base Obtained because counting with 2 times of the ratio of coverage.Coverage is defined as being mapped to the reading or mark of polymorphism specific site Sign (fetus and parent) sum.In situation 2 with the fraction of fetus and maternal sample come to fetus fraction carry out approximate evaluation etc. Formula is as follows:
In situation 3, wherein mother is heterozygous and baby is homozygous, and fetus fraction is time equipotential on paper 1-2 times of the ratio of gene count and coverage.In situation 3, number is both always read with fetus and maternal sample Fraction comes as follows to the approximate equation of fetus fraction progress:
Finally, in situation 4, wherein mother and fetus are heterozygous, and secondary equipotential mrna fraction should always 0.5 (no Including error).For falling the polymorphism in situation 4, fetus fraction can not be derived.
If the number that table 7 summarizes main allele reading be 300 and the number of secondary allele reading be 200, that The example of fetus fraction is estimated using equation 4 and 5.Coverage can be 500.
Table 7:Use the example of distribution type estimation fetus fraction
In certain embodiments, it can use what mixed model was proposed polymorphism sets classification into two or more Distribution type situation, while estimating foetal DNA fraction from average gene frequency for each of these situations.It is overall For, mixed model assumes that specific data acquisition system is made up of the mixing of different types of data, and each of which has its own Desired distribution (such as normal distribution).The program attempts to find the average value of each categorical data and other possible spies Sign.In embodiment disclosed here, it is under consideration to there is up to four kinds of different data types (distribution type situation), its composition For the secondary gene frequency data of polymorphism.
Using mixed model some embodiments in, for be just thought of as polymorphism position calculate by equation 1 to The one or more factorial moments gone out.For example, calculate factorial moment F using multiple SNP positions considered in DNA sequence dnai(or rank Multiply a collection of square).Shown in following article equation 10, each different factorial moment FiIt is to given position, for secondary allele frequency Rate aiWith coverage diRatio, the summation on all different polymorphism positions considered.Shown in following article equation 11, this A little factorial moments further relate to the parameter alpha relevant with each of above-mentioned four kinds of distribution type situations and pi.Exactly, they are related to pin To the Probability p of each situationi, and given by α, phase in four kinds of situations of concentration in the polymorphism considered each To amount.As explained above, Probability piIt is the function of the fraction of foetal DNA in the Cell-free DNA in mother's blood.Following article is more Fully explain, by calculating these sufficient amount of factorial moments, this method sufficient amount of expression formula is provided obtain it is all not The amount of knowing.Unknown quantity in the case of this can be in the polymorphism population considered, the relative quantity of each of four kinds of situations with And the probability (and be thus foetal DNA fraction) related to each of these four kinds of situations.Use the mixed of other versions Matched moulds type can obtain similar results.Some versions are merely with the polymorphism fallen in situation 1 and situation 2, wherein situation 3 and feelings The polymorphism of condition 4 is filtered by threshold technology.
Therefore, factorial moment can be used as a part for mixed model, to identify any combination of general of the four of distribution type kinds of situations Rate.Also, as mentioned, these probability, or these probability at least for situation 2 and situation 3, are directly related in mother's blood Total Cell-free DNA in foetal DNA fraction.
It should also be mentioned that the sequencing error given by e can be used for the system complex for reducing the factorial moment equation that must be solved Property.At this point, it should be appreciated that sequencing error (can essentially correspond to any with any one in four kinds of results Each in four possible bases of given polymorphism position).
Assuming that the main allele counting in genomic locations j is B, the single order of j counting (counting of reading) in position Statistic.Main allele, b, it is corresponding independent variable maximum (arg max).When considering more than one SNP, under use Mark.Counted by main allele given below:
Assuming that position j secondary equipotential gene count is A, the j counting (that is, secondary highest allele counts) in position Second-order statistic:
Coverage is defined as being mapped to total reading number (fetus and parent) in the specific site of polymorphism.Assuming that position j Coverage is defined as D:
D≡Dj={ di}=Aj+BjEquation 8
In this embodiment, secondary gene frequency A is the summation of four as shown in equation 9.Described four The implementations prompting of kind heterozygosis is directed in point (ai,di) aiThe following binomial mixed model of the distribution of individual secondary equipotential gene count, Wherein diIt is coverage:
A={ ai}~α1Data box (p1, di)+α2Data box (p2, di) 3 data box (p of+α3, di)+α4Data box (p4, di)
Wherein
1=α1234
M=4
Equation 9
Each single item corresponds to one of four kinds of distribution type situations.Each single item is the two of polymorphism fraction α and time gene frequency The product of item formula distribution.These α represent the fraction of the polymorphism in four kinds of situations in each.Each binomial distribution tool There are the probability of correlation, p, and coverage, d.The secondary allele probability of situation 2 is for example given by f/2, and wherein f is fetus point Number.For making piThe different models associated from fetus fraction and sequencing error rate are described as follows.Parameter alpha i is related to group specificity Parameter and relative to the race and offspring of such as parental generation, allows the ability of these values " floating " to assign these methods extra robust Property.
Disclosed embodiment utilizes the factorial moment for the gene frequency data in considering.It is well known that point Cloth average value is first moment.It is the desired value of time gene frequency.Variance is second moment.It is put down from gene frequency The desired value of side calculates.
For different heterozygosis implementations, above equation 9 can solve fetus fraction.In certain embodiments, fetus Fraction is solved by factorial Moment Methods, and wherein hybrid parameter can be represented with square, and these squares can be easily from observed data Estimate.
It can be used for calculating i-th of factorial moment F across the gene frequency data of all polymorphismsi(the first factorial moment F1, Two factorial moment F2Deng), as shown in equation 10.(SNP is only used for the purpose of example.Other kinds of polymorphism can be such as in the application Discuss use elsewhere.) giving n SNP position, then factorial moment is defined as below:
As shown in these equatioies, factorial moment is more than the summation of i items (the individual polymorphism in data set), wherein counting N such polymorphisms be present according to concentration.The items of summation are time equipotential gene count ai, and coverage value diFunction.
Usefully, factorial moment and αiAnd piValue it is relevant, as illustrated by equation 11.Factorial moment can be with { αi,piClose Connection, so as to
From Probability piFetus fraction f can be determined.For example,AndTherefore, reliable logic can To obtain solution of equations, this equation group makes unknown quantity α and p variable with being directed to across in the multiple polymorphisms considered equipotentials The factorial moment expression formula association of mrna fraction.Certainly, exist in the range of disclosed embodiment and mixed model is solved Other technologies.
Work as n>During 2* (number of parameters to be estimated), by obtaining in the equation group derived by relation above equation 8 {αi,piSolution can identify a solution.It is clear that the problem mathematically becomes much more difficult, because g is higher, it is necessary to estimate { the α of meteri,piMore.
Situation 1 and situation 2 (or situation can not possibly typically be distinguished exactly by the simple threshold values under lower fetus fraction 3 with situation 4) data.By pointMake a distinction, can by the data of situation 1 and situation 2 easily with situation 3 With the data separating of situation 4, wherein A is time equipotential gene count and D is coverage and T is threshold value.Have found and use T =0.5 can show satisfaction.
Paying attention to, the method with mixed model using equation 10 and equation 11 is to utilize the data of all polymorphisms, but no point Bright sequencing error is not mentionleted alone.Can be with from the proper method of the data separating of the third and fourth situation by the data of the first and second situations Illustrate sequencing error.
In additional examples, there is provided the data set to mixed model is only included for the polymorphism of situation 1 and situation 2 Data.It for mother is homozygous polymorphism that these, which are,.Threshold technology can be used to eliminate the polymorphism of situation 3 and 4.For example, Before using mixed model, the polymorphism that wherein secondary gene frequency is more than to specific threshold value excludes.Using through appropriate mistake Data of filter and according to the factorial moment of the hereafter abbreviation of equation 13 and 14, people can calculate fetus fraction f, in equation 15 It is shown.It is stating again for the equation 9 for this implementation for being directed to mixed model to pay attention to equation 13.It is it is also noted that specific real at this In example, the sequencing error relevant with machine reading is unknown.As a result, it is necessary to obtain the error of equation group, e respectively.
Figure 14 shows result and known fetal fraction (X-axis) using this mixed model and the fetus fraction (Y of estimation Axle) comparison.If mixed model ideally predicts fetus fraction, then the result of description will comply with dash line.However, estimate The fraction of meter is significantly good, is especially considering that most of data are excluded before application mixed model.
In order to be described in further detail, parameter Estimation is carried out to the model from equation 7 using some other methods.At some In the case of, can be by the way that chi amount (chi-squared statistic) derivative be set as into be zero disposable to find Solution.In the case where that can not find easy solution by direct differentiation, to binomial probability distribution function (PDF) or other are near It can be effective to carry out Taylor series expansion like multinomial.Well known minimum X2 estimator is effective.From equation 9 The method of square solution is asked to can be used as the starting point of iterative method.Following card side estimator can be used:
Wherein PiIt is the points for counting i.The alternative manner of Lycra grace (Le Cam) [" assume asymptotic by estimation and testability Theoretical (Asymptotic Theory of Estimation and Testing Hypotheses) ", primary gram of third time count Reason statistics and probability conference Papers collection (Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability), volume 1, Bai Kelai, California (Berkeley CA):Publishing house of University of California (University of CA Press), page 1956,129 to page 156] it is to use La Erfu-Newton iteration (Ralph-Newton iteration) in likelihood function.
Applied according to another kind, discuss a kind of method for parsing mixed model, it is related to being mixed into for pairing approximation β-distribution The desired value maximization approach of row operation.
Model 1:Situation 1 and 2, sequencing error is unknown
Consider the diminution model for only illustrating heterozygosis implementations 1 and 2.In this case, mixture distribution can be write as:
A={ ai}~α1Bin (e, di)+α2Bin (f/2, di)
Wherein
1=α12
M=4 equatioies 13.
And by equation group:
F11e+(1-α1)(f/2)
F21e2+(1-α1)(f/2)2
F31e3+(1-α1)(f/2)3Equation 14,
Solve e (sequencing error rate), α (ratio that situation) and f (fetus fraction), wherein F at 1 pointiSuch as above equation 10 Defined in.The closed-form solution selection of fetus fraction is the real solution of below equation:
The solution is between 0 and 1.
In order to measure the performance of reckoning formula, with the tire for being designed as { 1%, 3%, 5%, 10%, 15%, 20% and 25% } Youngster's fraction and 1% constant sequencing error rate construct Ha Di-Wen Boge equalization points (Hardy-Weinberg Equilibrium Points simulated data sets (a)i, di).The ratio that 1% error rate is used sequencing machine and scheme is currently received, It is and consistent with the Yi Lu meter Na gene part parser II data shown in Figure 15.Equation 15 is applied to the data simultaneously And find in addition to four points are to the upper deviation, it is unanimous on the whole with " known " fetus fraction.Interestingly, according to estimating Meter, sequencing error rate, e, is just above 1%.
Model 2:Situation 1 and 2, known to sequencing error
In next mixture model example, determined again using threshold value or another filtering technique belongs to feelings to remove The data for polymorphism of condition 3 and 4.However, in this case, sequencing error is known.This measure simplifies fetus point Number, f, gained expression formula, as shown in equation 16.Figure 16 shows that this version of mixture model and equation 15 are used Method compare and provide the result of improvement.In subsequent equation, it is e to make sequencing machine error rate.
A kind of similar method is shown in equation 17 with 18.This method recognizes that only some sequencing errors add To secondary equipotential gene count.However, only have one should increase time equipotential gene count in every four sequencing errors.Figure 17 is shown Use actual very good the agreeing with property between the fetus fraction of estimation during the technology.
Because the sequencing error rate of the machine used is largely known, it is intended to solve by eliminating to be used as The e of variable can reduce the deviation and complexity of calculating.Therefore, we obtain the equation group for fetus fraction f:
F11e+(1-α1)(f/2)
F21e2+(1-α1)(f/2)2Equation 16, to be solved:
Figure 16 shows that the parameter known to can be reduced a little to the upper deviation using machine error rate.
Model 3:Situation 1 and 2, sequencing error is, it is known that improved error model
In order to improve the deviation in the model, we expand the error model of above equation to illustrate following facts: It is not that each sequencing error event can increase to time equipotential gene count A=a in heterozygosis implementations 1i.In addition, we allow Following facts:Sequencing error event potentially contributes to the counting of heterozygosis implementations 2.Therefore, we to following factor square by closing The system of system is solved to determine fetus fraction f:
F11e/4+(1-α1)(e+f/2)
Then the solution of the system is:
Figure 17 is shown using machine error rate as known parameters, strengthens the simulation number of the error model of situation 1 and 2 According to, make to the upper deviation be significantly reduced to be less than be directed to less than 0.2 fetus fraction point.
Impacted sample is classified using fetus fraction
In certain embodiments, impacted sample is further characterized using fetus fraction estimate.In some feelings Under condition, fetus fraction estimate allows impacted sample being categorized as mosaic, complete aneuploidy or partial non-whole Ploidy.Describe a kind of computer-implemented method for obtaining the information relative to Figure 18 flow chart.It can carry out this Come to provide the estimation of fetus fraction, CNV determination and CNV classification simultaneously with the method for correlation.In other words, phase can be used With label carry out any of these three functions.
In order that with this method, using the patterns of two kinds of assessment fetus fractions.A kind of pattern produces NCNFF values, and another Kind pattern produces CNFF values.As explained above, CNFF values are to use the chromosome or dye dependent on being determined possessing copy number variation The technology of colour solid section and obtain.Polymorphism is needed not rely on to calculate fetus fraction.For calculating the non-polymorphic of fetus fraction One example of technology is described in example 17, and the example assumes the duplication that whole chromosome be present or missing and uses following table Up to formula:
ff(i)=2*NCVjACVjUEquation 28,
Wherein j represents the identification of aneuploidy chromosome, and CV represents being used for of being obtained from qualified samples and determines pin To the average value and the coefficient of variation of standard deviation in NCV expression formula.
NCNFF values are to use the technology for depending on chromosome or chromosome segment without copy number variation and obtain. In other words, NCN fetuses fraction is assuming that in the case of for calculating the normal ploidy of the part of the genome of fetus fraction, is led to Cross and reliably determine that the technology of fetus fraction determines.CN fetus fractions are by assuming that the sample paid attention to has non-multiple A form of technology of property determines.Impacted chromosome or the CNV of chromosome segment are used for calculating CN fetus fractions.Under The technology for its calculating is presented in text.
Estimate by comparing NCN fetus fractions contrasts the estimate of CN fetus fractions, and a kind of method can determine that sample In aneuploidy that may be present type.Substantially, if NCN fetuses fraction and CN fetuses fractional value matching, then with Ploidy in the technology for assessing CN fetus fractions is assumed to be considered to be really.If for example, calculate CN fetus fractions Method assumes that sample has complete chromosomal aneuploidy, and single adds that the aneuploidy shows a chromosome is copied Shellfish or a single missing of a chromosome, and NCN fetuses fractional value matching CN fetus fractional values, then this method can obtain Go out to draw a conclusion:The sample shows complete chromosomal aneuploidy.The basis for making the hypothesis is described in greater detail in hereafter In.
NCN fetuses fraction can be determined by different technologies.In some embodiments, using canonical sequence genome In selected polymorphism estimation NCN fetus fractions.During the examples of these technologies is described above.In other embodiments In, NCN fetuses fraction using it is known be not aneuploid or have determined the relative quantity for the chromosome for not being aneuploid come It is determined that.For example, known in sample is not that the chromosome of aneuploid is probably the chromosome x in male fetus.Therefore, exist In other embodiments, the X chromosome or the phase of Y chromosome in the sample comprising the DNA from the pregnant woman for nourishing son are used NCN fetus fractions are determined to amount (for example, chromosome dosage of such chromosome).The genome of son should not include X and contaminate Second copy of colour solid.Known this point, X chromosome DNA relative quantity can be used for the NCN values for providing fetus fraction.Comprising In female child DNA sample, it is known that the chromosome for not being aneuploid can be known chromosome not compatible with life.Can Alternatively, for including the sample of the DNA from sex fetus, chromosome dosage can be determined using sequence label (and NCV or NSV) with confirm chromosome can be used for determine NCN fetus fractions, come determine can be used for determine NCN fetus fractions dye The presence of the normal ploidy of colour solid.
Figure 18 flow chart 1800 is gone to, compares NCN fetus fractions estimate 1802 and CN fetus fractions estimate 1804. If they are matched, as indicated at square frame 1806, then the process is drawn a conclusion, and determines to be used to estimate CN fetus fractions Technology in it is contained assume be real.In different embodiments, this is assumed to be:Have three in one of chromosome of fetus Body or monosomy.
On the other hand, if the comparison indicate that, the value of two fetus fractions mismatches (condition 1808) and actually CN The estimate of fetus fraction is less than NCN fetus fractions, then by the second stage as indicated execution this method at square frame 1810.
In the second stage, this method determines that sample is the aneuploidy or mosaic for including part.If in addition, Sample includes the aneuploidy of part, then this method determines the where that aneuploidy is resided on aneuploid chromosome. In some embodiments, this is by the way that impacted chromosome is cased into multiple matrixs to realize first.In an example, Each matrix is about 1,000,000 base-pairs in length.Of course, it is possible to use other matrix length, such as from about 1 kilobase, about 10,000 Base, about 100 kilobase etc..These matrixs are not overlapping and cross over the most or all of length of the chromosome.By these bases Block or data box are compared to each other, and this compares the opinion provided on condition.In one approach, for each matrix or number According to case, the label of mapping is counted and is optionally converted into data box dosage.If in these data boxes or matrix Any one is aneuploid, then these are counted or data box dosage is just pointed out.Analysis as single data box A part, it will more suitably can be normalized from the information of each data box to illustrate to make a variation between data box, as G-C contains Amount.The normalized data box of gained is properly termed as the NBV for normalized data bin values;NBV is one of chromosome segment Example, the chromosome segment normalize to the label of the normalization section for the G/C content for being mapped to the section with similar G/C content (such as in example 19 below).In some embodiments, calculate fetus fraction for each data box and compare fetus fractional value Independent value.The sequence analysis of each data box is depicted in Figure 18 square frame 1812.If any data box or matrix are known Wei not be with aneuploidy (by considering label densities, fetus fraction or other information), then this method determines the sample bag Aneuploidy containing part and additionally fully deviate the data box of desired value with wherein label counting to position the non-multiple Property.Referring to square frame 1814.
If however, when these the independent ends for the chromosome that analysis is paid attention to, this method nonrecognition shows non-whole Any chromosomal region of ploidy, then this method determines that sample includes mosaic.Referring to square frame 1816.
On the chromosome interested of impacted sample and the known chromosome for not being aneuploid is (for example, dyeing Body X) on use polymorphism, such as SNP, to calculate and more real fetus fraction, to determine exist in male fetus Or in the absence of complete or partial aneuploidy
As explained above, use information polymorphic sequence, such as information SNP, it is determined that fetus fraction (FF) can be used for distinguished Whole chromosomal aneuploidy and partial aneuploidy.
Presence or absence of aneuploidy, either part is still complete, can be from use chromosome interested The value for the fetus fraction that upper existing polymorphic target sequence determines determines, and from using chromosome different in the sample The value for the fetus fraction that upper existing polymorphic target sequence determines is compared.In the sample that fetus is male, it may be determined that FF on chromosome interested, and compared with being directed to the FF that chromosome x determines in same sample.For example, given mother Body sample is from the mother for nourishing the male fetus with trisomy 21, then selection polymorphic sequence, such as comprising at least one Information SNP sequence, so as to be presented on chromosome 21 and chromosome x on;Polymorphic target sequence is expanded and is sequenced, and And such as determine fetus fraction in the explanation elsewhere of the application.
Given fetus fraction is proportional to the amount of fetal chromosomal in sample, then is dyed using trisomy in maternal sample The fetus fraction that existing polymorphic sequence determines on body will not be using known in the male fetus in identical maternal sample 1+1/2 times of the fetus fraction that polymorphic sequence on the chromosome (for example, chromosome x) of aneuploid determines.For example, normal In sample, the polymorphism group on using chromosome 21 determines fetus fraction (FF21) and use the polymorphism group on chromosome x Determine fetus fraction (FFX) when, it is known that chromosome x is unaffected in male fetus, then FF21=FFX.However, such as Fruit fetus is trisomy for chromosome 21, then for the fetus fraction (FF of trisomy chromosome 2121) will be equal to it is identical Fetus fraction (the FF of chromosome x in sampleX) a half times (FF again21=1.5*FFX).Then, if FF21<FFX, So analysis logic can be drawn to draw a conclusion:The missing of the part of chromosome 21 be present and/or mosaic be present.If FF21> FFX, then analysis logic can be drawn to draw a conclusion:A part for chromosome 21 increased, such as the part of chromosome 21 Replicate or double or completely replicate, chromosome 21 is not said in the technology for calculating fetus fraction by chromosome 21 It is bright.Difference between two results one can be solved duplication for part, will produce<1.5*FFXFF.Alternately, mosaic The duplication of part, missing or exist can be for example, by increasing the polymorphic sequence number on chromosome 21 so as to along the length of the chromosome Degree obtains multiple FF values to be determined so that shows the part of chromosome for FF dual or multiple value locally lie in It increased.Alternately, such as using as the situation for mosaic sample, the FF determined by polymorphic sequence is in the whole of chromosome Keep constant in individual length, show that the amount of complete chromosome generally increases, but the increase be less than be directed to FFXIncrease, as above Described in text.In the case where the loss of whole chromosome be present, such as chromosome x monosomy, then FFMonosomy=1/2FFX.By Information polymorphic sequence obtain fetus fractional value can be used for sequence dosage and its normalized dose value, such as NCV, NSV is combined, for confirming complete aneuploidy be present.
By the chromosome Rapid Dose Calculation fetus fraction of aneuploid sequence
NCV for chromosome interested is calculated according to below equation:
WhereinWithIt is accordingly the estimation mean and mark for j-th of chromosome dosage in qualified sample sets It is accurate poor, and xijIt is test sample i j-th of chromosome dosage of observation.
Generally, will proportionally increase with fetus fraction (ff) for the chromosome dosage of trisomy.Therefore, for containing The ff for having the chromosome dosage in the sample of trisomy chromosome will proportionally increase relative to fetus fraction:
To proportionally it be reduced with fetus fraction (ff) for the chromosome dosage of monosomy.Therefore, for containing monomer The ff of chromosome dosage in the sample of sex chromosome will proportionally be reduced relative to fetus fraction:
In equation 20 and 21, RjAIt is impacted sample (for example, to be tested Maternal sample) chromosome j chromosome dosage (x is directed in iij);Ff is the expection tire in unaffected (qualified) sample U Youngster's fraction;And RjUIt is the chromosome dosage in unaffected sample.Based on it is assumed hereinafter that including the factor " 2 ":In equation 20 Calculating symbol be " plus sige ", that is, the extra copy that chromosome interested be present;Calculating symbol in equation 21 is " minus sign ", that is, lack a complete copy of chromosome interested.If different hypothesis is made in addition (for example, this is sense The duplication of the part of the chromosome of interest), then the factor " 2 " does not represent practical significance.
Chromosome dosage R in alternative equation 19A
WhereinIt isEquivalently represented, and σjUIt isIt is equivalently represented;Solve ff as follows:
Or
Or
Or
Therefore, can be by the percentage " ff for any chromosome assumed for trisomy chromosome(i)" be defined as:
ff(i)=2*NCVjACVjUEquation 26.
Can be by the percentage " ff for any chromosome assumed for Monosomy(i)" be defined as:
ff(i)=-2*NCVjACVjUEquation 27.
The hypothesis of equation 27 is the complete copy missing of chromosome.NCV corresponding to the chromosomejANecessarily negative. Therefore, although equation 27 contains negative sign, the fetus fraction being calculated is still positive.
Because fetus fraction is unlikely to be the negative, " ff of any chromosome(i)" can be calculated by below equation:
ff(i)=2* | NCVjACVjU| equation 28
Solves no judgement using fetus fraction
Relative sequence contribution based on first genome determines two relative to the contribution of second genome to conclude The ability of the significant difference of the expression of one or more sequences in the presence of the mixture of genome.For example, use parent sample The non-invasive prenatal diagnosis of cfDNA in product is challenging, because only that sub-fraction DNA sample derives from fetus.Pin Pre-natal diagnosis is analyzed, mother body D NA background forms the actual limitation to sensitivity, and therefore, is deposited in maternal sample The fraction of foetal DNA be an important parameter.By counting the sensitive of the fetus aneuploidy carried out detection to DNA molecular Degree depends on foetal DNA fraction and the molecular number counted.
Typically, in the parent test sample analyzed by large-scale parallel sequencing for fetus aneuploidy about 1% is " no to judge " sample, for which, insufficient sequencing information, such as foetal sequence number of tags, hinder assertorically true Determine in maternal sample presence or absence of one or more fetus aneuploidy." no to judge " determines may be due to fetus cfDNA Content is too low for maternally contributing gives the content for being used for providing sequencing information sample so that institute is true in by qualified sample Fixed sequencing information is come caused by distinguishing aneuploid sample.In order to determine " " no judge " sample yes or no aneuploid sample, Empirically determine and/or fetus fraction is for example worth to by NVC, and for determining or negating depositing for chromosomal aneuploidy .As described elsewhere herein, types of the ff available for the aneuploidy in the presence of characterization test sample.For example, it is directed to " no to judge " area is located at the threshold value between 2.5 and 4NCV values, with the NCV close to 4 times of NCV threshold values and shown with relatively low The test sample of (being, for example, less than 3%) fetus fraction is probably impacted sample.Conversely, with close to 2.5NCV threshold values The test sample of NCV and display with higher (being greater than 40%) fetus fraction is probably unaffected sample.Split " no to judge " sample possibly relies on a kind of determination of fetus fraction.Preferably, according to two or more different methods, or By determining fetus point using the NCV determined using identical method from two or more different chromosomes of sample Number, similarly, whether it is probably accordingly false that fetus fraction can be used for evaluation NCV slightly larger than 4 or be slightly less than NCV 2.5 sample Positive or false negative judges.
For determining CNV equipment and system
Analysis to sequencing data and diagnosing typically using the algorithm and journey of different computer execution from it Sequence is carried out.Therefore, some embodiments, which use, is related to the logarithm in one or more computer systems or other processing systems According to the technique for being stored or being shifted by it.Multiple embodiments of the present invention are also on for carrying out these operations Equipment.The equipment can be directed to that required purpose is specifically configured, or its can be computer program by being stored in computer and/or The all-purpose computer (or one group of computer) that data structure is optionally activated or reconfigured by.In some embodiments, one Group processor performed by cooperation mode and/or simultaneously some or all of narrations analysis operation (such as by network or cloud in terms of Calculate).A processor or one group of processor for performing method described herein can belong to different types, including microcontroller Device and microprocessor, such as programmable device (such as CPLD and FPGA) and non-programmable device, as gate array ASIC or general is micro- Processor.
In addition, some embodiments are on tangible and/or non-transitory computer-readable media or computer program Product, these media or product include programmed instruction and/or data (including data structure), these programmed instruction and/or data (including data structure) is different by computer-implemented operation for performing.The example of computer-readable media includes but unlimited In semiconductor storage;Magnetic media, such as disc driver, tape;Optical media, such as CD;Magneto-optical media;And by special Configuration is to store and the hardware unit of execute program instructions, such as read-only memory device (ROM) and random access memory (RAM).Computer-readable media can directly be controlled by end user, or media can be indirectly controlled by end user.By straight Connecing the example of the media of control includes being located at the media at the user's set and/or media do not shared with other mechanisms.By indirect The example of the media of control includes user by external network and/or by providing the service (such as " cloud ") of shared resource and indirect The accessible media in ground.The example of programmed instruction includes machine code (as caused by compiler) and comprising can be by computer The file of the high-level code performed using interpreter.
In different embodiments, the data or information that are used in disclosed method and equipment are in electronic format There is provided.These data or information may include reading from nucleic acid samples and label, compared with the specific region of canonical sequence The counting of these labels of (such as being compared with chromosome or chromosome segment) or density, canonical sequence are (including exclusively or mainly Carry the canonical sequence of polymorphism), chromosome and section dosage, judge (such as aneuploidy judgement), normalized chromosome and area Segment value, pairing chromosomes or section and corresponding normalization chromosome or section, consulting suggestion, diagnosis etc..As used herein, The data or other information provided in electronic format are storable on machine and transmitted between machine.Routinely, in electronics The data of form are provided with digital form, and different data structures, row can be stored in as bit and/or bytewise In table, database.The data can be embodied in a manner of electronics, optics etc..
In one embodiment, the present invention provides a kind of computer program product, and the product is used to produce instruction test Presence or absence of the output of aneuploidy (such as fetus aneuploidy) or cancer in sample.The computer product can contain useful In any one or more of instruction for being used to determine the above method of chromosome abnormality of execution.As described, the computer product It may include non-transitory and/or tangible computer-readable media, have on the computer-readable media and be recorded in thereon The executable or compileable logic of computer (such as instruction) is to start processor to determine chromosome dosage and one Exist in the case of a little or fetus aneuploidy is not present.In an example, the computer product includes computer-readable matchmaker Body, the computer-readable media have be recorded in the executable or compileable logic of computer thereon (such as instruction) so as to Start processor and carry out diagnosing fetal aneuploidy, the computer product includes:One reception program, gives birth to for receiving from parent The sequencing data of at least a portion nucleic acid molecules of thing sample, the wherein sequencing data include the chromosome being computed and/or area Section dosage;Area of computer aided logic, for the data analysis fetus aneuploidy according to the reception;And an output program, The presence of the fetus aneuploidy is indicated for producing, is not present or the output of species.
It is many for any to identify that sequencing information from the sample paid attention to maps to chromosome canonical sequence Sequence label in one or more chromosomes interested each and identify it is many for it is described any one or more The sequence label of the normalization sector sequence of each in chromosome interested.In different embodiments, these references Sequence is stored in database, such as relation curve or target database.
It should be understood that a people without using aid is allowed to perform the calculating of methods disclosed herein operation most It is unpractical or even impossible in the case of number.For example, in the case where being aided in without computing device by from sample Single 30bp readings, which are mapped to any one human chromosomal, may need the effort of several years.Certainly, the problem is due to reliable non- Multiple sex determination generally needs to map the thousands of (for example, at least about 10,000) or even millions of of one or more chromosomes Individual reading and complicate.
Computer-readable media can be used to perform in methods disclosed herein, and the computer-readable media, which has, to be stored in Computer-readable instruction thereon, it is used to identify any CNV, such as the side of chromosome or partial aneuploidy for performing Method.Therefore, in one embodiment, the present invention provides a kind of computer-readable media, and the computer-readable media, which has, to be deposited Store up in computer-readable instruction thereon, be used to differentiate complete and partial chromosomal aneuploidy, such as tire for performing The method of youngster's aneuploidy.These instructions can include the instruction for example for being operated below:(a) obtain and be directed to a sample The sequence information of fetus and maternal nucleic acids in product and/or these information are temporarily at least stored in computer-readable media In;(b) stored sequence information is used to be directed to any one from the identification of the mixture Computer of fetus and maternal nucleic acids is many The sequence label of each in individual or multiple chromosomes interested selected from chromosome 1-22, X and Y, and identify many For the sequence label of at least one normalization chromosome sequence of each in one or more chromosome interested;With And (c) is used for the sequence label number of each identification in one or more chromosome interested and for each normalization The sequence label number of chromosome sequence identification, the single chromosome dosage of each chromosome interested is calculated by computer.These Instruction can be performed using one or more by the processor for being suitably designed or configuring.These instructions can be wrapped additionally Include by each chromosome dosage compared with dependent thresholds, and thereby determine that in the sample presence or absence of any four kinds or More kinds of parts or complete different fetal chromosomal aneuploidies.As described above, many be present on the technique Change programme.All these change programmes can be implemented when as described here using processing and storage feature.
In some embodiments, these instruction may further include for provide parent test sample the mankind by The information on this method is automatically recorded in the patient medical records of examination person, such as chromosome dosage and presence or absence of fetus Chromosomal aneuploidy.The patient medical records can be by such as laboratory, doctor's office, hospital, HMO, guarantor Dangerous company or personal medical records website preserve.In addition, the result based on the analysis implemented by processor, this method can be further Be related to prescribe, originate and/or change obtain parent test sample human experimenter treatment.This may relate to being derived from this The additional samples of subject carry out one or more additional testings or analysis.
Disclosed method can also be performed using computer processing system, the computer processing system by adjustment or Configuration is used to identify any CNV, such as the method for chromosome or partial aneuploidy to perform.Therefore, in an embodiment party In case, the present invention provides a kind of computer processing system, and it is by adjustment or configures to perform method as described herein.One In individual embodiment, the equipment includes a sequencing device, the sequencing device by adjustment or be configured to in sample extremely Few a part of nucleic acid molecules are sequenced to obtain sequence information type described elsewhere herein.The equipment can also include For handling the device of sample.These devices are described in this paper other parts.
Sequence or other data can directly or indirectly be input in computer or be stored on computer-readable media. In one embodiment, computer system is directly connected to the sequencing that can be read and/or analyze the nucleotide sequence from sample On device.Sequence or other information derived from these instruments are provided in computer systems by interface.As an alternative, by Sequence storage source, as database or other thesaurus provide the sequence handled by system.After with the processing unit, storage dress Put or mass storage device at least temporarily buffers or the sequence of storage nucleic acid.It is directed in addition, storage device can store Different chromosome or the label counting of genome etc..The memory can also be stored for analyzing existing sequence or mapping number According to different subprograms and/or program.These program/subprograms may include program for performing statistical analysis etc..
In an example, user provides a sample into sequencing device.By the sequencing device for being connected to computer To collect and/or analyze data.Software on the computer allows Data Collection and/or analysis.Data can store, show and (lead to Cross monitor or other similar devices) and/or it is sent to another location.The computer can be connected to internet, for by number According to being transferred in handheld type devices used in long-distance user (such as doctor, scientist or analyst).It should be understood that it can pass Storage and/or analyze data before defeated.In some embodiments, collecting initial data and being sent to will be carried out to the data The long-distance user or device of analysis and/or storage.It can be transmitted by internet, but satellite or other connections can also be passed through Carry out.As an alternative, it can store data on computer-readable media, and the media can be sent to end user Locate (such as passing through mail).The long-distance user can be at identical or different geographical position, including but not limited to building, city City, state, country or continent.
In some embodiments, these methods also include collecting and (such as read on the data of multiple polynucleotide sequences Number, label and/or with reference to chromosome sequence) and transmit this data to computer or other computing systems.For example, it can incite somebody to action The computer is connected to laboratory equipment, such as sample collection device, amplification oligonucleotide device, nucleotide sequencing device or hybridization Device.Then, the computer collects the proper data gathered by laboratory installation.Can be in any step, such as collecting When in real time, before transmitting, during transmission or simultaneously or after sending by the data storage on computers.It can incite somebody to action The data storage is on the computer-readable media that can be extracted from the computer.Collected or storage data can be from the meter Calculation machine is transferred to remote location, such as passes through LAN or wide area network, such as internet., can following article institute at the remote location State and different operations is carried out to the data transmitted.
The electronic format that can be stored in system described herein, device and method, transmit, analyze and/or operate The type for changing data is as follows:
By the reading for the nucleic acid in test sample be sequenced acquisition
By the label that reading is compared to acquisition with reference gene group or other canonical sequences
The reference gene group or sequence
Sequence label density-for reference gene group or other canonical sequences two or more regions (typically Chromosome or chromosome segment) each of counting or number of tags
For the normalization chromosome of specific chromosome or chromosome segment interested or the uniformity of chromosome segment
For chromosome or the dyeing obtained from chromosome or section interested and corresponding normalization chromosome or section The dosage of body section (or other regions)
For judging that chromosome dosage is impacted, uninfluenced or without judgement threshold value;
The actual judgement of chromosome dosage
Diagnose (clinical condition related to these judgements)
The suggestion for other tests from these judgements and/or diagnosis
Treatment and/or monitoring plan from these judgements and/or diagnosis
These different data types can be obtained using different devices in one or more positions, store, transmit, analysis And/or operation.Relative broad range is crossed in processing selection.In one end of the scope, the position of the test sample, such as doctor are being handled Office or other clinical settings are stored and used to all or most information.In another kind is extreme, a position Acquisition sample is put, it is handled in different positions and is optionally sequenced, in one or more different positions Put and compare reading and judged, and make diagnosis yet another position (it can be the position of acquisition sample), build View and/or plan.
In different embodiments, these readings are produced using the sequencing device, remote site are then communicated to, at this It is handled at remote spots to produce aneuploidy judgement.In the remote location, for example, by these readings and reference Sequence is compared to produce label, and it is counted and distributes to chromosome or section interested.It is equally remote at this Journey position, these countings are changed into dosage using the normalization chromosome or section of correlation.Further, in the long-range position Put, these dosage are used for produce aneuploidy judgement.
Can be as follows in the processing operation that diverse location uses:
Sample collection
Sample treatment before sequencing
Sequencing
Analytical sequence data and derive aneuploidy judgement
Diagnosis
To patient or nursing supplier's report diagnostic and/or judgement
Formulate the plan for further treatment, test and/or monitoring
Perform the plan
Consulting
Any one or more in these operations can automate as described elsewhere herein.Typically, it is sequenced and right Sequence data is analyzed and derived aneuploidy and judges to perform on computers.Other operations can be manually or automatically Perform.
The example that the position of sample collection can be carried out (wherein provides sample including health worker office, clinic, patient family Product collection kit or kit) and Mobile nursing vehicle.The example of the position of sample treatment includes protecting before being sequenced Strong personnel office, clinic, patient family's (sample processing device or kit are wherein provided), Mobile nursing vehicle and non-multiple Property analysis supplier facility.The example for the position that can be sequenced is done including health worker office, clinic, health worker Public room, clinic, patient family's (sample sequencing device and/or kit are wherein provided), Mobile nursing vehicle and aneuploidy point Analyse the facility of supplier.The position being sequenced can be provided with dedicated Internet access for transmitting the sequencing number in electronic format According to (typically reading).The connection can be wired or wireless, and and may pass through configuration to pass It is defeated to handle and/or the website of combined data to transmitting data to before process points.Data summarization device can be by health care group Maintenance is knitted, such as HMO (HMO).
Analysis and/or derivation operation can be in any of above positions, or as an alternative, are being directed to calculating and/or core Another remote site of acid sequence data analysis service is carried out.These positions include such as cluster, such as generic server area, non-whole Ploidy analysis service trade facility etc..In some embodiments, lease or rent for performing the computing device of analysis.Meter It can be a part of the processor in the accessible set in internet to calculate resource, be such as commonly called as the process resource for cloud.In some feelings Under condition, calculate by parallel or Massively Parallel Processor group associated with each other or not associated to perform.Processing can use Distributed treatment realizes, such as PC cluster, grid computing.In these embodiments, the cluster or grid of computing resource It is concentrically formed one that the multiple processors or computer to be worked by one to perform analysis as described herein and/or derive are formed Super virtual computer.These technologies and more conventional supercomputers can be used for handling sequence data as described herein. Respectively depend on the parallel computing form of processor computer.In the case of grid computing, these processors are (often complete Whole computer) connected by network (private, public or internet) by conventional network protocol (such as Ethernet).Phase Instead, supercomputer has many individual processors connected by local high-speed computer bus.
In certain embodiments, with analysis operation identical opening position produce diagnosis (such as fetus with Tang Shi it is comprehensive Simulator sickness or patient suffer from certain types of cancer).In other embodiments, it is in different position execution.In some realities In example, report diagnostic is performed in the opening position for obtaining sample, but situation is also not necessarily such.Can produce or report diagnostic and/ Or the example for the position made a plan includes health worker office, clinic, the accessible internet site of computer and tool There are the handheld type devices for the wired or wireless connection for being connected to network, such as mobile phone, flat board, smart phone.The position seeked advice from The example put includes health worker office, clinic, the accessible internet site of computer, handheld type devices etc..
In some embodiments, in first position progress sample collection, sample treatment and sequencing procedures, and Second position carries out derivation operation.However, in some cases, sample collection is that (such as health worker does a position Public room or clinic) collect, and sample treatment and sequencing are carried out a different position, the position is optionally to be divided Analysis and the same position derived.
In different embodiments, the order of operation listed above can by sample collection, sample treatment and/or The user of sequencing or mechanism trigger.After having started to perform one or more of these operations, other operations can be naturally Then.For example, sequencing procedures can be automatically collected reading and be sent to processing unit, then the processing unit is usual certainly Sequence analysis may be carried out dynamicly and in the case where intervening without other users and derive aneuploidy operation.In some realizations In mode, the result that then operates the processing automatically delivers (may with reformat as diagnose) and arrives system component Or mechanism, the system component or mechanism processing information and report to fitness guru and/or patient.As described, the information, can It can may also pass through and automatically process to produce treatment, test and/or monitoring plan together with consultation information.Therefore, early stage is started Operation can trigger end-to-end order, provide diagnosis, plan, consulting to fitness guru, patient or other associated groups wherein And/or available for the other information for acting on physical condition.Even if each several part of whole system be physically isolated and Possibly remote from the position of such as sample and sequence device, this measure can also be realized.
Figure 19 shows an implementation for producing the decentralized system for judging or diagnosing from test sample.Sample Collect position 01 to be used for from patient, as obtained test sample at pregnant female or the cancer patient of hypothesis.Then sample is provided To handling and position 03 being sequenced, wherein test sample can be handled and is sequenced as described above.Position 03 includes being used to locate Manage the device of sample and the device for treated sample to be sequenced.Sequencing knot as described elsewhere herein Fruit is the set of reading, and these readings are typically provided and provide network in electronic format, and such as internet, the network is being schemed Indicated in 19 with reference number 05.
The sequence data is provided at remote location 07, is analyzed wherein and judges to produce.The position can wrap One or more efficient computing devices are included, such as computer or processor.The computing resource in place set to 0 at 7 has completed their point Analyse and from the sequence information that is received produce one judge after, arrive network 05 by the judgement is relayed.In some embodiment party In formula, not only set to 0 in place and judgement is produced at 7, but also produce dependent diagnostic.Then as it is illustrated in fig. 19 by the judgement and/or Diagnosis is by network transmission and passes sample collection position 01 back.As described, how this is only in different positions Distribution is from producing one of many change programmes of different operations for judging or diagnosing correlation between putting.One common change programme It is related to and sample collection and processing and sequencing is provided in single position.Another change programme is related to is producing phase with analyzing and judging Same position provides processing and sequencing.
Figure 20 for the selection that different operations is performed in different positions to being described in detail.It is described most complete in fig. 20 In the sense that face, each following operation is in separated opening position progress:Sample collection, sample treatment, sequencing, read-around ratio to, sentence Fixed, diagnosis and report and/or plan.
In some embodiment in collecting these operations, sample treatment and sequencing are carried out a position, And carried out a separated position read-around ratio to, judge and diagnosis.Referring to Figure 20 by the portion that identifies with reference to alphabetical A Point.In another implementation of the letter b mark in by Figure 20, sample collection, sample treatment and sequencing are all same Individual position is carried out.In the implementation, read-around ratio pair and judgement are carried out second position.Finally, diagnose and report and/ Or program launched is carried out the 3rd position.In the implementation described in the letter C in as Figure 20, sample collection is at first Opening position is carried out, sample treatment, sequencing, read-around ratio to, judge and diagnosis is all carried out in second opening position together, and report And/or plan is carried out in the 3rd opening position.Finally, in the implementation that the alphabetical D in by Figure 20 is marked, sample To collect and carried out in first opening position, sample treatment, sequencing, read-around ratio pair and judgement are all carried out in second opening position, and Diagnosis and report and/or plan processing are carried out in the 3rd opening position.
In one embodiment, the present invention provides a kind of system, for the parent for determining to include fetus and maternal nucleic acids Presence or absence of any one or more of different complete fetal chromosomal aneuploidy, the system bag in test sample Include:One sequencer, for receiving nucleic acid samples and providing the fetus derived from the sample and maternal nucleic acids sequence information;One Processor;And a machine-readable storage media, including the instruction for performing on the processor, these instruction bags Include:
(a) it is used for the code for obtaining the sequence information of these fetuses and maternal nucleic acids in the sample;
(b) it is used to identify from these fetuses and maternal nucleic acids by computer using the sequence information and is contaminated for being selected from The many sequence labels of each in colour solid 1-22, X and Y any one or more chromosomes interested, and know Safety pin at least one normalization chromosome sequence of each in any one or more described chromosomes interested or Normalize the code of many sequence labels of chromosome segment sequence;
(c) it is used for described using being identified for each in any one or more described chromosomes interested Sequence label number and for it is each normalization chromosome sequence or normalize chromosome segment recognition sequence the sequence label Count to calculate the code of the single chromosome dosage of each in the chromosome that any one or more are interested for this;With And
(d) it is used to compare each single chromosome of each in the chromosome that any one or more are interested for this Dosage and the corresponding threshold value of each being directed in any one or more chromosomes interested, and thereby determine that this Presence or absence of the code of any one or more of complete different fetal chromosomal aneuploidies in sample.
In some embodiments, for calculating for each in any one or more chromosomes interested It is pin that the code of single chromosome dosage, which includes being used for the chromosome Rapid Dose Calculation of a selected chromosome interested, To the sequence label number of selected chromosome interested with for selected chromosome interested it is corresponding at least The code of the ratio for the sequence label number that one normalization chromosome sequence or normalization chromosome segment sequence are identified.
In some embodiments, the system further comprises interested for any one or more for computing repeatedly Chromosome any one or more sections remaining any chromosome segment in the chromosome dosage of each code.
In some embodiments, the chromosome bag interested of the one or more selected from chromosome 1-22, X and Y Include at least 20 be selected from chromosome 1-22, X and Y chromosomes, and wherein these instruction include be used for determine exist or In the absence of the instruction of at least 20 kinds different complete fetal chromosomal aneuploidies.
In some embodiments, at least one normalization chromosome sequence is selected from chromosome 1-22, X and Y One group chromosome.In other embodiments, at least one normalization chromosome sequence is to be selected from chromosome 1-22, X and Y A monosome.
In another embodiment, the present invention provides a kind of system, for the parent for determining to include fetus and maternal nucleic acids Presence or absence of the fetal chromosomal aneuploidy of any one or more of different part, the system bag in test sample Include:One sequencer, for receiving nucleic acid samples and providing the fetus derived from the sample and maternal nucleic acids sequence information;One Processor;And a machine-readable storage media, including the instruction for performing on the processor, these instruction bags Include:
(a) it is used for the code for obtaining the sequence information of the fetus and maternal nucleic acids in the sample;
(b) it is used to identify from these fetuses and maternal nucleic acids by computer using the sequence information and is contaminated for being selected from In any one or more sections of colour solid 1-22, X and Y any one or more chromosomes interested each Many sequence labels, and identify any one or more described sections for any one or more chromosomes interested In at least one normalization sector sequence of each many sequence labels code;
(c) it is used for using in any one or more described sections of any one or more chromosomes interested Each described sequence label number identified and for it is described normalization sector sequence identification the sequence label number To calculate for the list of each in any one or more sections described in any one or more chromosomes interested The code of one chromosome segment dosage;And
(d) it is used to compare in any one or more sections described in any one or more chromosomes interested The single chromosome segment dosage of each in each with for any one or more chromosomes interested Any one or more described chromosome segments in the corresponding threshold value of each, and thereby determine that and deposited in the sample Or in the absence of one or more different parts fetal chromosomal aneuploidy code.
In some embodiments, include being used for selected one for calculating the code of single chromosome segment dosage The chromosome segment Rapid Dose Calculation of individual chromosome segment for the sequence label number that is identified for selected chromosome segment with The code of the ratio of the sequence label number identified for the corresponding normalization sector sequence of selected chromosome segment.
In some embodiments, the system further comprises interested for any one or more for computing repeatedly Chromosome any one or more sections remaining any chromosome segment in each chromosome segment dosage Code.
In some embodiments, the system further comprises that (i) is used to be directed to the survey from different female subjects Test agent repeats (a)-(d) code, and (ii) is used to determine in each in the sample presence or absence of any The code of the fetal chromosomal aneuploidy of one or more different parts.
In other embodiments of any system provided in this article, the code further comprises being used for according in (d) Determine that automatically record exists or do not deposited in the patient medical records of the human experimenter for providing parent test sample In the code of fetal chromosomal aneuploidy, wherein using the computing device record.
In some embodiments of any system provided in this article, sequencer passes through configuration to perform sequencing of future generation (NGS).In some embodiments, sequencer is held by configuration to be sequenced using synthetic method, using reversible dye-terminators Row large-scale parallel sequencing.In other embodiments, sequencer is sequenced by configuration with performing connection method.In other implementations again In scheme, sequencer passes through configuration to perform single-molecule sequencing.
Equipment for determining fetus fraction
A kind of equipment for being used to carry out sample medical analysis can be used to provide one or two relevant genome to core The information for the fraction that acid blend is contributed, to carry out point to the sequence label from sequencing sample (such as maternal sample) Analysis.For example, there is provided plurality of devices is analyzed the sequence label obtained from sequencing maternal sample to determine in parent Fetal nucleic acid fraction in the mixture of fetus present in sample and maternal nucleic acids.The medical supply provided includes a series of Device, these devices are used to carry out such as the application described method for determining fetus fraction in place of other the step of.
Figure 65 shows an a kind of embodiment of medical analysis equipment, the medical analysis equipment be used for comprising fetus and Fetus fraction is determined in the parent test sample of the mixture of maternal nucleic acids.The equipment includes:
One device (a), for receiving the fetus and the multiple sequences of maternal nucleic acids in the parent test sample Row reading;
One device (b), for the multiple sequence reads to be compared with one or more chromosome reference sequences, And thus provide multiple sequence labels corresponding with these sequence reads;
One device (c), for identifying from one or more chromosomes interested or chromosome segment interested Those sequence labels a number, these chromosomes or chromosome segment be selected from chromosome 1-22, X and Y and its section, and And for for each in one or more of chromosomes interested or chromosome segment interested, identification to come from One number of at least one normalization chromosome sequence or those sequence labels for normalizing chromosome segment sequence, to determine One chromosome dosage or chromosome segment dosage, wherein, the chromosome interested or chromosome segment interested tool There is copy number variation;And
One device (d), for the dosage using the chromosome interested or the chromosome segment interested Dosage determine the fetus fraction.
Preferably, the signal output part of the device (a) is connected with the dress (b), and the signal output part of the device (b) is with being somebody's turn to do Device (c) is connected, and the signal output part of the device (c) is connected with the device (d).
In certain embodiments, the copy number variation be by by one or more of chromosomes interested or The chromosome dosage of each chromosome or chromosome segment in chromosome segment interested is one with being directed to Or each chromosome or a phase of chromosome segment in multiple chromosomes interested or chromosome segment interested Threshold value is answered to be compared to what is determined.
The copy number variation that fetus can carry includes complete chromosome duplication, complete chromosome missing, part duplication, portion Divide multiplication, partial insertion and excalation.
In certain embodiments, the chromosome or section Rapid Dose Calculation determined by device (c) is for described selected The number for the sequence label that fixed chromosome interested or section are identified and the chromosome interested for selecting or Corresponding at least one normalization chromosome sequence of section normalizes the sequence label that chromosome segment sequence is identified The ratio of number.In certain embodiments, the chromosome dosage or section Rapid Dose Calculation determined by device (c) is institute State the sequence label density ratio of selected chromosome interested or section and each selected chromosome interested or At least one corresponding normalization chromosome sequence of section or the ratio for the sequence label density ratio for normalizing chromosome segment sequence Rate.
In certain embodiments, the equipment further comprises device (e), and the device (e) is used to calculate a normalizing Change chromosome value (NCV) or a normalization section value (NSV), the chromosome dosage is combined with one wherein calculating the NCV The average value of corresponding chromosome dosage in lattice sample is associated, as:
WhereinAnd σiUIt is accordingly the estimation average value for i-th of the chromosome dosage in this group of qualified samples And standard deviation, and RiAIt is to be directed to the chromosome dosage that i-th of chromosome calculates in test sample, wherein described i-th Chromosome is the chromosome interested;Wherein calculating the NSV makes the chromosome segment dosage and in one group of qualified samples The average value of corresponding chromosome segment dosage be associated, as:
WhereinAnd σiUIt is accordingly that estimation for i-th of the chromosome segment dosage in this group of qualified samples is put down Average and standard deviation, and RiAIt is to be directed to the chromosome segment dosage that i-th of chromosome segment calculates in test sample, its Described in i-th of chromosome segment be the chromosome segment interested.Preferably, the signal output part and device of device (c) Part (e) connects.
In certain embodiments, the device (d) of the equipment determines fetus fraction then according to following formula:
Ff=2 × | NCViACViU|
Wherein ff is fetus fractional value, NCViABe in an impacted sample (for example, maternal sample to be tested) Normalized chromosome value on i-th of chromosome, and CViUIt is the chromosome interested determined in these qualified samples Dosage the coefficient of variation;Or fetus fraction is determined according to following formula:
Ff=2 × | NSViACViU|
Wherein ff is fetus fractional value, NSViABe in an impacted sample (for example, maternal sample to be tested) Normalized chromosomal region segment value on i-th of chromosome segment, and CViUIt is i-th determined in these qualified samples The coefficient of variation of the dosage of chromosome, wherein i-th of the chromosome is the chromosome interested.Preferably, device (e) Signal output part be connected with device (d).
In certain embodiments, chromosome interested is the X chromosome of autosome or male fetus, interested Chromosome segment be selected from the X chromosome of autosome or male fetus.
In certain embodiments, at least one normalization chromosome sequence or normalization chromosome segment sequence are pins The chromosome or section selected to a kind of associated chromosome interested or section, this is to enter in the following manner Capable, i.e.,:(i) multiple qualified samples of the identification for the chromosome interested or section;(ii) potentially returned using multiple One change chromosome sequence normalizes chromosome segment sequence to be directed to the chromosome or chromosome segment repetition meter that this is selected Calculate chromosome dosage or chromosome segment dosage;And (iii) individually or one kind combine in the normalization chromosome sequence Row or normalization chromosome segment sequence selected, so as in the chromosome dosage or chromosome segment dosage calculated to Go out the variability of minimum or the resolvability of maximum.In certain embodiments, it is that chromosome 1 arrives to normalize chromosome sequence 22nd, a monosome any one or more in X and Y;Alternately, it is in chromosome 1 to 22, X and Y to normalize sequence One group chromosome of any chromosome.In certain embodiments, normalization sector sequence is appointed in chromosome 1 to 22, X and Y One single section of one or more of anticipating;Alternately, it is any one in chromosome 1 to 22, X and Y to normalize sector sequence Or one group of multiple section.
In certain embodiments, for determining that the equipment of fetus fraction further comprises a device, the device is used for By the fetus fraction determined using chromosome dosage or chromosome segment dosage with using the tire from parent test sample Shown in youngster and maternal nucleic acids one of the unbalanced chromosome for being present in the non-chromosome interested of allele or The fetus fraction that the information of multiple polymorphisms determines is compared.
In certain embodiments, the equipment further comprises a sequencing device (10), the sequencing device (10) by with It is set to for being sequenced to the fetus in a parent test sample and maternal nucleic acids and obtaining sequence reads.Preferably, The signal output part of sequencing device (10) is connected with device (a).
In certain embodiments, sequencing device (10) is configurable for carrying out synthetic method sequencing.Synthetic method sequencing can To be carried out using reversible dye-terminators.In other embodiments, sequencing device (10) is configurable for being attached method Sequencing.In other other embodiments, sequencing device (10) is configurable for carrying out single-molecule sequencing.
In certain embodiments, sequencing device (10) is located in the place separated with device (a)-(d), and dress is sequenced The signal output part and device (a) for putting (10) pass through network connection.
In certain embodiments, including as mentioned the equipment of sequencing device further comprises device (11), the dress (11) are put to be used to obtain parent test sample from a pregnant mothers.For obtaining the device (11) and device of parent test sample (a)-(d) and (10) can be located in separated place.In addition to including device (a)-(d) and (10), the equipment can be with Further comprise device (12), the device (12) is used to extract Cell-free DNA from the parent test sample.In some embodiments In, the device (12) for extracting Cell-free DNA is located in same place with sequencing device (10), and for obtaining parent The device (11) of test sample is in a remote site.
In certain embodiments, the equipment of the determination fetus fraction also includes a storage device, at least temporary transient The sequence reads that ground storage device (a) receives.Preferably, the signal output part of device (a) is connected with storage device, storage device Signal output part be connected with device (b).
For determining the extra equipment of fetus fraction-classify to copy number variation
A kind of extra medical analysis equipment is additionally provided, for (such as acellular comprising fetus and maternal nucleic acids DNA the copy number variation in the Fetal genome in a maternal sample) is classified.The extra equipment includes being used for really Determine the device of fetus fraction and the device for comparing the fetus fractional value determined by different methods.The extra equipment makes The copy number variation in Fetal genome is classified with two fetus calculated fractions.It can be used to divide by the equipment The maternal sample of analysis can be selected from blood, blood plasma, serum or urine samples.In certain embodiments, maternal sample is blood plasma sample Product.Figure 66 shows an embodiment of such medical analysis equipment.
In one embodiment, there is provided a kind of doctor for being classified to the copy number variation in Fetal genome Credit desorption device, the equipment include:
Device (1), for receiving the sequence reads of fetus and maternal nucleic acids in a test sample;
Device (2), for the sequence reads to be compared with one or more chromosome reference sequences, and thus The multiple sequence labels corresponding with these sequence reads are provided;
Device (3), the number of these sequence labels from one or more chromosomes interested is identified, and really First chromosome interested in the fixed fetus is with a kind of copy number variation;
Device (4), for calculating a first fetus fractional value by a kind of first method, the first method without using The information of these labels from first chromosome interested;
Device (5), for calculating a second fetus fractional value by a kind of second method, the second method uses next From the information of these labels of the first chromosome;And
Device (6), for compared with the second fetus fractional value and the first fetus fractional value to be used into the ratio Relatively the copy number variation of the first chromosome is classified.
Preferably, the signal output part of device (1) is connected with device (2), signal output part and the device (3) of device (2) Connect, the signal output part of device (2) and (3) is connected with device (4), signal output part and the device (5) of device (2) and (3) Connection, and device (4) and the signal output part of (5) are connected with device (6).First chromosome interested can be selected from dye Any one in colour solid 1 to 2, X and Y.
In certain embodiments, the extra equipment also includes a storage device, for temporarily, at least storing dress Put the sequence reads of (1) receiving.Preferably, the signal output part of device (1) is connected with storage device, and the signal of storage device is defeated Go out end to be connected with device (2).
In certain embodiments, for calculate the first fetus fraction first method device (4) including the use of from Show the information of the unbalanced one or more polymorphisms of allele in the fetus and maternal nucleic acids of the parent test sample To calculate a component of the first fetus fractional value, the polymorphism is present in the dyeing of non-first chromosome interested Body;Include with the device (5) of the second method for calculating the second fetus fractional value:
(a) component (5-1), for calculating from first chromosome interested and at least one normalization chromosome sequence The number of the sequence label of row is to determine chromosome dosage;With
(b) component (5-2), for using the second method from the chromosome Rapid Dose Calculation fetus fractional value.Some In embodiment, the signal output part of device (2) and (3) is connected with component (5-1), and the signal output part of component (5-1) Component (5-2) is connected to, and the signal output part of component (5-2) is connected with device (6).
In certain embodiments, the information that the device (4) of first method uses is included by predetermined polymorphic Sequence be sequenced the sequence label of acquisition, and each of the polymorphic sequence includes one or more of polymorphic sites.The The information that the device (4) of one method uses may not be by sequence measurement obtain, for example, by qPCR, digital pcr, What the non-sequence measurement such as mass spectroscopy or capillary gel electrophoresis obtained.
In certain embodiments, for first method device (4) including the use of come from without copy number variation Chromosome or the tag computation of chromosome segment the first fetus fractional value component.For example, when this is first interested Chromosome when being chromosome 21, can be by using coming from fetus fraction and basis determined by the sequence label of chromosome 21 Come from fetus fraction determined by the sequence label of the chromosome x in male fetus to be compared.It is known not with aneuploidy State occurs, or is determined by any method described here it is not that aneuploid (such as passes through meter in the test sample Its NCV or NSV are calculated to determine) any chromosome or chromosome segment may be used to determine fetus fraction by device (4).
In certain embodiments, the device (5) of the second method for calculating the fetus fractional value further comprises For calculating the component (5-3) of a normalization chromosome value (NCV), wherein making for the component (5-3) for calculating the NCV The average value of chromosome dosage chromosome dosage corresponding with one group of qualified samples is associated, as:
WhereinAnd σiUIt is accordingly the estimation average value for i-th of the chromosome dosage in this group of qualified samples And standard deviation, and RiAIt is to be directed to the chromosome dosage that i-th of chromosome calculates in test sample, wherein described i-th Chromosome is the chromosome interested.
Preferably, the signal output part of component (5-1) is connected with component (5-3), and the signal output part of component (5-3) It is connected with component (5-2).
In certain embodiments, for the component by second method from the chromosome Rapid Dose Calculation fetus fractional value (5-2) uses the normalization chromosome value.Component (5- for the device (5) of the second method that calculates the fetus fractional value 2) the fetus fraction is assessed according to following formula:
Ff=2 × | NCViACViU|
Wherein ff is the second fetus fractional value, NCViAIt is in an impacted sample (for example, maternal sample to be tested) In normalized chromosome value on i-th of chromosome, and CViUIt is i-th of the dyeing determined in the qualified samples The coefficient of variation of the dosage of body, wherein i-th of the chromosome is the chromosome interested.
In certain embodiments, calculating the device (4) of the first method of the first fetus fraction includes:(a) component (4-1), for calculating chromosome from non-first chromosome interested and at least one normalization chromosome sequence Sequence label number is to determine the chromosome dosage of the chromosome of non-first chromosome interested;And (b) group Part (4-2), for by the first method from the chromosome Rapid Dose Calculation the first fetus fractional value;With the second fetus of calculating The device (5) of the second method of fraction includes:(a) component (5-1), for calculating from first chromosome interested Sequence label number with least one normalization chromosome sequence is to determine a chromosome dosage;And (b) component (5-2), for by the second method from the chromosome Rapid Dose Calculation the second fetus fractional value.
Preferably, the device (4) of first method further comprises a component (4-3), and the device (5) of second method enters one Step includes a component (5-3), and component (4-3) and component (5-3) calculate normalized chromosome value (NCV), component (4- respectively 3) and component (5-3) is corresponding in one group of qualified samples by the chromosome dosage of component (4-1) and component (5-1) determination respectively The average value of chromosome dosage is associated, as:
WhereinAnd σiUIt is the estimation average value and mark for the dosage of i-th of chromosome in this group of qualified samples respectively It is accurate poor, and RiAIt is the dosage of i-th of chromosome in the test sample calculated,
Wherein, for the device (4) of the first method, i-th of the chromosome is non-first dye interested The chromosome of colour solid;For the device (5) of the second method, i-th of the chromosome is the described first chromosome interested.
Preferably, the signal output part of component (4-1) is connected with component (4-3), and the signal output of component (4-3) End be connected with component (4-2), wherein component (4-2) by using accordingly normalized chromosome value the first method from Corresponding chromosome Rapid Dose Calculation the first fetus fractional value;The signal output part of component (5-1) is connected with component (5-3), and The signal output part of component (5-3) is connected with component (5-2), and wherein component (5-2) is by using corresponding normalized dyeing The second method of body value is from corresponding the second fetus of chromosome Rapid Dose Calculation fractional value.
In certain embodiments, the group of the component (4-2) of the device (4) of first method and the device (5) of second method Part (5-2) passes through following formula evaluation:
Ff=2 × | NCViACViU|
Wherein ff is fetus fractional value, NCViABe in an impacted sample (for example, maternal sample to be tested) Normalized chromosome value on i-th of chromosome, and CViUIt is the change of the dosage of i-th of chromosome in the qualified samples Different coefficient;
Wherein, for the device (4) for the first method, i-th of the chromosome is that non-first sense is emerging The chromosome of interesting chromosome;For the device (5) for the second method, i-th of the chromosome is described first interested Chromosome.Preferably, when the fetus is male, the chromosome of non-first chromosome interested is X dyeing Body.
In certain embodiments, the device (6) of the first fetus fractional value and the second fetus fractional value Determine two fetus fractional values whether approximately equal.In certain embodiments, device (6) further comprises in described two tires A kind of ploidy for determining to imply in the second method during youngster's fractional value approximately equal assumes real component.The second party It can be that the described first chromosome interested has a kind of non-multiple of complete chromosome that the ploidy implied in method, which is assumed, Property, for example, the complete chromosome aneuploidy of the described first chromosome interested is a kind of monosomy or a kind of trisomy.
In certain embodiments, the extra equipment further comprises analyzing the described first chromosome interested One device (7) of label information, to determine whether that the chromosomes interested of (i) first carry a kind of part aneuploidy, or That (ii) fetus is a chimera, wherein analyze the device (7) of the label information of first chromosome interested by with The device (6) being set in comparison the first fetus fractional value and the second fetus fractional value indicates that the two fetus fractional values are not near Performed during patibhaga-nimitta etc..Preferably, the signal output part of device (2), (3) and (6) is connected with device (7).
In certain embodiments, in described extra equipment, the device (4) of first method is including the use of from showing The information of the unbalanced one or more polymorphisms of allele in the fetus and maternal nucleic acids of the parent test sample is counted A component of the first fetus fractional value is calculated, the polymorphism is present in the chromosome of non-first chromosome interested; The device (5) of second method including the use of the allele in the fetus and maternal nucleic acids for showing the parent test sample not The information of one or more polymorphisms of balance calculates a component of the second fetus fractional value, and the polymorphism is present in Described first chromosome interested.The information that the device (4) of first method uses can include by predetermined more State sequence be sequenced the sequence label of acquisition, and each of the polymorphic sequence includes one or more of polymorphic sites. The information that the device (4) of first method uses may not be what is obtained by sequence measurement, for example, passing through qPCR, numeral What the non-sequence measurement such as PCR, mass spectroscopy or capillary gel electrophoresis obtained.
In certain embodiments, the device (6) for comparing includes:When the second fetus fractional value and the first fetus The component that the described first chromosome interested is diploid is determined when the ratio of fractional value is approximately 1;When described second The ratio of fetus fractional value and the first fetus fractional value determines that the described first chromosome interested is triploid when being approximately 1.5 A component;With described is determined when the ratio of the second fetus fractional value and the first fetus fractional value is approximately 0.5 One chromosome interested is a haploid component.
It is furthermore preferred that further comprise analyzing described first for the extra equipment for classifying to copy number variation One device (7 ') of the label information of chromosome interested, to determine whether that the chromosomes interested of (i) first carry one Kind part aneuploidy, or (ii) fetus is a chimera, wherein analyzing the label of first chromosome interested The device (7 ') of information is configured as in the device (6) of comparison the first fetus fractional value and the second fetus fractional value instruction the The ratio of two fetus fractional values and the first fetus fractional value performs when not being and being approximately 1,1.5 or 0.5.Preferably, device (2), (3) it is connected with the signal output part of (6) with device (7 ').
In certain embodiments, analysis for first chromosome interested label information device (7) or (7 ') include:(a) component (7-1), for the sequence vanning of first chromosome interested to be entered into some; (b) component (7-2), for determine in the part any one whether include than one or more other parts significantly more More or significantly less nucleic acid;And (c) component (7-3), for compared with one or more other parts if institute State determines first chromosome interested with a kind of when partly any one contains significantly more or significantly less nucleic acid Part aneuploidy or compared with one or more other parts if the part all not comprising significantly it is more or aobvious It is a chimera that the fetus is determined when writing less nucleic acid.Preferably, the signal output part and group of device (2), (3) and (6) Part (7-1) is connected, and the signal output part of component (7-1) is connected to component (7-2), and the signal of component (7-2) Output end is connected to component (7-3).In certain embodiments, component (7-3) is further determined that comprising than one or more A part for first chromosome interested of the significantly more or significantly less nucleic acid of other parts carries the non-multiple in part Property.
In certain embodiments, the first chromosome interested is to be selected from the group, and the group is by chromosome 1-22, X and Y Composition.
In certain embodiments, device (6) includes being used to for the copy number variation to be categorized into a class being selected from the group Other component, the group are made up of the following:Complete chromosome inserts or multiplication, complete chromosome lack, chromosome dyad is multiple System and chromosome dyad missing and chimera.
In certain embodiments, the extra medical analysis equipment further comprises:
(i) device (8), for determining that copy number variation is caused by part aneuploidy or chimera;And
(ii) device (9), if being caused for the copy number variation by part aneuploidy, it is determined that first interested at this Chromosome on part aneuploidy locus.
Wherein device (8) and (9), which are configured in, is used for the first fetus fractional value and the second fetus fraction The device (6) that value is compared determine the first fetus fractional value and the second fetus fractional value not approximately equal when perform.It is excellent Selection of land, the signal output part of device (6) is connected to device (8), and the signal output part of device (8) is connected to device (9). In some embodiments, the device of the locus for determining the part aneuploidy on first chromosome interested (9) include being used for the core being divided into these sequence labels of first chromosome interested in first chromosome interested The component of sour data box or matrix;And the component for being counted to these map tags in each data box.
In certain embodiments, the extra equipment further comprises a sequencing device (10), the sequencing device quilt It is configured to the fetus in a parent test sample (for example, blood, blood plasma, serum or urine samples) and maternal nucleic acids progress It is sequenced and obtains these sequence reads.Preferably, fetus and maternal nucleic acids are Cell-free DNA (cfDNA).Preferably, it is sequenced The signal output part of device (10) is connected with the device (1).
In certain embodiments, sequencing device (10) is configured for synthetic method sequencing.Reversible dyestuff can be used Terminator carries out synthetic method sequencing.Or sequencing device (10) is configured for connection method sequencing.Or sequencing device (10) it is configured for single-molecule sequencing.In certain embodiments, sequencing device (10) and extra the setting of being used to classifying Standby device (1)-(6) are located in separated place.Preferably, the signal output part of sequencing device (10) by a network with The device (1) is connected.
In certain embodiments, the extras for classification further comprise obtaining the parent from mother of pregnancy The device (11) of test sample.Device (11) and device (1)-(6) can be located in separated place.In addition, this extra sets The standby device (12) that can further include from parent test sample extraction Cell-free DNA.Extract the device of Cell-free DNA (12) it can be located at the sequencing device (10) in same place, and wherein obtain the device (11) of the parent test sample In a remote site.
In certain embodiments, device (2) compares at least about 1,000,000 readings.
Kit
In different embodiments, there is provided kit is used to implement method described herein.In certain embodiments, These kits include one or more positive internal controls for complete aneuploidy and/or partial aneuploidy. Typically, but may not, these controls include internal positive control, and these positive controls include the nucleic acid sequence of the type to be screened Row.For example, for determining that the control of the test in maternal sample presence or absence of fetal trisomic (such as trisomy 21) can By including the DNA (for example, obtained from personal DNA with trisomy 21) characterized by trisomy 21.In some embodiments In, the control includes being obtained from two or more personal DNA with different aneuploidy mixture.For example, for It is determined that presence or absence of the test of 13 trisomys, 18 trisomys, trisomy 21 and X monosomy, the control may include to be obtained from Respectively nourish the combination of the DNA sample of the pregnant woman of a fetus with one of tested trisomy.Except complete chromosome is non- Outside ortholoidy, IPC can also be produced to provide positive control for test, to determine presence or absence of the non-whole of part Ploidy.
In certain embodiments, should (these) positive control include it is one or more include trisomy 21 (T21) and/or The nucleic acid of 18 trisomys (T18) and/or 13 trisomys (T13).In certain embodiments, including existing each trisomy all It is that T21 nucleic acid is provided in separated container.In certain embodiments, including the nucleic acid of two or more trisomys carries For in single container.Thus, for example, in certain embodiments, container can include T21 and T18, T21 and T13, T18 and T13.In certain embodiments, container can contain T18, T21 and T13.In these different embodiments, trisomy can There is provided with equal amount/concentration.In other embodiments, trisomy specifically estimated rate can provide.In different realities Apply in scheme, control can provide as " deposit " solution of concentration known.
In certain embodiments, the control for detecting aneuploidy includes the cellular genome for being obtained from two subjects DNA mixture, a people are the contributors of the aneuploid genome.For example, as described above, it is caused as control The internal positive control (IPC) of test for determining fetal trisomic (such as trisomy 21) may include from carrying three body The genomic DNA of the sex subject of sex chromosome with it is tested from the known women for not carrying the trisomy chromosome The combination of the genomic DNA of person.In certain embodiments, the genomic DNA is sheared to provide between about 100-400bp, about Fragment between 150-350bp or between about 200-300bp simulates the circulation cfDNA fragments in maternal sample.
In certain embodiments, the fragment from the subject for carrying aneuploidy (such as trisomy 21) in the control The DNA of change ratio by selection to simulate the ratio of the circulation fetus cfDNA found in maternal sample, so as to provide including The IPC of fragmentation DNA mixture, the mixture include about 5%, about 10%, about 15%, about 20%, about 25%, about 30% From the DNA for the subject for carrying the aneuploidy.In certain embodiments, the control include from it is each carry it is different non-whole The DNA of the different subjects of ploidy.For example, IPC may include about 80% unaffected women DNA, and remaining 20% can be with It is three different subjects from each carrying trisomy chromosome 21, trisomy chromosome 13 and trisomy chromosome 18 DNA.
In certain embodiments, be somebody's turn to do (these) control has known chromosomal aneuploidy including nourishing obtained from known to Fetus parent cfDNA.For example, these controls may include to be obtained from nourish with trisomy 21 and/or 18 trisomys and/or The cfDNA of the pregnant woman of the fetus of 13 trisomys.The cfDNA can extract from maternal sample, and be cloned into bacteria carrier And grown in bacterium to provide continual IPC sources.As an alternative, warp can be expanded for example, by PCR The cfDNA of clone.
Although the control in the presence of kit is stated above in relation to trisomy, it need not be so limited.Should Understand, the positive control in the presence of kit can be produced to embody the aneuploidy of other parts, including it is for example different Section expands and/or missing.Thus, for example, in the specific amplification of known different cancer and substantially complete chromosome arm Or in the case that missing is related, being somebody's turn to do (these) positive control may include any one or more in chromosome 1-22, X and Y Galianconism is long-armed.In certain embodiments, the control includes the amplification for one or more arms being selected from the group, and the group is by following Item composition:1q、3q、4p、4q、5p、5q、6p、6q、7p、7q、8p、8q、9p、9q、10p、10q、12p、12q、13q、14q、16p、 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and/or 22q (see, for example, table 2).
In certain embodiments, these controls are included for the known any region related to specific amplification or missing Aneuploidy (such as breast cancer related to the amplification at 20Q13).Illustrative area includes but is not limited to 17q23 (with breast cancer phase Close), 19q12 (related from oophoroma), 1q21-1q23 (related with sarcoma and different solid tumors), 8p11-p12 is (with breast cancer phase Close), ErbB2 amplicons etc..In certain embodiments, these controls include the dye as shown in any one of table 3-6 The amplification of chromosomal regions or missing.In certain embodiments, these controls are included comprising as shown in any one of table 3-6 Gene chromosomal region amplification or missing.In certain embodiments, these controls are included comprising multiple nucleotide sequences, These nucleotide sequences include the amplification of the nucleic acid comprising one or more oncogenes.In certain embodiments, these controls include Multiple nucleotide sequences, these nucleotide sequences include the amplification of the nucleic acid comprising one or more genes being selected from the group, the group of the group Turn into:MYC, ERBB2 (EFGR), CCND1 (cyclin D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2 and CDK4.
It is that above-mentioned control is intended to be illustrative rather than restricted.Use the content provided in this article taught, ability The those of ordinary skill in domain can identify many other controls being suitably joined in kit.
In different embodiments, the replacement in addition to these controls or as these controls, these kits include One or more, which provide, to be adapted to follow the trail of and determine the nucleic acid and/or nucleic acid mimics of the label sequence of sample integrity. In some embodiments, these labels include antigene strand sequence.In certain embodiments, the length of these label sequences Degree is in about 30bp is to up to about 600bp length or about 100bp to about 400bp length ranges.In certain embodiments, should The length of (these) label sequence is at least 30bp (or nt).In certain embodiments, the label is connected to aptamer, And the aptamer connection marker molecules length between about 200bp (or nt) and about 600bp (or nt), about Between 250bp (or nt) and 550bp (or nt), between about 300bp (or nt) and 500bp (or nt) or about 350 and 450 it Between.In certain embodiments, the length of the marker molecules of aptamer connection is about 200bp (or nt).In some implementations In scheme, the length of marker molecules can be about 150bp (or nt), about 160bp (or nt), 170bp (or nt), about 180bp (or nt), about 190bp (or nt) or about 200bp (or nt).In certain embodiments, the length of label about 600bp (or Nt in the range of).
In certain embodiments, the kit provides at least two or at least three or at least four or at least five Individual or at least six or at least seven or at least eight or at least nine or at least ten or at least 11 or at least 12 Individual or at least 13 or at least 14 or at least 15 or at least 16 or at least 17 or at least 18 or at least 19 Individual or at least 20 or at least 25 or at least 30 or at least 35 or at least 40 or at least 50 different sequences Row.Separated container/bottle can be stored in by providing the different nucleic acid for being somebody's turn to do (these) label sequence and/or nucleic acid mimics In.Alternately, different marker molecules can be stored in identical container/bottle.
In different embodiments, these labels include one or more DNA, or these labels include it is a kind of or A variety of DNA analogs.Suitable analogies include but is not limited to morpholinyl-derivatives, peptide nucleic acid (PNA) and phosphorothioate DNA. In different embodiments, these labels are attached in these controls.In certain embodiments, by these label knots Close in aptamer and/or provide and be connected to aptamer.
In certain embodiments, the kit further comprises one or more sequencing aptamers.These aptamers include But it is not limited to the sequencing aptamer indexed.In certain embodiments, these aptamers include sub-thread arm, and the sub-thread arm includes One index sequence and one or more PCR trigger site.
In certain embodiments, the kit is further used to collect biological sample comprising a sample collection device. In certain embodiments, the sample collection device includes a device and optionally for being used to collect blood, and one is used for Hold the container of blood.In certain embodiments, the kit includes a container for being used to hold blood, and the container Including anticoagulant and/or cell fixative and/or one or more antigene strand label sequences.
In certain embodiments, the kit further comprises DNA extracts reagents (such as isolation medium and/or elution Solution).The kit can also include being used to prepare the reagent being sequenced to library.These reagents include but is not limited to be used for The solution of end DNA plerosis and/or the solution for dA tails DNA and/or the solution for aptamer connection DNA.
In certain embodiments, the kit further comprises a kind of combination for including one or more primer sets Thing, this or these primer set are used to expand at least one previously selected polymorphic nucleic acid in maternal sample, its In each previously selected polymorphic nucleic acid include at least one polymorphic site, and the forward direction in each of which primer set Or the DNA sequence dna of reverse primer and a close enough polymorphic site hybridizes to be included in by by the advance of amplification Selected polymorphic nucleic acid is carried out caused by the large-scale parallel sequencing in sequence reads.To by the previously selected of amplification Polymorphic sequence carries out sequencing can be such as in the described elsewhere of the application, for determining the fetus fraction in maternal sample.In advance Selected polymorphic nucleic acid can include SNP or STR.In certain embodiments, at least one in each described primer set Individual primer be designed to identification in about 25bp, about 40bp, about 50bp or about 100bp sequence reads existing one it is polymorphic Site.In certain embodiments, primer set and the DNA sequence dna hybridize, produce at least about 100bp, at least about 150bp or At least about 200bp amplicon.Primer set can be with hybridizing in DNA sequence dna present on phase homologous chromosomes, or primer set Can be with hybridizing in DNA sequence dna present on coloured differently body.In certain embodiments, primer set not with chromosome 13rd, 18,21, DNA sequence dna hybridization present on X or Y.
To implement these methods and the implementation of the kit provided being applied in combination with a variety of devices as described herein Scheme is illustrated in Figure 67 and 68.In one embodiment, kit provides for determination fetus fraction.Such as institute in Figure 67 Show, kit includes a reagent box main body (1), is arranged in reagent box main body for the clamping slot of bottle rack including interior The bottle (2) of portion's positive control;Including be suitable for following the trail of and determine sample integrity label nucleic acid bottle (3) and bag Include the bottle (4) of cushioning liquid.
Kit can include multiple extra bottles, wherein each in the multiple bottle includes different inside Positive control or different label nucleic acid.
In certain embodiments, bottle (2) includes two or more internal positive controls.The internal positive control bag The trisomy being selected from the group is included, the group is made up of the following:Trisomy 21, trisomy 18, trisomy 21, trisomy 13, three Body 16, trisomy 13, trisomy 9, trisomy 8, trisomy 22, XXX, XXY and XYY.In certain embodiments, it is internal Positive control includes the trisomy being selected from the group, and the group is made up of the following:Trisomy 21 (T21), trisomy 18 (T18) with And trisomy 13 (T13).In other embodiments, the internal positive control being loaded into bottle (2) includes trisomy 21 (T21), trisomy 18 (T18) and trisomy 13 (T13).Alternately, positive control included in kit can wrap Include amplification or the missing of one or more of chromosome 1 to 22, X and Y a part.In certain embodiments, it is positive right According to including a galianconism or a long-armed amplification or missing any one or more in chromosome 1 to 22, X and Y.Some In embodiment, the amplification for one or more arms that bottle (2) includes being selected from the group or missing, the group are made up of the following: 1q、3q、4p、4q、5p、5q、6p、6q、7p、7q、8p、8q、9p、9q、10p、10q、12p、12q、13q、14q、16p、17p、 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and 22q.In other embodiments, bottle (2) includes one be selected from the group The amplification in individual region, the group are made up of the following:20Q13,19q12,1q21-1q23,8p11-p12 and ErbB2.It is alternative Ground, the positive control being loaded into bottle (2) are included in a region shown in table 3, table 4, table 5 and table 6 or a base The amplification of cause.In certain embodiments, the positive control being loaded into bottle (2) includes the region or one being selected from the group The amplification of individual gene, the group are made up of the following:MYC, ERBB2 (EFGR), CCND1 (cycle element D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2 and CDK4.
Included label nucleic acid (also known as marker molecules (MM)) is anti-gene in multiple embodiments of kit Chain label sequence.The length of these label sequences can be from about 30bp to about 600bp in length range.In other implementations In scheme, the length of these label sequences is from about 100bp to about 400bp in length range.In certain embodiments, should Kit includes at least two, or at least three, or at least four, or at least five, or at least six, or at least seven, or at least 8 It is individual, or at least nine, or at least ten, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15 It is individual, or at least 16, or at least 17, or at least 18, or at least 19, or at least 20, or at least 25, or at least 30 It is individual, or at least 35, or at least 40, or at least 50 bottles for being used for different label sequences.
In certain embodiments, label included in kit includes one or more DNA.In other embodiment party In case, label includes one or more analogies being selected from the group, and the group is made up of the following:Morpholino derivative, peptide Nucleic acid (PNA) and phosphorothioate DNA.
In certain embodiments, label is attached in the control.In other embodiments, label is attached to In aptamer.In certain embodiments, the bottle (3) of kit can further load one or more sequencing aptamers. Aptamer includes the sequencing aptamer indexed.These aptamers may further include sub-thread arm, and the sub-thread arm includes one Index sequence and one or more PCR trigger site.
Figure 68 shows the sketch of kit, and the kit may further include a sample for being used to collect biological sample Collection device.The sample collection device includes a device (5) for being used to collect blood and an appearance for being used to hold blood Device (6).In certain embodiments, the device for being used to collect blood and the container for being used to hold blood include anticoagulation Agent and cell fixative.
In certain embodiments, kit may further include bottle (7), and the bottle (7) is loaded with DNA extraction examinations Agent.A kind of isolation medium and/or a kind of elution solution can be included by being somebody's turn to do (these) DNA extracts reagents.
In certain embodiments, the kit further comprises bottle (8), and the bottle (8) is loaded with to be surveyed for preparing The reagent in preface storehouse.For prepare sequencing library these reagents can include for end DNA plerosis solution, for pair DNA carries out the solution of dA tailings and the solution for carrying out aptamer connection to DNA.
In other embodiments, the kit further comprises bottle (9), and the bottle (9) includes being used for predetermined mesh The composition for the primer that mark nucleic acid is expanded.
In certain embodiments, the kit further comprises teaching using the reagent to determine in biological sample The guiding material of fetus fraction.These guiding materials are taught using these materials to detect trisomy or monosomy.Some In embodiment, these guiding materials are taught using these materials to detect the liability of cancer or cancer.
In addition, these kits optionally include mark and/or guiding material, for using provided in the kit Reagent and/or device, which provide, instructs (such as scheme).For example, these guiding materials can be taught using these reagents to prepare sample And/or the copy number variation in determination biological sample.In certain embodiments, these guiding materials, which are taught, uses these materials To detect trisomy.In certain embodiments, these guiding materials are taught using these materials to detect cancer or cancer Liability.
Although the guiding material in different kits typically comprises hand-written or printing material, they are not limited to This.These instructions can be stored and by any media of they and end user UNICOM by covering herein.These media include But it is not limited to electronic storage medium (such as magnetic disc, tape, pick-up head, chip), optical media (such as CD ROM) etc..These matchmakers Body may include the address for reaching the internet site for providing these guiding materials.
Different method, apparatus, system and purposes are described in further detail in the following example, these examples are never It is intended to limit the required scope of the invention.Accompanying drawing wishes to be considered the part that this specification and the present invention illustrate.There is provided Following instance is to illustrate and not limit the required present invention.
Experiment
Example 1
Sample treatment and cfDNA extractions
From in gravidic first trimenon or second trimenon and being considered as fetus aneuploidy wind being present Peripheral blood sample is collected in pregnant woman's body of danger.Before blood drawing letter of consent is obtained at each participant.In amniocentesis or chorion suede Blood is collected before hair sampling.Karyotyping is carried out to determine fetal karyotype using chorionic villi or amniocentesis sample.
The peripheral blood extracted from each subject is collected in ACD pipes.One pipe blood sample (about 6 to 9 milliliters/pipe) is transferred to In one 15 milliliters of low-speed centrifugal pipe.Using Beckman Allegra 6R centrifuges and the type rotors of GA 3.8,2640rpm, 4 DEG C It is lower by centrifugal blood 10 minutes.
Extract, top plasma layer is transferred in 15 milliliters of high speed centrifugation pipes, and use Bake for cell-free plasma Graceful Kurt Avanti J-E centrifuges and JA-14 rotors, centrifuged 10 minutes at 16000 × g, 4 DEG C.After blood collection, Two centrifugation steps were carried out in 72 hours.Cell-free plasma comprising cfDNA is stored at -80 DEG C, and in blood plasma CfDNA is expanded or cfDNA only thaws once before purification.
Using QIAamp blood DNAs Mini Kit (Kai Jie) (QIAamp Blood DNA Mini kit (Qiagen)), Purified Cell-free DNA (cfDNA) is extracted from cell-free plasma essentially according to manufacturer specification.It is slow by one milliliter Fliud flushing AL and 100 μ l protein enzyme solutions are added in 1ml blood plasma.The mixture is incubated 15 minutes at 56 DEG C.By one milliliter 100% ethanol is added in blood plasma digestive juice.Gained mixture is transferred to and QIAvac 24Plus column combinations parts (Kai Jie) What VacValve and VacConnector provided in (QIAvac 24Plus column assembly (Qiagen)) were combined In QIAamp micro-columns.To sample applying vacuum, and under vacuo with 750 μ l buffer As W1 to being trapped on post filter CfDNA washed, then carrying out second with 750 μ l buffer As W24 washs.The post is centrifuged 5 under 14,000RPM Minute to remove any remaining buffer from filter.Pass through the centrifugation buffer A E elutions under 14,000RPM CfDNA, and use QubitTMQuantify platform (QubitTMQuantitation Platform) (hero (Invitrogen)) Determine concentration.
Example 2
The preparation and sequencing of sequencing library initial and by enrichment
A. sequencing library-shortening stipulations (ABB) are prepared
All sequencing libraries, i.e., library initial and by enrichment, all passed through by the about 2ng extracted from Maternal plasma pure It is prepared by the cfDNA of change.Use reagent N EBNextTMDNA sample prepares the (NEBNext of DNA reagents collection 1TM DNA Sample Prep DNA Reagent Set 1) (Item Number E6000L;Knob Great Britain biology laboratory (New England Biolabs), Ipswich, Massachusetts) followingCarry out library preparation.Because cell-free plasma DNA is actually Into fragment, therefore the plasma dna sample is no longer made into fragment by spray-on process or sonication.According toEnd Repair module (End Repair Module), by by cfDNA and NEBNextTMDNA sample prepares DNA examinations 5 10 × Phosphorylation Buffers of μ l, 2 μ l deoxynucleotides solution mixtures (10mM is per dNTP), 1 μ l provided in agent collection 1 1:5DNA polymerase I dilutions, 1 μ l T4DNA polymerases and 1 μ l T4 polynucleotide kinases are together in 1.5ml microcentrifugations It is incubated 15 minutes at 20 DEG C in pipe, the jag of the purified cfDNA fragments of about 2ng contained in 40 μ l is changed into by phosphorus The blunt end of acidifying.Then hot inactivation is carried out to the enzyme by being incubated the reactant mixture 5 minutes at 75 DEG C.The mixture is cold But to 4 DEG C, and the dA tailing main mixed liquors (NEBNext for using 10 μ l to include Klenow fragment (3' to 5'exo minus)TM DNA sample prepares DNA reagents collection 1) and 15 minutes are incubated at 37 DEG C to realize blunt end DNA dA tailings.Then, by The reactant mixture is incubated 5 minutes at 75 DEG C hot inactivation is carried out to Klenow fragment.After Klenow fragment inactivation, use NEBNextTMDNA sample prepares 4 μ l T4DNA ligases provided in DNA reagents collection 1, by 25 DEG C that reaction is mixed Compound is incubated 15 minutes and uses 1 μ l Yi Lu meter Na genome aptamer oligomeric mixtures (Illumina Genomic Adaptor Oligo Mix) (Item Number 1000521;Illumina Inc., Hayward, California) 1:5 dilutions Yi Lu meter Na aptamers (non-index Y aptamers (Non-Index Y-Adaptors)) are arrived the DNA with dA tails by liquid.This is mixed Thing is cooled to 4 DEG C, and uses An Jinkete (Agencourt) AMPure XP PCR purification systems (Item Number A63881; Beckman Kurt genome, Danvers, Massachusetts) provided in magnetic bead, from not connected aptamer, aptamer two The cfDNA of adapted sub- connection is purified into aggressiveness and other reagents.UseHigh-fidelity main mixed liquor (25 μ l; Fragrant appearance is beautiful (Finnzymes), Wo Ben, Massachusetts) and compensate Yi Lu meter Na PCR primers (each 0.5 μM) (article of aptamer Numbering 1000537 with 1000537) carry out 18 PCR cycles to be optionally enriched with the cfDNA (25 μ l) that aptamer connects. Use Yi Lu meter Na Genomic PCRs primer (Item Number 100537 and 1000538) and NEBNextTMDNA sample prepares DNA examinations Phusion HF PCR main mixed liquors provided in agent collection 1, the DNA connected according to manufacturer specification to aptamer are carried out (98 DEG C, 30 seconds of PCR;98 DEG C, 10 seconds, 18 circulations;65 DEG C, 30 seconds;And 72 DEG C, 30 seconds;Final 5 points of the extension at 72 DEG C Clock, and it is maintained at 4 DEG C).Use An Jinkete AMPure XP PCR purification systems (Agencourt AMPure XP PCR Purification system) (An Jinkete biotechnologies company (Agencourt Bioscience Corporation), Billy's Buddhist, Massachusetts), according to can be in www.beckmangenomics.com/products/AMPureXPProtocol_ The manufacturer specification obtained at 000387v001.pdf purifies the product by amplification.In the triumphant outstanding EB buffer solutions of 40 μ l The purified amplification product of elution in (Qiagen EB BufferQiagen EB Buffer), and using for 2100 lifes The peace of thing analyzer (Agilent technology company (Agilent technologies Inc.), Santa Clara, California) The prompt kits of human relations DNA 1000 carry out the concentration and size distribution in analysing amplified library.
B. sequencing library-total length stipulations are prepared
Total length stipulations described herein are substantially the Standards Code that Yi Lu meter Na is provided, and only in the pure of amplification library It is different from Yi Lu meter Na stipulations in terms of change.Yi Lu meter Na stipulations indicate, using gel purified amplification library, and it is described herein Stipulations carry out identical purification step using magnetic bead.Using forNEBNextTMDNA sample prepares DNA (the Item Number E6000L of reagent collection 1;Knob Great Britain biology laboratory, Ipswich, Massachusetts), essentially according to manufacture Business's specification, initial sequencing library is prepared using the purified cfDNA extracted from Maternal plasma of about 2ng.Except right Aptamer connection product carry out finally purifying (step is carried out using An Jinkete magnetic beads and reagent rather than purification column) with Outside, all steps are all according to genome dna library sample preparation NEBNextTMStipulations appended by reagent are carried out, DNA texts Storehouse usesGAII is sequenced.NEBNextTMStipulations substantially follow the stipulations that Yi Lu meter Na is provided, her Rumi Stipulations of receiving can obtain at grcf.jhml.edu/hts/protocols/11257047_ChIP_Sample_Prep.pd f.
According toEnd repair module, by by 40 μ l cfDNA and NEBNextTMDNA sample prepares DNA 5 10 × Phosphorylation Buffers of μ l, 2 μ l deoxynucleotides solution mixtures (10mM is per dNTP), 1 μ l provided in reagent collection 1 1:5DNA polymerase I dilutions, 1 μ l T4DNA polymerases and 1 μ l T4 polynucleotide kinases are together in 200 μ l microcentrifugations It is incubated 30 minutes at 20 DEG C in recirculation heater in pipe, by the protrusion of the purified cfDNA fragments of about 2ng contained in 40 μ l End changes into the blunt end by phosphorylation.Sample is cooled to 4 DEG C, and uses QIAQuick PCR purification kits (Kai Jie Company, Valencia, California) provided in QIAQuick posts be carried out as follows purifying.50 μ l reactants are shifted Into 1.5ml microcentrifugal tubes, and add the triumphant outstanding buffer solution PB of 250 μ l.The μ l of gained 300 are transferred in QIAquick posts, Centrifuged in microcentrifuge under 13,000RPM 1 minute.The post is washed with 750 μ l triumphant outstanding buffer solution PE, and And centrifuge again.Residual ethanol is removed by adding centrifugation under 13,000RPM 5 minutes.Lead in the triumphant outstanding buffer solution EB of 39 μ l Centrifugation is crossed to elute DNA.The dA tailing main mixed liquors of Klenow fragment (3 ' to 5 ' exo minus) are included using 16 μ l (NEBNextTMDNA sample prepares DNA reagents collection 1) and according to manufacturerDA tailing modules, at 37 DEG C It is lower to be incubated 30 minutes to realize 34 μ l blunt ends DNA dA tailings.Sample is cooled to 4 DEG C, and it is pure using MinElute PCR Change the post provided in kit (Kai Jie companies, Valencia, California) and be carried out as follows purifying.By 50 μ l reactants It is transferred in 1.5ml microcentrifugal tubes, and adds the triumphant outstanding buffer solution PB of 250 μ l.300 μ l are transferred in MinElute posts, Centrifuged in microcentrifuge under 13,000RPM 1 minute.The post is washed with 750 μ l triumphant outstanding buffer solution PE, and And centrifuge again.Residual ethanol is removed by centrifuging 5 minutes again under 13,000RPM.Pass through in the triumphant outstanding buffer solution EB of 15 μ l Centrifugation elution DNA.According toQuick connection module, by ten microlitres of DNA elutriants and 1 μ l 1:5 Yi Lu meter Na genes Group aptamer oligomeric mixture dilution (Item Number 1000521), the quick coupled reaction buffer solutions of 15 μ l 2X and 4 μ l are fast Fast T4DNA ligases are incubated 15 minutes at 25 DEG C together.Sample is cooled to 4 DEG C, and entered as follows using MinElute posts Row purifying.150 microlitres of triumphant outstanding buffer solution PE are added in 30 μ l reactants, and whole volume is transferred to In MinElute posts, centrifuged in microcentrifuge under 13,000RPM 1 minute.With the triumphant outstanding buffer solution PE of 750 μ l to this Post is washed, and is centrifuged again.Residual ethanol is removed by centrifuging 5 minutes again under 13,000RPM.It is triumphant outstanding slow in 28 μ l In fliud flushing EB DNA is eluted by centrifuging.Using Yi Lu meter Na Genomic PCRs primer (Item Number 100537 and 1000538) and NEBNextTMDNA sample prepares the Phusion HF PCR main mixed liquors provided in DNA reagents collection 1, is illustrated according to manufacturer The DNA elutriants of book connection adapted to 23 microlitres carry out 18 (98 DEG C, 30 seconds of PCR cycles;98 DEG C, 10 seconds, 18 times Circulation;65 DEG C, 30 seconds;And 72 DEG C, 30 seconds;Final extension 5 minutes at 72 DEG C, and it is maintained at 4 DEG C).Use An Jinke Special AMPure XP PCR purification systems (An Jinkete biotechnologies company, Bi Lifo, Massachusetts), according to can be The manufacture obtained at www.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf Business's specification purifies amplification product.An Jinkete AMPure XP PCR purification systems will remove uncombined dNTP, primer, Primer dimer, salt and other pollutants, and reclaim the amplicon more than 100bp.In the triumphant outstanding EB buffer solutions of 40 μ l from Elution amplification product on An Jinkete beads, and use and be directed to 2100 bioanalysis devices (Agilent technology company, holy carat Draw, California) the kits of Agilent DNA 1000 analyze the size distribution in library.
C. analyze according to the sequencing library for shortening (a) and the preparation of total length (b) stipulations
Electrophoretogram is shown in Figure 21 A and 21B as caused by bioanalysis device.Figure 21 A, which are shown, to be used described in (a) Total length stipulations are by the electrophoretogram of the cfDNA that is purified from the plasma sample M24228 library DNA prepared, and Figure 21 B are shown Electrophoresis using the total length stipulations described in (b) by the cfDNA purified from plasma sample M24228 the library DNAs prepared Figure.In both figures, peak value 1 and 4 all accordingly represents 15bp bottoms internal standard and 1,500 upper internal standards;Numeral above peak value Show the migration number of library fragments;And horizontal line shows the given threshold of integration.Electrophoretogram in Figure 21 A shows have One main peak value of one minor peaks of 187bp fragment and fragment with 263bp, and the electrophoretogram in Figure 21 B is only shown Peak value at one 265bp.Peak area is integrated, obtaining the DNA calculating concentration of 187bp peak values in Figure 21 A is The DNA concentration of 263bp peak values is 7.34ng/ μ l in 0.40ng/ μ l, Figure 21 A, and in Figure 21 B 265bp peak values DNA concentration It is 14.72ng/ μ l.The known Yi Lu meter Na aptamers for being connected to cfDNA are 92bp, when it is subtracted from 265bp, are shown CfDNA peak value size is 173bp.Minor peaks at 187bp may represent the fragment of two primers of end-to-end link.When making During with shortening stipulations, Linear Double primer segments are eliminated from final library product.Shorten stipulations can also eliminate less than 187bp its His more small fragment.In this example, purified aptamer connection cfDNA concentration is adapted to using caused by total length stipulations Twice of son connection cfDNA concentration.Have been pointed out, the concentration of these aptamers connection cfDNA fragments, which is consistently greater than, uses total length Stipulations winner (data are not shown).
Therefore, an advantage for sequencing library being prepared using shortening stipulations is that the library obtained only is included in all the time A main peak in the range of 262-267bp, and the quality in the library prepared using total length stipulations can change, such as except representing The number and mobility of peak value beyond cfDNA peak value are embodied.Non- cfDNA products by the space occupied on flow cell and The quality of cluster amplification and the imaging of subsequent sequencing reaction is reduced, this is the basis of the overall assignment of aneuploid state.According to display, Shortening stipulations does not influence the sequencing in library.
Another advantage that sequencing library is prepared using shortening stipulations is that blunting, dA tailings and aptamer connection should The step of three enzymes, which spends, to be less than one hour and can complete, so as to support the checking of rapid aneuploidy diagnostic service and implementation.
The step of another advantage is, blunting, dA tailings and aptamer connect three enzymes is in same reaction tube Carry out, thus avoid multiple sample transfer, sample transfer is likely to result in loss of material, and may more importantly cause Sample mixes and sample pollution.
Example 3
Sequencing library is prepared by the cfDNA not repaired:Aptamer connection in solution
In order to determine whether further shorten stipulations shortening to further speed up sample analysis, by what is do not repaired CfDNA is made sequencing library and is sequenced as discussed previously using Yi Lu meter Na genome analysis instrument II.
CfDNA is prepared by peripheral blood sample as described herein.Without by the open stipulations institute for Yi Lu meter Na platforms It is required that 5 ' phosphatic blunting and phosphorylations, to provide the cfDNA samples do not repaired.
It was determined that omitting, DNA is repaired or DNA repairs the quality or yield (data for not influenceing sequencing library with phosphorylation It is not shown).
For 2 footworks in the solution for the DNA not repaired not indexed
First experiment concentrate, by same reactant mixture combine Ke Lienuo Exo- and T4-DNA ligase and To the cfDNA not repaired while carry out dA tailings and connected with aptamer, it is as follows:To 30 microlitres of concentration 20-150pg/ μ l it Between cfDNA carry out dA tailings (5 μ l 10X2 NEB buffer solutions, 2 μ l 10nM dNTP, 1 μ l 10nM ATP and 1 μ l 5000U/ml grams of row promise Exo-), and 1 μ l 400 are used, 000U/ml T4-DNA ligases, connect in 50 μ l reaction volume It is connected to Yi Lu meter Na Y aptamers (the 1 of 13 μM of storing solutions of μ l:15 dilutions).The Y aptamers do not indexed derive from Yi Lu meter Na. The reactant of combination is incubated 30 minutes at 25 DEG C.Heat inactivation 5 minutes is carried out to enzyme at 75 DEG C, and reactor product is deposited Storage is at 10 DEG C.
The product of aptamer connection uses SPRI beads (An Jinkete AMPure XP PCR purification systems, Beckman Ku Er Special genomics) purified and carry out 18 PCR cycles.The library by PCR amplifications is purified using SPRI, and And be sequenced using Yi Lu meter Na genome analysis instrument IIx or HiSeq according to manufacturer specification, to obtain 36bp list Hold reading.Obtain many 36bp readings, the genome of covering about 10%.After sample sequencing is completed, Yi Lu meter Na " sequencer controls Base is judged that file is transferred on the network of connection storage device to carry out in a binary format by software processed/analysis in real time " Data analysis.Using the software designed for being run in Linux server her Rumi is used come analytical sequence data, the software Receive " BCLConverter " by binary format base judge change into human readable text, then call increase income " Bowtie " program is so that by sequence, with being compared with reference to human genome, the reference human genome is derived from country's biology The hg18 genomes that technology information centre (National Center for Biotechnology Information) is provided (NCBI36/hg18, can be on the world wide web (www with http://genome.ucsc.edu/cgi-bin/hgGatewayOrg= Human&db=hg18&hgsid=166260105 is obtained).
The software reads the base passed through caused by procedure above with exporting (bowtieout.txt files) from Bowtie Because organizing the sequence data uniquely compared.Allow the sequence alignment with most 2 base mispairings, and only in itself and genome It is included in when uniquely comparing in comparison counting.Exclude the sequence alignment (copy) with identical beginning and end coordinate.Will About 500 to 2,500 ten thousand 36bp labels with 2 or less than 2 mispairing are uniquely mapped to human genome.Reflected to all Label is penetrated to be counted and be included in the chromosome Rapid Dose Calculation in test and qualified samples.Base 2 is extended to from base 0 ×106, base 10 × 106To base 13 × 106And base 23 × 106To chromosome Y ends region definitely from analysis Exclude, because the label mapping from sex fetus is to these regions of Y chromosome.
Figure 22 A are shown when according to shortening stipulations (ABB;When ◇) preparing sequencing library and when according to without repairing 2 footworks (INSOL;The total percentage (% chromosome N) of the sequence label of each human chromosomal is mapped to when) preparing sequencing library Average value (n=16).These data are shown, compared with the label percentage of corresponding chromosome is mapped to when using the method for shortening When, greater percentage of it is mapped to the chromosome with lower G/C content using without repairing 2 footworks and prepare sequencing library and produce The label for being mapped to the chromosome with more high GC content of label and smaller percentage.Figure 22 b are on sequence label percentage With chromosome size variation, and show that no restorative procedure reduces sequence offset.Obtained from according to shortening stipulations (ABB;Δ) And without reparation stipulations (2 steps in solution;) regression coefficient of the map tags of the sequencing library prepared is accordingly R2= 0.9332 and R2=0.9806.
The percentage G/C content of table 8./chromosome
Size (Mbp) GC (%) Size (Mbp) GC (%)
Chr1 247 41.37 Chr13 114 38.24
Chr2 243 39.44 Chr14 106 40.85
Chr3 199 38.74 Chr15 100 41.80
Chr4 191 38.60 Chr16 89 44.64
Chr5 181 39.35 Chr17 79 45.01
Chr6 171 39.94 Chr18 76 39.66
Chr7 159 39.78 Chr19 63 48.21
Chr8 146 40.30 Chr20 62 42.05
Chr9 140 40.17 Chr21 47 40.68
Chr10 135 40.43 Chr22 50 47.64
Chr11 134 41.37 ChrX 155 39.26
Chr12 132 40.59 ChrY 58 37.74
Shortening method is mapped to independent chromosome with being also regarded as without the comparison for repairing 2 footworks when using without restorative procedure The ratio of label percentage of the label percentage with being mapped to independent chromosome when using the method for shortening is with the GC of each chromosome Percentage composition and change.G/C content percentage relative to chromosome size is based on chromosome sequence and G/C content subregion Public information calculates (Constantine Buddhist nun (Constantini) et al., genome research (Genome Res) 16:536-541 [2006]) and provide in table 8.It the results are provided in Figure 22 C, the figure shows for the chromosome with high GC content Ratio significantly reduces, and for the ratio increase of the chromosome with low G/C content.These data clearly show that, no restorative procedure The possessed normalization effect for being used to overcome GC to offset.
These data show that no restorative procedure have modified GC skews to a certain extent, it is known that the GC is offset and DNA amplification Sequencing it is related.
In order to determine whether no restorative procedure influences the ratio that fetus contrasts parent cfDNA be sequenced, it is determined that mapping To the number percentage of chromosome x and Y label.Figure 23 A and 23B show bar chart, and these figures provide and are mapped to chromosome x (Figure 23 A;% chromosome x) and Y (Figure 23 B;% chromosome Y) label percentage average and standard deviation, the percentage by The 10 cfDNA samples purified from the blood plasma of 10 pregnant woman are sequenced and obtained.Figure 23 A are shown relative to use The number that shortening method is obtained, the label that X chromosome is mapped to when using without restorative procedure are larger in number.Figure 23 B are shown Difference when the label percentage that Y chromosome is mapped to when using without restorative procedure is not with using shortening method.
These data show that no restorative procedure, which will not introduce, is directed to or resists what fetus contrast mother body D NA was sequenced Any skew, i.e., when use is without repairing method, the constant rate for the foetal sequence being sequenced.
Sum it up, these data are shown, no restorative procedure can not adversely influence the quality of sequencing library, also will not shadow Ring by library being sequenced obtained information.Reagent cost will be reduced simultaneously by excluding the DNA reparations step needed for open stipulations And accelerate the preparation of sequencing library.
For 2 footworks in the DNA not repaired to index solution
Concentrated in second experiment, dA tailings are carried out to the cfDNA not repaired, then carry out Ke Lienuo Exo- heat inactivation Connected with aptamer.Connected when using the Yi Lu meter Na aptamers (it carries the sub-thread arm with 21 bases) do not indexed When connecing, excluding Ke Lienuo Exo- heat inactivation does not influence the yield or quality of sequencing library.
In order to determine whether no restorative procedure can be applied to multiple sequencing, using including the index sequence with 6 bases The Y aptamers indexed of self-control include or exclude Ke Lienuore inactivations will pass through to produce library.Different from not indexing Aptamer, the aptamer indexed includes the sub-thread arm with 43 bases, and it includes index sequence and PCR initiations site.
With obtained from integrated DNA technique (Integrated DNA Technologies) (Ke Laerweier, Iowa) Oligonucleotides is starting material, manufactures 12 kinds of different aptamers for indexing consistent with Yi Lu meter Na TruSeq aptamers. Oligonucleotide sequence is obtained from the adaptor sequence that disclosed Yi Lu meter Na TruSeq index.Oligonucleotides is dissolved, obtains 300 The annealing buffer (10mM Tris, 1mM EDTA, 50mM NaCl, pH 7.5) of μM ultimate density.Any specified volume will be included The equimolar oligonucleotide mixture of two cantilevers of the aptamer of index, usual (each 300 μM) mixing of 10 μ l, and allow to move back (95 DEG C, 6 minutes of fire;Then slow down control from 95 DEG C and be cooled to 10 DEG C).By final 150 μM of aptamers in 10mM Tris, 1mM 7.5 μM are diluted in EDTA (pH 8) and is stored at -20 DEG C until using.
Data show, when using the aptamer indexed, if active Ke Lienuo Exo- and ligase and indexed Aptamer is present in same reaction together, then is prepared by 2 footworks progress library infeasible.If however, first 75 Heat inactivation 5 minutes is carried out at DEG C to Ke Lienuo Exo-, ligase is then added and adds the aptamer indexed, then 2 footworks are very It is feasible.In the presence of the aptamer indexed and active Ke Lienuo Exo- may be worked as together, Ke Lienuo Exo- stock displacement activity is led The longer single-stranded DNA arm for the aptamer for causing to index is digested, so as to eliminate PCR primer site.Without or carry out heat go out In the case of step living, wrapped in Ke Lienuo Exo- react 2 footworks of display before addition ligase and the aptamer indexed The library with expected indicatrix (wherein main peak is at 290bp) can be made by including Ke Lienuo Exo- heat inactivations (data are not shown) Afterwards, the electrophoretogram of sequencing library is obtained using identical cfDNA and enzyme.Therefore, because it is applied to multiple sequencing without repairing method, Therefore all experiments using the Y aptamers indexed are modified and inactivated with the heat including Ke Lienuo Exo-.
Example 4
Sequencing library is prepared by the cfDNA not repaired:Aptamer is carried out on the surface of solids (SS) to be connected to not compile rope The DNA drawn 1 step surface of solids method
In order to determine without repair library technique whether can further simplify, to described in example 3 without repair sequencing library Preparation method is configured to carry out on a solid surface.Prepared library is sequenced as described in example 3.
As described in example 1, cfDNA is prepared by peripheral blood sample.With streptavidin painting polypropylene pipe, wash Wash, and make to be attached to by first collection of the aptamer indexed of biotinylation and be coated with by streptavidin Pipe on, it is as follows.By the way that SA is incubated overnight at 4 DEG C, by 8 hole PCR pipe row (U.S.'s science and technology (USA Scientific), Austria OK a karaoke club, Florida) pipe on coating containing the 0.5 nanomole streptavidin (silent science and technology (Thermo of match Scientific), Rockford, Illinois) 50 μ l PBS.Pipe is washed four times with 1XTE, every time 200 μ l.By 7.5 Picomole, 3.75 picomoles, 1.8 picomoles and 0.9 picomole are each in the rope by biotinylation in 50 μ l TE Draw 1 aptamer to be in duplicate added in the pipe by SA coatings, and be incubated 25 minutes at room temperature.Remove uncombined fit Gamete and pipe is washed four times with 200 μ l TE.As described in example 3, the leading to by biotinylation purchased from IDT is used The aptamer of index 1 by biotin labeling is manufactured with aptamer oligonucleotides.
Use the 1 step SS methods of the cfDNA from non-pregnant subject
In second row PCR pipe, in No. 2 NEB buffer solutions containing 20 nanomole dNTP and 10 nanomole ATP, in 50 μ l By control sample (NTC in reaction volume:No template control) or 30 μ l about 120pg/ μ l, i.e. about 32 femtomoles it is purified CfDNA obtained from non-pregnant female is incubated 15 minutes together with 5 unit of gram row promise Exo- at 37 DEG C.Then, by 75 DEG C Lower be incubated reactant mixture 5 minutes inactivates Klenow enzyme.Ke Lienuo-DNA mixtures are transferred to the warp combined comprising SA Cross in the respective tube of the aptamer of biotinylation, and by 25 DEG C in 10 μ l 1XT4-DNA ligase buffer solutions Mixture is incubated 15 minutes together with 400 unit T4-DNA ligases, cfDNA is connected to by fixed aptamer.With Afterwards, by 25 DEG C in 10 μ l buffer solutions by index 1 aptamer and 200 units of 7.5 picomoles without biotinylation T4-DNA ligases are incubated 15 minutes and are connected to the cfDNA with solid phase binding together.Reactant mixture is removed, and is used 200 μ l TE buffer solutions wash pipe 5 times.P5 and P7 primers (IDT is included by PCR uses;Each 1 μM) 50 μ l Phusion The cfDNA that PCR mixtures [knob Great Britain biology laboratory] connect to aptamer is expanded and is carried out as follows circulation:[30 seconds, 98℃;(10 seconds, 98 DEG C;10 seconds, 50 DEG C;10 seconds, 60 DEG C;10 seconds, 72 DEG C) 18 circulations of X;5 minutes, 72 DEG C;10 DEG C incubate Educate].SPRI cleanings [Beckman Kurt genomics] are carried out to gained library product, and according to using high-sensitivity biological Analyzer chip [Agilent technology, Santa Clara, California] is analyzed obtained indicatrix evaluation library Quality.These indicatrixes show that prepared by the solid phase sequencing library for the cfDNA not repaired provides high yield and high-quality sequencing Library (data are not shown).
Use the 1 step SS methods of the cfDNA from pregnant subject
Carry out testing solid surface (SS) method using the cfDNA samples obtained from pregnant woman.
As described in example 1, cfDNA is prepared by 8 peripheral blood samples obtained from pregnant woman, and as described above by passing through The cfDNA of purifying prepares sequencing library.Library is sequenced, and analytical sequence information.
Figure 24 show do not excluded on the respective canonical sequence genome (hg18) of 5 samples site (NE sites) number and The total ratio of these labels for not excluding site is mapped to, cfDNA is by these sample preparations and for according to example 2 Described in shortening stipulations (ABB) (packing), in the solution described in example 18 without repairing stipulations (2 steps;Hollow strips) and The surface of solids described in this example is without reparation stipulations (1 step;Grey bar) construct sequencing library.
Data shown in Figure 24 show that the expression of the PCR extension increasing sequences prepared according to three kinds of stipulations is suitable, show solid Body surface face method will not make sequence variation form skew expressed in library.
Figure 25 A are shown when to according to the uniquely mapping obtained when being sequenced without the library for repairing the preparation of surface of solids method Sequence label number to each chromosome is suitable with the number obtained when in using above-mentioned solution without 2 footwork of reparation.Data show Show, two kinds of GC skews that sequencing data is all reduced without restorative procedure.
Relation between the size for the chromosome that the number of tags and label of Figure 25 B display mappings are mapped.Obtained from basis Shorten in stipulations (ABB), solution without reparation stipulations (2 step) and the surface of solids without the sequencing library for repairing stipulations (1 step) preparation The regression coefficients of map tags be accordingly R2=0.9332, R2=0.9802 and R2=0.9807.
Figure 25 C are shown obtained from according to the sequence label/dye mapped without the percentage for repairing sequencing library prepared by 2 step stipulations Colour solid and the percentage obtained from the ratio according to the label/chromosome for shortening sequencing library prepared by stipulations (ABB) for each chromosome Than the function (◇) of G/C content, and obtained from according to the sequence mapped without the percentage for repairing sequencing library prepared by 1 step stipulations Label/chromosome is each chromosome with the ratio obtained from label/chromosome according to the sequencing library for shortening stipulations (ABB) preparation Percentage G/C content function ().Sum it up, the data in Figure 25 B and 25C are shown, both 1 step and 2 footworks show class As GC homogenization effect because both omit library technique DNA repair step.
In order to determine whether no restorative procedure influences the ratio that fetus contrasts parent cfDNA be sequenced, it is determined that being mapped to The number percentage of chromosome x and Y label.Figure 26 A and 26B show the mark for being mapped to chromosome x (Figure 26 A) and Y (Figure 26 B) The mean of percentage and the comparison of standard deviation are signed, these data are obtained from the blood plasma to 5 pregnant woman by ABB, 2 steps and 1 footwork 5 cfDNA samples of purifying are sequenced.Figure 26 A are shown relative to the number (packing) obtained using shortening method, when The number of tags that X chromosome is mapped to during using without restorative procedure (2 steps and 1 step) is bigger.Figure 26 B are shown when use is without reparation 2 The label percentage of Y chromosome and difference when using shortening method are mapped to when step and 1 footwork.
These data show that no reparation footwork of the surface of solids 1 will not introduce to be directed to or resist to be entered to fetus contrast mother body D NA Any skew of row sequencing, i.e., when use is without surface of solids method is repaired, the constant rate for the foetal sequence being sequenced.
Sum it up, data are shown produces sequencing library for being for sample formulation is sequenced on a solid surface One easy and feasible selection.
Example 5
Without the high conveying capacity compatibility for repairing the step library of the surface of solids 1 preparation method
In order to determine to prepare whether method can be applied to high conveying capacity without 1 step library of reparation by what NGS technologies were sequenced Sample treatment, 96 are prepared by 96 peripheral blood samples in 96 hole PCR plates of the aptamer the indexed coating combined by SA Kind cfDNA libraries.Prepared library is sequenced as described in example 5.
Carry out being coated with first PCR plate with SA as described in example 4, and be connected through indexing for biotinylation Aptamer.Each row hole coating of 96 orifice plates is included into unique aptamer indexing, by biotinylation.Use second 96 Hole PCR plate, in the case where each has 10 μ l Ke Lienuo main mixed liquors, to 37 differences in 30 μ l at 37 DEG C CfDNA carries out dA tailings 15 minutes, and Klenow enzyme is then carried out at 75 DEG C inactivates 5 minutes.Several are used in multiple holes CfDNA, amount to 94 holes and contain cfDNA;2 holes are used as no template control.CfDNA mixtures by dA tailings are transferred to the PCT-225 tetrad gradients are used at 25 DEG C in one PCR plate and in the case where 10 μ l quick ligases main mixed liquors 1 be present Recirculation heater (Bole (BioRad), Heracles, California) be connected to it is bound, by biotinylation Aptamer.Addition is directed to 10 μ l connections main mixed liquors 2 of the aptamer customization respectively indexed and 15 points of connection at 5 DEG C Clock.Uncombined DNA is removed, and is washed bound DNA- by the aptamer complex compound of biotinylation with TE buffer solutions Wash five times.50 μ l PCR main mixed liquors are added into each hole, and the DNA of aptamer connection is expanded and such as example 4 Described in carry out SPRI cleanings.Library is diluted and analyzed using HiSens BA chips.
For 61 clinical samples (Figure 27 A) prepared using ABB methods and using 35 prepared without the reparation footworks of SS 1 Study sample (Figure 27 B), obtain the amount of purified cfDNA for preparing sequencing library and library product obtained quantity it Between correlation.These data are shown, when the correlation that the library with being prepared using the shortening method described in example 2 is obtained (R2=0.1534;Figure 27 B) compared to when, for using without repair the footworks of SS 1 prepare library for, correlation is significantly bigger (R2=0.5826;Figure 27 A).Pay attention to:CfDNA samples in this comparison simultaneously differ, because clinical sample is for researching and developing not It can use.However, these results indicate that the footworks of SS 1 are consistent compared with ABB methods to have bigger cfDNA inputs and library without repairing The correlation of output.Then, for all three methods, 3 kinds are compared using the identical purified cfDNA of serial dilution amount Method, i.e. ABB, the correlation without 2 steps of reparation and without the reparation footworks of SS 1.As shown in Figure 28, when according to the footwork systems of SS 1 Best correlation (R is obtained during standby library2=0.9457;Δ), it is then 2 footwork (R2=0.7666;) and with significantly lower ABB methods (the R of correlation2=0.0386;◇).These data are shown, with end modified [DNA is repaired and phosphorylation] cfDNA's Method is compared, no restorative procedure, either in the solution still on a solid surface, all provides consistent and predictable production Rate, whether include or do not include the purifying of DNA and dA tailing products repaired.
Library the time spent in ratio is prepared when according to the preparation sequencing of the method for shortening according to the surface of solids method described in the example A small number of times of the time spent in during library.For example, 10 to 14 samples can be manually prepared using ABB methods in about 4 hours, and working as makes During with 1 footworks of SS, 96 or 192 libraries accordingly can be manually prepared in 4 and 5 hours.Further, the steps of SS 1 can easily be made Method automates, to prepare library in multiple 96 multiple sequencing using NGS technologies.Therefore, it is automatic will to be suitable for business for SS methods Change high conveying capacity sample analysis.
Prepared by the solid phase sequencing library for the cfDNA that the analysis display to DNA library is not repaired provides high yield and Gao Pin Matter sequencing library, these sequencing libraries can pass through configuration and be used for automation process and need to use NGS skills to further speed up Art carries out the sample analysis of large-scale parallel sequencing.Surface of solids method is applied to the DNA repaired.
Example 6
Multiple sequencing is carried out to the library prepared according to 1 step SS methods
With multiple form, sample pair that each Yi Lu meter Na HySeq sequencer flow cell swimming lanes are indexed with six kinds of differences The library sample (example 20) prepared by the footworks of SS 1 on 96 orifice plates is sequenced.To prepared as described in example 2 Library is sequenced.Data shown in Figure 29 compare index efficiency, such as (hollow by 2 steps (packing) and the steps of SS 1 Bar) between multiple sequencing assessed.These data are shown, are prepared library on a solid surface and are not damaged index efficiency.Figure 30 A The sum for the sequence label that each human chromosomal is mapped to when preparing sequencing library according to 1 step surface of solids method is shown with 30B Percentage (% chromosomes N;Figure 30 A);And Figure 30 B (R2=0.9807) display sequence label percentage is chromosome size Function.Figure 30 A and 30B show that the GC skews of the footworks of SS 1 are identical with 2 footworks, because two kinds of techniques all use no DNA to repair sample Product prepare zymetology.
Figure 31 shows that the sequence label for being mapped to Y chromosome relative to the percentage for the label for being mapped to X chromosome, is obtained from To being prepared with the aptamer indexed using the footworks of SS 1 and being sequenced with multiple form using Yi Lu meter Na by using reversible end Only sub- technology synthesis is sequenced come 42 libraries being sequenced.Data have substantially been distinguished obtained from the pregnant woman for nourishing male fetus With the sample obtained from the pregnant woman for nourishing female child.
Example 7
Sample treatment and DNA extractions
From in gravidic first trimenon or second trimenon and being considered as fetus aneuploidy wind being present Peripheral blood sample is collected in pregnant woman's body of danger.Before blood drawing letter of consent is obtained at each participant.In amniocentesis or chorion suede Blood is collected before hair sampling.Karyotyping is carried out to determine fetal karyotype using chorionic villi or amniocentesis sample.
The peripheral blood extracted from each subject is collected in ACD pipes.One pipe blood sample (about 6 to 9 milliliters/pipe) is transferred to In one 15 milliliters of low speed centrifuge pipe.Using Beckman Allegra 6R centrifuges and the type rotors of GA 3.8 in 2640rpm, 4 By centrifugal blood 10 minutes at DEG C.
Extract, top plasma layer is transferred in 15 milliliters of high speed centrifugation pipes, and use Bake for cell-free plasma Graceful Kurt Avanti J-E centrifuges and JA-14 rotors, centrifuged 10 minutes at 16000x g, 4 DEG C.After blood collection, Two centrifugation steps were carried out in 72 hours.Cell-free plasma is stored at -80 DEG C, and the only defrosting one before DNA extractions It is secondary.
By using QIAamp DNA blood Mini Kits (Kai Jie), according to manufacturer specification from cell-free plasma Extract Cell-free DNA.Five milliliters of buffer A L and the triumphant outstanding protease of 500 μ l are added to 4.5ml to 5ml cell-free plasma In.10ml is arrived into volume regulation with phosphate buffered saline (PBS), and is incubated mixture 12 minutes at 56 DEG C. Using multiple posts by the way that centrifugation separates Shen Dian cfDNA from solution under 8,000RPM in Beckman trace centrifuge.With AW1 and AW2 buffer solutions wash to post, and elute cfDNA with 55 μ l nuclease-free waters.Extracted about from plasma sample 3.5 arrive 7ng cfDNA.
All sequencing libraries are all prepared by the purified cfDNA of the about 2ng extracted from Maternal plasma.Use reagent NEBNextTMDNA sample prepares (the Item Number E6000L of DNA reagents collection 1;Knob Great Britain biology laboratory, Ipswich, horse Sa Zhusai states) it is carried out as follows library preparation.Because cell-free plasma DNA no longer passes through spray-on process substantially into fragment Or sonication makes the plasma dna sample into fragment.By the jag of the cfDNA fragments of the about 2ng included in 40 μ l purifying According toEnd Repair Module and change into the blunt end of phosphorylation, this is by 1.5ml microcentrifugal tubes It is middle that cfDNA is used in NEBNextTMThe 5 μ l 10X provided in DNA Sample Prep DNA Reagent Set 1 phosphoric acid Change effect buffer, 2 μ l deoxynucleotides solution mixtures (every part of dNTP has 10mM), the 1 of 1 μ l:5 DNA polymerase i Dilution, 1 μ l T4DNA polymerases and 1 μ l T4 polynucleotide kinases are incubated 15 minutes to carry out at 20 DEG C.Then lead to Cross and the reactant mixture is incubated 5 minutes at 75 DEG C and inactivates these enzymes heat.The mixture is cooled to 4 DEG C, and made Contain Klenow fragment (3 ' to 5 ' exo-) (NEBNext with 10 μ lTM DNA Sample Prep DNA Reagent Set 1) dA tailings main mixed liquor completes the DNA of blunt end dA tailings, and is incubated 15 minutes at 37 DEG C.Then, by should Reactant mixture is incubated 5 minutes at 75 DEG C and inactivates these Klenow fragments heat.After Klenow fragment is inactivated, use In NEBNextTMThe 4 μ l provided in DNA Sample Prep DNA Reagent Set 1 T4DNA ligases, by should Mixture is incubated 15 minutes at 25 DEG C, with the 1 of 1 μ l Illumina Genomic Adaptor Oligo Mix:5 dilution Liquid (Item Number:1000521;Illumina Inc., Hayward, CA) by these Illumina aptamers (Non-Index Y-Adaptors) it is connected on the DNA with dA tails.The mixture is cooled to 4 DEG C, and uses Agencourt AMPure XP PCR purification system (Item Numbers:A63881;Beckman Coulter Genomics, Danvers, MA) in provide magnetic The cfDNA that aptamer connects is purified by pearl from not connected aptamer, aptamer dimer and other reagents.Enter 18 PCR of row circulation is used with being optionally enriched with the cfDNA of aptamer connectionHigh- Fidelity Master Mix (Finnzymes, Woburn, MA) and the Illumina complementary with aptamer PCR primer (Part No.1000537and 1000537).Using Illumina Genomic PCRs primer (Item Number 100537 and 1000538) and in NEBNextTMThe Phusion HF PCR provided in DNA Sample Prep DNA Reagent Set1 Master Mix (according to the explanation of manufacturer), the DNA that aptamer connects is set to be subjected to PCR (30 seconds at 98 DEG C;Followed for 18 times at 98 DEG C Ring continues at 10 seconds, 65 DEG C 30 seconds, and 30 seconds at 72 DEG C;Finally extend at 72 DEG C 5 minutes, and be maintained at 4 DEG C). Using Agencourt AMPure XP PCR purification systems (Agencourt Bioscience Corporation, Beverly, MA) according to the explanation of manufacturer (in www.beckmangenomics.com/products/AMPureXPProtocol_ Can be obtained at 000387v001.pdf) product of amplification is purified.By after purification amplification product 40 μ l Qiagen EB Eluted in buffer solution, and use 2100Bioanalyzer (Agilent technologies Inc., Santa Clara, CA library analytical concentrations and Size Distribution of the Agilent DNA 1000Kit) to amplification.
DNA after amplification is sequenced using Illumina genome analysis instrument II, to obtain 36bp single-ended reading. In order to identify that a sequence belongs to a specific human chromosome, it is thus only necessary to about 30bp random sequence information.Longer Sequence can uniquely identify more specifically target.In current situations, numerous 36bp readings are obtained, cover gene About the 10% of group.Once the sequencing of sample is completed, Illumina " sequence control software (Sequencer Control Software image and base) " are judged that file is transferred to an operation Illumina " genome analysis instrument streamline In the Unix servers of (Genome Analyzer Pipeline) " software versions 1.51.Run Illumina " Gerald " journey Sequence, by sequence with being compared with reference to human genome, this is derived from NCBI with reference to human genome (National Center for Biotechnology Information) provide hg18 genomes (NCBI36/hg18, In world website http://genome.ucsc.edu/cgi-bin/hgGatewayOrg=Human&db=hg18&hgsid It can be obtained at=166260105).It is being compared with the genome uniqueness, from sequence data caused by procedure above pass through one fortune A program (c2c.pl) is run on the computer of row Linnux operating systems from Gerald output results (export.txt texts Part) read.The sequence alignment that allows there is base mispairing and just wrapped only when they only uniquely align with the genome Include in counting is compared.The sequence alignment (replisome) that coordinate is originated and terminated with identical forecloses.
36bp labels between about 5,000,000 to 15,000,000 with 2 or less mispairing are uniquely mapped to mankind's base Because of group.The label of all mappings is counted and includes the calculating of the chromosome dosage in both test and qualified samples Within.From chromosome Y base 0 to base 2x 106, base 10x 106To base 13x 106And base 23x 106To end The region of tail is definitely excluded outside analysis, because the label obtained from masculinity and femininity fetus is mapped to Y chromosome These regions.
It is noted that some changes on the total number of sequence label are mapped to throughout the sample being sequenced in same round The individual chromosome (interchromosomal variability) of product, but it is noted that in the sequencing (change between sequence processing of different rounds The opposite sex) in there occurs substantive bigger change.
Example 8
For chromosome 13,18,21, X and Y dosage and change
In order to check interchromosomal variability and sequence in the number of the sequence label mapped for all chromosomes The degree of variability between row measure, it is extracted the blood plasma cfDNA obtained from the peripheral blood of the subject of 48 volunteers pregnancy simultaneously And be sequenced as illustrated in example 7, and analyzed as follows.
The total number (sequence label density) for the sequence label for being mapped to each chromosome is determined.Alternately, can be with The number of the sequence label of mapping is normalized to the length of the chromosome, to produce a sequence label density ratio.Normalization The step of being not required to the length of chromosome, but can individually carry out reduce the digital digit in a number so as to Simplified for human interpretation.It can be in the world that can be used for these sequence labels counting normalized chromosome length The length provided at the genome.ucsc.edu/goldenPath/stats.html#hg18 of website.
Make the sequence label density for the sequence label density that each chromosome obtains and each remaining chromosome It is associated, to obtain a qualified chromosome dosage, the dosage is calculated as (such as dyeing for chromosome interested Body 21) sequence label density with for remaining chromosome (i.e. chromosome 1-20,22 and X) sequence label density ratio Rate.Table 9 provides a reality for chromosome 13 interested, 18,21, X and the Y qualified chromosome dosage calculated Example, the dosage determine in a wherein qualified samples.Chromosome is determined for all chromosomes in all samples Dosage, and for the chromosome 13 interested in qualified samples, 18,21, X and Y mean dose in table 10 and table 11 There is provided, and be illustrated in Figure 32-36.Figure 32 to 36 is also illustrated in the chromosome dosage qualified samples of test sample The chromosome dosage of each chromosome interested is provided for each chromosome interested (relative to each residue Chromosome) mapping sequence label total number on one kind for changing measure.Therefore, qualified chromosome dosage can identify Following chromosome or a group chromosome, i.e. best approached in the variability of sample room and the variability of chromosome interested Normalization chromosome, and the ideal that the normalization chromosome will be normalized as the value to further statistical estimation Sequence.Figure 37 and 38 is depicted for chromosome 13,18 and 21, and chromosome x and Y are in a qualified sample group Average chromosome dosage determining, calculating.
In some cases, this best normalization chromosome is perhaps without minimum variability, but may have There is a kind of distribution of qualified dosage, this distribution best mutually distinguishes one or more test samples and these qualified samples, I.e.:Perhaps best normalization chromosome and does not have a minimum variability, but may have maximum resolvability.Cause This, resolvability takes the change of chromosome dosage and the distribution of the dosage in qualified samples into account.
Table 10 and 11 provides the coefficient of variation and measured as variability, and provide t test values as chromosome 18, 21st, X and Y resolvability is measured, and wherein t test values are smaller, resolvability is bigger.The resolvability of chromosome 13 is as conjunction The difference of dosage and being averaged for qualified dosage of average chromosome dosage and the chromosome 13 only in T13 test samples in lattice sample The ratio of value standard deviation is determined.
When identifying aneuploidy in the test sample as explained below, qualified chromosome dosage is also as measure threshold The basis of value.
Table 9. is for chromosome 13,18,21, X and Y qualified chromosome dosage (n=1;Sample number into spectrum 11342, 46XY)
Table 10. is directed to qualified chromosome dosage, change and the resolvability of chromosome 21,18 and 13
Table 11. is directed to qualified chromosome dosage, change and the resolvability of chromosome 13, X and Y
Obtained using normalization chromosome, chromosome dosage and the resolvability for chromosome interested T21, T13, T18 and the diagnosis example of a Turner syndrome case are illustrated in example 9.
Example 9
Use normalization chromosome diagnosis fetus aneuploidy
In order that aneuploidy of the purposes of chromosome dosage suitable for assessment biological test sample, from the aspiration of pregnancy Person obtains maternal blood test sample and is prepared for cfDNA, and is sequenced and is analyzed as illustrated by example 1 and 2.
Trisomy 21
Table 12 provides the dosage calculated in an exemplary test sample (#11403) for chromosome 21.It is right The average value away from these qualified (normal) samples is set in the threshold value that T21 positive diagnosis calculates>2 standard deviation Place.T21 diagnosis is that the threshold value based on the dyeing Volume Dose Relation setting in test sample provides greatly.Chromosome 14 is used With 15 using single result of calculation as normalization chromosome, to show to have minimum variability (such as chromosome 14) or tool The chromosome for having the resolvability (such as chromosome 15) of maximum may serve to identify aneuploidy.Use the dyeing calculated Body dosage have identified 13 T21 samples, and confirm that these aneuploidy samples are T21 by caryogram.
Table 12. is directed to the chromosome dosage (sample #11403,47XY+21) of T21 aneuploidy
Trisomy 18
Table 13 provides the dosage calculated in a test sample (#11390) for chromosome 18.For T18's The threshold value that positive diagnosis calculates is set as leaving the average value of qualified (normal) sample>2 standard deviation.T18's examines Disconnected is that the threshold value based on the dyeing Volume Dose Relation setting in test sample provides greatly.Contaminated using chromosome 8 as normalization Colour solid.In this example, chromosome 8 has minimum variability and maximum resolvability.Identified using chromosome dosage 18 T18 samples are gone out, and it is T18 to be turned out to be by caryogram.
These as shown by data, a normalization chromosome can have minimum variability and maximum resolvability.
Table 13. is directed to the chromosome dosage (sample #11390,47XY+18) of T18 aneuploidy
Trisomy 13
Table 14 provides the dosage calculated in a test sample (#51236) for chromosome 13.For T13's The threshold value that positive diagnosis calculates is set as leaving the average value of qualified sample>2 standard deviation.T13 diagnosis is to be based on The threshold value of dyeing Volume Dose Relation setting in test sample is big and provides.Use the genome of chromosome 5 or 3,4,5 and 6 As normalization chromosome chromosome dosage is calculated for chromosome 13.It has identified a T13 sample.
Table 14. is directed to the chromosome dosage (sample #51236,47XY+13) of T13 aneuploidy
The sequence label density of chromosome 3 to 6 is the average label counting of chromosome 3 to 6.
The as shown by data, the combination of chromosome 3,4,5 and 6 provides a variability less than chromosome 5, and is more than Any one maximum resolvability in other chromosomes.
Therefore, it is possible to use a group chromosome determines chromosome dosage and identifies non-multiple as normalization chromosome Property.
Turner syndrome (monosomy X)
Table 15 provides the dosage calculated in a test sample (#51238) for chromosome x and Y.For Tener It is apart from qualified (normal that the threshold value that the positive diagnosis of syndrome (monosomy X) calculates, which is set to for X chromosome, ) average value of sample<At -2 standard deviations, and pin is in being apart from qualified (normal) sample in the absence of Y chromosome Product average value<At -2 standard deviation from averages.
Table 15. is directed to the chromosome dosage of Tener (XO) aneuploidy (sample #51238,45 X)
The sample that the X chromosome dosage having is less than given threshold is identified as having less than an X chromosome.It is same Sample is confirmed as a Y chromosome dosage with less than given threshold, and this shows that the sample does not have Y chromosome.Therefore, Turner syndrome (monosomy X) sample is identified using the combination of X and Y dosage.
Therefore, the method provided permits a determination that the CNV of chromosome.Specifically, this method passes through to parent blood Slurry cfDNA carries out large-scale parallel sequencing and normalization chromosome is identified for carrying out statistical analysis to sequencing data Permit a determination that the chromosomal aneuploidy for excessively representing and representing deficiency.The sensitivity of this method and reliability allow accurate Determine the aneuploidy of first and second trimenons.
Example 10
The determination of part aneuploidy
The part that the purposes of sequence dosage is applied to assess the cfDNA Biological test samples by being prepared from blood plasma is non-whole Ploidy, and be sequenced as illustrated in example 7.Confirm that the sample is lacked from the part of chromosome 11 by karyotyping What the subject lost obtained.
For part aneuploidy (excalation of chromosome 11, i.e. q21-q23) sequencing data analysis such as Chromosome aneuploidy in example before is illustrated and carries out.Sequence label is to chromosome 11 in a test sample Mapping show relative to the chromosome 11 in qualified samples corresponding sequence acquisition label counting for dyeing One significantly loss (data are not shown) of the label counting between the long-armed middle base-pair 81000082-103000103 of body.Make With the sequence label (810000082- for the sequence interested that chromosome 11 is mapped in each qualified samples 103000103bp) and in the whole gene group of qualified samples it is mapped to the sequence label of all 20 megabasse fragments (i.e. Qualified sequence label density) determine ratio of the qualified sequence dosage as the label densities in all qualified samples. Mean sequence dosage, standard deviation and the coefficient of variation for all 20 megabasse fragment computations in whole gene group, And the 20- megabase sequences with minimum variability are identified as the normalization sequence (13000014- on chromosome 5 33000033bp) (referring to table 16), the normalization sequence is used to calculate the dosage for being directed to sequence interested in test sample (referring to table 17).Table 16 provides the sequence (810000082- interested in the test sample on chromosome 11 Sequence dosage 103000103bp), the sequence dosage are calculated as being mapped to the sequence label of sequence interested with being mapped to The ratio of the sequence label of the normalization sequence identified.Figure 40 is shown in 7 qualified samples (O) for sequence interested Sequence dosage and test sample (◇) in for corresponding sequence sequence dosage.Average value is shown by solid line, and by The threshold value that positive diagnosis shown in phantom for part aneuploidy calculates, it is set at 5 standard deviations of anomaly average Place.The diagnosis of part aneuploidy is that the threshold value based on the sequence dose ratio setting in test sample is small and provide.Pass through core Type analysis confirm that the test sample has missing q21-q23 on chromosome 11.
Therefore, in addition to identifying chromosome aneuploidy, it is non-that method of the invention can be utilized to identification division Ortholoidy.
Table 16. is directed to sequence C hr11:81000082-103000103 qualified normalization sequence, dosage and change (qualified samples n=7)
Sequence dosage (test of the table 17. for sequence (81000082-103000103) interested on chromosome 11 Sample 11206)
Example 11
The displaying of aneuploidy detection
Enter to advance for the sequence data that the sample in example 2 and 3 illustrating and showing in Figure 32 to 36 is obtained The analysis of one step, to show sensitivity of this method in terms of the aneuploidy in successfully identifying maternal sample.For chromosome 21st, 18,13, X and Y normalized chromosome dosage is analyzed as relative to a distribution (Y-axis) of standard deviation from average, And shown in Figure 41 A-41E.Used normalization chromosome is shown (X-axis) as denominator.
Figure 41 (A) is shown when using chromosome 14 as normalization chromosome for chromosome 21, for not by shadow Loud sample (o) and the sample (T21 of trisomy 21;Δ) in the dosage of chromosome 21 for chromosome dosage relative to standard from equal One distribution of difference.Figure 41 (B) is shown when using chromosome 8 as normalization chromosome for chromosome 18, for not Impacted sample (o) and the sample (T18 of trisomy 18;Δ) in the dosage of chromosome 18 for chromosome dosage relative to standard One distribution of deviation from average.Figure 41 (C) is shown for unaffected sample (o) and the sample (T13 of trisomy 18;Δ) in Chromosome dosage uses one of 3,4,5 and 6 to contaminate relative to a distribution of standard deviation from average for the dosage of chromosome 13 The mean sequence label densities of colour solid group are as normalization chromosome to determine the chromosome dosage of chromosome 13.Figure 41 (D) shows Gone out when for chromosome x use chromosome 4 as normalization chromosome when, for unaffected women sample (o), not by The male sample (Δ) and monosomy X samples (XO of influence;+) in chromosome x dosage for chromosome dosage relative to mark One distribution of quasi- deviation from average.Figure 41 (E) is shown when a genomic mean sequence label for using 1 to 22 and X When density is as chromosome is normalized to determine chromosome Y chromosome dosage, for unaffected male sample (o), not Chromosome Y dosage in impacted women sample (Δ) and monosomy X samples (+) is relative to one of standard deviation from average Distribution.
The as shown by data, trisomy 21, trisomy 18, trisomy 13 is with unaffected (normal) sample can be clear Distinguish.When with chromosome x dosage be significantly lower than unaffected women sample (Figure 41 (D)) dosage when, and When the chromosome Y dosage having is significantly lower than the dosage of unaffected male sample (Figure 41 (E)), monosomy X samples can hold It is easy to identify go out.
Therefore, the method provided is sensitive and for determining in a maternal blood sample presence or absence of dye Colour solid aneuploidy is specific.
Example 12
Fetal chromosomal is non-to be determined using extensive parallel DNA sequencing to the acellular foetal DNA from maternal blood Ortholoidy:Independently of the test group 1 of training group 1
This research is by qualified fixed point clinical research personnel in 13 U.S. clinical areas in April, 2009 and 2010 10 Human experimenter's scientific experimentation plan being approved according to the Institutional Review Board (IRB) by each mechanism is entered between month OK.Before participating in studying written consent book is obtained from every subject.The scientific experimentation plan is designed to provide blood Sample and clinical data support the development of non-invasive diagnosis of prenatal genetics method.18 years old or age bigger gravid woman It is eligible to participate in.For undergo clinical indication chorionic villi sampling (CVS) or amnion pierce through patient carry out the program it Preceding collection blood, and equally collect the result of fetal karyotype.Peripheral blood sample (two pipes or altogether about are extracted from all subjects 20mL) it is placed in acid citrate dextrose (ACD) pipe (Becton Dickinson).All samples are all removed into identity simultaneously And specify an anonymous patient ID number.Blood sample is transported overnight in the temperature control type conveying containers provided for research institute To laboratory.The time that blood drawing and sample are spent between receiving is recorded as the part that sample is ascended the throne.
Case study coordination personnel will be related to the Pregnancy and history that patient is current using anonymous patient ID number Clinical data typing research CRF (CRF) in.The sample from non-invasive antenatal program is entered in each laboratory The CYTOGENETIC ANALYSIS OF ONE of row fetal karyotype and by result equally be recorded in research CRF in.All data obtained on CRF All in the clinical database in typing laboratory.Using two step centrifugal process from single after the venipunctures sampling of 24 to 48 hours Blood tube obtains acellular blood plasma.Blood plasma from single blood tube carries out sequencing analysis enough.By using QIAamp DNABlood Mini kit (Qiagen) extract from cell-free plasma Cell-free DNA according to the explanation of manufacturer.By In these known acellular DNA fragmentations in length be about 170 base-pair (bp) (Fan et al., Clin Chem 56: 1279-1286 [2010]), do not required DNA fragmentations before sequencing.
For the sample of this training group, cfDNA is delivered into Prognosys Biosciences, Inc. (La Jolla, CA) it is used for sequencing library and prepares (blunting and be connected on common aptamer cfDNA) and using standard manufacture business Learn test plan Illumina Genome Analyzer IIx instruments (http://www.illumina.com/) surveyed Sequence.Obtain the single-ended reading of 36 base-pairs.After completing to be sequenced, collect all bases and judge file and divided Analysis.For test group sample, prepare sequencing library and surveyed on Illumina Genome Analyzer IIx instruments Sequence.The preparation of sequencing library is carried out as follows.Illustrated total length scientific experimentation plan is mainly the standard gauge that Illumina is provided About, it is and different from Illumina scientific experimentation plans only in the purifying in the library of amplification.Illumina scientific experimentation plans Instruction:The library of amplification is purified using gel electrophoresis, and scientific experimentation plan described herein carries out phase using magnetic bead Same purification step.A primary sequencing library is prepared using the cfDNA for the about 2ng purifying extracted from Maternal plasma, this master UseNEBNextTMDNA Sample Prep DNA Reagent Set 1(Part No.E6000L; New England Biolabs, Ipswich, MA) carried out according to the explanation of manufacturer.Except using Agencourt magnetic beads and Reagent replaces the product that purification column connects to aptamer to carry out outside finally purifying, and all steps are all according to scientific experimentation plan With the NEBNext of the sample preparation for genome dna libraryTMReagent (has usedGAII is sequenced) enter OK.NEBNextTMNEBNextTMMainly carried out according to what Illumina was provided, this is in grcf.jhml.edu/hts/ It can be obtained at protocols/11257047_ChIP_Sample_Prep.pdf.
The jag of the cfDNA fragments of the about 2ng included in 40 μ l purifying is passed through in 1.5ml microcentrifugal tubes CfDNA is used in NEBNextTMThe 5 μ l 10X provided in DNA Sample Prep DNA Reagent Set 1 phosphorylation The buffer of effect, 2 μ l deoxynucleotides solution mixtures (every part of dNTP has 10mM), the 1 of 1 μ l:5 DNA polymerase i it is dilute Liquid, 1 μ l T4DNA polymerases and 1 μ l T4 polynucleotide kinases is released to be incubated 15 minutes at 20 DEG C, according to End Repair Module and change into the blunt end of phosphorylation.The sample is cooled to 4 DEG C, and using one The quick posts of QIA provided in QIAQuick PCR Purification Kit (QIAGEN Inc., Valencia, CA) carry out pure Change.50 μ l reaction solutions are transferred in 1.5ml centrifuge tubes, and add 250 μ l Qiagen Buffer PB.By what is obtained In 300 μ l to the quick posts of QIA, it is centrifuged 1 minute under 13,000RPM in a microcentrifuge.The post is used 750 μ l Qiagen Buffer PE washings, and centrifuge again.Remaining ethanol by centrifuging 5 minutes again under 13,000RPM To remove.DNA is eluted in 39 μ l Qiagen Buffer EB by centrifuging.Contain Klenow fragment using 16 μ l (3 ' to 5 ' exo-) (NEBNextTMDNA Sample Prep DNA Reagent Set 1) dA tailings main mixed liquor complete The DNA of 34 μ l blunt ends dA tailings, and according to manufacturerDA- tailings module (dA- Tailing Module) it is incubated 30 minutes at 37 DEG C.The sample is cooled to 4 DEG C, and using one in MinElute The post provided in PCR Purification Kit (QIAGEN Inc., Valencia, CA) is purified.By 50 μ l reaction solutions It is transferred in 1.5ml microcentrifugal tubes, and adds 250 μ l Qiagen buffer solutions PB (Qiagen Buffer PB).By 300 μ l are transferred in a MinElute post, and it is centrifuged 1 minute under 13,000RPM in a microcentrifuge.By the post Washed with 750 μ l Qiagen buffer solutions (PE Qiagen Buffer PE), and centrifuged again.Remaining ethanol by 13, 5 minutes are centrifuged under 000RPM again to remove.DNA is eluted in 15 μ l Qiagen Buffer EB by centrifuging.According toRapid connecting module (Quick Ligation Module), by ten microlitres of DNA eluents With the 1 of 1 μ l:5 Illumina Genomic Adapter Oligo Mix (Item Number 1000521) dilution, 15 μ l 2X Quick Ligation Reaction Buffer and the quick T4DNA ligases of 4 μ l are incubated 15 minutes at 25 DEG C.Will Sample is cooled to 4 DEG C, and uses a following MinElute post.150 microlitres of Qiagen Buffer PE are added In 30 μ l reaction solutions, and whole volumes are transferred in a MinElute post, by its in a microcentrifuge 13, Centrifuged 1 minute under 000RPM.The post is washed with 750 μ l Qiagen Buffer PE, and centrifuged again.Remaining ethanol leads to Cross and centrifuge 5 minutes again under 13,000RPM to remove.DNA is washed in 28 μ l Qiagen Buffer EB by centrifuging It is de-.Using Illumina Genomic PCRs primer (Item Number 100537 and 1000538) and in NEBNextTM DNA The Phusion HF PCR Master Mix provided in Sample Prep DNA Reagent Set 1 are (according to saying for manufacturer It is bright), the DNA eluents for connecting 23 microlitres of aptamer are subjected to 18 PCR cycles (30 seconds at 98 DEG C;18 times at 98 DEG C Circulation continuous is 30 seconds at 10 seconds, 65 DEG C, and 30 seconds at 72 DEG C;Finally extend at 72 DEG C 5 minutes, and be maintained at 4 DEG C Under).Using Agencourt AMPure XP PCR purification systems (Agencourt Bioscience Corporation, Beverly, MA) according to the explanation of manufacturer (in www.beckmangenomics.com/products/ Can be obtained at AMPureXPProtocol_000387v001.pdf) product of amplification is purified.Agencourt AMPure XP PCR purification systems eliminate unassembled dNTP, primer, primer dimer, salt and other pollutants, and reclaim It is more than 100bp amplicon.By the product of amplification after purification 40 μ l Qiagen EB buffer solutions from Agencourt pearls Eluted in grain, and use 2100Bioanalyzer's (Agilent technologies Inc., Santa Clara, CA) Agilent DNA 1000Kit analyze Size Distribution to library.For training and test sample collection, to the list of 36 base-pairs Side reading is sequenced.
Data analysis and sample classification
The sequence reads that length is 36 bases are carried out with the human genome component hg18 obtained from UCSC databases Compare (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/).Using in comparison process It is middle allow most two base mispairings the short tract comparative devices (version 0.12.5) of Bowtie (Langmead et al., Genome Biol 10:R25 [2009]) it is compared.The reading being only clearly mapped on a term single gene group position Just it is included.The genomic locus mapped reading count and be included in the calculating of chromosome dosage (referring to herein below).Sequence label from masculinity and femininity fetus maps on the Y chromosome of part with distinguishing without any Region is excluded beyond analysis (exactly, from base 0 to base 2x 106, base 10x 106To base 13x 106;With And base 23x106To the end of Y chromosome.)
It can make fetus aneuploidy pair with the sequencing change between round and round in the chromosome distribution of sequence reads The distribution unobvious of the sequence site mapped.In order to correct this change, a chromosome dosage has been calculated, because for The counting in the mapping site of the chromosome interested provided is normalized to for presetting normalization chromosome sequence institute It was observed that counting.As previously described, a normalized chromosome sequence can be by a monosome or by one group of dye Colour solid forms.In a sample subset in the training group of unaffected (i.e. qualified) sample, normalized chromosome Sequence is identified as having chromosome 21,18,13 interested and X diploid karyotype first, considers each autosome Potential denominator is used as in the ratio of the counting of the chromosome interested with us.Denominator chromosome (i.e. normalized dye Colour solid sequence) change of chromosome dosage that is chosen to be sequenced between batch is minimum.Each chromosome quilt interested It is defined as that there is a significant normalization chromosome sequence (denominator) (table 10).There is no individual chromosome to be identified as pin Chromosome sequence is normalized to one of chromosome 13, because neither one chromosome is confirmed as reducing chromosome in sample The change of 13 dosage, i.e. the extension of the NCV values of chromosome 13, which is not reduced to, to be enough to allow to carry out T13 aneuploidy Correct identification.Chromosome 2 to 6 is selected at random and the energy of the behavior of their imitation chromosomes 13 is tested as a group Power.The group of chromosome 2 to 6 is found to be substantially reduced in training group sample for the change on the dosage of chromosome 13, And therefore it is selected as the normalization chromosome sequence of chromosome 13.As described above, the chromosome dosage for chromosome Y Change be greater than 30, with it independently, monosome it is determined that chromosome Y dosage when be used as normalize chromosome sequence Row.The group of chromosome 2 to 6 is found to be substantially reduced for the change on chromosome Y dosage in training group sample, and And therefore it is selected as chromosome Y normalization chromosome sequence.
Provided in qualified samples for the chromosome dosage of each chromosome interested for each interested The total number of the sequence label mapped for chromosome relative to the sequence label of the mapping of each remaining chromosome total number One of change measure.Therefore, qualified chromosome dosage can identify the chromosome or a group chromosome, i.e., in the sample With best close to chromosome interested variability a variability and will be used as dare to further statistics comment The normalization chromosome sequence of the ideal sequence for the normalized value estimated.
The chromosome dosage of all samples is also as such as following in identification in training group (i.e. qualified and impacted) Basis during aneuploidy in the test sample of explanation for threshold value.
Table 18. is used for the normalization chromosome sequence for determining chromosome dosage
For each chromosome interested in each sample of test group, it is determined that a normalized value and by For determining presence or absence of aneuploidy.The normalized value conduct can be further calculated normalized to provide one The chromosome dosage of chromosome value (NCV) and calculated.
Chromosome dosage
For test group, each chromosome 21 interested for each sample, 18,13, X and Y calculate a dye Colour solid dosage.As provided in upper table 18, the chromosome dosage of chromosome 21 is as the dyeing being mapped in test sample Number of tags in the test sample of body 21 and the number of tags in the test sample for the chromosome 9 being mapped in test sample Ratio calculates;The chromosome dosage of chromosome 18 is as the mark in the test sample for the chromosome 18 being mapped in test sample The ratio of label number and the number of tags in the test sample for the chromosome 8 being mapped in test sample calculates;Chromosome 13 Chromosome dosage as the number of tags in the test sample for the chromosome 13 being mapped in test sample and being mapped to test The ratio of number of tags in the test sample of chromosome 2 to 6 in sample calculates;The chromosome dosage conduct of chromosome x Number of tags in the test sample for the chromosome x being mapped in test sample and the chromosome 6 that is mapped in test sample The ratio of number of tags in test sample calculates;Chromosome Y chromosome dosage is as the dye being mapped in test sample Number of tags in colour solid Y test sample and the number of tags in the test sample for the chromosome 2 to 6 being mapped in test sample Purpose ratio calculates.
Normalized chromosome value
Use the chromosome dosage and the conjunction in training group that each chromosome interested is directed in each test sample The corresponding chromosome dosage determined in lattice sample, normalized chromosome value (NCV) is calculated using below equation:
WhereinWithIt is accordingly the estimation training cell mean and standard deviation for j-th of chromosome dosage, AndIt is for j-th of chromosome dosage observed by test sample i.It is normalized point when by chromosome dosage During cloth, NCV equivalent to one statistics z-score for these dosage.Divide position in the NCV from unaffected sample Number-quantile is not observed in drawing to be deviated significantly from the linearity.In addition, the standard testing of the normalizing degree for NCV Fail the null hypothesis of rejection normality.
For test group, each chromosome 21 interested for each sample, 18,13, X and Y calculate one NCV.In order to ensure a safe and efficient classification schemes, the conservative border for aneuploidy categorizing selection.In order to right Autosomal aneuploid state is classified, it is desirable to which chromosome is classified as impacted (that is, for the dyeing by NCV Body is aneuploidy);And NCV<2.5 chromosome is classified as it is unaffected.Autosome have 2.5 and 4.0 it Between NCV sample be classified as " no judge ".
In testing, the classification of sex chromosome for X and Y by following content sequential use NCV by all being carried out:
If NCV Y>- 2.0 male sample standard deviation from averages, then the sample be classified as male (XY).
If NCV Y<- 2.0 male sample standard deviation from averages, and NCV Y>- 2.0 women sample standard deviation from averages, then The sample is classified as women (XX).
If NCV Y<- 2.0 male sample standard deviation from averages, and NCV Y<- 3.0 women sample standard deviation from averages, then The sample is classified as monosomy X, i.e. Turner syndrome.
If NCV does not meet any above standard, it is " no to judge " that the specimen cup, which is classified as sex,.
As a result
Study demography
Register 1,014 patients altogether between in April, 2009 and in July, 2010.The demographics of patient, invasive journey Sequence type and results of karyotype be summarised in the average age of study population in table 19 as 35.6 years old (scope was at 17 to 47 years old) and Pregnant age scope is 1 day 6 weeks to 1 day 38 weeks (average out to 15 week 4 days).The overall incidence of abnormal fetus karyotype is 6.8%, the wherein T21 incidences of disease are 2.5%.In 946 subjects with single pregnancy and caryogram, 906 (96%) Show at least one clinical generally acknowledged risk factors for the fetus aneuploidy of prenatal course.Even if remove those only Subject with the high pregnancy age as its unique indication, data are still illustrated for current examination mode one very High false positive rate.It is with the ultrasonic result of ultrasonography carried out:Increased nuchal translucency, cystic hygroma or its Birth defect in his structure, these are the abnormal karyotypes that foresight is most strong in this age group.
The patient demographics of table 19.
* the result of the fetus including multifetation, * * are assessed and reported by clinician
Abbreviation:AMA=high pregnant ages, NT=nuchal translucencies
Also shown in the distribution of ethnic background various shown in this study population in table 19.Generally, originally grinding 63% patient is Caucasian in studying carefully, and 17% is Spaniard, and 6% is Asian, and 5% is multi-national, and 4% is Africa American.Notice, ethnic difference is changed significantly in different places.For example, one place registers 60% western class Tooth and 26% Caucasia subject, and three clinical points positioned at same state are not enrolled for Spain subject.As expected , recognizable difference is not observed in our not agnate result.
Training dataset 1
The training group research is from April, 2009 to collected between in December, 2009, preliminary phase after 435 samples of accumulation In pick 71 samples.There are all subjects of impacted fetus (abnormal karyotype) in the subject of the First Series Be included for being sequenced, and one with appropriate sample and data it is random select it is uninfluenced with random number Subject.The demographics of holistic approach of the Clinical symptoms of training group patient with being shown in table 19 is consistent.In training group The pregnant age scope of sample is the scope from 0 day 10 weeks to 1 day 23 weeks.38 people experienced CVS, and 32 people experienced amniocentesis And 1 patient does not have the type (unaffected caryogram 46, XY) for the invasive program specified.70% patient is high adds Rope people, 8.5% is Spaniard, and 8.5% is Asian, and 8.5% is multi-ethnic.For training purposes, from this Six samples being sequenced are eliminated in collection.Subject (discussed further below) of 4 samples from gemellary pregnancy, 1 sample With T18, it is contaminated in preparation process, and 1 sample has fetal karyotype 69, XXX, and it is the training to be left 65 samples Group.
The number (that is, the label identified in genome with unique site) in unique sequence site is from the training group research Early stage 2.2M to later stage 13.7M (due to the improvement with the time on sequencing technologies) and change.In order to supervise Survey any potential change that the chromosome dosage in the site of uniqueness exceedes this 6 times of scope, beginning and knot in research Different, unaffected sample has been run during beam.For the round of preceding 15 unaffected samples, unique site is averaged Number is 3.8M and is 0.314 and 0.528 respectively for the average chromosome dosage of chromosome 21 and chromosome 18.For rear The round of 15 unaffected samples, the average number in unique site are 10.7M and for chromosome 21 and chromosome 18 Average chromosome dosage be 0.316 and 0.529 respectively.With instruction between chromosome 21 and the chromosome dosage of chromosome 18 Practice the time passage of group research, do not count sex differernce.
Training group NCV for chromosome 21,18 and 13 is shown on Figure 42.Figure 42 illustrates result with it is a kind of just The hypothesis of state property is consistent, and the hypothesis is:About 99% diploid NCV will fall into+2.5 standard deviations of average value.At this In 65 samples in collection, 8 samples with the clinical caryogram for indicating T21 with NCV scopes be from 6 to 20.Four With clinical caryogram indicate fetus T18 sample with NCV scopes from 3.3 to 12, and two clinical cores having The NCV that type indicates the sample of fetal trisomic 13 (T13) and had is 2.6 and 4.NCV distribution is in impacted sample Due to the dependence of their percentages to the fetus cfDNA in single sample.
It is similar with autosome, the average value and standard deviation of sex chromosome are determined in training group.Sex chromosome Threshold value allows 100% ground to differentiate the masculinity and femininity fetus in training group.
Test data set 1
After chromosome dosage average value and standard deviation from average with training group is established, from January, 2010 to A test group of 48 samples is have selected in the sample collected between in June, 2010 from 575 samples altogether.One of them Sample from gemellary pregnancy removes from final analysis, is left 47 samples so in test group.Preparation is set to be used to be sequenced Sample and operation equipment personnel to clinical karyotype information to be blind.Pregnant age scope (table similar to what is seen in training group 19).The 58% of invasive program is CVS, procedural demographic higher but also similar with training group than totality.50% Subject be Caucasian, 27% is Spaniard, and 10.4% is Asian and 6.3% is African Americans.
In test group, the number of unique sequence label is different from about 13M to 26M.For unaffected sample Product, for chromosome 21 and chromosome 18, chromosome dosage is respectively 0.313 and 0.527.For chromosome 21, chromosome 18 With chromosome 13, test group NCV figure 43 illustrates and be sorted in table 20 and provide.
The test group of table 20. classification data test group grouped data
* MX is the monosomy of X chromosome, and Y chromosome does not have sign
In test group, 13/13 subject with the caryogram for being designated as fetus T21 is correctly identified as with model Enclose the NCV from 5 to 14.Eight/eight subjects with the caryogram for being designated as fetus T18 be correctly identified for scope from 8.5 to 22 NCV.In this test group, it is about 3 that the simple sample with the C for being classified as T13, which is classified as wherein NCV, Without judgement.
For test data set, all male samples are correctly identified, including with complex karyotype 46, XY+ mark dyes There are 19 to be correctly validated in 20 women samples of sample (table 11) of colour solid (can not be identified by cytogenetics), And a women sample is classified as no judgement.For three samples that caryogram in test group is 45, X, there are two in three Monosomy X is correctly identified as, and 1 is classified as without judgement (table 20).
Twins
For having in the sample of the initial selection of training group in four and test group, to have one be to come from gemellary pregnancy.Herein The threshold value used may by the environment of gemellary pregnancy expected cfDNA different values perplex.In training group, come Caryogram from one of twins sample is single chorion 47, XY+21.One the second twins sample is different ovum and amniocentesis Each fetus is individually carried out.In this gemellary pregnancy, a fetus has 47, XY+21 caryogram and another has There is a normal caryogram 46, XX.In the two cases, sample is returned in the acellular classification based on method discussed above Class is T21.Other two gemellary pregnancies in training group are classified as that unaffected (all twins are all for T21 by correct Show the diploid karyotype for chromosome 21).For the gemellary pregnancy in test group, only twins B is established caryogram (46, XX), and the algorithm to be classified as T21 be unaffected by correct.
Conclusion
The as shown by data large-scale parallel sequencing method can be used to determine multiple abnormal fetuses from the blood of pregnant woman Caryogram.These as shown by data, correctly sorting out to 100% of the sample with trisomy 21 and trisomy 18 can use independently Test group data be identified.Even in the case of the fetus with abnormality karyotype, neither one sample Algorithm using this method is mistakenly sorted out.Importantly, the algorithm equally it is determined that two gemellary pregnancies group internal memory Or in the absence of in terms of T21 same performance it is good.In addition, this research checked many continuous samples from multiple centers, The scope for the abnormal karyotype that people may see in commercial clinical environment is not only represented, is also illustrated not by common three body Property the importance accurately sorted out of gestation that influences, it is high to unacceptable false sun present in current Prenatal Screening to emphasize Property rate.The data using the great potential of this method following for providing valuable opinion.Unique gene loci The increase that analysis shows in the consistent Poisson counting statistics value of variance of subset.
The data are established on the basis of Fan and Quake discovery, and Fan and Quake are confirmed:Using extensive parallel It is sequenced from the sensitivity of Maternal plasma non-invasively determining fetus aneuploidy and is only limited (Fan and Quake, PLos by counting statistics One 5,e10439[2010]).Because sequencing information is gathered throughout whole gene group, it can determine appoint in this way What aneuploidy or other copy number variations, including insert and lack.Caryogram from one of sample is in chromosome 11 There is a small missing between q21 and q23, when sequencing data is analyzed in 500k base data boxes, observation To the reduction of the 25Mb originated at q21 region interior label relative number about 10%.In addition, in training group, sample In have three due to the mosaicism in cytogenetic and with minute property caryogram.These caryogram are:i)47,XXX [9]/45, X [6], ii) 45, X [3]/46, XY [17], and iii) 47, XXX [13]/45, X [7].Show some and contain XY The sample ii of cell be correctly classified as XY.Shown by cytogenetic (consistent with chimera Turner syndrome) The sample i (coming from CVS processes) and iii (coming from amniocentesis) of the mixture of XXX and X cell be classified as respectively without judge and Monosomy X.
When testing the algorithm, for the chromosome 21 of a sample (Figure 43) from test group, another is interesting Data point is observed with a NCV between -5 and -6.Although the sample is on chromosome 21 by cytogenetics Diploid, the caryogram illustrate the chimerism with part triploidy for chromosome 9:47,XX+9[9]/46,XX[6].Due to Chromosome 9 determines the chromosome dosage (table 18) of chromosome 21 with the denominator, and it reduce total NCV values.In following reality The result provided in example 13 confirms the ability for determining fetal trisomic 9 in this sample using normalization chromosome.
The conclusion of the sensitivity about these methods such as Fan is only in used algorithm it can be considered that sequence measurement is brought Any random or systematic bias when be only correctly.If the sequencing data is not appropriately normalized, point of gained Analysis result will be inferior to counting statistics.Chiu et al. is in their recent papers it is noted that they use large-scale parallel sequencing The measurement result for the chromosome 18 and 13 that method obtains is inaccurate, and conclusion is to need the more researchs of progress should Method is applied to T18 and T13 measure (Chiu et al., BMJ 342:c7401[2011]).Used in Chiu et al. paper Method simply used in their case chromosome 21 chromosome interested sequence label number, the number leads to The total number for the label crossed in the sequencing round is normalized.It is in place of the challenge of this approach:Label is each Distribution on chromosome can be different from sequencing round to sequencing round, and therefore add what aneuploidy measure was measured Entire change.In order to which the dosage of chromosome of the result of Chiu algorithms with using in this example is contrasted, by chromosome The method that 21 and 18 test data is recommended using Chiu et al. is analyzed again, as shown in Figure 44.Generally, for dyeing Each of body 21 and 18 observed the compression in the range of NCV, and observed the reduction of determination rate, wherein profit Correctly identify 10/13 T21's and 5/8 from our test group with the NCV threshold values 4.0 for being used for aneuploidy classification T18 samples.
Ehrich et al. also only focuses on T21 and use and Chiu et al. identical algorithm (Ehrich et al., Am J Obstet Gynecol 204:205e1-e11[2011]).In addition, the test group z-score for observing them measure with outside After a skew of the portion with reference to data (i.e. training group), they have carried out retraining to establish classification boundaries to test group.To the greatest extent Manage this method be in principle it is feasible, in practice by it is challenging be determine require how many sample be trained and Need how long once to carry out retraining to ensure the correct of these grouped datas.A kind of method for mitigating this problem is each Being sequenced in round all includes control, and these are calibrated to amount of illumination baseline and for quantitative behavior.
The as shown by data obtained using this method, when the algorithm for chromosome counting data to be normalized is optimised When, large-scale parallel sequencing can determine a variety of fetal chromosomal abnormalities from the blood plasma of pregnant woman.For quantitative this method not only The random and systematic variation being sequenced between round is minimized, also allows to divide aneuploidy throughout whole gene group Class, most significantly T21 and T18. require larger sample collection to test the algorithm for T13 measure.For this purpose, The clinical research in a perspective, blind, more place is carried out further to prove the diagnostic accuracy of this method.
Example 13
Determined in all chromosomes of single test sample non-whole presence or absence of at least five kinds of different chromosomes Ploidy
In order to prove that this method is used to determine each group of parent test sample (test group 1;Example 12) in exist or do not deposit In the ability of any chromosome aneuploidy, in unaffected test group sample (training group 1;Example 12) in identify and be The normalization chromosome sequence that system ground determines, and these normalization chromosome sequences are used to calculate and are directed to each test sample All chromosomes chromosome dosage.It is determined that presence or absence of any one or more of in each test and training group sample Different complete fetal chromosomal aneuploidy is by being obtained from the single sequencing round carried out to each single sample What sequencing information was realized.
Using dyeing volume density, i.e., know for each chromosome in the sample of each test group illustrated in example 12 The number of other sequence label, by calculating a monosome dosage for each in chromosome 1-22, X and Y and true Be made up of an a monosome or a group chromosome, normalization chromosome sequence systematically determined is determined.By making Systematically calculated for the chromosome dosage of each chromosome and determined as denominator by the use of each possible chromosomal For each in chromosome 1-22, X and Y, the normalization chromosome sequence that systematically determines.For example, for chromosome 21 as chromosome interested, the number as (i) sequence label obtained for chromosome 21 (chromosome interested) (ii) is for the number of the sequence label of each remaining chromosome acquisition and for remaining chromosome (not including chromosome 21) All possible combination obtain number of tags sum ratio, calculate chromosome dosage, i.e.,:1st, 2,3,4,5 etc. is straight To 20,21,22, X and Y;1+2,1+3,1+4,1+5 etc. are until 1+20,1+22,1+X and 1+Y;1+2+3、1+2+4、1+2+ 5 etc. until 1+2+20,1+2+22,1+2+X and 1+2+Y;1+3+4,1+3+5,1+3+6 etc. are until 1+3+20,1+3+ 22nd, 1+3+X and 1+3+Y;1+2+3+4,1+2+3+5,1+2+3+6 etc. until 1+2+3+20,1+2+3+22,1+2+3+X, And 1+2+3+Y;And and so on, so that all chromosome 1-20,22, X and Y all possible combination all by with Each every in these qualified (aneuploidy) samples for making normalization chromosome sequence (molecule) to be directed in training group Individual chromosome interested determines all possible chromosome dosage.For the chromosome 21 in all training group samples with Same mode determines chromosome dosage, and these are directed to the normalization chromosome sequence quilt that chromosome 21 systematically determines Be determined as cause for 21 to have throughout all training samples with the single of minimum variability in a dosage or One group chromosome.Be repeated identical analysis using determine to be used as each remaining chromosome (including chromosome 13, 18th, X and Y) systematically determined normalization chromosome sequence monosome or chromosomal, i.e. use All possible chromosomal come determine in all training samples for every other chromosome 1-12 interested, 14-17,19-20,22, X and Y normalization sequence (individual chromosome or a group chromosome).Therefore, by all chromosomes all Chromosome interested is regarded as, and for each in all chromosomes in each unaffected sample in training group The normalization systematically determined a sequence is all determined.Table 21 is provided as each chromosome 1-22, X interested And the individual chromosome that goes out of the Y normalization recognition sequence systematically determined or genome.Such as highlighted by table 21, For some chromosomes interested, the normalization chromosome sequence systematically determined be confirmed as single chromosome (such as when When chromosome 4 is chromosome interested), and for other chromosomes interested, the normalization chromosome systematically determined Sequence is confirmed as a group chromosome (such as when chromosome 21 is chromosome interested).
Normalization chromosome sequence that table 21. is directed to all chromosomes, systematically determining
The normalization chromosome for systematically being determined determined by each in all chromosomes is provided in table 22 Average value, standard deviation (SD) and the coefficient of variation (CV) of sequence.
Table 22. is for average value, standard deviation (SD) and the variation lines of the normalization chromosome sequence systematically determined Number (CV)
Chromosome interested Average value SD CV
1 0.36637 0.00266 0.72%
2 0.31580 0.00068 0.22%
3 0.21983 0.00055 0.18%
4 0.98191 0.02509 2.56%
5 0.30109 0.00076 0.25%
6 0.21621 0.00059 0.27%
7 0.21214 0.00044 0.21%
8 0.25562 0.00068 0.27%
9 0.12726 0.00034 0.27%
10 0.24471 0.00098 0.40%
11 0.26907 0.00098 0.36%
12 0.12358 0.00029 0.23%
13a 0.26023 0.00122 0.47%
14 0.09286 0.00028 0.30%
15 0.21568 0.00147 0.68%
16 0.25181 0.00134 0.53%
17 0.46000 0.00248 0.54%
18a 0.10100 0.00038 0.38%
19 1.43709 0.02899 2.02%
20 0.19967 0.00123 0.62%
21a 0.07851 0.00053 0.67%
22 0.69613 0.01391 2.00%
Xb 0.46865 0.00279 0.68%
Yb 0.00028 0.00004 14.97%
aDo not include trisomy
bFemale child
Change throughout the chromosome dosage of all training samples confirms systematically true (as reflected by CV value) Fixed normalization chromosome sequence is used for the purposes for providing big a signal to noise ratio and dynamic range, so as to allow with high sensitivity Property and high specificity aneuploidy is determined, as shown in herein below.
In order to prove that the Sensitivity and Specificity of this method determines, for all chromosome 1-22, X and Y interested For all chromosome 1-22, X and Y interested chromosome dosage in each sample in training group, and in example Each of all samples in test group illustrated in 11 all employ provided in upper table 21 it is corresponding, systematically true Fixed normalization chromosome sequence.
Using the normalization chromosome sequence systematically determined for each chromosome interested, in each training group Sample in and each test sample in the existence or non-existence of any fetus aneuploidy is determined, i.e. determine each Sample whether chromosome 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, X and Y All contain a complete fetal chromosomal aneuploidy.For in the sample of each training group and each test sample In all chromosomes all obtain the number of sequence information, i.e. sequence label, and in each training and test sample In each chromosome use normalization chromosome sequence corresponding with those determined in test group, systematically determining The number for the sequence label that row (table 21) obtain calculates a monosome dosage as previously discussed.In each training sample The number of the sequence label obtained for the normalization chromosome sequence systematically determined is used to determine in each training sample The chromosome dosage of each chromosome, and obtained in each test sample for the normalization chromosome sequence systematically determined The number of the sequence label obtained is used to determine the chromosome dosage of each chromosome in each test sample.In order to ensure to non- Ortholoidy is safely effectively classified, and as illustrated by example 12, have selected equally conservative border.
Training group result
Provided in Figure 45 and dyeing is directed in the sample of training group using the normalization chromosome sequence systematically determined The drawing of the chromosome dosage of body 21,18 and 13.As the normalization chromosome sequence that use systematically determines, i.e. chromosome 4+14 During+16+20+22 group, wherein clinical caryogram instruction T21 8 samples have the NCV between 5.4 and 21.5.When using system During normalization chromosome sequence (i.e. the chromosome 4+14+16+20+22 group) that ground determines, wherein the 8 of clinical caryogram instruction T21 Individual sample has the NCV between 5.4 and 21.5.As normalization chromosome sequence (the i.e. chromosome 2+3+5 that use systematically determines + 7 group) when, wherein clinical caryogram instruction T18 4 samples have the NCV between 3.3 and 15.3.The T21 samples of training group Last 8 samples as the data of chromosome 21 are shown (O);Last 4 as chromosome eighteen data of the T18 samples of training group Sample is shown (Δ);And the T13 samples of training group are shown () as last 2 samples of the data of chromosome 13.
These as shown by data, different, complete tire can be determined using normalization chromosome sequence with high confidence level Youngster's chromosome aneuploidy and it correct is classified.Because all samples with impacted caryogram are all with more than 3 NCV, there is a possibility that about 0.1%, i.e.,:These samples are the part in unaffected distribution.
It is similar with autosome, when the normalization chromosome sequence (i.e. chromosome 4+8 group) systematically determined is used for During chromosome x, and when the normalization chromosome sequence (i.e. chromosome 4+6 group) systematically determined is used for chromosome Y, All women and male fetus in training group are correctly identified out.In addition, all 5 monosomy X samples are all known Do not come out.Figure 46 A show the NCV (X-axis) determined for each sample in training group for X chromosome and contaminated for Y The curve map for the NCV (Y-axis) that colour solid determines.There is the NCV values less than -4.83 by all samples that caryogram is monosomy X.Tool Having with those monosomy X samples of the consistent caryogram of 45, X caryogram (complete or chimeric) there is one as expected to approach Zero Y NCV values.Women sample is all gathered near NCV=0 for X and Y.
Test group result
The normalization chromosome sequence systematically determined using correlation is provided in Figure 47 in the test sample for dye The drawing of the chromosome dosage of colour solid 21,18 and 13.As normalization chromosome sequence (the i.e. chromosome 4+ that use systematically determines 14+16+20+22 group) when, wherein clinical caryogram instruction T21 13 samples in have 13 be correctly validated out with 7.2 with NCV between 16.3.When the normalization chromosome sequence (i.e. during chromosome 2+3+5+7 group) that use systematically determines, wherein Clinical caryogram instruction T18 all 8 samples are all identified with the NCV between 12.7 and 30.7.When use is systematically true During fixed normalization chromosome sequence (i.e. chromosome 2+3+5+7 group), wherein clinical caryogram instruction T18 all 8 samples All it is identified with the NCV between 12.7 and 30.7.Last 13 sample of the T21 samples of test group as the data of chromosome 21 Product are shown (O);The T18 samples of test group are shown (Δ) as last 8 samples of chromosome eighteen data;And test group T13 samples are shown () as the last sample of the data of chromosome 13.
These as shown by data, it can be determined not using systematically determine, normalization chromosome sequence with high confidence level With complete fetal chromosomal aneuploidy and it correct is classified.It is similar with training group, there is impacted caryogram All samples all have more than 7 NCV, this shows there is a minimum possibility, i.e.,:These samples are unaffected point A part for cloth.(Figure 47).
It is similar with autosome, when the normalization chromosome sequence (i.e. chromosome 4+8 group) systematically determined is used for During chromosome x, and when the normalization chromosome sequence (i.e. chromosome 4+6 group) systematically determined is used for chromosome Y, All women and male fetus in test group are correctly identified out.In addition, all 3 monosomy X samples are all known Do not come out.Figure 46 B show the NCV (X-axis) determined for each sample in test group for X chromosome and contaminated for Y The drawing for the NCV (Y-axis) that colour solid determines.
As described above, this method allow to determine in each sample presence or absence of each chromosome 1-22, X and A kind of Y complete or partial chromosome aneuploidy.Except determine complete chromosome aneuploidy T13, T18, Outside T21 monosomy X, this method also measured were the presence of the trisomy 9 in a wherein test sample.When use system measurement When normalizing chromosome sequence (i.e. chromosome 3+4+8+10+17+19+20+22 group), for chromosome 9 interested, identification One sample (Figure 48) with 14.4 NCV.The test sample that this sample corresponds in example 12, the test sample root Under a cloud according to the lopsided low dosage for chromosome 21 is that aneuploidy (wherein uses in example 12 for chromosome 9 Chromosome 9 is as normalization chromosome sequence).
The as shown by data, 100% sample have the sample of instruction T21, T13, T18, T9 and monosomy X clinical caryogram Product are correctly identified out.Figure 49 show in each of 47 test samples for chromosome 1-22 each NCV curve map.NCV median is normalized to zero.The as shown by data, method of the invention is (including the use of systematically true Fixed normalization chromosome sequence) institute present in this test group determined with 100% sensitivity and 100% specificity There is the presence of the chromosome aneuploidy of 5 types, and clearly indicate, this method can identify right in any sample In chromosome 1-22, X and Y any chromosome aneuploidy of any one.
Example 14
It is determined that presence or absence of part fetal chromosomal aneuploidy:Determine cat's eye syndrome
DiGeorge syndrome (22q11.2 deletion syndromes), the illness as caused by the defects of chromosome 22, leads Cause the bad development of several body systems.Medical care problem generally associated with DiGeorge syndrome include heart defect, Bad function of immune system, cleft palate, parathyroid gland and behavioral disorder.With number of DiGeorge syndrome the problem of associated Mesh and the order of severity have very big change.Almost there is the people of DiGeorge syndrome to be required for coming from multiple necks for each The treatment of the expert in domain.
In order to determine the excalation presence or absence of fetal chromosomal 22, by implementing venipuncture to mother to obtain A blood sample was obtained, and cfDNA is prepared as described in above example.CfDNA after purification is connected to On aptamer and using Illumina cBot cluster station (cluster station) be subjected to cluster expand.Using reversible Dye-terminators carry out large-scale parallel sequencing, to produce millions of 36bp readings.By these sequence reads and mankind hg19 Reference gene group is compared, and counts using the reading being uniquely mapped in reference gene group as label.
The diploid of chromosome 22 will be all known as (i.e. only with diploid shape known to chromosome 22 or its any part State presence) the groups of a qualified samples be sequenced and analyzed first with 1000 areas for 3 megabasses (Mb) Each of section obtains multiple sequence labels (not including region 22q11.2).If human genome includes about 3,000,000,000 Base (3Gb), 3Mb 1000 sections each about constitute the remainder of genome.Each of in this 1000 sections The normalization of section interested can be used to determine individually or as the group service of a sector sequence, these sector sequences The 3Mb regions of sector sequence, i.e. 22q11.2.The number coverlet for the sequence label being mapped on each single 1000bp sections Solely it is used for calculating the section dosage in 22q11.2 3Mb regions.In addition, all possible combination of two or more sections It is used to determine the section dosage for section interested in all qualified samples.Cause to have throughout the minimum change of sample The combination of the single 3Mb sections or two or more 3Mb sections of the section dosage of the opposite sex is selected as normalizing section sequence Row.
The number for the sequence label being mapped in each qualified samples on section interested is used to determine each conjunction Section dosage in lattice sample.The average value and standard variance of section dosage in all qualified samples are calculated and are used for really Determine threshold value, the section dosage determined in the test sample can be contrasted with these threshold values.Preferably for all qualified All sections interested in sample calculate normalized section value (NSV), and carry out given threshold using these values.
Then, the number that the label of normalization sector sequence is mapped in corresponding test sample is used to determine test The dosage of section interested in sample.As described earlier a normalization section is calculated for the section in test sample Value (NSV) and by the NCV of section interested in test sample compared with the threshold value determined using qualified samples with true The fixed 22q11.2 of existence or non-existence in the test sample missing.
Test NCV<- 3 show a kind of loss in section interested, i.e., chromosome 22 in the test sample be present The excalation of (22q11.2).
Example 15
To obtain the faeces DNA test of the prediction result of II stage colorectal cancer patients progress
About 30% will recur and die from its disease suffered from all II stages colorectal cancer patients.There is disease to answer The II stages colorectal cancer patients of hair show significantly more loss on chromosome 4,5,15q, 17q and 18q.Specifically, Loss of the II stage colorectal cancer patients on 4q22.1-4q35.2 has shown that with worse result be associated.It is determined that in the presence of Or change patient (Brosens et al., the analysis cell pathology that adjuvant therapy can be carried out with assisted Selection in the absence of these genomes / cell tumour (Analytical Cellular Pathology/Cellular Oncology) 33:95–104 [2010])。)
In order to determine existence or non-existence 4q22.1 one kind into 4q35.2 regions in II stage colorectal cancer patients Or a variety of chromosome deficiencies, obtain excrement and/or plasma sample from this or these patient.Faeces DNA is according to Chen etc. People, J Natl Cancer Inst 97:Prepared by the method for 1124-1132 [2005] descriptions;And plasma dna be according to Prepared by the method described in upper example.DNA is sequenced according to NGS methods described here, and this or these patient The sequence information of sample is used to calculate the section dosage being directed to across 4q22.1 to one or more sections in 4q35.2 regions. Section dosage is come using normalization section dosage previously determined in a qualified excrement and/or plasma sample group respectively Determine.Calculate the section dosage in test sample (Patient Sample A), and exist into 4q35.2 regions in 4q22.1 or In the absence of one or more chromosome dyads missing be by by each section interested with being set by the NSV in qualified samples group Fixed threshold value is compared to what is determined.
Example 16
Detected by the way that Maternal plasma DNA is sequenced to carry out full gene group fetus aneuploidy:It is perspective, The accuracy of diagnosis in blind multicenter study
For determining that the method in parent test sample presence or absence of aneuploidy is used for perspective study, and its The accuracy of diagnosis is shown as described below.Perspective study further proves that the inventive method is used for for crossing over genome Gemini detection fetus aneuploidy the effect of.Actual pregnant woman colony is simulated in blind research, and wherein fetal karyotype is unknown , and select all samples with any abnormal karyotype to be sequenced.By the classification made according to the inventive method really Result is determined compared with the fetal karyotype derived from invasive program to determine diagnosis of this method to a variety of chromosomal aneuploidies Ability.
The general introduction of this example
In perspective blind research, collected in 60 U.S. sites from the women of 2,882 progress pre-natal diagnosis programs Blood sample (clinicaltrials.gov NCT01122524).
All single pregnancies and equal number of the independent biostatistican's selection with any abnormal karyotype it is random The gestation with euploid caryogram of selection.The method according to the invention carries out chromosome classification and and fetal nucleus to each sample Type compares.
In the analysis cohort of 532 samples, the case ((95%CI of sensitivity 100% of 89/89 trisomy 21 95.9-100)), the case (sensitivity 97.2%, (95%CI 85.5-99.9)) of 35/36 trisomy 18,11/14 three body The case (sensitivity 78.6%, (95%CI 49.2-99.9)) of property 13,232/233 women (sensitivity 99.6%, (95% CI 97.6->99.9)), 184/184 male (sensitivity 100%, (95%CI 98.0-100)) and 15/16 monosomy X case (sensitivity 93.8%, (95%CI 69.8-99.8)) is classified.In unaffected subject, in the absence of normal Chromosomal aneuploidy false positive (100% specificity, (95%CI>98.5-100)).In addition, with trisomy 21 (3/3), three The fetus of body 18 (1/1) and monosomy X (2/7) chimerism, three translocation trisomicses, two other bodies of autosome three Property (20 and 16) and other sex chromosome aneuploidy (XXX, XXY and XYY) are correctly classified.
These results further prove this method using Maternal plasma DNA to detect the tire across the gemini of genome The effect of youngster's aneuploidy.The high sensitivity and specificity detected for trisomy 21,18,13 and monosomy X shows this method It can be incorporated in existing aneuploidy examination algorithm to reduce unnecessary invasive program.
Material and method
Carry out MELISSA (maternal blood is the source of diagnosing fetal aneuploidy exactly) research as it is perspective more in Heart observational study, with blind nido case:Check analysis.The invasive antenatal program of experience is enlisted to determine 18 years old of fetal karyotype With the pregnant woman (Clinicaltrials.gov NCT01122524) of more than 18 years old.Qualified criterion includes gestation at 0 day and 22 8 weeks Pregnant woman between 0 day week, it meets at least one in following additional criteria:Age >=38 year old;Positive examination test result (blood Clear assay value and/or nuchal translucency (NT) measured value);Increase related ultrasonic tags in the presence of to fetus aneuploidy risk Thing;Or previously nourish aneuploid fetus.Written consent book is obtained from all women for agreeing to participate in.
According to the Institutional Review Board (IRB) of each mechanism batch at 60 medical centres being geographically spread out in 25 states Accurate scheme is registered.Engage two clinical research tissues (CRO) (elder brother Thailand (Quintiles), De Han, the North Carolina state; With An Pusen (Emphusion), San Francisco, California) come keep research be it is blind and provide clinical data management, Data monitoring, biometrics and data analysis service.
Before any invasive program, peripheral veins blood sample (17mL) is collected in two acid citrate dextroses (ACD) in pipe (must Supreme Being), remove and identify and be marked with unique numbering of studying.Position researcher will study numbering, number According to this and the blood drawing time is input in safe electronic medical recordses account (eCRF).Whole blood sample is in the container of controlled temperature From multiple website shipped overnights to laboratory (Wei Ruinatai health company (Verinata Health, Inc.), California State).After receiving and carrying out sample survey, according to previously described method (referring to example 13) prepare cell-free plasma and Untill freezer storage is at -80 DEG C when sequencing in 2 to 4 aliquots.Recording laboratory carries out the day of sample reception If phase and time sample are that to receive, touch up all through the night be cool and comprising at least 7mL blood, then determine that it is adapted to point Analysis.Weekly by sampling report qualified when receiving to CRO and for the selection (see below and Figure 50) of stochastical sampling list. Clinical data derived from the current gestation of women and fetal karyotype is input in eCRF by website researcher and carried out by CRO Checking.
The estimation of the target zone for the performance characteristic (sensitivity and specificity) that the determination of sample size is tested based on index The accuracy of value.Exactly, the case of impacted (T21, T18, T13, male, women or monosomy X) is determined and not by shadow The number of the control of (non-T21, non-T18, non-T13, non-male, non-women or non-monosomy X) is rung, so as to based on normality approximation pair Should ground assessment sensitivity and specificity (N=(1.96 √ p (1-p)/error span) in prespecified smaller error span2, Wherein p=sensitivity or specific estimate).Assuming that real sensitivity is 95% or bigger, the sample between 73 to 114 Product size ensures that the accuracy of sensitivity estimation will be so that the lower bound of 95% confidence subregion (CI) will be 90% or bigger (error Amplitude≤5%).For smaller sample size, the evaluated error amplitude of planning the 95%CI of sensitivity it is bigger (from 6% to 13.5%).In order to estimate specificity with bigger accuracy, (it is directed in the bigger unaffected control number of sample phase plan Case about 4:1 ratio).Therefore ensure that the accuracy of specific estimate reaches at least 3%.Therefore, with sensitivity and/or Specificity increase, the accuracy of confidence subregion also will increase.
Determined based on sample size, CRO designs random sampling scheme to produce the list of selected sample to be sequenced (minimum 110 unaffected for trisomy by T21, T18 or T13 case influenceed and 400, so as to allow these Up to half has the caryogram removed beyond 46, XX or 46, XY in case).It is adapted to selection with single pregnancy and qualified blood sample Subject.Exclude the subject (Figure 50) with failed test sample, without caryogram record or multifetation.In whole research periodically Produce list and be sent to Wei Ruinatai health laboratory.
Each qualified blood sample is analyzed for six kinds of independent classifications.These classifications are to be directed to chromosome 21,18 and 13 Aneuploid state, and the sex state of male, women and monosomy X.Although being still blind, each blood plasma is directed to It is (impacted, unaffected or not that each of six kinds of independent classifications of DNA sample produce three kinds of classification perspectively One of it is classified).During using the program, same sample may be classified as in an analysis it is impacted (such as The aneuploidy of chromosome 21) and it is classified as unaffected (such as the multiple for chromosome 18 in another analysis Body).
The conventional medium cell genetic analysis of (CVS) or the cell of amniocentesis acquisition are sampled at this by chorionic villi It is used as reference standard in research.Fetal karyotype determination is carried out in the usually used diagnostic test room of website is participated in.If stepping on Patient experienced CVS and amniocentesis after note, then be used to research and analyse by caryogram caused by amniocentesis.If it can not obtain Mid-term caryogram, then allow targeting staining body 21,18,13, X and Y FISH (FISH) result (table 24).It is all Abnormal karyotype report is all examined (i.e. except 46, XX and 46, beyond XY) by the cytogeneticist by committee's certification, and phase It is categorized as chromosome 21,18 and 13 and sex state XX, XY and monosomy X impacted or unaffected.
Prespecified stipulations agreement provides that following abnormal karyotype will be appointed as ' being examined for caryogram by cytogeneticist Cross ' state:Triploidy, tetraploidy, the complex karyotype of chromosome 21,18 or 13 involved in addition to trisomy are (such as embedding Conjunction property), the mosaic of sex chromosome, sex chromosome aneuploidy or the caryogram that can not be translated completely by source document with mixing (such as the label chromosome in unknown source).Because cytogenetics diagnosis is not to be sequenced known to laboratory, all processes The sample of cytogenetics inspection is all analyzed independently and is appointed as what is determined according to the inventive method using sequencing information Classification (sequencing classification), but be not included in statistical analysis.The correlation that the state checked is pertaining only in six kinds of analyses it is a kind of or A variety of (such as mosaic T18 will be checked in being analyzed from chromosome 18, but analyzed by other, such as chromosome 21,13, X and Y, recognize For ' unaffected ') (table 25).Do not checked from analysis when stipulations design can not perfect foresight other are abnormal simultaneously And rare complex karyotype (table 26).
Contained data are only limitted to privileged user (research website, CRO and the clinical people of signing in eCRF and clinical data storehouse Member).Any employee of Wei Ruinatai health can not be accessed untill when making known.
After chance sample list is received at CRO, from the selected blood plasma sample by thawing as described in example 13 Total Cell-free DNA (parent and the mixture of fetus) is extracted in product.Prepared and be sequenced using Yi Lu meter Na TruSeq kits v2.5 Library.It is sequenced, (6 clumps, i.e. 6 samples is carried out on the instruments of Yi Lu meter Na HiSeq 2000 in Wei Ruinatai health laboratory Product/swimming lane).Obtain the single-ended reading of 36 base-pairs.Reading is mapped in whole gene group, and to each dyeing interested Sequence label on body is counted and for classifying as described above for independent classification to sample.
Clinical stipulations need evidence existing for foetal DNA with report category result.The classification of male or aneuploid by regarding For the ample evidence of foetal DNA.In addition, also directed to the presence of foetal DNA, using two kinds of allele-specific methods to various kinds Product are tested.In first method, AmpflSTR Minifiler kits (life technology (Life is used Technologies), Santiago, California) examine the presence of the fetus component in Cell-free DNA.In ABI The electrophoresis of STR (STR) amplicon is carried out on 3130 Genetic Analysers according to the stipulations of manufacturer.Pass through ratio Relatively it is in the intensity of each peak value reported of the percents for the intensity summation for accounting for all peak values, to all in the kit Nine str locus seats are analyzed, and evidence of the presence of minor peaks for providing foetal DNA.In the absence of can identify Micro STR in the case of, the aliquot of sample is checked with the SNP groups with 15 kinds of SNPs (SNP), its In selected from the group of Jede (Kidd) et al., average heterozygosity >=0.4 (Jede et al., international medical jurisprudence (Forensic Sci Int)164(1):20-32[2006]).Available for the allele for detecting and/or quantifying the foetal DNA in maternal sample Specificity method is described in U.S. Patent Publication 20120010085,20110224087 and 20110201507, these announcements It is incorporated herein by reference.
Normalized chromosome value (NCV) is by calculating all autosomes and sex chromosome as described in example 13 All possible denominator is arranged to determine, however, because the sequencing in the research is previously to use Multi-example/swimming lane work with us Make to carry out on different instruments, so new normalization chromosome denominator must not known.Normalization dyeing in current research Body denominator be based on before sample of analyzing and researching to 110 independent (being not from MELISSA qualified samples) not The training group of impacted sample (i.e. qualified sample) is sequenced and determined.New normalization chromosome denominator is to pass through meter The all possible denominator for calculating all autosomes and sex chromosome is arranged to determine, so as to being had illicit sexual relations for whole gene group The variation of unaffected training group is minimized (table 23) by colour solid.
The NCV rules for being applied to provide the autosome classification of each test sample are described in example 12, i.e., for The classification of autosomal aneuploidy, NCV>4.0 require chromosome classification to be impacted (the i.e. non-multiple of the chromosome Body) and NCV<Chromosome classification is unaffected by 2.5.Autosomal sample with NCV between 2.5 and 4.0 Product are referred to as " not being classified ".
Sex chromosome classification in this test is carried out by application in order for X and Y NCV, as follows:
1. if NCV X<- 4.0 and NCV Y<2.5, then classify sample as monosomy X.
2. if NCV X>- 2.5 and NCV X<2.5 and NCV Y<2.5, then classify sample as women (XX).
3. if NCV X>4.0 and NCV Y<2.5, then classify sample as XXX.
4. if NCV X>- 2.5 and NCV X<2.5 and NCV Y>33, then classify sample as XXY.
5. if NCV X<- 4.0 and NCV Y>4.0, then classify sample as male (XY).
6. if meet condition 5, but NCV Y are about 2 times that NCV X are expected measured value, then classify sample as XYY.
7. if chromosome x and Y NCV do not meet any above criterion, then classify sample as sex not It is classified.
Because laboratory is blind to clinical information, sequencing knot is not adjusted for any following demographic variable Fruit:Maternal body mass index, smoking state, diabetes be present, be pregnant type (spontaneous or auxiliary), previous pregnant, previous Aneuploidy or conceptional age.Using neither the sample that parent is not male parent again is classified and according to point of this method Class is not dependent on the measured value of specific gene seat or allele.
Sequencing result is returned to independent signing biostatistican before making known and analyzing.The personnel of research website, CRO (including producing the biostatistican of stochastical sampling list) and signing cytogeneticist are blind to sequencing result.
The normalization chromosome sequence systematically determined of 23. all chromosomes of table
Statistical method is recorded in the detailed statistical analysis of the research in the works.For six kinds analysis classification in each, Use clo amber-Pearson came method (Clopper-Pearson method) meter sensitivity and specific point estimate and standard 95% true confidence subregion.For all statistical estimators carried out, removal is not detected by foetal DNA, ' is inspected ' complex karyotype (agreement defined according to stipulations) or pass through be sequenced test ' not being classified ' sample.
As a result
Between in June, 2010 and in August, 2011,2,882 pregnant woman are registered in the research.Eligible subjects and selected The feature for the cohort selected is provided in table 24.Register and blood is provided but is subsequently found to go beyond during data monitoring and include Criterion and subject of the actual conceptional age more than 0 day 22 weeks when registering allow to retain under study for action (n=22).These samples Three in product are in selected group.Figure 50 shows flow of the sample between registration and analysis.It is adapted in the presence of 2,625 The sample of selection.
The patient demographic of table 24.
*GA in invasive program.
**The abnormal penetrance of ultrasonic wave is higher in the fetus with abnormal karyotype
Abbreviation:BMI- body mass indexs;IUGR- intrauterine fetal growth retardations
According to random sampling scheme, selection has all Eligible subjects of abnormal karyotype and nourishes euploid fetus Subject group is used to analyze (Figure 50 B), is approximately 4 so as to which research colony is always sequenced and is produced for trisomy 21:1 not by shadow Loud:The ratio of impacted subject.By the technique, 534 subjects are selected.Then because sample tracing problem is from analysis Two samples of middle removal, whole chain of custody does not pass through quality audit (Figure 50) wherein between sample cell and data acquisition.Thus produce 532 subjects of raw 53 contributions by 60 research websites are for analysis.The demographics of selected cohort with Total cohort is similar.
Test performance
Figure 51 A-51C show the flow chart of the aneuploidy analysis of chromosome 21,18 and 13, and Figure 51 D-51F Show gender analysis flow.Table 27 shows sensitivity, specificity and the confidence subregion of each in six analyses, and Figure 52,53 and 54 show the diagram sample distribution according to the NCV after sequencing.In all 6 are analyzed classification, due to not examining Measure foetal DNA and remove 16 samples (3.0%).After making known, recognizable Clinical symptoms is not present in these samples.It is all kinds of The number of other caryogram checked depends on the situation (being fully specified in Figure 52) analyzed.
The sensitivity and specificity of method for testing and analyzing the T21 in colony (n=493) are accordingly 100% (95%CI=95.9,100.0) and 100% (95%CI=99.1,100.0) (table 27 and Figure 51 A).The example is included to following Correct classification:A kind of complicated T21 caryogram 47, XX, inv (7) (p22q32) ,+21;With two kinds due to Robertsonian translocation The transposition T21 of (Robertsonian translocations), one of which for monosomy X or mosaic (45, X ,+ 21,der(14;21)q10;q10)[4]/46,XY,+21,der(14;21)q10;Q10) [17] and 46, XY ,+21, der (21; 21)q10;q10).
The sensitivity and specificity for testing and analyzing the T18 in colony (n=496) are 97.2% (85.5,99.9) and 100% (99.2,100.0) (table 27 and Figure 51 B).Although (according to stipulations) were checked from initial analysis, with regard to T21 and T18 Four samples of the speech with mosaic caryogram are all correctly classified as aneuploidy ' impacted by the method for the present invention ' (table 25).Because they are correctly detected out, they are noted on the left of Figure 51 A and 51B.It is all remaining The sample checked is all correctly classified as unaffected (table 25) for chromosome 21,18 and trisomy 13.Inspection The T13 surveyed in analysis colony sensitivity and specificity are 78.6% (49.2,99.9) and 100% (99.2,100.0) (figure 51C).A detected T13 case (46, XY ,+13, der (13 caused by Robertsonian translocation;13)q10;q10).Contaminating Colour solid 21 has seven samples not being classified (1.4%) in analyzing, and has five (1.0%) in the analysis of chromosome 18, and Chromosome 13 has two (0.4%) (Figure 51 A-51C) in analyzing.In all categories, there is that three samples are overlapping, and these samples are simultaneous There is the caryogram (69, XXX) being inspected and be not detected by foetal DNA.A sample not being classified during chromosome 21 is analyzed Product correctly identify the T13 in being analyzed for chromosome 13, and a sample not being classified during chromosome 18 is analyzed is just The T21 really being identified as during chromosome 21 is analyzed.
The caryogram that table 25. is inspected
*The subject excluded due to the label chromosome in a cell line from all analysis classifications.
**Caryogram 48, XXY ,+18 is not classified in the analysis of chromosome 18 and is not detected by sex chromosome aneuploidy Subject.
The exception and the caryogram of complexity that table 26. is not inspected
*After making known, notice that increased normalized chromosome value (NCV) is from the sequencing label in chromosome 6 3.6。
It is 433 that sex chromosome for determining this method performance, which analyzes colony (women, male or monosomy X),.We use Allow accurately to determine sex chromosome aneuploidy in the extracted arithmetic for classifying to sex state, so as to obtain more The high number of results not being classified.Sensitivity and specificity for detecting diploid women state (XX) are accordingly 99.6% (95%CI=97.6,>99.9) and 99.5% (95%CI=97.2,>99.9);For detecting sensitivity and the spy of male (XY) The opposite sex is all 100% (95%CI=98.0,100.0);And the sensitivity and specificity for detecting monosomy X (45, X) are 93.8% (95%CI=69.8,99.8) and 99.8% (95%CI=98.7,>99.9) (Figure 33 D-f).Although checked by analysis Cross (according to stipulations), but the sequencing of mosaic monosomy X caryogram is classified as follows (table 25):2/7 is classified as monosomy X, and 3/7 It is classified as with the Y chromosome component for being classified as XY, and 2/7 with XX chromosome complements is classified as women.Root Two samples that classification according to the present invention is monosomy X have caryogram 47, XXX and 46, XX.For caryogram 47, XXX, 47, XXY and 47, XYY, 8/10ths sex chromosome aneuploidy are correctly classified (table 25).If sex chromosome classification office It is limited to monosomy X, XY and XX, then most of sample not being classified will can be correctly classified as male, but will not XXY and XYY aneuploidy can be identified.
In addition to chromosome 21,18, trisomy 13 and sex exactly classify, sequencing result can also be by two Aneuploidy in individual sample (47, XX ,+16 and 47, XX ,+20) for chromosome 16 and 20 is correctly classified (table 26).Make us Interested, long-armed (6q) and two duplications (one of them is 37.5 megabasses in size) with chromosome 6 face A sample of complicated change shows that the sequencing label in chromosome 6 causes NCV to increase (NCV=3.6) on bed.At another In sample, the method according to the invention detects the aneuploidy of chromosome 2, but in the fetal karyotype in amniocentesis not Observe (46, XX).Other complex karyotype variants shown in table 25 and 26 include coming from chromosome inversion, lacked Mistake, transposition, the sample of triploidy and other the abnormal fetuses not detected herein, but the method for the present invention may be used more Classified under high sequencing density and/or under further algorithm optimization.In these cases, method of the invention can be by sample Product are correctly classified as the unaffected and sex for trisomy 21,18 or 13.
In our current research, 38/532 sample by analysis is from the women for living through supplementary reproduction.Wherein, 17/38 Sample has chromosome abnormality;False positive or false negative are not detected by the subgroup.
The sensitivity and specificity of this method of table 27.
Discuss
This determines that the perspective study of whole chromosome fetus aneuploidy is to be designed to simulate real generation by Maternal plasma Sample collection, processing and the situation of analysis in boundary.Whole blood sample is obtained in registration website, it is not necessary to immediately treat, and even Night transports sequencing laboratory.Perspective study with previously only relating to chromosome 21 (Pa Luomaiji (Palomaki) et al., is lost Pass medical science (Genetics in Medicine) 2011:1) on the contrary, in our current research, to all conjunctions with any abnormal karyotype Lattice sample is sequenced and analyzed.Sequencing laboratory does not know which fetal chromosomal may be impacted in advance, does not know non-yet The ratio of euploid and euploid sample.It is statistically significantly non-whole to ensure that the research and design enlists excessive risk research pregnant woman group Ploidy prevalence rate, and table 25 and 26 indicates the complexity of analyzed caryogram.As a result prove:I) can be in high sensitivity and spy The lower detection fetus aneuploidy of the opposite sex (including caused by being made a variation by translocation trisomics, mosaic and complexity);And ii) one Aneuploidy in individual chromosome does not influence the energy that the inventive method is used to correctly identify the euploid state of other chromosomes Power.The algorithm utilized in previous research seems effectively determine to be inevitably present in general clinical populations Other aneuploidy (Ai Lixi (Erich) et al., U.S.'s journal of obstetrics and gynecology (Am J Obstet Gynecol) in March, 2011; 204(3):205e1-11;Zhao et al., British Medical Journal (BMJ) 2011;342:c7401).
On mosaic, the analysis in this research to sequencing information can be correctly to pin in 4/4 impacted sample The sample for having mosaic caryogram to chromosome 21 and 18 is classified.These results prove to be used for nothing in detection of complex mixture The sensitivity of the analysis of the special characteristic of cell DNA.In a case, indicated completely for the sequencing data of chromosome 2 Or partial chromosomal aneuploidy, and the amniocentesis results of karyotype for chromosome 2 is diploid.In two other examples In, a sample has 47, XXX caryogram and another sample has 46, XX caryogram, and method of the invention classifies these samples For monosomy X.It is mosaic case to be possible to these, or pregnant woman itself is mosaic.(it is important that it should be remembered that, sequencing It is that STb gene is carried out, the STb gene is the combination of parent and foetal DNA.Although) by invasive program to amnion cell or suede Hair carries out the reference standard that CYTOGENETIC ANALYSIS OF ONE is currently aneuploidy classification, but to the caryogram of a limited number of cell progress Low-level mosaic can not be excluded.Current clinical study design does not include long-term baby's follow-up or contacts placenta group in childbirth Knit, therefore we not can determine that these are true or false positive findingses.It is presumed that compared with Standard karyotype determines, work is sequenced The specificity of skill is with can finally provide pair according to being used to detecting the algorithm combination that the inventive method of whole gene group optimized The abnormal more sensitive identification of foetal DNA, particularly in the case of mosaic.
International pre-natal diagnosis association has delivered is used for Down syndrome (Down to large-scale parallel sequencing (MPS) Syndrome) (this (Benn) et al., pre-natal diagnosis are stated in the fast reaction for being available for business usability to be commented on of antenatal detection (Prenat Diagn)2012doi:10.1002/pd.2919).They state, are introducing the base for fetal Down syndrome , it is necessary to which the evidence tested in some subgroups, is such as passing through body before the population screening of the parallel sequencing of conventional macro-scale In the women of outer fertilization pregnancy.The result reported herein shows that this method is accurate in the pregnant woman group, wherein more people deposit In higher aneuploidy risk.
Although these results demonstrate utilize through optimization algorithm this method be used for from aneuploidy risk compared with Whole gene group in the single pregnancy of high women carries out premium properties during aneuploidy detection, but when prevalence rate is relatively low And when being multifetation, it is necessary to which more experience is established to the diagnosis capability of this method particularly in low-risk colony Credibility.In the early stage of clinical implementation, should after positive pregnancy first or second trimenon screening results basis This method is classified using sequencing information to chromosome 21,18 and 13.Thus will reduce caused by false positive screening results Unnecessary invasive program, the reduction of the simultaneous program related to adverse events.Invasive program may limit to In the positive findings that confirmation is obtained by sequencing.However, clinical scenarios (such as the parent that pregnant woman wants to avoid invasive program be present Advanced age and sterility);They may require the test as preliminary examination and/or the alternative solution of invasive program.All Patient is seeked advice to ensure that they understand limitation and the implication of result of test before should receiving sufficiently test.With using more Multi-example carries out experience accumulation, and the test is possible to that current examination planning of experiments will be substituted and becomes preliminary examination, and Ultimately become the non-invasive diagnostic experiment of fetus aneuploidy.
Example 17
Fetus fraction is determined by NCV complete or partial fetal chromosomal aneuploidy in discrimination analysis sample be present
Assuming that the chromosome dosage of related fetal chromosomal proportionally increases with increased fetus fraction in maternal sample, People are expected for complete chromosome interested, and the ff values based on NCV values will be determined presence or absence of complete fetus Chromosomal aneuploidy.In order to which the ff for proving to be determined by NCV can be used for distinguishing complete chromosomal aneuploidy and partial dye The presence of colour solid aneuploidy or the contribution of mosaic sample, established using the genomic DNA from mother and their children Simulate the fetus found in pregnant woman is circulated and the Artificial sample of parent cfDNA mixture.The value based on NCV of fetus fraction It is a kind of form of above-mentioned hypothesis fetus fraction.
The DNA of mother and children are purchased from Julius Korir medical research association (Coriell Institute for Medical Research) (Camden, New Jersey).DNA is identified and sample caryogram is provided in table 27.
The example 17 of table 27.
The sample comprising complete chromosome or partial chromosomal aneuploidy is analyzed as follows.
In all cases, the genomic DNA from mother and the genomic DNA from children are sheared by sonication, Wherein peak value is 200bp.Artificial sample comprising mother DNA additional 0%, 5% or 10%w/w children DNA is handled with Sequencing library is prepared, it is sequenced using synthetic method sequencing with extensive parallel mode as described in example 12.It is each artificial DNA sample is sequenced four times in sequencer using independent flow cell, to provide comprising 0%, 5% and 10% children DNA 4 sequence information collection of each sample.36bp readings are compared with mankind's canonical sequence genome hg19, and to uniquely The label of mapping is counted.Each of 4 flow cell swimming lanes used for each sample, obtain about 125X 106It is individual Sequence label.
In the qualified samples group comprising 20 males and 20 women gDNA libraries identification normalization chromosome (it is single or Chromosome group), as described elsewhere herein.It is identified as chromosome 4+ dyeing for the normalization chromosome of chromosome 21 Body 16+ chromosome 22s;It is identified as chromosome 4+ chromosome 6+ chromosome 8+ chromosomes for the normalization chromosome of chromosome 7 12+ chromosome 19+ chromosomes 20;It is identified as chromosome 9+ chromosomes 12+ dyeing for the normalization chromosome of chromosome 15 Body 14+ chromosome 19+ chromosomes 20;It is identified as chromosome 19 for the normalization chromosome of chromosome 22;And for dye Colour solid X normalization chromosome is identified as chromosome 4+ chromosome 6+ chromosome 7+ chromosomes 8.To by being carried out to Artificial sample The chromosome interested and the sequence of corresponding normalization chromosome (single chromosome or chromosome group) for being sequenced and obtaining Label is counted, and for calculating chromosome dosage and calculating NCV.
In this example, ff, wherein NCV are determined using the NCV for the chromosome 21 in sample mixture (1)21AIt is pin To NCV values determined by the chromosome 21 in test sample (1), the test sample includes trisome 21, and CV21UIt is The coefficient of variation of the dosage of identified chromosome 21 in qualified samples (including diploid chromosome 21);And wherein NCVXA It is NCV values determined by the chromosome x being directed in test sample (1), the test sample includes trisome 21, and CVXU It is the coefficient of variation of the dosage of identified chromosome x in qualified samples (including impregnable female child chromosome).
Figure 56 is shown in the maternal sample (1) of synthesis using the dosage (ff of chromosome 2121) determine percentage " ff " is with the dosage (ff using chromosome xX) figure of percentage " ff " change that determines, the sample included from having three bodies The DNA of the children of property 21.
Data show, chromosome dosage and proportionally increase as ff increases from its NCV, and are using The percentage ff that the dosage of trisome (i.e. chromosome 21) determines is known as the existing dye of single chromosome with use Have 1 between the percentage ff that the dosage of colour solid (i.e. chromosome x) determines:1 relation.
Figure 57 is shown in the maternal sample (2) of synthesis using the dosage (ff of chromosome 77) determine percentage " ff " With the dosage (ff using chromosome xX) determine percentage " ff " change figure, the sample include come from an euploid mother The DNA of her affine children, its children carry excalation in chromosome 7.
As shown in for sample (1) and (2), data show chromosome dosage and from its NCV as ff increases And proportionally increase.However, in the case where aneuploidy is the chromosomal aneuploidy of part, part aneuploid is used Chromosome dosage (the ff of chromosome7) determine percentage ff not with using chromosome x dosage (ffX) determine percentage ff It is corresponding.Therefore, 1 shown by complete trisomic sample is deviateed:1 relation shows part aneuploidy be present.
Figure 58 is shown in the maternal sample (3) of synthesis using the dosage (ff of chromosome 1515) determine percentage " ff " is with the dosage (ff using chromosome xX) determine percentage " ff " change figure, the sample include come from a multiple Body mother and the DNA of her children, the children are 25% mosaic types that there is the part of chromosome 15 to replicate.
Shown for sample (1) and (2), the ff of dosage determination and from its NCV with ff increases and Proportionally increase.As shown in sample (2), sample (3) includes the chromosomal aneuploidy of part, and non-using part Chromosome dosage (the ff of euploid chromosomal15) determine percentage ff with using for chromosome x dosage (ffX) determine Percentage ff it is corresponding.Shortage correspondence shows the aneuploidy that part be present rather than complete chromosome between two ff Aneuploidy.
Figure 59 is shown in Artificial sample (4) using the dosage (ff of chromosome 2222) determine percentage " ff " and be derived from In its NCV figure, the sample includes 0% children DNA (i);With 10% DNA from unaffected twin boys (ii), it is known that the chromosomal aneuploidy of part of the son without chromosome 22;And 10% from impacted double born of the same parents The DNA (iii) of fetus, it is known that the son has the chromosomal aneuploidy of the part of chromosome 22.Data are shown, for bag Determined containing the sample from unaffected twinborn DNA and by four NCV of the Rapid Dose Calculation according to chromosome 22 " ff " close to zero, this shows the aneuploidy that chromosome 22 is not present in unaffected children;And when according to dyeing During body X Rapid Dose Calculation, unaffected twinborn " ff " confirms that " ff " of unaffected twins' sample is about 10%. Data are also shown, for comprising the sample from impacted twinborn DNA and by the dosage (ff according to chromosome 2222) " ff " that four NCV calculated are determined is about 3%, and this shows aneuploidy in chromosome 22 be present;And when according to chromosome x Dosage (ffX) when calculating, " ff " confirms that " ff " of unaffected twins' sample is about 10%.ff22With ffXBetween lack Correspondence shows that the aneuploidy of the chromosome 22 in impacted twins is the chromosomal aneuploidy of part.
Therefore, data are shown, in the maternal sample of the cfDNA comprising male fetus, chromosome dosage and from it NCV values can be used for distinguishing the aneuploidy that the part in the presence of complete trisomy and mosaic sample be present and/or complete Or partial aneuploidy.Partial aneuploidy can be increasing or decreasing for a chromosome part.It is optionally possible to such as The aneuploidy of part and/or mosaic are obtained by using chromosome dosage and the fetus fraction of estimation described in example 12 Fractionation.
Above-mentioned fetus fraction method can be also used for determining one or more fetuses in multifetation have an aneuploidy can Can property.For example, in the case of a fraternal twin, find according to NCVXThe fetus fraction that value determines is 8.3%, and by NCV21The fraction that value measures is 5.0%.Being indicated above in a pair of male fetus only one has T21 aneuploidy, and The result is confirmed by results of karyotype.In another has the twinborn example of parent, according to the fetus of X chromosome determination Fraction is 7.3%, and the fetus fraction determined by chromosome 18 is 8.9%.In this example, two double born of the same parents are determined according to caryogram Tire is all T18 male.
Example 18
Fetus fraction is determined by NCV to identify the presence of complete fetal chromosomal aneuploidy in clinical sample
In order to prove that the ff (CNff) determined according to NCV can be used for distinguishing complete chromosomal aneuploidy in clinical sample With the presence of the chromosomal aneuploidy of part, using the cfDNA obtained from pregnant woman blood to the dye interested in clinical sample Colour solid 21,13 and 18 is quantified.The presence of trisomy is verified by caryogram.
CfDNA is obtained from following sample:Respectively nourishing one has the pregnant woman 46 of male fetus of trisomy 21 (T21) Maternal sample;Respectively nourish 13 maternal samples of the pregnant woman of a fetus with trisomy 18 (T18);And nourish a tool There are 3 maternal samples of the pregnant woman of the male fetus of trisomy 13 (T13).These clinical samples are from described in example 16 The sample of clinical research.CfDNA is separated, and is made as described in example 16, but using new Yi Lu meter Na v3 chemical substances Standby sequencing library.
Also using new Yi Lu meter Na v3 chemical substances to by derived from known uninfluenced for chromosome 21,18 and 13 Qualified samples cfDNA made from sequencing library be sequenced.People will be mapped to for the sequence reads that qualified samples obtain Class canonical sequence genome hg19, and to uniquely mapping all chromosomes corresponding to mankind's canonical sequence genome hg19 The sequence reads of sequence (not shielding repetitive sequence) are counted, and for which dyeing in the test sample systematically to be determined Body or which group chromosome will serve as the normalization chromosome of each chromosome 21,18 and 13 interested.
Table 28 below shows the dyeing for being used to determine to be directed to chromosome 1-22, X and Y in each test sample identified The normalization chromosome (denominator chromosome) of body dosage (ratio).
The confession that the example 18- of table 28. is systematically identified is used for the normalization chromosome of T21, T18 and T13 test sample
When having identified the normalization chromosome in qualified samples, test sample is sequenced, and surveyed to being mapped to The sequence label of each chromosome 21,18,13 and corresponding normalization chromosome in test agent is counted, and based on Calculate chromosome dosage (ratio).Then, NCV values are calculated according to below equation as discussed previously:
For each test sample, the below equation according to other parts in this specification determine for chromosome x and The fetus fraction of chromosome interested:
Ff=2 × | NCViACViU| equation 28.
Figure 60 shows the figure of the identified CNffx contrasts CNff21 in the sample comprising fetus T trisomys 21.Such as pin To complete chromosomal aneuploidy it is anticipated that CNffx with using chromosome 21 NCV determined by (CNff21) match.
Similarly, in T18 test samples, CNffx matches with (CNff18) determined by the NCV using chromosome 18 (Figure 61), and in T13 test samples, CNffx matches with (CNff13) determined by the NCV using chromosome 13 and (schemed 62)。
Figure 60 also show the fetus fraction that the sample influenceed for female child by T21 is obtained.As is expected, CNff21 in these " women " samples can not be by relatively verifying compared with chromosome x.In order to verify women sample CNff21, it may be determined that the CNff of the known chromosome (such as chromosome 1) that can not turn into fetal aneuploidy.As alternative Case, the CNff21 of " women " sample can be by the way that it to be compared to determine with NCNff, such as passes through such as this paper other parts institute State and the label of polymorphic sequence is counted and determined.
Therefore, the gained NCV values of the copy number variation of sequence label number and the complete chromosome of identification can be used for determining Corresponding fetus fraction in the sample of aneuploid/impacted.The CNff of chromosome interested with it is known be not aneuploid The CNff correspondence of chromosome can be used for confirming the presence of complete Trisomy.
Example 19
Fetus fraction is determined by NCV to identify the fetal chromosomal aneuploidy that part in clinical sample be present
In order to which the ff (CNff) for proving to be determined according to NCV can be used for identifying and position the chromosome of part in clinical sample The presence of aneuploidy and partial chromosomal aneuploidy, to from having been identified as with chromosome 17 as described in example 18 The cfDNA of the clinical sample of aneuploidy is sequenced and analyzed.
Use the normalization chromosome (dyeing identified in the chromosome 17 and qualified samples group being mapped in test sample Body 16+ chromosome 20+ chromosome 22s) sequence label (with upper table 28), calculate test sample in be directed to each chromosome NCV Value.
Figure 63 shows the figure for chromosome 1-22 and X NCV values in test sample.As shown in the figure, for dye The NCV values of colour solid 17 are confirmed as with NCV>4, it is to select the threshold value for identifying aneuploid chromosome.The figure also shows The NCV values for chromosome x are gone out, as expected, chromosome x has negative NCV.
The CNff of chromosome 17 and chromosome x is calculated according to below equation:
ff(i)=2*NCVjACVjUEquation 25,
And determine CNff17=3.9% and CNffX=13.5%.
Difference between CNff shows the aneuploidy that part be present or is probably mosaic.
In order to distinguish the aneuploidy of part and possible mosaic, for the continuous bases of each 100Kbp on chromosome 17 Block/subregion calculates normalized binary value (NBV) to be counted to number of tags for each subregion.In independent subregion The normalization of number of tags be by determine label/data box with formed objects and with closest with institute analyze data case G/C content 20 data boxes in the ratio of number of tags summation carry out.Therefore, in this case, normalization and G/C content It is relevant.Optionally, data box normalization is also possible to relevant with the variability of data box dosage, such as chromosome dosage/ratio Determined in qualified samples described in rate.In this example, GCC Z scores are equal to the NBV values as determined by following:
Wherein MjAnd MADjIt is accordingly the estimation median and warp for j-th of chromosome dosage in qualified samples group Cross the deviation of median adjustment, and xijIt is j-th of the chromosome dosage observed for test sample i.
Return for the normalized binary value (NBV) of each 100Kbp subregions of the length along chromosome 17 as instruction GC The one GCC Z changed obtain form-separating and shown in Figure 64 Y-axis.Figure shown in Figure 64 is expressly shown corresponding in chromosome 17 The copy number increase of approximate last 200,000bp subregion.The discovery illustrates that one at the q ter of chromosome 17 is answered with being directed to The caryogram that the sample of system is provided is consistent.
Therefore, the aneuploidy for the part that CNff can be used in identification and positioning dyeing body.
_____________
Example 20
The verification sample integrality in parent cfDNA multi-biological is examined
By the marker molecules synthesis with the known sequence being not included in any of genome and to test The integrality of whole blood and blood plasma maternal source sample is demonstrate,proved, these samples pass through processing to extract fetus and the mother in maternal sample Body cfDNA mixture and it is sequenced.
The experimental data of current and previous has shown that cfDNA average length is about 170bp.Using blast search, Logged in for all genomes, identify the antigene strand sequence for the 170bp being not present in any one known genome.Six Sequence (SEQ ID NO of the individual marker molecules (MM1-MM6) based on the antigene strand sequence identified:1-6;Table 29) synthesis, And as follows to the integrality of verification sample.
Table 29
Marker molecules
Peripheral blood is collected out of pregnant woman's body and (applies special rake in Nebraska State Omaha city to 4 blood collection tubes The Cell-Free DNA of company (Streck, Inc.Omaha NE)TMBCT laboratory is transported in) and all through the night to be analyzed. Two whole blood source samples additional marker molecules as follows.The additional 720pg marker molecules 1 (MM1) of one blood sources sample, And the additional 720pg marker molecules 2 of the second blood sources sample.All 4 pipes centrifuge 10 points all at 4 DEG C under 1600g Clock.Plasma supernatant is removed from each in four pipes, and is put it into 5mL high speed centrifugation pipes and at 4 DEG C Centrifuged 10 minutes under 16000g.The blood plasma fractions of the whole blood of additional marker molecules be distributed in separated pipe and Stored at -80 DEG C.Tried from the blood plasma fractions of two residual blood pipes (not carrying out additional) are then split into 1.1mL etc. points Sample.Blood plasma source sample is prepared as follows.100 pik MM1 are added in a blood plasma aliquot, 100pg MM2 add blood plasma In aliquot 2, etc., to obtain 6 blood plasma source samples by mark, each blood plasma source sample is included in -80 Different marker molecules (MM1-MM6) are stored at DEG C.
One pipe of each blood plasma source sample for passing through mark through the 1 of source blood sample mark with each Individual pipe thaws, and according to the method described in example 1, uses triumphant outstanding blood Mini Kit (Qiagen Blood Mini Kit DNA) is extracted.Use the TruSeq including indexing 1-6TMDNA sample reagent preparation box (San Diego, CA City), prepare library using 30 microlitres of every kind of sample DNA.It is prepared by sequencing library, so that It must be indexed including MM1 sample using indexed molecule 1, including MM2 sample is indexed etc. using index 2.Sequencing library Quantified using Agilent bioanalysis device DNA1000 kits (Agilent technology company, Santa Clara, California) And it is diluted to 4nM with triumphant outstanding buffer solution EB.It will index and collect and be diluted further to by the sample of mark 2nM, then using Yi Lu meter Na TruSeq SBS kit v3, according to table 30, in four swimming of Yi Lu meter Na HiSeq flow cells It is sequenced in road.
Table 30
The layout of multiple sequencing flow cell
By sequence reads be compared with mankind's reference gene group hg19 and with comprising antigene strand marker molecules sequence The reference gene group of the synthesis of row is compared.(i.e. only once) hg19 reference genes group is mapped to uniqueness or there is mark The sequence reads of the reference gene group of the synthesis of thing molecular sequences are counted (table 31).
Table 31
MM sequences are corresponding with source sample cfDNA sequences
* I=indexes
* L=swimming lanes
As shown by data, for each sample, it is determined that the sequence for the MM having been added in source sample is only with adding The sequence for entering the cfDNA of MM source sample is corresponding.For example, the as shown by data of sample 1, it is determined that being mapped to MM1 reading CfDNA of the several sequence with being obtained from the source sample (plasma sample 1) for having been added to MM1 sequence is corresponding.Separately Outside, show that source sample 1 does not have in the absence of different sequences (such as MM2) in the reading obtained from the sequencing cfDNA of source sample 1 Have by another sample (such as source sample 2) cross pollution.
Example 21
Internal positive control
Development is a kind of to be used to carrying out parent cfDNA positive control during large-scale parallel sequencing, be trisomy 13, Trisomy 18 and trisomy 21 provide qualitative positive staining body dosage and NCV values.
By three male patients from the known trisomy accordingly with Chr13, Chr18 and Chr21 into fragment Genomic DNA be applied to women into the DNA backgrounds of fragment.Size choosing is carried out to the genomic DNA into fragment by PAGE Select, with comprising length from about 150bp to the fragment about in the range of 250bp, so as to simulate fetus cfDNA size.To T13, The DNA by size selection of T18 and T21 controls is purified and is carried out end reparation, and uses Nanodrop (Te La Hua Zhou Wilmingtons city (Wilmington, DE)) measurement concentration.Prepared DNA is in bioanalysis device high sensitivity DNA chip Confirmed on (Agilent, Santa Clara, California).Trisomy 13, trisomy 18 and trisomy 21 these DNA is from riel Institute for Medical Research of section (Coriell Institute for Medical Research) (New Jersey Camden City (Camden, NJ)) obtain.Women genomic DNA is from Bo Cheng companies (The Biochain Institute) (California State Hayward city (Hayward, CA)) obtain.Three a small amount of body DNA are applied in main women DNA backgrounds, to simulate in female " male fetus " DNA fractions in property " parent " DNA backgrounds.The composition of this DNA mixture is optimized so that when for When in sequencing inspection to determine to copy number variation, mixture is always qualitatively to trisomy 13, trisomy 18 and trisomy 21 Report is positive, wherein 13,18 and 21 NCV values are more than 4.
Parent cfDNA extracts from plasma sample, and these plasma samples obtain from pregnant woman;And prepare maternal sample The sequencing library of cfDNA and T13, T18 and T21 comparison DNA is used for multiple sequencing, is somebody's turn to do using Yi Lu meter Na platforms Multiple sequencing.Four positive controls and 56 samples are sequenced in each flow cell of sequenator.Such as in the application It is described elsewhere, obtain 36bp readings, identify the label of multiple chromosomes, and calculate NCV values.
Figure 69 A, B and C displaying parent test samples (◇) and the NCV values of internal positive control ().NCV values are more than 4 It is confirmed as that accordingly there is copy number variation for chromosome 13 (A) interested, 18 (B) and 21 (C).The figure shows sun Property control NCV and the NCV of parent test sample be associated, identify that it has and copy number variation, be i.e. chromosome 13,18 and 21 additional copy.
Internal positive control can be designed to simulate complete chromosome variation and chromosome dyad variation, these internal sun Property control can be used for pre-natal diagnosis examine and tire is determined by large-scale parallel sequencing for example as described in throughout this specification The related inspection such as youngster's fraction.
Example 22
Fetus fraction is determined using large-scale parallel sequencing:Sample treatment and cfDNA extractions
From in gravidic first trimenon or second trimenon and being considered as fetus aneuploidy wind being present Peripheral blood sample is collected in pregnant woman's body of danger.Before blood drawing letter of consent is obtained at each participant.In amniocentesis or chorion suede Blood is collected before hair sampling.Karyotyping is carried out to determine fetal karyotype using chorionic villi or amniocentesis sample.
The peripheral blood extracted from each subject is collected in ACD pipes.One pipe blood sample (about 6 to 9 milliliters/pipe) is transferred to In one 15 milliliters of low-speed centrifugal pipe.Using the R centrifuges of Beckman Allegra 6 and the type rotors of GA 3.8, in 2640rpm, 4 By centrifugal blood 10 minutes at DEG C.
Extract, top plasma layer is transferred in 15 milliliters of high speed centrifugation pipes, and use Bake for cell-free plasma Graceful Kurt Avanti J-E centrifuges and JA-14 rotors, centrifuged 10 minutes at 16000 × g, 4 DEG C.After blood collection, Two centrifugation steps were carried out in 72 hours.Cell-free plasma comprising cfDNA is stored at -80 DEG C, and in blood plasma CfDNA is expanded or cfDNA only thaws once before purification.
Using QIAamp blood DNAs Mini Kit (Kai Jie), essentially according to manufacturer specification from cell-free plasma The middle purified Cell-free DNA (cfDNA) of extraction.One milliliter of buffer A L and 100 μ l protein enzyme solutions are added into 1ml blood plasma In.The mixture is incubated 15 minutes at 56 DEG C.One milliliter of 100% ethanol is added in blood plasma digestive juice.Gained is mixed Thing is transferred to what is combined with the VacValve provided in the Plus column combinations parts (Kai Jie) of QIAvac 24 and VacConnector In QIAamp micro-columns.To sample applying vacuum, and under vacuo with 750 μ l buffer As W1 to being trapped on post filter CfDNA washed, then carrying out second with 750 μ l buffer As W24 washs.The post is centrifuged 5 under 14,000RPM Minute to remove any remaining buffer from filter.Pass through the centrifugation buffer A E elutions under 14,000RPM CfDNA, and use QubitTMQuantify platform (Invitrogen (hero)) and determine concentration.
Example 23
Fetus fraction is determined using large-scale parallel sequencing:Prepare sequencing library, sequencing and analysis sequencing data
A. sequencing library is prepared
All sequencing libraries, i.e. target, primary and the library by enrichment, all pact by being extracted from Maternal plasma It is prepared by the purified cfDNA of 2ng.UseNEBNextTMDNA sample prepares DNA reagents collection 1, and (article is compiled Number E6000L;Knob Great Britain biology laboratory, Ipswich, Massachusetts) reagent be carried out as follows library preparation.Because nothing Cell plasma dna is substantially into fragment, therefore no longer makes the plasma dna sample into fragment by spray-on process or sonication. According toEnd repair module, by by cfDNA and NEBNextTMPrepared by DNA sample is carried in DNA reagents collection 1 5 μ l 10X Phosphorylation Buffers, 2 μ l deoxynucleotides solution mixtures (each dNTP of 10mM), the 1 μ l 1 supplied:5DNA is more Poly- enzyme I dilutions, 1 μ l T4DNA polymerases and 1 μ l T4 polynucleotide kinases are together in 1.5ml microcentrifugal tubes 20 It is incubated 15 minutes at DEG C, it is blunt that the jags of the purified cfDNA fragments of about 2ng being included within 40 μ l changes into phosphorylation End.Then hot inactivation is carried out to the enzyme by being incubated the reactant mixture 5 minutes at 75 DEG C.The mixture is cooled to 4 DEG C, And the dA tailing main mixed liquors (NEBNext of Klenow fragment (3 ' to 5 ' exo-) is included using 10 μ lTMIt is prepared by DNA sample DNA reagents collection 1) and 15 minutes are incubated at 37 DEG C to realize blunt end DNA dA tailings.Then, by should at 75 DEG C Reactant mixture is incubated 5 minutes and carries out hot inactivation to Klenow fragment.After Klenow fragment inactivation, NEBNext is usedTM DNA sample prepares 4 μ l T4DNA ligases provided in DNA reagents collection 1, by the way that reactant mixture is incubated into 15 at 25 DEG C Minute, with 1 μ l Yi Lu meter Na genome aptamer oligomeric mixture (Item Numbers 1000521;California Hayward city Illumina Inc.) 1:Yi Lu meter Na aptamers (non-index Y aptamers) are connected to the DNA with dA tails by 5 dilutions.Should Mixture is cooled to 4 DEG C, and uses An Jinkete AMPure XP PCR purification systems (Item Number A63881;Beckman storehouse Your special genome, Danvers, Massachusetts) provided in magnetic bead, from not connected aptamer, aptamer dimer and The cfDNA of aptamer connection is purified into other reagents.Use(fragrant appearance is beautiful, Wo Ben, horse for high-fidelity main mixed liquor Sa Zhusai states) and compensation aptamer Yi Lu meter Na PCR primers (Item Number 1000537 and 1000537) carry out 18 PCR follow Ring so as to be optionally enriched with aptamer connection cfDNA.Use Yi Lu meter Na Genomic PCRs primer (Item Number 100537 With 1000538) and NEBNextTMDNA sample prepares the Phusion HF PCR main mixed liquors provided in DNA reagents collection 1, root The DNA connected according to manufacturer specification to aptamer enters (98 DEG C, 30 seconds of performing PCR;98 DEG C, 10 seconds, 18 circulations;65 DEG C, 30 Second;And 72 DEG C, 30 seconds;Final extension 5 minutes at 72 DEG C, and it is maintained at 4 DEG C).Use An Jinkete AMPure XP PCR purification systems (An Jinkete biotechnologies company, Bi Lifo, Massachusetts), according to can be The manufacture obtained at www.beckmangenomics.com/products/AMPureXPProtocol_000387 v001.pdf Business's specification purifies the product by amplification.Purified amplified production is eluted in the triumphant outstanding EB buffer solutions of 40 μ l, and Use the Agilent DNA 1000 for 2100 bioanalysis devices (Agilent technology company, Santa Clara, California) Kit analyzes the concentration in the library by amplification and size distribution.
B. it is sequenced
Using genome analysis instrument II (Illumina Inc., Santiago, California, USA), according to standard system Business's stipulations are made, library DNA is sequenced.The stipulations of genome sequencing are carried out using Yi Lu meter Na/Suo Lekesa technologies Copy can be found in BioTechniques.RTM. stipulations guidance page 2007 the 29th disclosed in December, 2006, and in ten thousand dimensions Net biotechniques.com/default.aspPage=protocol&subsection=article_display&id Found on=112378.
DNA library is diluted to 1nM and is denatured.According to can be in WWW illumina.com/systems/ The Yi Lu meter Na clusters station users' guidebook (Illumina ' s obtained on genome analyzer/cluster_station.ilmn Cluster Station User Guide) and cluster station operating guidance (Cluster Station Operations Guide) Described in program, make library DNA (5pM) carry out cluster amplification.Using Yi Lu meter Na genome analysis instrument II to by expanding DNA be sequenced, to obtain 36bp single-ended reading.Identify that a sequence belongs to a specific human chromosome, only Only need about 30bp random sequence information.Longer sequence can uniquely identify more specifically target.In current feelings Under condition, numerous 36bp readings are obtained, cover about the 10% of genome.
C. sequencing data is analyzed to determine fetus fraction
Once completing the sequencing of sample, image and base are judged that file is transferred to by Yi Lu meter Na " sequence control software " One is run Yi Lu meter Na " genome analysis instrument streamline (Genome Analyzer Pipeline) " software version 1.51 In Unix servers.Using BOWTIE programs, 36bp readings and artificial reference genome (such as SNP genomes) are compared It is right.The artificial reference genome is identified as covering the polymorphic DNA sequence dna of the allele included in polymorphic target sequence Packet.For example, artificial reference genome is to include SEQ ID NO:7-62 SNP genomes.Only uniqueness is mapped to this The reading of artificial gene group is used to analyze fetus fraction.The reading of matching SNP genomes completely be can be regarded as label and be carried out Filtering.In residual readings, only the reading with one or two mispairing be can be regarded as label and is included in analysis.To mapping Label to each in polymorphic allele is counted, and fetus fraction is defined as being mapped to main allele (i.e. Maternal allele) label number and be mapped to time allele (i.e. foetal allele) label number ratio Rate.
Example 24
Autosome SNP is selected to determine fetus fraction
One group of 28 autosome SNP is inventory (Parkes et al., human genetics 127 selected from 92 SNP:315- 324 [2010]) and selected from Web address be appliedbiosystems.com Life TechnologiesTM(add profit Fu Niya states Carlsbad city) application biosystem.Primer is designed to and SNP site close on cfDNA Sequence hybridization is produced with ensuring that the SNP site is included in by carrying out large-scale parallel sequencing on Yi Lu meter Na analyzers GII In raw 36bp readings, and produce the amplicon that length is enough to carry out bridge amplification during cluster is formed.Therefore, primer quilt It is designed to produce at least 110bp amplicon, these amplicons are in the General adaptive (Jia Lifuni with expanding for cluster Ya Zhou Diego Californias Illumina Inc.) combination when produce at least 200bp DNA molecular.Primer sequence is identified, and is led to Cross integrated DNA technique (Santiago, California) synthetic primer set (i.e. forward and reverse primer) and molten with 1 μM Liquid form stores, and is ready to use in as described in example 25 to 27, expands polymorphic target sequence.Table 33 provides RefSNP (rs) deposits Identification number, the primer for expanding target cfDNA sequences and comprising possible SNP equipotentials caused by these primers will be used The sequence of the amplicon of gene.The SNP provided in table 33 is used in a multiple check while expands 13 target sequences. The group provided in table 33 is an exemplary SNP group.Polymorphic target core can be directed to using less or more SNP Sour enriches fetal and mother body D NA.The extra SNP that can be used is included in the SNP provided in table 34.SNP allele is with slightly Body display and underline.Other the extra SNP for determining fetus fraction available for the method according to the invention include rs315791、rs3780962、rs1410059、rs279844、rs38882、rs9951171、rs214955、rs6444724、 rs2503107、rs1019029、rs1413212、rs1031825、rs891700、rs1005533、rs2831700、 Rs354439, rs1979255, rs1454361, rs8037429 and rs1490413, by TaqMan PCR for determining These SNP of fetus Fraction analysis, and be disclosed in U.S. Provisional Application table 61/296,358 and 61/360,837.
Table 33
For determining the SNP groups of fetus fraction
Table 34
For determining the extra SNP of fetus fraction
Example 25
By carrying out large-scale parallel sequencing to target library to determine fetus fraction
In order to determine the cfDNA fractions of fetus in maternal sample, the polymorphic nucleotide sequence of target to each including SNP Expanded and for preparing the target library being sequenced with extensive parallel model.
Extraction cfDNA as described above.Target sequencing library is prepared as follows.Included in the 5 purified cfDNA of μ l CfDNA is including 50 μ l reactants of 7.5 μ l, 1 μM of primer mixture (table 1), 10 μ l NEB 5X main mixed liquors and 27 μ l water Expanded in product.Using following cycling condition, thermal cycle is carried out with Gene Amp9700 (applying biosystem):It is incubated at 95 DEG C 1 minute, then 20 seconds at 95 DEG C, 1 minute at 68 DEG C, and 30 seconds at 68 DEG C, circulate 20 to 30 times, then at 68 DEG C Lower final incubation 5 minutes.Be eventually held at 4 DEG C, until for purified cfDNA samples do not expand part combine and Remove sample.Use An Jinkete AMPure XP PCR purification systems (Item Number A63881;Beckman Kurt genome, Danvers, Massachusetts) product by amplification is purified.It is eventually held at 4 DEG C, until to prepare target library And remove.With 2100 bioanalysis devices analyze (Agilent technology company, California Sen Niweier cities (Sunnyvale, CA)) product Jing Guo Kuo Zeng and the concentration of the product by amplification is determined.The sequencing library of target nucleic acid by amplification is such as Prepared described in example 23, and using the synthetic method sequencing by reversible dye-terminators and according to Yi Lu meter Na stipulations (BioTechniques.RTM. stipulations guide 2007 page 29 disclosed in December, 2006, and in WWW biotechniques.com/default.aspPage=protocol&subsection=article_display&id= 112378) it is sequenced with extensive parallel model.As described, to be mapped to by comprising SNP 26 sequences (13 pairs, it is each To representing two allele) (i.e. SEQ ID NO:7-32) label of the reference gene group of composition is analyzed and counted.
Table 35 is provided from target library being sequenced obtained label counting, and counted from what sequencing data obtained The fetus fraction of calculation.
Table 35
By carrying out large-scale parallel sequencing to polymorphic nucleic acid library to determine fetus fraction
As a result show, each polymorphic nucleotide sequence for including at least one SNP can be from from Maternal plasma sample CfDNA amplification, to construct a library, the library can be sequenced by extensive parallel model to determine parent The fraction of fetal nucleic acid in sample.
Example 26
In cfDNA sequencing library samples fetus fraction is determined after fetus and maternal nucleic acids enrichment.
For the tire being enriched with included in the primary sequencing library constructed using purified fetus and parent cfDNA Youngster and parent cfDNA, polymorphic target nucleic acid sequence is expanded using a part for purified cfDNA samples, and prepared The sequencing library of the polymorphic target nucleic acid expanded, the sequencing library is to the fetus being enriched with included in the primary libraries and mother Body nucleotide sequence.
This method corresponds to workflow illustrated in Figure 10.As described in example 23, from purified cfDNA's A part prepares target sequencing library.As described in example 23, primary survey is prepared using purified cfDNA remainder Preface storehouse.By the way that primary and target sequencing library is diluted into 10nM, and by target library and primary libraries with 1:9 ratio The sequencing library of enrichment is combined to provide, to realize for the polymorphic nucleic acid by amplification included in target library to primary The enrichment in library.As described in example 23, the library of enrichment is sequenced and sequencing data is analyzed.
Table 36 provides the number of the sequence label for the SNP genomes for being mapped to informedness SNP, and these informednesses SNP leads to Cross the enriched library of the plasma sample of the pregnant woman to accordingly nourishing T21, T13, T18 and monosomy X fetuses from each It is sequenced and is identified.Fetus fraction is calculated as below:AllelexFetus fraction %=((∑ allelexFoetal sequence Label)/(∑ allelexParental sequences label)) × 100
Table 36 additionally provides the number for the sequence label for being mapped to mankind's reference gene group.Using corresponding with for determining Fetus fraction identical plasma sample, is determined presence or absence of non-multiple using the label for being mapped to mankind's reference gene group Property.Determine that the method for aneuploidy is described in U.S. Provisional Application 61/407,017 and 61/455 using sequence label counting, In 849778, these applications are hereby incorporated by by quoting with its full text.
Table 36 to the enriched library of polymorphic nucleic acid by carrying out large-scale parallel sequencing to determine fetus fraction
Example 27
Fetus fraction is determined by large-scale parallel sequencing:
The enrichment of fetus and maternal nucleic acids in purified cfDNA samples for polymorphic nucleic acid.
For the fetus being enriched with included in the cfDNA gone out from Maternal plasma sample extraction purification of samples and parent CfDNA, polymorphic target nucleic acid sequence is expanded using a purified cfDNA part, each polymorphic target nucleic acid sequence Row are comprising one the SNP selected from the SNP groups provided in table 33.
This method corresponds to workflow illustrated in Fig. 9.As described in example 22, nothing is obtained from maternal blood sample Cell blood plasma, and purify cfDNA from plasma sample.It is 92.8pg/ μ l. to determine ultimate density.5 μ l are purified CfDNA included in cfDNA is including 7.5 μ l, 1 μM of primer mixture (table 1), 10 μ l NEB 5X main mixed liquors and 27 μ Expanded in 50 μ l reaction volumes of l water.Thermal cycle is carried out with Gene Amp9700 (applying biosystem).Use following circulation bar Part:It is incubated 1 minute at 95 DEG C, then 20 seconds at 95 DEG C, 1 minute at 68 DEG C, and 30 seconds at 68 DEG C, circulation 30 It is secondary, it is then final at 68 DEG C to be incubated 5 minutes.Be eventually held at 4 DEG C, until for purified cfDNA samples not Combine and remove sample in amplification part.Use An Jinkete AMPure XP PCR purification systems (Item Number A63881;Bake Graceful Kurt genome, Danvers, Massachusetts) product by amplification is purified, and use Nanodrop 2000 (match is silent scientific and technological (Thermo Scientific), Wilmington, the Delaware State) quantify concentration.By purified amplification Product is 1 in water:10 dilutions and 0.9 μ l (371pg) are added in the purified cfDNA samples of 40 μ l to obtain outside 10% Add.The fetus of enrichment in the presence of purified cfDNA samples and parent cfDNA are used to prepare sequencing library, and such as It is sequenced described in example 22.
Table 37 is provided for each label counting obtained in chromosome 21,18,13, X and Y, i.e. sequence label Density, and for the label counting that the informedness polymorphic sequence included in SNP reference gene groups is obtained, i.e. SNP labels Density.As shown by data sequencing information can be by the way that single library by purified parent cfDNA sample arrangements be sequenced And obtain, the enriched sequence for including SNP of parent cfDNA samples, to determine presence or absence of aneuploidy simultaneously and Fetus fraction.Such as U.S. Provisional Application 61/407,017 and 61/455, described in 849, the label for being mapped to chromosome is used Number is determined presence or absence of aneuploidy.In given example, foetal DNA in as shown by data plasma sample AFR105 Fraction can quantify from five informedness SNP sequencing results and be defined as 3.84%.For chromosome 21,13,18, X and Y, there is provided sequence label density.
The example shows, is enriched with stipulations to determine that aneuploidy and fetus fraction provide required mark by single sequencing procedure Label count.
Table 37
Fetus fraction is determined by large-scale parallel sequencing:
Polymorphic nucleic acid enriching fetus and maternal nucleic acids are directed in purified cfDNA samples
Example 28
Fetus fraction is determined by the Capillary Electrophoresis of the polymorphic sequence comprising STR
To determine the fetus fraction in the maternal sample comprising fetus and parent cfDNA, from nourishing sex fetus Volunteer pregnant woman in collect peripheral blood sample.As described in example 22, peripheral blood sample is obtained and handled to provide by pure The cfDNA of change.
UseMiniFilerTMPCR amplification kits (apply biosystem, Foster city, Jia Lifu Buddhist nun Asia state), according to manufacturer specification, analyze ten microlitres of cfDNA samples.Briefly, the cfDNA included in 10 μ l is being wrapped Containing 5 μ l fluorescence labelings primer (MiniFilerTMPrimer set) andMiniFilerTM Expanded in 25 μ l reaction volumes of main mixed liquor, shouldMiniFilerTMMain mixed liquor includes AmpliTaqArchaeal dna polymerase and relevant buffers, salt (1.5mM MgCl2) and 200 μM of deoxynucleotide triphosphates (dNTP:dATP、 DCTP, dGTP and dTTP).The primer of fluorescence labeling is to use 6FAMTM、VICTM、NEDTMAnd PETTMDyestuff is marked just To primer.Using following cycling condition, thermal cycle is carried out with Gene Amp9700 (applying biosystem):10 are incubated at 95 DEG C Minute, then 20 seconds at 94 DEG C, 2 minutes at 59 DEG C, and 1 minute at 72 DEG C, circulation 30 times, then at 60 DEG C most It is incubated 45 minutes eventually.It is eventually held at 4 DEG C, until removing sample to be analyzed.By in 8.7 μ l Hi-DiTM formyls 1 μ l of dilution in amine (applying biosystem) and 0.3 μ l GeneScanTM-500LIZ inside dimensions standard (applying biosystem) Product by amplification prepares the product by amplification, and uses Data Collection HID_G5_POP4 (applying biosystem) And 36cm capillary arrays, analyzed with ABI PRISM3130xl Genetic Analysers (applying biosystem).All genes point Type all uses GeneMapper_ID v3.2 softwares (applying biosystem), the allelic ladder provided using manufacturer (allelic ladders) and data box and group are carried out.
All Genotyping measurements are all on application biosystem 3130xl Genetic Analysers, using for each equipotential Size ± the 0.5-nt " window " that gene is obtained is performed, with the comparison for allowing to detect and correcting allele.Size is in ± 0.5- Any sample allele outside nt windows is defined as OL, i.e. " (the Off Ladder) of parting standard beyond the region of objective existence ".OL allele It is that size existsMiniFilerTMThe allele not showed in allelic ladder, or not It is corresponding with allelic ladder, but cause size just in allele outside window due to measurement error.It is minimum Peak height threshold value>50RFU is set based on confirmatory experiment, performs these confirmatory experiments so that avoid can be competent in stochastic effects Parting is carried out when disturbing the accurate reading of mixture.The calculating of fetus fraction is to be based on averaging all informedness labels. There is the peak value in the parameter fallen into for analyzed STR preset data case by electrophoretogram for informedness label To identify.
Use main allele and time allele on each str locus seat according to determined by triplicate injection Average peak height is calculated fetus fraction.Rule suitable for the calculating is:
1. allele (OL) data of the parting standard beyond the region of objective existence for not including allele in the calculation;And
2. only by>The peak height that 50RFU (Relative fluorescence units) is obtained is included in the calculation.
3. if only a data box is present, label is considered as non-information;And
4. if it is determined that second data box, but the peak value of the first and second data boxes is relative at its in peak height In 50% to the 70% of flat fluorescent (RFU), then the fraction and the label that do not measure minority are not qualified as informedness.
For any informedness label provided secondary allele fraction by by the peak height of accessory constituent Divided by the peak height of key component and calculate, and percentage is expressed as, first against each information gene seat meter It is
Fetus fraction=(peak height of the main allele of peak height/∑ of ∑ time allele) X 100,
The fetus fraction of sample comprising two or more informednesses STR will be calculated as being directed to two or more information Property label calculate fetus fraction average value.
Table 38 provides from the cfDNA of the subject to nourishing male fetus and is analyzed obtained data.
Table 38
The fetus fraction determined by analyzing STR in the cfDNA of pregnant subject
As a result show, cfDNA can be used for determining presence or absence of foetal DNA, such as in one or more STR allele Indicated by the detection of upper accessory constituent, for determining fetus percentage fractional, and for determining sex of foetus, as existed or not Exist indicated by Amelogenin allele.
Sequence table
<110>Verinata Health, Inc.
<120>Copy the detection and classification of number variation
<130> IP-0722-CN
<140> CN 201210441134.8
<141> 2012-11-07
<150> 13/555,037
<151> 2012-07-20
<150> 13/482,964
<151> 2012-05-29
<150> 13/445,778
<151> 2012-04-12
<160> 118
<170> PatentIn version 3.5
<210> 1
<211> 170
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Polynucleotides
<400> 1
gcacatcccg ctccgggtga ctattaaaga cgaccctcga tcatagcact cgatcagatt 60
gtgacgtatg atctgtagga catacttctt ggccactaac cagacggtgc gagatatttc 120
gaattgcgcc tacctatctg gaacgactaa tgtcaattct tcgaatgaca 170
<210> 2
<211> 170
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Polynucleotides
<400> 2
cgccaatcgc gctctatgct taacgcacgt cctgtctctt tatagagata ccgtgggtga 60
cggcgtgacc gggagccttg aggagagcat aaagcgtaac cggattatcc cgaatggtat 120
atgacggtcc ctcgcatacc ggaccgggca ttactcagca gcggttctgc 170
<210> 3
<211> 170
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Polynucleotides
<400> 3
ccccaatagt gcggtgatct aacacctgac atcgggccga aagaggaatt aagagccgac 60
cggctagact gcccatgtgc caaatcaggg gtcgaggagg ttgtgtggcg acatcctatt 120
ggttccacct ggcggaatcg ggcaaagcca ccatcactgg actgagaacc 170
<210> 4
<211> 170
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Polynucleotides
<400> 4
agtccagtaa ttgcgaggaa ccacttactc ggtacaccgc tcctggctgg ggttggcaga 60
ccagtcatgt tgctgaggac cgacgacccc ggaccattta actctcagac gtaccgacag 120
caactttgcc gaattctctc cagcaatcga gagcgggaag gcataagtgc 170
<210> 5
<211> 170
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Polynucleotides
<400> 5
agaaccatct ccggcgcaag tctacgcgag ttggccttag ctcataccta cggatgtgga 60
ggataagtcc ttagctcgta ccatcgtaac ctagtggcgt catgcgccta cgtgagaagg 120
attctttact gagcgcagag ttgtccgtct actgccacgg gccataacgc 170
<210> 6
<211> 170
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Polynucleotides
<400> 6
cctaaggcct acttcaatat cgtgatgcac ccgaatgact aaaggggtat atggagtatg 60
tccatggcgt cattgagccc gcttaggatc tactgtaatc cgagggatac atgcctcacg 120
cgagtctttc ctaccgctac tagacattat ggtgcgcgcc ttgagtacgt 170
<210> 7
<211> 111
<212> DNA
<213>Homo sapiens
<400> 7
cacatgcaca gccagcaacc ctgtcagcag gagttcccac cagtttcttt ctgagaacat 60
ctgttcaggt ttctctccat ctctatttac tcaggtcaca ggaccttggg g 111
<210> 8
<211> 111
<212> DNA
<213>Homo sapiens
<400> 8
cacatgcaca gccagcaacc ctgtcagcag gagttcccac cagtttcttt ctgagaacat 60
ctgttcaggt ttctctccat ctctgtttac tcaggtcaca ggaccttggg g 111
<210> 9
<211> 126
<212> DNA
<213>Homo sapiens
<400> 9
tgaggaagtg aggctcagag ggtaagaaac tttgtcacag agctggtggt gagggtggag 60
attttacact ccctgcctcc cacaccagtt tctccagagt ggaaagactt tcatctcgca 120
ctggca 126
<210> 10
<211> 126
<212> DNA
<213>Homo sapiens
<400> 10
tgaggaagtg aggctcagag ggtaagaaac tttgtcacag agctggtggt gagggtggag 60
attttacact ccctgcctcc cacaccagtt tctccggagt ggaaagactt tcatctcgca 120
ctggca 126
<210> 11
<211> 121
<212> DNA
<213>Homo sapiens
<400> 11
gtgccttcag aacctttgag atctgattct atttttaaag cttcttagaa gagagattgc 60
aaagtgggtt gtttctctag ccagacaggg caggcaaata ggggtggctg gtgggatggg 120
a 121
<210> 12
<211> 121
<212> DNA
<213>Homo sapiens
<400> 12
gtgccttcag aacctttgag atctgattct atttttaaag cttcttagaa gagagattgc 60
aaagtgggtt gtttctctag ccagacaggg caggtaaata ggggtggctg gtgggatggg 120
a 121
<210> 13
<211> 111
<212> DNA
<213>Homo sapiens
<400> 13
aggtgtgtct ctcttttgtg aggggagggg tcccttctgg cctagtagag ggcctggcct 60
gcagtgagca ttcaaatcct caaggaacag ggtggggagg tgggacaaag g 111
<210> 14
<211> 111
<212> DNA
<213>Homo sapiens
<400> 14
aggtgtgtct ctcttttgtg aggggagggg tcccttctgg cctagtagag ggcctggcct 60
gcagtgagca ttcaaatcct cgaggaacag ggtggggagg tgggacaaag g 111
<210> 15
<211> 139
<212> DNA
<213>Homo sapiens
<400> 15
cctcgcctac tgtgctgttt ctaaccatca tgcttttccc tgaatctctt gagtcttttt 60
ctgctgtgga ctgaaacttg atcctgagat tcacctctag tccctctgag cagcctcctg 120
gaatactcag ctgggatgg 139
<210> 16
<211> 139
<212> DNA
<213>Homo sapiens
<400> 16
cctcgcctac tgtgctgttt ctaaccatca tgcttttccc tgaatctctt gagtcttttt 60
ctgctgtgga ctgaaacttg atcctgagat tcacctctag tccctctggg cagcctcctg 120
gaatactcag ctgggatgg 139
<210> 17
<211> 117
<212> DNA
<213>Homo sapiens
<400> 17
aattgcaatg gtgagaggtt gatggtaaaa tcaaacggaa cttgttattt tgtcattctg 60
atggactgga actgaggatt ttcaatttcc tctccaaccc aagacacttc tcactgg 117
<210> 18
<211> 117
<212> DNA
<213>Homo sapiens
<400> 18
aattgcaatg gtgagaggtt gatggtaaaa tcaaacggaa cttgttattt tgtcattctg 60
atggactgga actgaggatt ttcaatttcc tttccaaccc aagacacttc tcactgg 117
<210> 19
<211> 114
<212> DNA
<213>Homo sapiens
<400> 19
gaaatgcctt ctcaggtaat ggaaggttat ccaaatattt ttcgtaagta tttcaaatag 60
caatggctcg tctatggtta gtctcacagc cacattctca gaactgctca aacc 114
<210> 20
<211> 114
<212> DNA
<213>Homo sapiens
<400> 20
gaaatgcctt ctcaggtaat ggaaggttat ccaaatattt ttcgtaagta tttcaaatag 60
caatggctcg tctatggtta gtctcgcagc cacattctca gaactgctca aacc 114
<210> 21
<211> 128
<212> DNA
<213>Homo sapiens
<400> 21
acccaaaaca ctggaggggc ctcttctcat tttcggtaga ctgcaagtgt tagccgtcgg 60
gaccagcttc tgtctggaag ttcgtcaaat tgcagttaag tccaagtatg ccacatagca 120
gataaggg 128
<210> 22
<211> 128
<212> DNA
<213>Homo sapiens
<400> 22
acccaaaaca ctggaggggc ctcttctcat tttcggtaga ctgcaagtgt tagccgtcgg 60
gaccagcttc tgtctggaag ttcgtcaaat tgcagttagg tccaagtatg ccacatagca 120
gataaggg 128
<210> 23
<211> 110
<212> DNA
<213>Homo sapiens
<400> 23
gcaccagaat ttaaacaacg ctgacaataa atatgcagtc gatgatgact tcccagagct 60
ccagaagcaa ctccagcaca cagagaggcg ctgatgtgcc tgtcaggtgc 110
<210> 24
<211> 110
<212> DNA
<213>Homo sapiens
<400> 24
gcaccagaat ttaaacaacg ctgacaataa atatgcagtc gatgatgact tcccagagct 60
ccagaagcaa ctccagcaca cggagaggcg ctgatgtgcc tgtcaggtgc 110
<210> 25
<211> 116
<212> DNA
<213>Homo sapiens
<400> 25
tgactgtata ccccaggtgc acccttgggt catctctatc atagaactta tctcacagag 60
tataagagct gatttctgtg tctgcctctc acactagact tccacatcct tagtgc 116
<210> 26
<211> 116
<212> DNA
<213>Homo sapiens
<400> 26
tgactgtata ccccaggtgc acccttgggt catctctatc atagaactta tctcacagag 60
tataagagct gatttctgtg tctgcctgtc acactagact tccacatcct tagtgc 116
<210> 27
<211> 110
<212> DNA
<213>Homo sapiens
<400> 27
tgtacgtggt caccagggga cgcctggcgc tgcgagggag gccccgagcc tcgtgccccc 60
gtgaagcttc agctcccctc cccggctgtc cttgaggctc ttctcacact 110
<210> 28
<211> 110
<212> DNA
<213>Homo sapiens
<400> 28
tgtacgtggt caccagggga cgcctggcgc tgcgagggag gccccgagcc tcgtgccccc 60
gtgaagcttc agctcccctc cctggctgtc cttgaggctc ttctcacact 110
<210> 29
<211> 114
<212> DNA
<213>Homo sapiens
<400> 29
cagtggaccc tgctgcacct ttcctcccct cccatcaacc tcttttgtgc ctccccctcc 60
gtgtaccacc ttctctgtca ccaaccctgg cctcacaact ctctcctttg ccac 114
<210> 30
<211> 114
<212> DNA
<213>Homo sapiens
<400> 30
cagtggaccc tgctgcacct ttcctcccct cccatcaacc tcttttgtgc ctccccctcc 60
gtgtaccacc ttctctgtca ccacccctgg cctcacaact ctctcctttg ccac 114
<210> 31
<211> 110
<212> DNA
<213>Homo sapiens
<400> 31
cagtggcata gtagtccagg ggctcctcct cagcacctcc agcaccttcc aggaggcagc 60
agcgcaggca gagaacccgc tggaagaatc ggcggaagtt gtcggagagg 110
<210> 32
<211> 110
<212> DNA
<213>Homo sapiens
<400> 32
cagtggcata gtagtccagg ggctcctcct cagcacctcc agcaccttcc aggaggcagc 60
agcgcaggca gagaacccgc tggaaggatc ggcggaagtt gtcggagagg 110
<210> 33
<211> 129
<212> DNA
<213>Homo sapiens
<400> 33
aggtctgggg gccgctgaat gccaagctgg gaatcttaaa tgttaaggaa caaggtcata 60
caatgaatgg tgtgatgtaa aagcttggga ggtgatttct gagggtaggt gctgggttta 120
atgggagga 129
<210> 34
<211> 129
<212> DNA
<213>Homo sapiens
<400> 34
aggtctgggg gccgctgaat gccaagctgg gaatcttaaa tgttaaggaa caaggtcata 60
caatgaatgg tgtgatgtaa aagcttggga ggtgattttt gagggtaggt gctgggttta 120
atgggagga 129
<210> 35
<211> 107
<212> DNA
<213>Homo sapiens
<400> 35
acggttctgt cctgtagggg agaaaagtcc tcgttgttcc tctgggatgc aacatgagag 60
agcagcacac tgaggcttta tggattgccc tgccacaagt gaacagg 107
<210> 36
<211> 107
<212> DNA
<213>Homo sapiens
<400> 36
acggttctgt cctgtagggg agaaaagtcc tcgttgttcc tctgggatgc aacatgagag 60
agcagcacac tgaggcttta tgggttgccc tgccacaagt gaacagg 107
<210> 37
<211> 127
<212> DNA
<213>Homo sapiens
<400> 37
gcgcagtcag atgggcgtgc tggcgtctgt cttctctctc tcctgctctc tggcttcatt 60
tttctctcct tctgtctcac cttctttcgt gtgcctgtgc acacacacgt ttgggacaag 120
ggctgga 127
<210> 38
<211> 127
<212> DNA
<213>Homo sapiens
<400> 38
gcgcagtcag atgggcgtgc tggcgtctgt cttctctctc tcctgctctc tggcttcatt 60
tttctctcct tctgtctcac cttctttcgt gtgcctgtgc atacacacgt ttgggacaag 120
ggctgga 127
<210> 39
<211> 130
<212> DNA
<213>Homo sapiens
<400> 39
gccggacctg cgaaatccca aaatgccaaa cattcccgcc tcacatgatc ccagagagag 60
gggacccagt gttcccagct tgcagctgag gagcccgagg ttgccgtcag atcagagccc 120
cagttgcccg 130
<210> 40
<211> 130
<212> DNA
<213>Homo sapiens
<400> 40
gccggacctg cgaaatccca aaatgccaaa cattcccgcc tcacatgatc ccagagagag 60
gggacccagt gttcccagct tgcagctgag gagcccgagt ttgccgtcag atcagagccc 120
cagttgcccg 130
<210> 41
<211> 121
<212> DNA
<213>Homo sapiens
<400> 41
agcagcctcc ctcgactagc tcacactacg ataaggaaaa ttcatgagct ggtgtccaag 60
gagggctggg tgactcgtgg ctcagtcagc atcaagattc ctttcgtctt tcccctctgc 120
c 121
<210> 42
<211> 121
<212> DNA
<213>Homo sapiens
<400> 42
agcagcctcc ctcgactagc tcacactacg ataaggaaaa ttcatgagct ggtgtccaag 60
gagggctggg tgactcgtgg ctcagtcagc gtcaagattc ctttcgtctt tcccctctgc 120
c 121
<210> 43
<211> 138
<212> DNA
<213>Homo sapiens
<400> 43
tggcattgcc tgtaatatac atagccatgg ttttttatag gcaatttaag atgaatagct 60
tctaaactat agataagttt cattacccca ggaagctgaa ctatagctac tttacccaaa 120
atcattagaa tggtgctt 138
<210> 44
<211> 138
<212> DNA
<213>Homo sapiens
<400> 44
tggcattgcc tgtaatatac atagccatgg ttttttatag gcaatttaag atgaatagct 60
tctaaactat agataagttt cattacccca ggaagctgaa ctatagctac tttccccaaa 120
atcattagaa tggtgctt 138
<210> 45
<211> 136
<212> DNA
<213>Homo sapiens
<400> 45
atgaagcctt ccaccaactg cctgtatgac tcatctgggg acttctgctc tatactcaaa 60
gtggcttagt cactgccaat gtatttccat atgagggacg atgattacta aggaaatata 120
gaaacaacaa ctgatc 136
<210> 46
<211> 136
<212> DNA
<213>Homo sapiens
<400> 46
atgaagcctt ccaccaactg cctgtatgac tcatctgggg acttctgctc tatactcaaa 60
gtggcttagt cactgccaat gtatttccat atgagggacg gtgattacta aggaaatata 120
gaaacaacaa ctgatc 136
<210> 47
<211> 118
<212> DNA
<213>Homo sapiens
<400> 47
acaacagaat caggtgattg gagaaaagat cacaggccta ggcacccaag gcttgaagga 60
tgaaagaatg aaagatggac ggaacaaaat taggacctta attctttgtt cagttcag 118
<210> 48
<211> 118
<212> DNA
<213>Homo sapiens
<400> 48
acaacagaat caggtgattg gagaaaagat cacaggccta ggcacccaag gcttgaagga 60
tgaaagaatg aaagatggac ggaagaaaat taggacctta attctttgtt cagttcag 118
<210> 49
<211> 150
<212> DNA
<213>Homo sapiens
<400> 49
ttggggtaaa ttttcattgt catatgtgga atttaaatat accatcatct acaaagaatt 60
ccacagagtt aaatatctta agttaaacac ttaaaataag tgtttgcgtg atattttgat 120
gacagataaa cagagtctaa ttcccacccc 150
<210> 50
<211> 150
<212> DNA
<213>Homo sapiens
<400> 50
ttggggtaaa ttttcattgt catatgtgga atttaaatat accatcatct acaaagaatt 60
ccacagagtt aaatatctta agttaaacac ttaaaataag tgtttgcgtg atattttgat 120
gatagataaa cagagtctaa ttcccacccc 150
<210> 51
<211> 145
<212> DNA
<213>Homo sapiens
<400> 51
tgcaattcaa atcaggaagt atgaccaaaa gacagagatc ttttttggat gatccctagc 60
ctagcaatgc ctggcagcca tgcaggtgca atgtcaacct taaataatgt attgcaaact 120
cagagctgac aaacctcgat gttgc 145
<210> 52
<211> 145
<212> DNA
<213>Homo sapiens
<400> 52
tgcaattcaa atcaggaagt atgaccaaaa gacagagatc ttttttggat gatccctagc 60
ctagcaatgc ctggcagcca tgcaggtgca atgtcaacct taaataatgt attgcaaatt 120
cagagctgac aaacctcgat gttgc 145
<210> 53
<211> 124
<212> DNA
<213>Homo sapiens
<400> 53
ctgtgctctg cgaatagctg cagaagtaac ttggggaccc aaaataaagc agaatgctaa 60
tgtcaagtcc tgagaaccaa gccctgggac tctggtgcca tttcggattc tccatgagca 120
tggt 124
<210> 54
<211> 124
<212> DNA
<213>Homo sapiens
<400> 54
ctgtgctctg cgaatagctg cagaagtaac ttggggaccc aaaataaagc agaatgctaa 60
tgtcaagtcc tgagaaccaa gccctgggac tctggtgcca ttttggattc tccatgagca 120
tggt 124
<210> 55
<211> 118
<212> DNA
<213>Homo sapiens
<400> 55
tttttccagc caactcaagg ccaaaaaaaa tttcttaata tagttattat gcgaggggag 60
gggaagcaaa ggagcacagg tagtccacag aataagacac aagaaacctc aagctgtg 118
<210> 56
<211> 118
<212> DNA
<213>Homo sapiens
<400> 56
tttttccagc caactcaagg ccaaaaaaaa tttcttaata tagttattat gcgaggggag 60
gggaagcaaa ggagcacagg tagtccacag aataggacac aagaaacctc aagctgtg 118
<210> 57
<211> 110
<212> DNA
<213>Homo sapiens
<400> 57
tcttctcgtc ccctaagcaa acaacatccg cttgcttctg tctgtgtaac cacagtgaat 60
gggtgtgcac gcttgatggg cctctgagcc cctgttgcac aaaccagaaa 110
<210> 58
<211> 110
<212> DNA
<213>Homo sapiens
<400> 58
tcttctcgtc ccctaagcaa acaacatccg cttgcttctg tctgtgtaac cacagtgaat 60
gggtgtgcac gcttggtggg cctctgagcc cctgttgcac aaaccagaaa 110
<210> 59
<211> 110
<212> DNA
<213>Homo sapiens
<400> 59
cacatggggg cattaagaat cgcccaggga ggaggaggga gaacgcgtgc ttttcacatt 60
tgcatttgaa ttttcgagtt cccaggatgt gtttttgtgc tcatcgatgt 110
<210> 60
<211> 110
<212> DNA
<213>Homo sapiens
<400> 60
cacatggggg cattaagaat cgcccaggga ggaggaggga gaacgcgtgc ttttcacatt 60
tgcatttgaa tttttgagtt cccaggatgt gtttttgtgc tcatcgatgt 110
<210> 61
<211> 128
<212> DNA
<213>Homo sapiens
<400> 61
gggctctgag gtgtgtgaaa taaaaacaaa tgtccatgtc tgtcctttta tggcattttg 60
ggactttaca tttcaaacat ttcagacatg tatcacaaca cgaaggaata acagttccag 120
ggatatct 128
<210> 62
<211> 128
<212> DNA
<213>Homo sapiens
<400> 62
gggctctgag gtgtgtgaaa taaaaacaaa tgtccatgtc tgtcctttta tggcattttg 60
ggactttaca tttcaaacat ttcagacatg tatcacaaca cgagggaata acagttccag 120
ggatatct 128
<210> 63
<211> 21
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 63
cacatgcaca gccagcaacc c 21
<210> 64
<211> 23
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 64
ccccaaggtc ctgtgacctg agt 23
<210> 65
<211> 23
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 65
tgaggaagtg aggctcagag ggt 23
<210> 66
<211> 24
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 66
tgccagtgcg agatgaaagt cttt 24
<210> 67
<211> 27
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 67
gtgccttcag aacctttgag atctgat 27
<210> 68
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 68
tcccatccca ccagccaccc 20
<210> 69
<211> 25
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 69
aggtgtgtct ctcttttgtg agggg 25
<210> 70
<211> 21
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 70
cctttgtccc acctccccac c 21
<210> 71
<211> 26
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 71
cctcgcctac tgtgctgttt ctaacc 26
<210> 72
<211> 25
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 72
ccatcccagc tgagtattcc aggag 25
<210> 73
<211> 26
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 73
aattgcaatg gtgagaggtt gatggt 26
<210> 74
<211> 24
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 74
ccagtgagaa gtgtcttggg ttgg 24
<210> 75
<211> 27
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 75
gaaatgcctt ctcaggtaat ggaaggt 27
<210> 76
<211> 27
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 76
ggtttgagca gttctgagaa tgtggct 27
<210> 77
<211> 22
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 77
acccaaaaca ctggaggggc ct 22
<210> 78
<211> 27
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 78
cccttatctg ctatgtggca tacttgg 27
<210> 79
<211> 27
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 79
gcaccagaat ttaaacaacg ctgacaa 27
<210> 80
<211> 22
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 80
gcacctgaca ggcacatcag cg 22
<210> 81
<211> 24
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 81
tgactgtata ccccaggtgc accc 24
<210> 82
<211> 27
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 82
gcactaagga tgtggaagtc tagtgtg 27
<210> 83
<211> 22
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 83
tgtacgtggt caccagggga cg 22
<210> 84
<211> 26
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 84
agtgtgagaa gagcctcaag gacagc 26
<210> 85
<211> 21
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 85
cagtggaccc tgctgcacct t 21
<210> 86
<211> 24
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 86
gtggcaaagg agagagttgt gagg 24
<210> 87
<211> 24
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 87
cagtggcata gtagtccagg ggct 24
<210> 88
<211> 21
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 88
cctctccgac aacttccgcc g 21
<210> 89
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 89
aggtctgggg gccgctgaat 20
<210> 90
<211> 23
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 90
tcctcccatt aaacccagca cct 23
<210> 91
<211> 23
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 91
acggttctgt cctgtagggg aga 23
<210> 92
<211> 22
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 92
cctgttcact tgtggcaggg ca 22
<210> 93
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 93
gcgcagtcag atgggcgtgc 20
<210> 94
<211> 23
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 94
tccagccctt gtcccaaacg tgt 23
<210> 95
<211> 21
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 95
gccggacctg cgaaatccca a 21
<210> 96
<211> 21
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 96
cgggcaactg gggctctgat c 21
<210> 97
<211> 21
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 97
agcagcctcc ctcgactagc t 21
<210> 98
<211> 23
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 98
ggcagagggg aaagacgaaa gga 23
<210> 99
<211> 24
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 99
tggcattgcc tgtaatatac atag 24
<210> 100
<211> 23
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 100
aagcaccatt ctaatgattt tgg 23
<210> 101
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 101
atgaagcctt ccaccaactg 20
<210> 102
<211> 27
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 102
gatcagttgt tgtttctata tttcctt 27
<210> 103
<211> 22
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 103
acaacagaat caggtgattg ga 22
<210> 104
<211> 25
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 104
ctgaactgaa caaagaatta aggtc 25
<210> 105
<211> 22
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 105
ttggggtaaa ttttcattgt ca 22
<210> 106
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 106
ggggtgggaa ttagactctg 20
<210> 107
<211> 23
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 107
tgcaattcaa atcaggaagt atg 23
<210> 108
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 108
gcaacatcga ggtttgtcag 20
<210> 109
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 109
ctgtgctctg cgaatagctg 20
<210> 110
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 110
accatgctca tggagaatcc 20
<210> 111
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 111
tttttccagc caactcaagg 20
<210> 112
<211> 21
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 112
cacagcttga ggtttcttgt g 21
<210> 113
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 113
tcttctcgtc ccctaagcaa 20
<210> 114
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 114
tttctggttt gtgcaacagg 20
<210> 115
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 115
cacatggggg cattaagaat 20
<210> 116
<211> 22
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 116
acatcgatga gcacaaaaac ac 22
<210> 117
<211> 20
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 117
gggctctgag gtgtgtgaaa 20
<210> 118
<211> 24
<212> DNA
<213>Artificial sequence
<220>
<223>Artificial sequence description:Synthesis
Primer
<400> 118
agatatccct ggaactgtta ttcc 24

Claims (10)

1. preparing the method for sequencing library from the mixture of the fetus in maternal sample and parent cfDNA, methods described is included such as Lower step:
(a) cfDNA from maternal sample is provided;
(b) to the consecutive steps that the cfDNA carries out end reparation, dA tailings connect with aptamer, wherein the consecutive steps End reparation product is purified before not being included in the dA tailings, and purifies the dA before not being included in the aptamer connection and adds Tail product.
2. the method as described in claim 1, wherein step (b) be included in the end repair and the consecutive steps of dA tailings it Between by enzyme heat inactivate.
3. the method as described in claim 1, wherein step (b) are included in the consecutive steps that the dA tailings connect with aptamer Between by enzyme heat inactivate.
4. the method as described in claim 1, wherein the maternal sample is selected from blood, blood plasma, serum, urine and saliva Biological fluid.
5. the method as described in claim 1, in addition at least part nucleic acid molecules from the sequencing library are surveyed Sequence, so as to obtain the sequence information of multiple fetuses of maternal sample and maternal nucleic acids molecule.
6. the method as described in claim 1, it is sequenced wherein the sequencing is the synthetic method carried out using reversible dye-terminators Large-scale parallel sequencing.
7. the method as described in claim 1, wherein the sequencing includes amplification.
8. the method as described in claim 1, wherein the end to the cfDNA is repaired, the company that dA tailings connect with aptamer Continuous step is carried out in the solution.
9. the method as described in claim 1, wherein the consecutive steps of the end reparation and dA tailings to the cfDNA are in solution Middle progress, the cfDNA is then combined on a solid surface.
10. the method as described in claim 1, wherein the end to the cfDNA is repaired, the company that dA tailings connect with aptamer Continuous step is carried out in the case of no polyethylene glycol.
CN201710644858.5A 2012-04-12 2012-11-07 Copy the detection and classification of number variation Pending CN107435070A (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US13/445,778 2012-04-12
US13/445,778 US9447453B2 (en) 2011-04-12 2012-04-12 Resolving genome fractions using polymorphism counts
US13/482,964 2012-05-29
US13/482,964 US20120270739A1 (en) 2010-01-19 2012-05-29 Method for sample analysis of aneuploidies in maternal samples
US13/555,037 US9260745B2 (en) 2010-01-19 2012-07-20 Detecting and classifying copy number variation
US13/555,037 2012-07-20
CN201210441134.8A CN103374518B (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201210441134.8A Division CN103374518B (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation

Publications (1)

Publication Number Publication Date
CN107435070A true CN107435070A (en) 2017-12-05

Family

ID=49460351

Family Applications (4)

Application Number Title Priority Date Filing Date
CN201710644858.5A Pending CN107435070A (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation
CN201810154581.2A Active CN108485940B (en) 2012-04-12 2012-11-07 Detection and classification of copy number variation
CN201210441134.8A Active CN103374518B (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation
CN201220583608.8U Expired - Lifetime CN204440396U (en) 2012-04-12 2012-11-07 For determining the kit of fetus mark

Family Applications After (3)

Application Number Title Priority Date Filing Date
CN201810154581.2A Active CN108485940B (en) 2012-04-12 2012-11-07 Detection and classification of copy number variation
CN201210441134.8A Active CN103374518B (en) 2012-04-12 2012-11-07 Copy the detection and classification of number variation
CN201220583608.8U Expired - Lifetime CN204440396U (en) 2012-04-12 2012-11-07 For determining the kit of fetus mark

Country Status (1)

Country Link
CN (4) CN107435070A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427864A (en) * 2018-02-14 2018-08-21 南京世和基因生物技术有限公司 A kind of detection method, device and the computer-readable medium of copy number variation
CN109628579A (en) * 2019-01-13 2019-04-16 清华大学 In a kind of determining organism sample chromosome quantitative whether Yi Chang detection method
CN110317877A (en) * 2019-08-02 2019-10-11 苏州宏元生物科技有限公司 Application of the unstable variation of one group chromosome in preparation diagnosis bladder transitional cell carcinoma, the reagent or kit of assessing prognosis
CN110656159A (en) * 2018-06-28 2020-01-07 深圳华大生命科学研究院 Method for detecting copy number variation
CN110880356A (en) * 2018-09-05 2020-03-13 南京格致基因生物科技有限公司 Method and apparatus for screening, diagnosing or risk stratification for ovarian cancer
CN112614548A (en) * 2020-12-25 2021-04-06 北京吉因加医学检验实验室有限公司 Method for calculating sample database building input amount and database building method thereof

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113528645A (en) * 2014-03-14 2021-10-22 凯尔迪克斯公司 Methods for monitoring immunosuppressive therapy in a transplant recipient
EP3149199B1 (en) * 2014-05-30 2020-03-25 Verinata Health, Inc. Detecting, optionally fetal, sub-chromosomal aneuploidies and copy number variations
CN104152553B (en) * 2014-07-21 2016-11-23 上海交通大学 A kind of auxiliary diagnoses the test kit whether fetus to be measured is mongolism patient
US11072814B2 (en) 2014-12-12 2021-07-27 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
US10395759B2 (en) * 2015-05-18 2019-08-27 Regeneron Pharmaceuticals, Inc. Methods and systems for copy number variant detection
WO2016201507A1 (en) * 2015-06-15 2016-12-22 Murdoch Childrens Research Institute Method of measuring chimerism
GB2555551A (en) * 2015-07-07 2018-05-02 Farsight Genome Systems Inc Methods and systems for sequencing-based variant detection
AU2016321204B2 (en) 2015-09-08 2022-12-01 Cold Spring Harbor Laboratory Genetic copy number determination using high throughput multiplex sequencing of smashed nucleotides
AU2016321333A1 (en) * 2015-09-09 2018-04-26 Psomagen, Inc. Method and system for microbiome-derived diagnostics and therapeutics for conditions associated with cerebro-craniofacial health
IL293187B2 (en) * 2015-09-22 2024-03-01 Univ Hong Kong Chinese Accurate quantification of fetal dna fraction by shallow-depth sequencing of maternal plasma dna
EP3390668A4 (en) * 2015-12-17 2020-04-01 Guardant Health, Inc. Methods to determine tumor gene copy number by analysis of cell-free dna
US10095831B2 (en) 2016-02-03 2018-10-09 Verinata Health, Inc. Using cell-free DNA fragment size to determine copy number variations
ES2967443T3 (en) * 2016-07-06 2024-04-30 Guardant Health Inc Cell-Free Nucleic Acid Fragmentome Profiling Procedures
RU2674700C2 (en) * 2016-12-30 2018-12-12 Общество с ограниченной ответственностью "Научно-производственная фирма ДНК-Технология" (ООО "НПФ ДНК-Технология") Method of determining the source of aneuploid cells on the blood of a pregnant woman
SG11201911538YA (en) * 2017-06-20 2020-01-30 Illumina Inc Methods and systems for decomposition and quantification of dna mixtures from multiple contributors of known or unknown genotypes
CA3067418C (en) 2017-06-20 2022-08-16 Illumina, Inc. Methods for accurate computational decomposition of dna mixtures from contributors of unknown genotypes
US20210265006A1 (en) * 2018-07-24 2021-08-26 Affymetrix, Inc. Array based method and kit for determining copy number and genotype in pseudogenes
KR20220013349A (en) * 2019-06-03 2022-02-04 일루미나, 인코포레이티드 Limit-of-detection-based quality control metrics
CN110373477B (en) * 2019-07-23 2021-05-07 华中农业大学 Molecular marker cloned from CNV fragment and related to porcine ear shape character
CN110452985A (en) * 2019-08-02 2019-11-15 苏州宏元生物科技有限公司 Application of the unstable variation of one group chromosome in the reagent or kit for preparing diagnosing liver cancer, assessment prognosis
CN112342627A (en) * 2019-08-09 2021-02-09 深圳市真迈生物科技有限公司 Preparation method and sequencing method of nucleic acid library
CN111105844B (en) * 2019-11-22 2023-06-06 广州金域医学检验集团股份有限公司 Somatic cell mutation classification method, apparatus, device, and readable storage medium
CN111394474B (en) * 2020-03-24 2022-08-16 西北农林科技大学 Method for detecting copy number variation of GAL3ST1 gene of cattle and application thereof
CN111476497B (en) * 2020-04-15 2023-06-16 浙江天泓波控电子科技有限公司 Distribution feed network method for miniaturized platform
CN111948394B (en) * 2020-08-10 2023-07-28 山西医科大学 Application of TSTA3 and LAMP2 as targets in esophageal squamous carcinoma cell metastasis detection and drug screening
CN112322722B (en) * 2020-11-13 2021-11-12 上海宝藤生物医药科技股份有限公司 Primer probe composition and kit for detecting 16p11.2 microdeletion and application thereof
CN113462768B (en) * 2021-07-29 2023-05-30 中国医学科学院整形外科医院 Primer and kit for detecting copy number of ECR region of small ear deformity patient by ddPCR
CN113684277B (en) * 2021-09-06 2022-05-17 南方医科大学南方医院 Method for predicting ovarian cancer homologous recombination defect based on biomarker of genome copy number variation and application
CN114093417B (en) * 2021-11-23 2022-10-04 深圳吉因加信息科技有限公司 Method and device for identifying chromosomal arm heterozygosity loss
CN114507904B (en) * 2022-04-19 2022-07-12 北京迅识科技有限公司 Method for preparing second-generation sequencing library

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001090415A2 (en) * 2000-05-20 2001-11-29 The Regents Of The University Of Michigan Method of producing a dna library using positional amplification
WO2002002772A2 (en) * 2000-06-30 2002-01-10 Incyte Genomics, Inc. Human extracellular matrix (ecm)-related tumor marker
WO2011090557A1 (en) * 2010-01-19 2011-07-28 Verinata Health, Inc. Method for determining copy number variations
CA2825984A1 (en) * 2010-02-25 2011-09-01 Advanced Liquid Logic, Inc. Method of making nucleic acid libraries
CN102212614A (en) * 2003-01-29 2011-10-12 454生命科学公司 Methods of amplifying and sequencing nucleic acids
CN102409043A (en) * 2010-09-21 2012-04-11 深圳华大基因科技有限公司 Method for constructing high-flux and low-cost Fosmid library, label and label joint used in method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
PT2562268T (en) * 2008-09-20 2017-03-29 Univ Leland Stanford Junior Noninvasive diagnosis of fetal aneuploidy by sequencing
CN102127818A (en) * 2010-12-15 2011-07-20 张康 Method for creating fetus DNA library by utilizing peripheral blood of pregnant woman

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001090415A2 (en) * 2000-05-20 2001-11-29 The Regents Of The University Of Michigan Method of producing a dna library using positional amplification
WO2002002772A2 (en) * 2000-06-30 2002-01-10 Incyte Genomics, Inc. Human extracellular matrix (ecm)-related tumor marker
CN102212614A (en) * 2003-01-29 2011-10-12 454生命科学公司 Methods of amplifying and sequencing nucleic acids
WO2011090557A1 (en) * 2010-01-19 2011-07-28 Verinata Health, Inc. Method for determining copy number variations
CA2825984A1 (en) * 2010-02-25 2011-09-01 Advanced Liquid Logic, Inc. Method of making nucleic acid libraries
CN102409043A (en) * 2010-09-21 2012-04-11 深圳华大基因科技有限公司 Method for constructing high-flux and low-cost Fosmid library, label and label joint used in method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FAN H.C.等: ""Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood"", 《PANS》 *
THORSTENSON Y.R.等: ""An Automated Hydrodynamic Process for Controlled, Unbiased DNA Shearing"", 《GENOME METHODS》 *
VOELKERDING K.V.等: ""Next-Generation Sequencing:From Basic Research to Diagnostics"", 《CLINICAL CHEMISTRY》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427864A (en) * 2018-02-14 2018-08-21 南京世和基因生物技术有限公司 A kind of detection method, device and the computer-readable medium of copy number variation
CN108427864B (en) * 2018-02-14 2019-01-29 南京世和基因生物技术有限公司 A kind of detection method, device and computer-readable medium copying number variation
CN110656159A (en) * 2018-06-28 2020-01-07 深圳华大生命科学研究院 Method for detecting copy number variation
CN110656159B (en) * 2018-06-28 2024-01-09 深圳华大生命科学研究院 Copy number variation detection method
CN110880356A (en) * 2018-09-05 2020-03-13 南京格致基因生物科技有限公司 Method and apparatus for screening, diagnosing or risk stratification for ovarian cancer
CN109628579A (en) * 2019-01-13 2019-04-16 清华大学 In a kind of determining organism sample chromosome quantitative whether Yi Chang detection method
CN109628579B (en) * 2019-01-13 2022-11-15 清华大学 Detection method for determining whether chromosome number in biological sample is abnormal
CN110317877A (en) * 2019-08-02 2019-10-11 苏州宏元生物科技有限公司 Application of the unstable variation of one group chromosome in preparation diagnosis bladder transitional cell carcinoma, the reagent or kit of assessing prognosis
CN112614548A (en) * 2020-12-25 2021-04-06 北京吉因加医学检验实验室有限公司 Method for calculating sample database building input amount and database building method thereof

Also Published As

Publication number Publication date
CN204440396U (en) 2015-07-01
CN103374518A (en) 2013-10-30
CN103374518B (en) 2018-03-27
CN108485940A (en) 2018-09-04
CN108485940B (en) 2022-01-28

Similar Documents

Publication Publication Date Title
CN204440396U (en) For determining the kit of fetus mark
US11875899B2 (en) Analyzing copy number variation in the detection of cancer
US11697846B2 (en) Detecting and classifying copy number variation
US20200219588A1 (en) Detecting and classifying copy number variation
US9411937B2 (en) Detecting and classifying copy number variation
EP2877594B1 (en) Detecting and classifying copy number variation in a fetal genome
CN105830077B (en) Method for improving the sensitivity of detection in determining copy number variation
US9323888B2 (en) Detecting and classifying copy number variation
CN103003447B (en) Method for determining the presence or absence of different aneuploidies in a sample
CN108884491A (en) Using Cell-free DNA piece size to determine copy number variation
CN107750277A (en) Determine that copy number changes using Cell-free DNA clip size
CN103384725A (en) Fetal genetic variation detection
AU2019200163B2 (en) Detecting and classifying copy number variation
AU2019200162B2 (en) Detecting and classifying copy number variation
Colombi et al. Report on a patient with extremely fragile skin, dermatosparaxis, joint hypermobility, short stature, skeletal deformities, and lipomas: a new syndrome?
Artuso et al. Implementation of an NGS-based workflow for BRCA1 and BRCA2 mutation screening

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171205