CN109971852A

CN109971852A - Detect the mutation and ploidy in chromosome segment

Info

Publication number: CN109971852A
Application number: CN201910135027.4A
Authority: CN
Inventors: J·巴比亚尔兹; T·P·康斯坦丁; L·A·尤班克; G·杰梅罗斯; M·M·希尔; H·E·柯克雷; M·罗比诺威特茨; O·萨卡里亚; S·斯古约恩松; B·齐默曼
Original assignee: Gene Security Network Inc
Current assignee: Natera Inc
Priority date: 2014-04-21
Filing date: 2015-04-21
Publication date: 2019-07-05
Also published as: RU2016141308A; US20220282335A1; US20220154290A1; AU2022202083A1; EP3134541B1; US11414709B2; JP2022028950A; JP2020072737A; US20170107576A1; AU2021209221A1; US20230042405A1; JP2020072738A; US20180320239A1; CA2945962A1; AU2022202083B2; RU2717641C2; CN106460070B; CN113774132A; US20220213561A1; JP7362805B2

Abstract

The present invention provides the methods, system and computer-readable medium for detecting chromosome segment or whole chromosome ploidy, for detecting mononucleotide variant and for detecting chromosome segment ploidy and mononucleotide variant.In some respects, the present invention provides the methods, system and computer-readable medium for detecting fetus cancer or chromosome abnormality.

Description

Detect the mutation and ploidy in chromosome segment

National Phase in China, application No. is 201580033190.X, the applying date to enter on December 20th, 2016 by the application For on 04 21st, 2015, the division of the application for a patent for invention of entitled " mutation and ploidy in detection chromosome segment " Application.

Cross reference to related applications

This application claims the U.S. Provisional Patent Applications submitted on April 21st, 2014 the 61/982,245th, in May, 2014 The U.S. Provisional Patent Application submitted the 61/987,407th on the 1st, the U.S. Provisional Patent Application submitted on October 21st, 2014 No. 62/066,514, the U.S. Provisional Patent Application submitted on April 10th, 2015 the 62/146,188th, April 14 in 2015 U.S. Provisional Patent Application the 62/147,377th that day submits, the U.S. Provisional Patent Application submitted on April 15th, 2,015 the 62/148, No. 173 priority, these applications are all hereby incorporated by a part of introduction knowledge by citation herein.

Technical field

The invention mainly relates to the method and systems for detecting chromosome segment ploidy, and for detecting single nucleosides The method and system of sour variant.

Technical background

Copy number variation (CNV) and the main reason for be considered as genome structure variation, duplication including sequence and It deletes, the sequence normal length range is in 1,000 base-pair (1kb) to 20 megabasses between (mb).Chromosome segment or The deletion and duplication of person's whole chromosome are related with a variety of situations, such as neurological susceptibility or the resistance to disease.

Copy number variation is typically divided into two major classes, the length based on impacted sequence.The first kind includes that copy number is polymorphic Property (CNP), this be in general population it is common, the sum frequency of appearance is greater than 1%.(most of length are less than the usual very little of CNP 10 kilobase), and they are usually enriched with and encode important removing toxic substances and immune protein gene.The subset of these CNP is relative to copying Shellfish number is alterable height.Therefore, different human chromosomes can have a variety of different copy numbers for specific one group of gene (such as 2,3,4,5 etc.).And the relevant CNP of immune response gene is related with the neurological susceptibility of complex inheritance disease recently, including ox-hide Tinea, regional ileitis and glomerulonephritis.

Second class CNV includes the relatively rare variation more much longer than CNP, and magnitude range is from hundreds of thousands base-pair to super Cross the length of 1,000,000 base-pairs.In some cases, these CNV are likely to occur in the sperm or ovum for generating particular individual During formation or they may spread several generations in a family.These big and rare structure variations are Through with mental retardation, hypoevolutism is disproportionately observed in the subject of schizophrenia and self-closing disease.He Appearance in these cases cause the conjecture of people: big and rare CNV may be than other in neuro-cognitive disease The genetic mutation of form is even more important, including mononucleotide replaces.

Gene copy number can change in tumour cell.For example, being replicated in breast cancer for Chr1p is common , the copy number of EGFR is higher than normal level in non-small cell lung cancer.One of the main reason for cancer is death；Therefore, cancer The early diagnosis and therapy of disease is important, because it can improve the prognosis of patient (as by increasing remission rate and paracmasis Duration).Early diagnosis can also allow for patient by less or more mild therapeutic choice.Many existing treatments Method, which destroys cancer cell, also influences normal cell, leads to various possible side effects, such as nausea, vomits, low blood count, Increase the risk of infection, alopecia and mucosal ulcer.Therefore, the early detection of cancer is desirable, because it can reduce elimination The amount and/or quantity for the treatment of needed for cancer (such as chemotherapeutics and radiation).

Copy number variation is also related with serious spirit and physiological barrier and idiopathic learning disorder.Using cell-free The antenatal test (NIPT) of the Noninvasive of DNA (cfDNA) can be used for detecting exception, such as fetus 13,18 and trisomy 21 syndrome, Trisomies and sex chromosome aneuploidy.Subchromoso is micro-deleted, can also lead to serious spirit and physiological barrier, due to Its lesser size and be more difficult to detect.The total incidence of eight kinds of microdeletion syndromes is more than 1/1000, so that they are almost It is common as fetal chromosomal patau syndrome.

In addition, the relatively high copy number of CCL3L1 is associated with the hyposensitivity of HIV infection, FCGR3B (CD16 cell surface Immunoglobulin receptor) low copy number can increase to systemic loupus erythematosus and similar inflammatory autoimmune disease Neurological susceptibility.

The missing and repetition of chromosome segment or whole chromosome are detected therefore, it is necessary to improved method.Preferably, these Method can be used for more accurately diagnosing the illness or the increase risk of disease, such as the gene copy number in cancer or Pregnant Fetus becomes Change.

Summary of the invention

In illustrative embodiment, the present invention provides a kind of for measuring the ploidy of chromosome segment in individual specimen Method.Method includes the following steps:

A. gene frequency data are received, the gene frequency data include one on chromosome segment in sample Allele quantity present on each gene loci in group polymorphic loci；

B. the phase allelic information in Genetic polymorphism site is generated by the phase of assessment gene frequency data；

C. gene frequency data are used, the individual for generating the gene frequency of Different Ploidy polymorphic site is general Rate；

D. using individual probability and phase allelic information, Genetic polymorphism site union of sets probability is generated；With

E. be based on joint probability, a most suitable model selected to indicate ploidy, so that it is determined that chromosome piece The ploidy of section.

One measure ploidy method illustrative embodiment in, data be by nucleic acid sequence data, it is especially high What flux nucleic acid sequence data generated.In certain illustrative embodimentss for determining ploidy method, gene frequency number According to by carry out error correction, before it is used to individual probability.In specific description embodiment, the error of correction Including amplified allele efficiency variation.Illustrate in embodiment at other, the error of correction includes environmental pollution and genotype Pollution.In some embodiments, the error of correction includes amplified allele efficiency variation, and environmental pollution and genotype are dirty Dye.

In the embodiment of certain measurement times counting methods, pass through one group of Different Ploidy state and polymorphic site equipotential base Because the model of unbalance factor generates individual probability.In these embodiments and other embodiments, by considering chromosome piece Chain between polymorphic locus generates joint probability to section above.

Therefore, it in an illustrative embodiment for being combined with these embodiments, provides a method, examines here Survey the ploidy in individual specimen comprising following steps:

A. the nucleic acid sequence data of the allele in individual chromosome segment at one group of polymorphic site is received；

B. the gene frequency for detecting gene loci set, uses nucleotide sequence data；

C. amplified allele efficiency variation is corrected in the gene frequency detected, correct one group of generation polymorphic The gene frequency in property site；

D. the phase allelic information of polymorphic site group is generated by the phase of assessment nucleotide sequence data；

E. gene frequency and Different Ploidy state set and polymorphic site allele after correcting by comparing are not The model of balanced ratio generates the individual probability of the gene frequency of the polymorphic site for Different Ploidy state；

F. joint probability is generated, to polymorphic position point set, by combining individual probability (to consider more on chromosome segment Linkage relationship between state property site)；

G. according to joint probability, the selection instruction most suitable model of chromosome aneuploid.

On the other hand, a kind of system is provided here, for detecting the ploidy in individual specimen, this system Include:

A. input processor is configured as receiving gene frequency number in the polymorphic site group of chromosome segment According to the quantity including each allele on site each in sample；

B. modeling device is configured as:

I. for a series of polymorphic sites, the allele that the phase by assessing gene frequency generates phase is believed Breath；With

Ii. the gene frequency of the polymorphic site for Different Ploidy state is generated using gene frequency data Frequency of individuals；

Iii. the Combined Frequency of polymorphic site group is generated using frequency of individuals and phase allelic information；

C. assume manager, be configured to select the most suitable model of display ploidy according to joint probability, To determine the ploidy of chromosome segment.

In certain embodiments of the system implementation plan, gene frequency data are generated by nucleic acid sequencing system Data.In certain embodiments, which further includes error correction unit, to correct those gene frequency data In mistake, wherein the gene frequency data after those corrections be not then modeled person using so that generate individual probability.One In a little embodiments, error correction unit corrects for the efficiency variation of amplified allele.In certain embodiments, modeler Individual probability is generated by a group model, this group model includes the equipotential in different ploidy state and polymorphic position point set Gene unbalance factor.In some example embodiments, modeler is by considering Genetic polymorphism site in chromosome segment On linkage relationship, to generate joint probability.

In illustrative embodiment, a kind of system is provided here, for detecting the chromosome in individual specimen times Property comprising it is following:

A. input processor is configured to receive the nucleosides of the allele of polymorphic site group in individual chromosome segment Acid sequence data, and the gene frequency on gene loci is detected using nucleotide sequence data；

B. error correction unit is configured to the error for the gene frequency that correction detects, is polymorphic position point set Symphysis is at correct gene frequency；

C. modeling device is configured as:

I. the allelic information of the polymorphic position point set of phase is generated by the phase of assessment nucleic acid sequence data；

Ii. by comparing the allelic information of phase and Different Ploidy state set and serial polymorphic site equipotential base Because the model of unbalance factor generates the individual probability of the gene frequency to the polymorphic site of Different Ploidy state；

Iii. by combining individual probability, consider that the relative distance of polymorphic site on chromosome segment generates polymorphic position The joint probability of point set；

D. assume manager, be configured for according to joint probability, selection indicates the most suitable of chromosomal aneuploidy Model.

In some aspects, this invention provides a method to determine in individual sample whether deposit

In the tumour nucleic acid of circulation, comprising:

A. sample is analyzed to determine a series of ploidy of polymorphic sites in individual chromosome segment；

B. the unbalanced level of allele determined based on ploidy occurred in polymorphic site is determined, wherein equipotential base Show that there are circulating tumor nucleic acid in sample because imbalance is equal to or more than 0.4%, 0.45% or 0.5%.

In certain embodiments, it is determined whether further include that detection mononucleotide becomes there are the method for circulating tumor nucleic acid The single nucleotide variations on single nucleotide variations site in ectopic sites group, wherein detect allele imbalance be equal to or Greater than 45% or detect single nucleotide acid variation, or both have concurrently, show that there are circulating tumor nucleic acid in sample.

In certain embodiments, it is determined whether there are the analytical procedures of the method for circulating tumor nucleic acid, including analysis one Group chromosome segment, it is known that it is shown as aneuploid in cancer.In certain embodiments, it is determined whether it is swollen to there is circulation The analytical procedure of the method for tumor nucleic acid, including analysis 1000 to 5000 between or 100 to 1000 between ploidy polymorphic position Point.

In some aspects, it provides here a kind of for detecting the method that single nucleotide acid makes a variation in sample.Therefore, here It provides a kind of for the determining side that single nucleotide variations whether there is on one group of genomic locations from individual specimen Method, this method include:

A. to each genomic locations, frequency estimation is generated and to amplicon in genome using a set of training dataset The assessment of the error rate of each circulation when extending on position；

B. the nucleotide similarity information observed is received for each of sample genomic locations；

C. by comparing the nucleotide similarity information and a Different Variation index observed on each genomic locations Model, using the error rate of independent each circulation on the amplification efficiency and each genomic locations after assessment determine one group every A possibility that one or more single nucleotide acid mutation probabilities being really mutated are caused on a genomic locations；

D. most possible true mutation rate and confidence level are determined from the Making by Probability Sets of each genomic locations.

In the illustrative embodiment for the method for being used to determine whether to make a variation there are single nucleotide acid, for one group across base Because the amplicon of group position generates assessment efficiency and each cyclic error rate.For example, 2,3,4,5,10,15,20,25,50,100 Or more the amplicon across genomic locations can be included.In certain embodiments of this method, for detecting The detection limit of one or more spleen necrosis virus is 0.015%, 0.017% or 0.02%.

It is being used to determine whether that there are the nucleosides in the illustrative embodiment of the method for single nucleotide acid variation, observed Sour affinity information includes to number observed by each genomic locations total indicator reading and to making a variation on each genomic locations Number observed by allele.

For determining in the illustrative embodiment of method that single nucleotide acid variation whether there is, sample is blood plasma sample This, single nucleotide acid variation is present in the Circulating tumor DNA of sample.

In another embodiment, it provides a method here for detecting in the tested sample of individual One or more single nucleotide variations.According to the method for this embodiment, comprising the following steps:

A. it determines average variant gene frequency in one group of reference sample from each normal individual, is transported according to sequencing Row determines selected single nucleotide variations as a result, in each mononucleotide variant position of one group of mononucleotide differential location An average variant gene frequency of the site in normal sample is lower than a threshold value, and determination is deleting each monokaryon The background error in the later each single nucleotide variations site of thuja acid variant sites outliers；

B. a reading weighted average observed and variance depth are determined, to the mononucleotide selected in detection sample Variant sites run data generated based on the sequencing to test sample.

C. it is determined using computer, one or more single nucleotide variations sites, compared with the background error in the site With weighted average statistical significance is read, to detect one or more single nucleotide variations.

In certain embodiments of this method for detecting one or more SNV, sample is plasma sample, control sample It originally is plasma sample, the one or more single nucleotide acid variants detected are present in the Circulating tumor DNA of sample.It is being used for It include at least 25 samples in multiple samples for reference in the certain embodiments for detecting this method of one or more SNV.With In certain embodiments of this method of detection one or more SNV, exceptional value is by the number from high-flux sequence operation generation According to middle removing, the observation depth for reading weighted average is calculated, determines the variance observed.For detecting one or more In certain embodiments of this method of SNV, the reading depth to each single nucleotide variations site of test sample is at least 100 readings.

In certain embodiments of this method for detecting one or more SNV, sequencing operation includes limiting primer Multiplex amplification reaction under reaction condition.In certain embodiments of this method for detecting one or more SNV, detection It is limited to 0.015%, 0.017% or 0.02%.

In one aspect, the present invention describes a kind of method, it is determined whether there are the copies of the first homologous chromosomal segments Number is overexpressed, compared with the second homologous chromosomal segments in one or more cellular genomes from individual.In some realities It applies in scheme, which comprises obtain the phase gene data of the first homologous chromosomal segments composition, including the first homologous dye The identity of the allele at that gene loci on chromosome fragment, for polymorphic position on the first homologous chromosomal segments For each locus in point set；The phase gene data of the second homologous chromosomal segments is obtained, including is present in second The identity of allele on homologous chromosomal segments at the gene loci, for polymorphic on the second homologous chromosomal segments Each locus in property gene loci set；And the genetic alleles data of measurement are obtained, including one from individual The quantity of each allele present in DNA the or RNA sample of a or multiple cells, in the set of Genetic polymorphism site Each gene loci at each allele for.In some embodiments, the method includes enumerating one group one Or multiple hypothesis, specify the copy of the first homologous chromosomal segments in the genome of individual one or more cells excessively high A possibility that degree, calculating (such as calculating on computers) one or more hypothesis, the genetic data of the sample based on acquisition With phase gene data obtained, and selection have maximum likelihood it is assumed that thereby determining that individual one or more thin The overexpression degree of the copy number of first homologous chromosomal segments in born of the same parents' genome.In some embodiments, the number of phases According to including the phase data speculated using the group obtained based on Haplotype frequencies, and measurement phase data (for example, logical It crosses and determines phase data to what the sample containing DNA or RNA from individual or individual relatives measured acquisition).

In one aspect, the present invention describes a kind of method, is used to determine whether that there are the first homologous chromosomal segments The overexpression of copy number, compared with the second homologous chromosomal segments in the genome of one or more cells from individual. In some embodiments, the method includes obtaining the phase genetic data of the first homologous chromosomal segments, including first same The identity of the allele at that gene loci in source chromosome segment, for polymorphic on the first homologous chromosomal segments Property site set in each locus for；The phase gene data of the second homologous chromosomal segments is obtained, including is present in The identity of allele on second homologous chromosomal segments at the gene loci, on the second homologous chromosomal segments Each locus in the set of Genetic polymorphism site；And the genetic alleles data of measurement are obtained, including individual one Or the quantity of each allele present in multiple cell DNAs or RNA sample, for every in the set of Genetic polymorphism site For each allele at a gene loci.In some embodiments, the method includes enumerating one group one or more A hypothesis specifies the degree of the overexpression of the first homologous chromosomal segments；It calculates, for each hypothesis, multiple genes in sample The expection genetic data in site, from phase genetic data obtained；(such as calculating on computers) is calculated between sample Fitting data in the genetic data and sample of middle acquisition between expected genetic data；According to data fitting to one or more false If being ranked up；Selected and sorted it is highest it is assumed that so that it is determined that in individual one or more cellular genomes the first homologous dye The degree that chromosome fragment copy number is overexpressed.

In one aspect, the present invention describes a kind of method, it is determined whether there are the copies of the first homologous chromosomal segments Number is overexpressed, compared with the second homologous chromosomal segments in one or more cellular genomes from individual.In some realities It applies in scheme, which comprises obtain the phase gene data of the first homologous chromosomal segments composition, including the first homologous dye The identity of the allele at that gene loci on chromosome fragment, for polymorphic position on the first homologous chromosomal segments For each locus in point set；The phase gene data of the second homologous chromosomal segments is obtained, including is present in second The identity of allele on homologous chromosomal segments at the gene loci, for polymorphic on the second homologous chromosomal segments Each locus in property gene loci set；And the genetic alleles data of measurement are obtained, for Genetic polymorphism position For each allele at each gene loci in point set, including from the one or more target cells and one of individual The quantity of each allele present in DNA the or RNA sample of a or multiple non-target cells.In some embodiments, institute Stating method includes enumerating one group of one or more hypothesis, specifies the degree of the overexpression of the first homologous chromosomal segments；Calculate (example As calculated on computers), for each hypothesis, the expection genetic data in multiple sites in sample is lost from the phase obtained Pass in data, for one or more target cells DNA or RNA until one or more of DNA or RNA total in sample can For energy probability；Each possible probability of (such as calculating on computers) DNA or RNA are calculated, and is directed to each hypothesis, The possibility probability data of DNA or RNA is fitted between acquisition genetic data and the expection genetic data of sample of sample, and And it is fitted that hypothesis；One or more hypothesis is ranked up according to data fitting；And it selects wherein to arrange highest vacation If so that it is determined that from individual one or more cells genome in the first homologous chromosomal segments copy number cross table The degree reached.

In one aspect, the present invention describes a kind of method, is used to determine whether that there are the first homologous chromosomal segments Copy number is overexpressed, compared with the second homologous chromosomal segments in one or more cellular genomes from individual.One In a little embodiments, which comprises obtain the phase gene data of the first homologous chromosomal segments composition, including first same The identity of the allele at that gene loci in source chromosome segment, for polymorphic on the first homologous chromosomal segments Property site set in each locus for；The phase gene data of the second homologous chromosomal segments is obtained, including is present in The identity of allele on second homologous chromosomal segments at the gene loci, on the second homologous chromosomal segments Each locus in the set of Genetic polymorphism site；And the genetic alleles data of measurement are obtained, including from individual The number of each allele present in DNA the or RNA sample of one or more target cells and one or more non-target cells Amount, for each allele at each gene loci in the set of Genetic polymorphism site.In some embodiments In, the method includes enumerating one group of one or more hypothesis, specify the degree of the overexpression of the first homologous chromosomal segments；Meter Calculate (such as calculating on computers), for each hypothesis, the expection genetic data in multiple sites in sample, from the phase obtained Position genetic data in, for one or more target cells DNA or RNA until one or more in DNA or RNA total in sample For a possibility probability；To each of multiple sites site calculate (such as on computers calculate) its DNA or RNA can Energy probability, and a possibility that its hypothesis is correct is calculated to each hypothesis, by comparing the acquisition heredity of that gene loci of sample The expection genetic data of data and that site, for the possibility probability of DNA or RNA and that hypothesis；Determine each vacation The joint probability said, by combining a possibility that hypothesis is for each site and each possibility probability；Selection has The hypothesis of maximum joint probability, so that it is determined that the degree that the copy number of the first homologous chromosomal segments is overexpressed.In some implementations In scheme, all sites are considered disposably to calculate the probability of specific hypothesis, and the hypothesis with maximum probability is selected.

In one aspect, the present invention describes a kind of method, for determining an interested dyeing in Fetal genome The copy number of body segment.In some embodiments, the method includes obtaining the phase of at least one biology parent of fetus Position genetic data, wherein phase genetic data includes polymorphic on the first homologous chromosomal segments and the second homologous chromosomal segments Property locus group in allele existing for each locus identity, a pair of homologous comprising chromosome segment interested In chromosome segment.In some embodiments, the method includes one group of polymorphisms on interested chromosome segment Genetic data is obtained in site, in portion comprising foetal DNA or RNA and from the maternal DNA of fetus mother or the mixing of RNA DNA or RNA sample in, pass through the amount of each allele on each gene loci of measurement.In some embodiments, institute Stating method includes enumerating one group of one or more hypothesis, the specified chromosome segment interested being present in Fetal genome Copy number.In some embodiments, the method includes enumerating one group of one or more hypothesis, for one of fetus or two A parent specifies the copy number of first homologous chromosomal segments in Fetal genome from parent or part thereof, fetus gene The copy number of the second homologous chromosomal segments in group from parent or part thereof, and the sense being present in Fetal genome are emerging The copy sum of interesting chromosome segment.In some embodiments, this method includes calculating (such as calculating on computers), right In it is each it is assumed that multiple locus in mixing sample expection genetic data, from it is obtained from parent phase heredity Data；Calculate the expected hereditary number for obtaining genetic data and mixing sample of (such as calculating on computers) between mixing sample Fitting data between；One or more hypothesis are ranked up according to data fitting；The highest hypothesis of selected and sorted, thus Determine the copy number of interested chromosome segment in Fetal genome.

In one aspect, the present invention describes a kind of method, for determining an interested dyeing in Fetal genome The copy number of body segment.In some embodiments, the method includes obtaining the phase of at least one biology parent of fetus Position genetic data, wherein the phase genetic data is included in the first homologous chromosomal segments and the second homologue in parent The identity of allele existing for each locus in one group of polymorphic locus in segment.In some embodiments, It include tire in portion the method includes obtaining genetic data in one group of polymorphic site on chromosome or chromosome segment Youngster DNA or RNA and in the mixed DNA or RNA sample of the maternal DNA or RNA of fetus mother passes through each base of measurement Because of the amount of each allele on site.In some embodiments, one or more false the method includes enumerating one group It says, the copy number of the specified chromosome or chromosome segment interested being present in Fetal genome.In some embodiments, The method includes creation (such as creating) on computers, for each hypothesis, creates multiple genes in a mixing sample The probability distribution of the desired amount of each allele in site on each site, it is obtained from parent's from (i) The probability of the one or more exchanges of phase genetic data and (ii) optionally, it may occur however that Yu Peizi shaping age carrys out fetus It says and contributes to a copy for its interested chromosome or chromosome segment；It is quasi- to calculate (such as calculating on computers) one It closes, for each it is assumed that in multiple sites in the genetic data and (2) mixing sample for the mixing sample that (1) obtains each On site between the probability distribution of the desired amount of each allele；One or more hypothesis are arranged according to data fitting Sequence；And the highest hypothesis of selected and sorted, so that it is determined that in Fetal genome interested chromosome segment copy number.

In some embodiments, this method includes obtaining phase genetic data for mother of fetus.In some embodiment party In case, the method includes enumerating one group of one or more hypothesis, specify homologous from maternal first in Fetal genome The copy number of chromosome segment or part thereof, from maternal the second homologous chromosomal segments or part thereof in Fetal genome Copy number, and the total copy number of chromosome segment interested being present in Fetal genome.In some embodiments, The method includes calculating, for each hypothesis, the expection genetic data in multiple sites, comes from from obtained in mixing sample In maternal phase genetic data.

In some embodiments, the expection genetic data of each hypothesis includes mother body D NA or RNA and tire in mixing sample The consistency and amount of one or more allele of each locus in multiple locus of youngster DNA or RNA.In some implementations In scheme, the method includes calculating (such as calculating on computers) expected genetic datas, pass through tire in measurement mixing sample The ratio of youngster DNA or RNA and the ratio of mother body D NA or RNA.In some embodiments, this method includes calculating, for multiple Each site in gene loci, in mixing sample in female parent DNA or RNA one or more allele of the locus it is pre- Phase amount, using the identity of the allele occurred on that site, in the maternal phase genetic data of acquisition, and it is mixed Close the part of female parent DNA or RNA in sample.In some embodiments, this method includes calculating (such as to count on computers Calculate), for each gene loci in multiple gene locis in each hypothesis, inherit in mixing sample in maternal foetal DNA Or the desired amount of one or more allele in RNA on that gene loci, by by hypothesis specify via fetus after The identity of the allele that gene loci occurs on maternal first or second homologous chromosomal segments held, The copy number from maternal first or second homologous chromosomal segments by fetus genetic is appointed as by hypothesis, and The ratio of foetal DNA or RNA in mixing sample.

In some embodiments, the expection genetic data of each hypothesis includes mother body D NA or RNA and tire in mixing sample The consistency and amount of one or more allele of each locus in multiple locus of youngster DNA or RNA.In some implementations In scheme, the method includes calculating expected genetic data, by the ratio of foetal DNA or RNA in measurement mixing sample and The ratio of mother body D NA or RNA.In some embodiments, this method includes calculating (such as calculating on computers), for more Each site in a gene loci, maternal DNA of the locus in mixing sample or one or more equipotential bases in RNA The desired amount of cause uses the base in the phase genetic data of female parent obtained in mixing sample and the ratio of mother body D NA or RNA Because of the identity of allele existing at site.In some embodiments, this method includes calculating (such as on computers Calculate), for each gene loci in multiple gene locis in each hypothesis, inherit in mixing sample in maternal and male parent The desired amount of one or more allele in foetal DNA or RNA on that gene loci, by by hypothesis specify via The allele that gene loci occurs on maternal first or second homologous chromosomal segments that fetus inherits Identity is appointed as the copy from maternal first or second homologous chromosomal segments by fetus genetic by hypothesis Number, specifies that gene position on the first or second homologous chromosomal segments from male parent inherited via fetus by hypothesis The identity for the allele that point occurs, is appointed as same by the first or second from male parent of fetus genetic by hypothesis The ratio of foetal DNA or RNA in the copy number and mixing sample of source chromosome segment.In some embodiments, group's frequency Rate be used to predict the identity of the first or second homologous chromosomal segments allelic from male parent.In some embodiment party In case, the possible allele each of at each gene loci in the first or second homologous chromosomal segments of male parent Probability be considered identical.

In some embodiments, the method includes obtaining the female parent of fetus and the phase genetic data of male parent.One In a little embodiments, the method includes enumerating one group of one or more hypothesis, the female parent in Fetal genome is specified The copy number of first homologous chromosomal segments or part thereof, the second maternal homologous chromosomal segments in Fetal genome Copy number, the copy number of first homologous chromosomal segments of male parent in Fetal genome or part thereof, come from fetus The copy number of the second homologous chromosomal segments of male parent in genome or part thereof, and the sense being present in Fetal genome Total copy number of interest chromosome segment.In some embodiments, the method includes calculating (such as to count on computers Calculate), for it is each it is assumed that in mixing sample multiple gene locis expection genetic data, from obtained from maternal phase In position genetic data and the phase genetic data of the male parent of acquisition.

In some embodiments, expected genetic data includes each position in multiple gene locis for each hypothesis The identity and amount of one or more allele on point, from the maternal DNA or RNA and foetal DNA or RNA in mixing sample In.In some embodiments, the method includes calculating expected genetic data, pass through foetal DNA in measurement mixing sample Or RNA ratio and mother body D NA or RNA ratio.In some embodiments, this method includes calculating (such as in computer Upper calculating), for each site in multiple gene locis, that gene loci in female parent DNA or RNA in mixing sample One or more allele desired amount, using occurring on that gene loci in maternal phase genetic data obtained Allele identity and mixing sample in female parent DNA or RNA ratio.In some embodiments, this method includes meter Calculate (such as calculating on computers), for each site in multiple gene locis of each hypothesis, tire in mixing sample The desired amount of one or more allele in youngster DNA or RNA on that gene loci, by being specified by hypothesis via tire The allele of that gene loci appearance is same on the first or second homologous chromosomal segments from female parent that youngster inherits One property is appointed as the copy from maternal first or second homologous chromosomal segments by fetus genetic by hypothesis Number, specifies that gene position on the first or second homologous chromosomal segments from male parent inherited via fetus by hypothesis The identity for the allele that point occurs, is appointed as same by the first or second from male parent of fetus genetic by hypothesis The ratio of foetal DNA or RNA in the copy number and mixing sample of source chromosome segment.

In some embodiments, this method includes calculating (such as calculating on computers), multiple for each hypothesis One probability distribution of the expected genetic data of gene loci, from the mixing sample for obtaining phase genetic data in parent.? In some embodiments, the method includes increasing the specific allele in mixing sample on first gene loci in probability Probability in distribution if that specific allele appears in the first homologous chromosomal segments of parent, and is obtaining Mixing sample genetic data in an equipotential on a site near the first homologous chromosomal segments of parent for observing On gene；Or probability of the specific allele in reduction mixing sample on first gene loci in probability distribution, such as That specific allele of fruit appears in the first homologous chromosomal segments of parent, and in the mixing sample heredity number of acquisition On the allele on a site near the first homologous chromosomal segments of parent observed in.In some implementations In scheme, the method includes increasing the specific allele in mixing sample on second gene loci in probability distribution Probability, if that specific allele appears in the second homologous chromosomal segments of parent, and in the aggregate sample of acquisition On the allele on a site near the second homologous chromosomal segments of parent observed in this genetic data；Or Person reduces probability of the specific allele in probability distribution in mixing sample on second gene loci, if that is specific Allele appear in the second homologous chromosomal segments of parent, and observed in the mixing sample genetic data of acquisition The second homologous chromosomal segments of parent near a site on an allele on.

In some embodiments, this method includes the female parent for obtaining fetus and the phase genetic data of male parent.Some In embodiment, the method includes enumerating one group of one or more hypothesis, specify in Fetal genome from maternal first The copy number of homologous chromosomal segments or part thereof section, in Fetal genome from maternal the second homologous chromosomal segments or The copy number of its partial sector, the copy of first homologous chromosomal segments in Fetal genome from male parent or part thereof section It counts, the copy number and fetus gene of second homologous chromosomal segments in Fetal genome from male parent or part thereof section The copy sum of interested chromosome segment in group.In some embodiments, this method includes calculating (such as in computer Upper calculating), for each it is assumed that multiple gene locis are expected a probability distribution of genetic data in mixing sample, from female parent In the phase genetic data obtained with parent.In some embodiments, the method includes increasing to be present in mixing sample Probability of the specific allele in probability distribution on first gene loci, if the specific allele be present in it is maternal or In first homology segment of male parent, and the allele on the gene loci near the first homologous chromosomal segments of the parent It can be observed in mixing sample genetic data obtained；Or it reduces and is present in mixing sample on the first gene loci Probability of the specific allele in probability distribution, if the specific allele is present in maternal or male parent the first homologous region Duan Zhong, and the allele on the gene loci near the first homologous chromosomal segments of the parent is obtained in mixing sample Genetic data in do not observe.In some embodiments, the method includes increasing to be present in the second base in mixing sample Because of probability of the specific allele in probability distribution on site, if the specific allele be present in it is maternal or male parent In second homology segment, and the allele on the gene loci near the second homologous chromosomal segments of the parent can be It is observed in mixing sample genetic data obtained；Or reduce be present in mixing sample it is specific etc. on the second gene loci Probability of the position gene in probability distribution, if the specific allele is present in maternal or male parent the second homology segment, And the allele on the gene loci near the second homologous chromosomal segments of the parent is in mixing sample something lost obtained It passes and is not observed in data.

In some embodiments, the gene loci near the first gene loci and the first gene loci isolates.One In a little embodiments, the gene loci near the second gene loci and the second gene loci is isolated.In some embodiments, It does not expect to intersect between the first gene loci and the gene loci close to the first gene loci.In some embodiments In, it does not expect to intersect between the second gene loci and the gene loci of close second gene loci.In some embodiment party In case, the distance between the first gene loci and gene loci of close first gene loci are less than 5mb, 1mb, 100kb, 10kb, 1kb, 0.1kb or 0.01kb.In some embodiments, the second gene loci and the gene close to the second gene loci The distance between site is less than 5mb, 1mb, 100kb, 10kb, 1kb, 0.1kb or 0.01kb.

In some embodiments, one or more to intersect the chromosome interested for betiding and contributing a copy for fetus During the gamete of segment is formed；And intersects and generate interested chromosome segment in a Fetal genome, it includes come from In a part of the first homology segment of parent and a part of the second homology segment.In some embodiments, including one Or the hypothesis set of multiple hypothesis, specify Fetal genome in interested chromosome segment copy number, it includes from A part of the first homology segment of parent and a part of the second homology segment.

In some embodiments, the expection genetic data of mixing sample includes multiple bases in the mixing sample of each hypothesis Because of the desired amount of allele one or more on gene loci each in site.

In one aspect, the present invention describes a kind of method, for determining that the copy number of the first homologous chromosomal segments is No overexpression, compared with the second homologous chromosomal segments in genes of individuals group (such as in one or more cellular genomes, CfDNA, cfRNA suspect the individual for suffering from cancer, fetus or embryo), utilize phase genetic data.In some embodiments, The method includes simultaneously or successively (i) obtains phase genetic data in any order, to the first homologous chromosomal segments, Identity comprising being present in the allele on the first homologous chromosomal segments at the gene loci, dyeing homologous for first In body segment for each gene loci of multiple polymorphic sites, (ii) obtains phase genetic data, to the second homologous dyeing Body segment, it includes the identity for being present in the allele on the second homologous chromosomal segments on the gene loci, for For each gene loci on two homologous chromosomal segments in multiple Genetic polymorphism sites, and (iii) obtains measurement Genetic alleles data, the amount including each allele at each gene loci in the set of Genetic polymorphism site, The mixing of dissociative DNA or RNA from individual one or more cells or from the different cells of the two or more heredity of individual In sample.In some embodiments, the method includes calculating allele ratio, at least one for separating sample For one or more sites in multiple heterozygosity Genetic polymorphisms site in cell.In some embodiments, it calculates The allele ratio for specific site, be the measurement amount an of allele divided by allele all on gene loci Overall measurement amount.In some embodiments, the method includes determining whether there is the copy of the first homologous chromosomal segments Number is overexpressed, by comparing one or more allele ratios and expected allele calculated on a gene loci Ratio, if such as first and second homologous chromosomal segments of expected ratio of that gene loci deposit at equivalent ratios When.In some embodiments, it is contemplated that ratio 0.5, for biallelic marker.

In some embodiments for antenatal test, the method includes simultaneously or successively in any order (i) The phase gene data (such as the maternal fetus bred of pregnancy) of the first homologous chromosomal segments in Fetal genome is obtained, including It is present in the identity of the allele on that site of the first homologous chromosomal segments, on the first homologous chromosomal segments One group of Genetic polymorphism site in each site for, (ii) obtains the second homologous chromosomal segments in Fetal genome Phase gene data, including the identity for the allele being present on that site of the second homologous chromosomal segments, for For each site in one group of Genetic polymorphism site on two homologous chromosomal segments, and (iii) obtains the something lost of measurement The measurement that allele data include each allele amount is passed, in one group of polymorphism of fetus female parent DNA or RNA mixing sample On each site of gene loci comprising foetal DNA or RNA and female parent DNA or RNA are (for example originating from maternal blood sample In dissociative DNA or RNA a mixing sample, dissociative DNA or RNA including fetus and maternal dissociative DNA or RNA).? In some embodiments, the method includes calculating the allele ratio of one or more gene locis, it is in fetus Heterozygosis and/or be heterozygosis in female parent.In some embodiments, allele ratio specific gene site calculated It is the measurement amount an of allele divided by the overall measurement amount of allele all on gene loci.In some embodiments, The method includes determining whether there is the copy number of the first homologous chromosomal segments to be overexpressed, by comparing a gene loci On one or more allele ratios calculated and expected allele ratio, such as one of that gene loci in advance If the first and second homologous chromosomal segments of phase ratio at equivalent ratios in the presence of.

In some embodiments, the allele ratio of a calculating indicates the copy number of the first homologous chromosomal segments Overexpression, if (i) being present in the allele of allele measured quantity at that gene loci on the first homologue Ratio divided by allele all on locus overall measurement amount, greater than the expection allele ratio of that gene loci, or (ii) it is present in the allele ratio of allele measured quantity at that gene loci on the second homologue divided by gene The overall measurement amount of all allele on seat, greater than the expection allele ratio of that gene loci.In some embodiments In, the gene frequency of a calculating indicates that the copy number of the first homologous chromosomal segments is not overexpressed, if (i) existed In the allele ratio of allele measured quantity owns divided by locus at that gene loci on the first homologue The overall measurement amount of allele is present in the second homologous dye less than the expection allele ratio of that gene loci, or (ii) On colour solid at that gene loci the allele ratio of allele measured quantity divided by the total of allele all on locus Measurement amount, more than or equal to the expection allele ratio of that gene loci.

In some embodiments, it is determined whether there are the overexpressions of the copy number of the first homologous chromosomal segments to include column One group of one or more hypothesis is lifted to specify the degree of the overexpression of the first homologous chromosomal segments.In some embodiments In, at least one cell heterozygosis site (such as in fetus heterozygosis and/or in female parent heterozygosis site) it is pre- The allele ratio of survey is assessed each hypothesis, and the degree of overexpression is specified by that hypothesis.One In a little embodiments, a possibility that hypothesis is correct, is calculated, by comparing the equipotential base of the allele ratio and prediction that calculate Because of ratio, and select the hypothesis with maximum likelihood.In some embodiments, an expection of a test statistics point Cloth is calculated, and the prediction allele ratio of each hypothesis is used.In some embodiments, a possibility that hypothesis is correct is counted It calculates, is calculated by comparing using a test statistics for calculating allele ratio calculating and using expected allele ratio Test statistics expected distribution, and select the hypothesis with maximum likelihood.In some embodiments, at least one is thin The prediction of the gene loci of heterozygosis in born of the same parents' (such as being heterozygosis in fetus, and/or be the gene loci of heterozygosis in parent) Allele ratio is estimated, according to the phase genetic data on the first homologous chromosomal segments, the second homologous chromosomal segments On phase genetic data, and the degree of overexpression specified by the hypothesis.In some embodiments, hypothesis correctly may be used Energy property is calculated, by comparing the allele ratio calculated and expected allele ratio；And select that there is maximum likelihood Hypothesis.

In some embodiments, from the DNA (or RNA) of one or more target cells until total DNA (or RNA) in sample Ratio calculated.One exemplary ratios is the ratio of the foetal DNA (or RNA) and total DNA (or RNA) in sample.One In a little embodiments, the ratio of foetal DNA and total DNA is by measuring an equipotential on one or more gene locis in sample The amount of gene determines that wherein fetus has the allele and female parent does not have.In some embodiments, tire in sample The ratio of youngster DNA and total DNA is determined by the methylation differential between the one or more maternal and foetal alleles of measurement. In some embodiments, one group of one or more hypothesis is listed to describe the overexpression journey of the first homologous chromosomal segments Degree.In some embodiments, at least one cell heterozygosis site (such as in fetus heterozygosis and/or it is maternal The site of middle heterozygosis) prediction allele ratio, when being calculating ratio according to DNA or RNA and assessing each hypothesis Hypothesis specified overexpression degree is assessed.In some embodiments, a possibility that hypothesis is correct is calculated, and is passed through Compare the allele ratio of calculating and the allele ratio of prediction, and selects the hypothesis with maximum likelihood.Some In embodiment, tested to obtain a statistic using the allele ratio of prediction and DNA the or RNA ratio of calculating It is expected that distribution, estimates each hypothesis.In some embodiments, a possibility that hypothesis is correct is determined, and passes through ratio Compared with the test statistics that the ratio using the allele ratio of calculating and the DNA of calculating or RNA is calculated, and use The expected distribution for the test statistics that the allele ratio and DNA of prediction or the calculating ratio of RNA are calculated, and Select the hypothesis with maximum likelihood.

In some embodiments, the method includes enumerating one group of one or more hypothesis to specify the first homologous dyeing The degree of the overexpression of body segment.In some embodiments, the method includes assessments, for each hypothesis or (i) exist Be at least one cell heterozygosis gene loci (such as in fetus be heterozygosis and/or in mother be heterozygosis gene Site) prediction allele ratio, according to the overexpression degree that the hypothesis is specified, or (ii) for one or more possible DNA or RNA ratio (such as ratio of the total DNA or RNA in foetal DNA or RNA and sample), is calculated a test statistics The expected distribution of amount, using the allele ratio of prediction and from one or more target cells (such as fetal cell) DNA or RNA Until the possibility ratio of total DNA in sample or RNA.In some embodiments, data fitting is calculated, is calculated by comparing (i) Allele ratio and prediction allele ratio, or (ii) utilize the allele ratio and DNA or RNA calculated Possible ratio, and the test statistics being calculated using the allele ratio and DNA of prediction or the possibility ratio of RNA Desired distribution.In some embodiments, it is fitted according to data and ranking is carried out to one or more hypothesis, and select ranking Highest hypothesis.In some embodiments, a kind of technology or algorithm, such as a searching algorithm, are used in following steps One or more: calculate data fitting, sort to hypothesis, or the top ranked hypothesis of selection.In some embodiments, number It is for β-bi-distribution or for a fitting of bi-distribution according to fitting.In some embodiments, the technology or Algorithm is gathered selected from one, including maximal possibility estimation, MAP estimation, Bayesian Estimation, dynamic estimation (such as dynamic Bayesian Estimation) and expectation maximization estimation.In some embodiments, the method includes the application technologies or algorithm to go Obtain genetic data and expected genetic data.

In some embodiments, the method includes one possible ratio of creation (such as foetal DNA or RNA and samples In total DNA or RNA ratio) subregion, range total DNA or RNA into sample from one or more target cell DNA or RNA DNA or RNA ratio under be limited to the upper limit.In some embodiments, one group of one or more hypothesis is listed, is specified The degree of overexpression on one homologous chromosomal segments.In some embodiments, the method includes assessments, in subregion The possibility ratio and each hypothesis of each DNA or RNA or (i) at least one cell be heterozygosis gene loci (such as in fetus be heterozygosis and/or in mother be heterozygosis gene loci) prediction allele ratio, according to DNA Or the overexpression degree that the possibility ratio of RNA and the hypothesis are specified, or (ii) using prediction allele ratio and DNA or The expected distribution of one test statistics of possibility ratio calculation of RNA.In some embodiments, the method includes calculating, A possibility that possibility ratio for each of subregion DNA or RNA and for each hypothesis, hypothesis is correct, pass through ratio Compared with the allele ratio of (i) allele ratio calculated and prediction, or (ii)

The inspection statistics obtained using the allele ratio and DNA of calculating or the possibility ratio calculation of RNA, with benefit The inspection statistics obtained with the possibility ratio calculation of the allele ratio of prediction and DNA or RNA.In some embodiments, The joint probability of each hypothesis is determined, by combining that hypothesis probability to possibility ratio each in subregion；And Select the hypothesis with greatest combined probability.In some embodiments, the joint probability of each hypothesis is determined, and passes through weight one A possibility that a hypothesis is for specific possible ratio, be based on the possible ratio correct ratio possibility under.

In one aspect, the present invention describes a kind of method, for determining the copy number of chromosome or chromosome segment, In the genome of one or more cells from individual, using phase or genetic data is obscured.In some embodiments, institute Stating method includes obtaining genetic data, on one group of polymorphic site on the chromosome or chromosome segment of a sample, is led to Cross the amount for measuring each allele at each gene loci.In some embodiments, sample comes from individual one Or DNA the or RNA sample of multiple cells, or the mixing sample from individual dissociative DNA comprising come from two or more The dissociative DNA of a heredity difference cell.In some embodiments, the allele ratio of heterozygous sites is calculated, in sample In at least one cell in source.In some embodiments, it is for the allele ratio of the calculating in specific gene site The measurement amount of each allele divided by allele all on gene loci total measurement amount.In some embodiments, The allele ratio of the calculating in specific gene site is (such as the first homologous chromosomal segments of allele on the site On allele) measurement amount divided by one or more of the other allele measurement amount (such as the second homologue piece Allele in section).In some embodiments, one group of one or more hypothesis is listed, one or more cell is specified The copy number of chromosome or chromosome segment in genome.In some embodiments, based on the most probable of test statistics Hypothesis selected, so that it is determined that the copy number of chromosome or chromosome segment in one or more cellular genome.

In one aspect, the present invention describes a kind of method, for determining that fetus (such as is being pregnant and is breeding in female parent Fetus) copy number of chromosome or chromosome segment in genome, using phase or obscure genetic data.In some embodiments In, the method includes obtaining the genetic data of one group of polymorphic site on sample chromosomes or chromosome segment, pass through survey Measure the amount of each allele at each gene loci.In some embodiments, sample be comprising foetal DNA or RNA and The mixing sample of maternal DNA or RNA from fetomaternal

(such as the trip in the maternal serum sample containing fetus dissociative DNA or RNA and maternal dissociative DNA or RNA From DNA or RNA mixing sample).In some embodiments, allele ratio is calculated, for being heterozygosity in fetus And/or in female parent be heterozygosity gene loci for.In some embodiments, the calculating in specific gene site etc. Position gene ratio is the measurement amount of an allele at gene loci divided by the overall measurement amount of all allele.Some In embodiment, the allele ratio of the calculating in specific gene site is the allele (such as at gene loci Allele on one homologous chromosomal segments) measurement amount divided by other one or more allele, (such as second is homologous Allele on chromosome segment) measurement amount.In some embodiments, one group of one or more hypothesis is listed, Specify the copy number of chromosome or chromosome segment in Fetal genome.In some embodiments, it is based on test statistics Most probable hypothesis selected, so that it is determined that the copy number of chromosome or chromosome segment in Fetal genome.

In some embodiments, a hypothesis is selected, if belonging to the survey of the test statistics distribution of the hypothesis It tries statistic probability and is higher than the upper limit；One or more hypothesis are denied, if belong to the test statistics distribution of the hypothesis Test statistics probability is lower than lower limit；Or a hypothesis is not only unselected but also is not denied, if belonging to the test of the hypothesis The test statistics probability of statistics distribution between lower and upper limit, or if probability not with sufficiently high confidence level quilt It determines.In some embodiments, the overexpression of the copy number of the first homologous chromosomal segments is due to the first homologue The missing of the repetition of segment or the second homologous chromosomal segments.In some embodiments, the institute of one or more gene locis There is the overall measurement amount of allele to be compared with reference quantity, to determine the overexpression of the copy number of the first homologous chromosomal segments Whether be repetition or the second homologous chromosomal segments due to the first homologous chromosomal segments missing.In some embodiments In, the size of the difference between the allele ratio of the calculating at one or more gene locis and expected allele ratio It is used to determine whether being overexpressed for the copy number of the first homologous chromosomal segments is weight due to the first homologous chromosomal segments Multiple or the second homologous chromosomal segments missings.In some embodiments, the first and second homologous chromosomal segments are determined Exist at equivalent ratios, if there is no the overexpression of the copy number of the first homologous chromosomal segments, and it is same without second The overexpression of source chromosome segment (such as in the genome of cell, cfDNA, cfRNA are individual, fetus or embryo).

In some embodiments, from the DNA ratio of one or more target cells until the total DNA ratio in sample is true Determine, based on one or more the total amount or relative quantity of one or more allele at gene loci, for the base of target cell Being different from non-target cell genotype because of type and target cell and non-target cell are expected to be the cell of two-body.In some embodiment party In case, which is used to determine whether that the overexpression of the copy number of the first homologous chromosomal segments is due to the first homologue The missing of the repetition of segment or the second homologous chromosomal segments.In some embodiments, the ratio is for determining duplicate dye The additional copy number of chromosome fragment or chromosome.In some embodiments, phase genetic data includes probability data.Some In embodiment, obtains the phase of the first homologous chromosomal segments and/or the second homologous chromosomal segments in Fetal genome and lose Pass data include obtain the first homologous chromosomal segments in one or two of fetus biology parent parental gene group and/ Or second homologous chromosomal segments phase genetic data, and infer fetus inherited from one or two biology parent to Be which homologous chromosomal segments.In some embodiments, one or more to intersect the general of (such as 1,2,3 or 4 exchange) Rate may betide ligand forming process, contribute to one of the first homologous chromosomal segments or the second homologous chromosomal segments It is copied to fetus individual, which homologue is used to infer that fetus inherits from one or two biology parent is Segment.In some embodiments, fetus female parent and/or the phase genetic data of male parent are obtained, using a kind of technology, are selected from In one group of technology including digital pcr, haplotype is inferred using the group based on Haplotype frequencies, for example using haploid cell Sperm or ovum carry out Haplotyping A, carry out haplotype point using the genetic data from one or more first degree relatives Type, and combinations thereof.In some embodiments, individual phase genetic data is obtained, by will be in individual specimen Split-phase position is carried out corresponding to missing or duplicate all or part of region.In some embodiments, the phase heredity number of fetus According to obtained, by will correspond in the sample of fetus or fetus mother missing or duplicate all or part of region into Row split-phase position.In some embodiments, the phase genetic data for obtaining the first and second homologous chromosomal segments includes determining It is present in the identity of the allele in a chromosome segment, and determination is present in another chromosome segment by inference In allele identity.In some embodiments, do not exist in the first homologous chromosomal segments obscures hereditary number Allele in is assigned to the second homologous chromosomal segments.For example, if individual genotype be (AB, AB), and The phase data of individual indicates that first haplotype is (A, A)；So, another haplotype may infer that as (B, B).Some In embodiment, if only measuring an allele at gene loci, which is confirmed as the first He (for example, if the genotype at gene loci is AA, two haplotypes all have a part of second homologous chromosomal segments There is A allele).In some embodiments, individual phase genetic data comprises determining whether that one or more has occurred Possible chiasma, such as the sequence by determining any one flank region of recombination hotspot and recombination hotspot.One In a little embodiments, it is that haplotype section is deposited with determination that any primed libraries of the invention, which are used to detection recombination event, It is in genes of individuals group.

In some embodiments, this method include using Joint Distribution model (such as consider track between link Joint Distribution model), linkage analysis is executed, using bi-distribution model, using beta-binomial model, and/or uses generation (such as existed using chromosome in the probability of the chiasma of Meiosis (generate gamete formed embryo grow up to fetus) The probability that different loci is intersected, builds on chromosome or chromosome segment interested in a mould and relies between polymorphic allele On the chromosome of property.).

In some embodiments, one or more allele ratios calculated of cfDNA or cfRNA indicate cfDNA Or in the derived cell of cfRNA DNA or RNA corresponding allele ratio.In some embodiments, cfDNA or cfRNA Corresponding allele ratio in the allele ratio instruction genes of individuals group that one or more calculates.In some embodiments In, an allele ratio is only calculated or is only compared with expected allele ratio, if the heredity of measurement Statistics indicate that there are more than one different allele (such as in cfDNA or cfRNA sample) for the gene loci in sample. In some embodiments, an allele ratio is only calculated or is only compared with expected allele ratio, If at least one cell for carrying out sample separation site be heterozygosity (such as in fetus be heterozygosity and/or It is the gene loci of heterozygosity in female parent).In some embodiments, an allele ratio only calculated or only with Expected allele ratio is compared, if gene loci is heterozygosity in fetus.In some embodiments, one A allele ratio is calculated or is compared with expected allele ratio, for the gene loci of homozygosity. For example, be predicted as the gene frequency in homozygosity site, for tested particular individual (or for fetus and pregnant mothers The two) for, it can be analyzed to determine the noise or error level of system.

In some embodiments, at least 10；50；100；200；300；500；750；1,000；2,000；3,000； 4000, or more gene loci (such as SNP) is analyzed, for interested chromosome or chromosome segment.? In some embodiments, the average of the gene loci (such as SNP) of every mb in interested chromosome or chromosome segment It is at least 1；10；25；50；100；150；200；300；500；750；1,000；Or more the every mb in site.In some embodiments In, the average of the gene loci (such as SNP) of every mb is between 1 to 500 in interested chromosome or chromosome segment Between site/mb, such as 1 to 50,50 to 100,100 to 200,200 to 400,200 to 300 or 300 to 400 site/mb. In some embodiments, the gene loci in potential missing or duplicate multiple portions is analyzed, to increase the spirit of CNV measurement Sensitivity and/or specificity, compared with only analyzing 1 gene loci or only analyzing several gene locis closer to each other.Some In embodiment, most common two allele is measured or is used for determining the equipotential calculated at only each gene loci Gene ratio.In some embodiments, gene loci is expanded, using with low 5' → 3' exonuclease and/or The polymerase (for example, archaeal dna polymerase, RNA polymerase or reverse transcriptase) of low strand-displacement activity.In some embodiments, it surveys The genetic alleles data of amount are obtained, and the DNA or RNA in sample is sequenced by (i), DNA in (ii) amplified sample or Then DNA or RNA in the DNA of amplification, or (ii) amplified sample is sequenced in RNA, connect PCR product, and then sequencing connection produces Object.In some embodiments, the genetic alleles data of measurement are obtained, multiple by the way that the DNA of sample or RNA to be divided into Part, increase different bar codes in each part DNA or RNA (for example, having in all DNA or RNA of specific part Have same bar code), the DNA or RNA of bar shaped code labeling are arbitrarily expanded, these parts are combined, then to item in built-up section The DNA or RNA of code indicia are sequenced.In some embodiments, the allele of Genetic polymorphism site (such as SNP) is reflected It is fixed, use one of following methods or a variety of: sequencing (such as nano-pore sequencing or Halcyon molecule are sequenced), SNP array, Real-time PCR, TaqMan, NanostringnAnalysis system uses distinctiveness archaeal dna polymerase and ligase The detection of Illumina GoldenGate Genotyping, ligation-mediated PCR, or the reversed probe (LIPs of connection；It can also be claimed For pre-cyclization probe, pre-cyclization probe, circularizing probes, padlock probe or the reversed probe of molecule (MIPs)) Illumina The measurement of GoldenGate Genotyping.In some embodiments, two or more (such as 3 or 4) target amplicons are connected It is connected together, then the product of connection is sequenced.In some embodiments, to the different equipotential bases of identical gene loci The measurement of cause is adjusted, and for the metabolism between allele, apoptosis, histone is inactivated, and/or the difference (example in amplification The difference of amplification efficiency between such as not iso-allele of identical gene loci).In some embodiments, the adjustment is held The capable calculating prior to the genetic data allele ratio to acquisition, or prior to measurement genetic data and expected genetic data Compare.

In some embodiments, the method also includes determining depositing for one or more risk factors of disease or obstacle Whether.In some embodiments, the method also includes determining wind related to disease or obstacle or with disease or obstacle Danger increases the presence or absence of relevant one or more polymorphisms or mutation.In some embodiments, the method also includes Determine the total level of cfDN Acf mDNA, cf nDNA, cfRNA, miRNA or other compositions.In some embodiments, institute Stating method includes the water for measuring interested one or more cfDN Acf mDNA, cf nDNA, cfRNA and/or miRNA molecule It is flat, such as molecule that is relevant to disease or obstacle or increasing relevant polymorphism or mutation with disease or obstacle risk.One In a little embodiments, Tumour DNA accounts in all DNA ratio (such as ratio or the total cfDNA of the tumour cfDNA in total cfDNA In with specific mutation tumour cfDNA ratio) be determined.In some embodiments, the tumour ratio is for determining cancer The phase (because higher tumour ratio may be related to the relatively late period of cancer) of disease.In some embodiments, the method It also include the total level for determining DNA or rna level.In some embodiments, the method includes measuring interested one kind Or the methylation level of a variety of DNA or RNA molecule, such as it is relevant to disease or obstacle or increase phase with disease or obstacle risk The polymorphism of pass or the molecule of mutation.In some embodiments, the method includes determining the presence of the variation of DNA integrality Whether.In some embodiments, the method also includes determining the total level of mRNA montage.In some embodiments, institute The method of stating includes the level of determining mRNA montage or detects for interested one or the optional mRNA montage of RNA molecule, example Molecule that is such as relevant to disease or obstacle or increasing relevant polymorphism or mutation with disease or obstacle risk.

In some embodiments, this invention describes a kind of methods, for detecting a cancerous phenotype in individual, Middle cancerous phenotype is defined by the presence of at least one in one group of mutant.In some embodiments, the method includes obtaining DNA or RNA measured value is obtained, for a DNA or RNA sample from individual one or more cells, one of them Or multiple cells possess cancerous phenotype by doubtful；Analysis DNA or RNA measured value goes to determine, prominent for each of ensemble de catastrophes A possibility that change, at least one cell possesses that mutation.In some embodiments, the method includes determining that individual possesses A possibility that if cancerous phenotype (i) is mutated at least one, at least one cell contains the mutation is greater than threshold value, or (ii) at least one mutation, a possibility that at least one cell contains the mutation, is less than threshold value, and for multiple For mutation, the joint possibility that at least one cell possesses at least one mutation is greater than threshold value.In some embodiments In, some or all of one or more cells possess in ensemble de catastrophes mutation.In some embodiments, sample includes free DNA or RNA.In some embodiments, DNA or RNA measurement includes measurement (such as each allele of each gene loci Amount), on one group of polymorphic site on interested one or more chromosomes or chromosome segment.

In one aspect, the present invention describes certain methods, for selecting a kind of therapy, for treatment, stablizes or prevents Disease or obstacle in mammal.In some embodiments, the method includes determining whether there is the first homologous dyeing The copy number of body segment is overexpressed, and compared with the second homologous chromosomal segments, utilizes any method described herein.In some realities It applies in scheme, the therapy for mammal is selected a kind of (such as the therapy of disease or obstacle, with the first homologue piece Section is overexpressed related).

In one aspect, the present invention describes certain methods, for preventing, delays, stablizes or treat in mammal Disease or obstacle.In some embodiments, the method includes determining whether there is the copy of the first homologous chromosomal segments Number is overexpressed, and compared with the second homologous chromosomal segments, utilizes any method described herein.In some embodiments, one Kind selected that (such as a kind of therapy of disease or obstacle crosses table with the first homologous chromosomal segments for the treatment of mammal Up to correlation), then this therapy be used to treat mammal.

In some embodiments, it treats, stable or prevention disease or obstacle include preventing or delaying disease or obstacle Initial generation or follow-up developments increase the disease-free survival time that symptom disappears between recurrence, stablize or reduction is related to illness Ill symptoms, inhibit or stablize illness progress.In some embodiments, at least 20,40,60,80,90 or 95% Treatment subject has a complete incidence graph, and wherein all symptoms of disease disappear.In some embodiments, subject's quilt Diagnosis is at least 20,40,60,80,100,200 or even 500% greater than (i) with the time-to-live length after disease and treatment The survival mean time area of a room of untreated subject, or (ii) use the survival average time of the subject of another therapy treatment Amount.

In some embodiments, it treats, stable or pre- anti-cancer includes reducing or stablizing tumour (for example, one benign Or malignant tumour) size, slow down or prevent the increase of tumor size, reduce or stablize the number of tumour cell, increase tumour Disappear and its recur between the disease-free survival time, prevent tumour it is initial generation or follow-up developments, reduce or stablize with it is swollen The relevant ill symptoms of tumor.In one embodiment, the cancer cell count survived after treatment is at least originated than the first of cancer cell Raw number low 10,20,40,60,80 or 100%, as using measured by any code test.In some embodiments In, it is reduced by using the cancer cell number purpose that a kind of therapy of the invention obtains bigger than the reduction of non-cancerous cell number extremely It is 2,5,10,20 or 50 times few.In some embodiments, existing cancer cell count is than application pair after applying a kind for the treatment of of therapy It is at least 2,5,10,20 or 50 times low (such as application salt water or buffers) according to the number of rear existing cancer cell.In some realities It applies in scheme, certain methods of the invention result in 10,20,40,60,80 or 100% reduction of tumor size, and size is logical Cross standard method measurement.In some embodiments, at least 10,20,40,60,80,90 or 95% treatment subject has had Direct release, without detectable cancer cell.In some embodiments, cancer is after at least 2,5,10,15 or 20 years No longer occur or retransmits.In some embodiments, a subject is being diagnosed with cancer and with therapy of the invention Time-to-live length after treating more at least 10,20,40,60,80,100,200 or at least 500%, it is more untreated than (i) by The mean survival time amount for the subject that the mean survival time amount of examination person or (ii) are treated using another therapy.

In one aspect, the present invention describes some methods for subject's layering, is related to one kind for treating, stablizes Or the clinical test of prevention mammalian diseases or obstacle.In some embodiments, the method includes determining whether there is The copy number of first homologous chromosomal segments is overexpressed, and compared with the second homologous chromosomal segments, is described using the present invention before Any method, during or after clinical test.In some embodiments, the first homologue in receptor gene's group Subject is divided into a subgroup of clinical test by the presence or absence that segment is overexpressed.

In some embodiments, disease or obstacle are selected, from containing cancer, dysnoesia, learning disorder (such as first Nature learning disorder), baryencephalia, hypoevolutism, self-closing disease, neurodegenerative disease or obstacle, schizophrenia, physiology lack It falls into, autoimmune disease or obstacle, systemic loupus erythematosus, psoriasis, Crohn disease, glomerulonephritis, HIV infection, AIDS And combinations thereof in the set of disease.In some embodiments, disease or obstacle are selected, from contain DiGeorge syndrome, DiGeorge2 syndrome, DiGeorge/VCFS syndrome, Prader-Willi syndrome, Angelman syndrome, Beckwith-Wiedemann syndrome, 1p36 deletion syndrome, 2q37 deletion syndrome, 3q29 deletion syndrome, 9q34 are lacked Mistake syndrome, 17q21.31 deletion syndrome, Cri-du-chat syndrome, Jacobsen syndrome, Miller Dieker are comprehensive Simulator sickness, Phelan-McDermid syndrome, Smith-Magenis syndrome, WAGR syndrome, Wolf-Hirschhom are comprehensive Sign, Williams syndrome, Williams-Beuren syndrome, Miller-Dieker syndrome, Phelan-McDermid are comprehensive Simulator sickness, Smith-Magenis syndrome, Down syndrome, Edward syndrome, Patau syndrome, Klinefelter are comprehensive It levies, Tumer syndrome, 47, XXX syndromes, 47, XYY syndromes, in the set of Sotos syndrome and combinations thereof disease.One In a little embodiments, this method has determined the presence or absence of one or more following chromosome abnormalities: nullisomic, monomer, single parent two Times body, triploid match triploid, mismatch triploid, maternal triploid, parent triploid, triploidy, and four times of mosaic Body, match tetraploid, mismatch tetraploid, other aneuploids, unbalanced translocation, balanced translocation, be inserted into, missing, recombination and A combination thereof.In some embodiments, chromosome abnormality is the copy number and the chromosome of specific chromosome or chromosome segment Any deviation of the most common copy number of segment, such as in human somatic cell, and any deviation of 2 copies can be thought of as Chromosome abnormality.In some embodiments, this method determines the presence or absence of euploid.In some embodiments, it copies Number hypothesis includes one or more copy number hypothesis of single pregnancy.In some embodiments, copy number hypothesis includes polyembryony One or more copy number hypothesis of gestation, such as gemellary pregnancy is (for example, subtract with ovum or fraternal twin or naturally the double born of the same parents gone out Tire).In some embodiments, it is euploid that copy number hypothesis, which includes all fetuses in multifetation, in multifetation All fetuses are one or more tires in aneuploid (such as any aneuploid disclosed herein) and/or multifetation Youngster is that one or more fetuses are aneuploid in euploid and multifetation.In some embodiments, copy number hypothesis packet Include identical twins' (also referred to as identical twin) or fraternal twin (also referred to as double ovum twins).In some embodiments In, copy number hypothesis includes mole gestation, such as gestation completely or partially.In some embodiments, interested chromosome Segment is whole chromosome.In some embodiments, chromosome or chromosome segment are selected, and from chromosome 13 is contained, are contaminated Colour solid 18, chromosome 21, X chromosome, Y chromosome, segment and combinations thereof formed set in.In some embodiments, One homologous chromosomal segments and the second homologous chromosomal segments are the pair of homologous chromosome pieces comprising chromosome segment interested Section.In some embodiments, the first homologous chromosomal segments and the second homologous chromosomal segments are interested pair of homologous Chromosome.In some embodiments, confidence level is calculated, for CNV measurement or the diagnosis of disease or obstacle.

In some embodiments, missing is at least 0.01kb, 0.1kb, 1kb, 10kb, 100kb, 1mb, 2mb, 3mb, The missing of 5mb, 10mb, 15mb, 20mb, 30mb or 40mb.In some embodiments, missing is between 1kb between 40mp Missing, for example including 1kb to 100kb, 100kb to 1mb, 1 to 5mb, 5 to 10mb, 10 to 15mb, 15 to 20bp mb, 20 to 25mb, 25 to 30mb or 30 to 40mb.In some embodiments, a copy of chromosome segment is missing from, and one is copied Shellfish is existing.In some embodiments, two copies of chromosome segment are missing from.In some embodiments, whole What a chromosome was missing from.

In some embodiments, it repeats to be at least 0.01kb, 0.1kb, 1kb, 10kb, 100kb, 1mb, 2mb, 3mb, The repetition of 5mb, 10mb, 15mb, 20mb, 30mb or 40mb.In some embodiments, it repeats to be between 1kb between 40mp Repetition, for example including 1kb to 100kb, 100kb to 1mb, 1 to 5mb, 5 to 10mb, 10 to 15mb, 15 to 20mb, 20 to 25mb, 25 to 30mb or 30 to 40mb.In some embodiments, chromosome segment repeats one times.In some embodiments In, chromosome segment repeats to be more than one times, such as 2,3,4 or 5 times.In some embodiments, whole chromosome is duplicate. In some embodiments, a region in the first homologous fragment is missing from, same area in the second homologous fragment or Another region is duplicate.In some embodiments, at least 50 the SNV of test, 60,70,80,90,95,96,98,99 or 100% is transversional mutation rather than transition mutations.

In some embodiments, sample includes DNA and/or RNA, from (i) one or more target cells, or (ii) One or more non-target cells.In some embodiments, sample is a mixing sample of DNA and/or RNA, from one A or multiple target cells and one or more non-target cells.In some embodiments, target cell is the cell containing CNV, Such as interested missing or repetition, non-target cell are free from the cell of interested copy number variation.In some embodiments In, wherein one or more target cells are cancer cells, and one or more non-target cells are non-cancerous cells, and this method includes determining With the presence or absence of the overexpression of the first homologous chromosomal segments copy number, in the genome of one or more cancer cells.Some In embodiment, wherein one or more target cells are the identical cancer cells of heredity, and one or more non-target cells are non-cancerous Cell, the method includes determining whether there is the overexpression of the first homologous chromosomal segments copy number, in the gene of cancer cell In group.In some embodiments, wherein one or more target cells are the different cancer cell of heredity, one or more non-target Cell is non-cancerous cells, the method includes determining whether there is the overexpression of the first homologous chromosomal segments copy number, In the genome of one or more hereditary not identical cancer cells.In some embodiments, wherein sample includes dissociative DNA, is come From in the mixture of one or more cancer cells and one or more non-cancerous cells, the method includes determining whether there is The overexpression of first homologous chromosomal segments copy number, in the genome of one or more cancer cells.In some embodiments In, wherein one or more target cells are the identical fetal cells of heredity, and one or more non-target cells are parental cells, described Method includes determining whether the overexpression of the first homologous chromosomal segments copy number, in the genome of fetal cell.? In some embodiments, wherein one or more target cells are the different fetal cell of heredity, one or more non-target cells Parental cells, the method includes determining whether there is the overexpression of the first homologous chromosomal segments copy number, at one or In the genome of multiple different fetal cells of heredity.Because the cell of most of individuals contains one group of almost the same core DNA, term " target cell " can be used interchangeably with term " target cell ", in some embodiments.Cancer cell, which has, is different from place The genotype of main individual.In this case, cancer itself is considered an individual.In addition, many cancers are heterogeneous , it is meant that the different cells in a tumour are genetically different from other cells in same tumour.In such case Under, different hereditary same areas is considered different individuals.Alternatively, cancer, which is considered one, has difference The single individual of the mixing with cells of genome.In general, non-target cell is euploid, although being not necessarily such case.

In some embodiments, sample is obtained from maternal whole blood sample or its ingredient blood sample, maternal blood sample Middle isolated cell, amniocentesis sample, fetus sample, placenta tissue sample, chorionic villus sample, placenta membrane sample, palace Neck stick liquid sample, or the sample from fetus.In some embodiments, sample includes the blood sample or ingredient from mother The dissociative DNA obtained in blood sample.In some embodiments, sample includes from the mixture of fetal cell and parental cells The core DNA of acquisition.In some embodiments, sample is obtained from the mother for containing karyocyte (being enriched in fetal cell) A part of this blood.In some embodiments, sample is divided into multiple portions (such as 2,3,4,5 or more parts), often A part is analyzed, uses method of the invention.If it is (such as one or more interested that each part generates identical result CNV presence or absence), then result confidence level increase.Generate different as a result, sample can be by again in different parts Analysis can collect another sample and be analyzed from same subject.

Subject exemplary includes mammal, such as people and the interested mammal of veterinary science.In some implementations In scheme, mammal is primate (such as people, monkey, gorilla, ape, mongoose lemur etc.), ox, horse, pig, dog or cat.

In some embodiments, any method includes generating a report (such as written or electronic report), and disclosure is originally The result (such as a missing or duplicate presence or absence) of the method for invention.

In some embodiments, any method includes taking a clinical evolution, based on a kind of method of the invention As a result (such as a missing or duplicate presence or absence).In some case study on implementation, one of embryo or fetus possess sense The one or more polymorphisms or mutation (such as CNV) of interest, it is based on the method for the present invention as a result, clinical evolution includes carry out volume Outer test (such as the presence tested to confirm polymorphism or mutation), is not implanted into the embryo of in-vitro fertilization (IVF), is implanted into in-vitro fertilization (IVF) Different embryos, terminal pregnancy is prepared for the child of special requirement, or is carried out an intervention and be intended to reduce genetic disease table The severity that type occurs.In some embodiments, clinical evolution is selected from a set, comprising carrying out ultrasound, fetus Amniocentesis inherits the amniocentesis of the subsequent fetus of inhereditary material, the chorion suede of fetus from mother and/or father Knitting inspection, the chorionic villus biopsy of the subsequent fetus of inhereditary material is inherited from mother and/or father, it is in vitro fertilization, to from One or more embryos that mother and/or father inherit inhereditary material be implanted into preceding genetic diagnosis, the karyotyping of mother, The karyotyping of father, Study of Fetal Echocardiography (such as with 21,18 or trisomy 21, monomer X or micro-deleted fetus it is super Sound cardiogram) and combinations thereof.In some embodiments, clinical evolution is selected from a set, including giving with monosomy X Children born applies growth hormone (such as starting to apply at about 9 months), applies calcium to the children born lacked with 22q (such as DiGeorge syndrome), to the children born application androgen such as testosterone with 47, XXY (such as to baby or child Mensal injection 3 months 25mg testosterone heptanoate), to having complete or partial mole of gestation (such as triploid Fetus) women carry out cancer test, apply cancer to the women with complete or partial mole of gestation (such as triploid fetus) Disease treats such as chemotherapeutics, and the fetus (such as the fetus for being determined as male using method of the invention) that screening is determined as male is right In one or more X- linkage inheritance diseases such as Du Shi muscular dystrophy (DMD), adrenoleukodystrophy or blood friend Disease, carries out amniocentesis to the male fetus in the chain disease risks of X, is in adrenal,congenital hyperplasia risk to nourishing The women of female child (such as the fetus for being determined as women using method of the invention) apply dexamethasone, in congenital Property adrenal hyperplasia risk female child carry out amniocentesis, to 22q 11.2 lack immune deficiency children born apply Inactivated vaccine (rather than live vaccine) does not apply certain vaccines, carry out occupation and/or physical therapy, carries out in education early Phase intervenes, and delivers a child baby in the tertiary care centre with NICU and/or the paediatrics specialist for having license of delivering a child, to children born (such as children of XXX, XXY or XYY) carry out behavior intervention, and combinations thereof.

In some embodiments, ultrasonic or another Screening tests are performed, and are confirmed as to one with multifetation The women of (such as twins), to determine whether two or more fetuses are single villus.Identical twin is female thin by single ovum The ovulation and fertilization of born of the same parents generates, egg division of being then fertilized；Placenta may be double chorions or single chorion.Double ovum twins from The ovulation and fertilization generation of two egg mother cells, typically result in dichorial placenta.Identical twin has twins defeated The risk of blood syndrome, the blood that may cause between fetus are unevenly distributed, and cause the difference of their growth and development, sometimes Cause stillborn foetus.Therefore, it is needed using the twins that method of the invention is determined as identical twin tested (such as by super Sound) to determine whether they are identical twins, if it is, these twins can be monitored (such as from 16 weeks Double Zhou Chaosheng) twins' transfusion syndrome sign.

An embryo or fetus are without containing interested one or more polymorphisms or mutation (example in some embodiments Such as CNV), it is based on the method for the present invention as a result, clinical evolution includes being implanted into the embryo of in-vitro fertilization (IVF) or continuing pregnant.In some realities It applies in scheme, it includes carrying out selected from a set that clinical evolution, which is additional test to confirm there is no polymorphism or mutation, Ultrasound, amniocentesis, chorionic villus biopsy and combinations thereof.

An individual has one or more polymorphisms or mutation (such as such as with disease or obstacle in some embodiments Cancer is relevant or relevant to the increase risk of disease or obstacle such as cancer polymorphism or mutation), based on the method for the present invention As a result, clinical evolution include disease or obstacle are carried out additional test or the one or more therapies of application (such as treatment of cancer, The treatment or any treatment disclosed herein for the mutation type that specific type or diagnosis of case for cancer go out).In some realities It applies in scheme, clinical evolution is additional test to confirm the presence or absence of polymorphism or mutation, includes selected from one group of set Biopsy, operation, medical imaging (such as Mammogram or ultrasonic wave) and combinations thereof.

In some embodiments, additional test includes executing identical or different method (such as described herein Where method) to confirm polymorphism or be mutated the presence or absence of (such as CNV), such as test same test sample or same individual The second part of (such as identical pregnant woman, fetus, embryo or the individual for increasing risk with cancer) different samples.In some realities Apply in scheme, additional test is performed, for a polymorphism or mutation (such as CNV) a possibility that higher than threshold value individual For (such as confirming possible polymorphism or the existing additional test of mutation).In some embodiments, additional survey Examination is performed, for the individual that the confidence level of a polymorphism or mutation (such as CNV) or z value are higher than threshold value (such as volume Outer test is to confirm that there are possible polymorphism or mutation).In some embodiments, additional test is performed, for One polymorphism or to be mutated the individual of the confidence level or z value of (such as CNV) between minimum and maximum threshold value (such as additional Test is to increase the correct confidence level of initial results).In some embodiments, additional test is performed, for one Determine polymorphism or be mutated (such as CNV) presence or absence confidence level lower than threshold value individual for (such as " noncall " as a result, Because the presence or absence of CNV can not be determined with effective confidence level).One exemplary Z value is calculated, and is delivered in Chiu et al. Document BMJ 2011；In 342:c7401 (it is fully incorporated by reference herein), wherein No. 21 chromosomes are used as an example Son, and can be replaced with the chromosome of any other in tested sample or chromosome segment.

Test Z value=((percentage of No. 21 chromosome in test case) one of the percentage of No. 21 chromosome in case (average percent of No. 21 chromosome in reference pair photograph))/(standard deviation of the percentage of No. 21 chromosome in reference pair photograph).

In some embodiments, additional test is performed, and does not meet quality control guide or tool for initial sample There are fetus score or tumour score to be lower than the individual of threshold value.In some embodiments, the method includes selecting an individual It is based on method of the invention as a result, a possibility that result for additionally testing, confidence level or z value as a result；And to a Body is additionally tested (such as on identical or different sample).In some embodiments, disease or barrier are diagnosed with Hinder the subject of (such as cancer) to carry out retest at multiple time points, uses method of the invention or known for disease The test of disease or obstacle, to monitor the alleviation or recurrence of the progress or disease or obstacle of disease or obstacle.

In one aspect, the present invention describes a result report (such as written or electronic report), from the present invention A kind of method (such as missing or duplicate presence or absence).

In various embodiments, primer extension reaction or polymerase chain reaction include by polymerase addition one or Multiple nucleotide.In some embodiments, primer is in the solution.In some embodiments, primer is not in the solution and It is fixed on solid support.In some embodiments, primer is not a part of microarray.In various embodiments, Primer extension reaction or polymerase chain reaction do not include ligation-mediated PCR.In various embodiments, primer extension reaction or Polymerase chain reaction does not include connecting two primers by ligase.In various embodiments, primer does not include that connection is anti- To probe (LIPs), the probe being also referred to as cyclized in advance, pre-cyclization probe, circularizing probes, padlock-probe or molecule are reversed Probe (MIP).

It is reported that the aspect and embodiment of invention as described herein include any two or many aspects or reality of the invention Apply the combination of scheme.

Definition

Single nucleotide polymorphism (SNP) refers to monokaryon glycosides that may be different between the genome of two members of same species Acid.The use of the term is not construed as any restrictions for the frequency that each variant occurs.

Sequence refers to DNA sequence dna or gene order.It can refer to the DNA molecular of individual or primary structure, the physical structure of chain. It can refer to the nucleic acid sequence found in DNA molecular, or refer to the nucleic acid sequence found on DNA molecular complementary strand.He may be used also To refer to the including information represented in DNA molecular as its biology (in silico)

Site refers to the region to cherish a special interest on individual DNA, this can refer to single nucleotide polymorphism (SNP), may Insertion or the site deleted or the site that corresponding genetic mutation may occur.Related this of disease can refer to mononucleotide Polymorphism (SNP) can also be referred to as disease related locus.

Polymorphic allele is also referred to as " polymorphic site ", refers to a kind of allele or site, In these allele or site, there is variation in same kind of interindividual genotype.Polymorphic allele Some examples include single nucleotide polymorphism, Short tandem repeatSTR, missing, duplication and inversion.

Polymorphic site refers to the specific nucleosides found in changed polymorphic regions between Different Individual.

Mutation refers to the variation occurred in naturally occurring nucleic acid sequence or reference nucleic acid sequence, such as be inserted into, delete, Duplication, displacement, replacement, frameshift mutation, silent mutation, nonsense mutation, missense mutation, point mutation, sharp transition, transversional mutation, Inverse transition or microsatellite alteration.In some embodiments, by the amino acid sequence of nucleic acid sequence encoding from naturally occurring With the change of at least one amino acid in sequence.

Allele refers to the gene for occupying specific gene site.

Genetic data is also referred to as " gene data ", refers to description one or more than one genes of individuals group various aspects Data.It can refer to one or perhaps full sequence chromosome dyad or all dyeing of one group of site, partial sequence Body or whole gene group.It can refer to the consistency of one or some nucleotide；It can refer to one group of continuous nucleosides Acid or the nucleotide from genome different loci or its combination.Genetic data is usually typical biology vocabulary, but It is that he is it is also possible to be considered as with certain tactic actual nucleosides, thus the genetic coding data of chemistry.Hereditary number According to " on individual " can be referred to as, " individual ", " be located at individual place ", " from individual " or " on individual ".Genotype Data can refer to the output measurement result from genotyping platform, and wherein those measurements are carried out to inhereditary material.

Inhereditary material is referred to also as " genetic sample ", refers to from one or more individual including DNA or RNA Actual substance, such as tissue or blood.

Confidence level refers to the copy that the SNP, allele, one group of allele, chromosome or chromosome segment determine It is several, or the statistics likelihood of individual breeding true state representated by presence or the diagnosis there is no certain disease.

Ploidy interpretation is also referred to as " chromosomal copy number interpretation " or " copy number interpretation " (CNC), can refer in measurement cell The behavior of the quantity and/or chromosome consistency of existing one or more chromosomes or chromosome segment.

Aneuploidy refer in cell there are the chromosome of number of errors (for example, the complete chromosome of number of errors or The chromosome segment of number of errors, such as there are the missing of chromosome segment or duplications) state.The human body cell the case where Under, it can refer to the case where cell is free of 22 pairs of autosomes and a pair of of sex chromosome.In the case where people's gamete, it can refer to Cell is free of the case where one in 23 chromosomes.In the case where single chromosome type, it can refer to wherein in the presence of more In or less than two homologous but inconsistent chromosome copies, or in which there are two chromosome copies for being originated from same parent Situation.In some embodiments, the missing of chromosome segment is micro-deleted.

Ploidy state refers to that the quantity and/or chromosome of one or more chromosomes or chromosome segment in cell are consistent Property.

Chromosome can refer to single chromosome copies, refer to that there are 46 single DNA moleculars in normal somatic cell； One example is ' derived from No. 18 chromosomes of parent '.Chromosome can also refer to chromosome type, in normal human body cell There are 23 chromosome types；One example is ' No. 18 chromosomes '.

Chromosome consistency can refer to reference to chromosome quantitative, i.e. chromosome type.The normal mankind have 22 seed types Numbered autosome type and two kinds of sex chromosome.It can also refer to the parental source of chromosome.It may be used also To refer to the specific chromosome from parent's heredity.It can also refer to other attributive character of chromosome.

Allele data refer to one group of genotype data about one or more allele groups.It can specify phase Haplotype data.It can refer to single nucleotide polymorphism (SNP) consistency, and it can refer to the sequence data of DNA, including insert Enter, lack, repeating and being mutated.It may include the parental source of each allele.

Allele status refers to the virtual condition of the gene in one or more allele groups.It can refer to and pass through The virtual condition of the gene of position gene data description.

Allele counts the quantity for referring to the sequence for being mapped to specific gene site, and if the gene loci is Polymorphism, then it refers to the quantity for the sequence being mapped in each allele.If in a binary fashion to each Allele is counted, then allele counting will be integer.If in terms of being carried out to allele by probabilistic manner Number, then allele counting can be percentage.

Allele counts probability and refers to one that may be mapped to specific gene site or be mapped at polymorphic site The quantity of the sequence of group allele, combines with mapping probabilities.It should be noted that when each counting sequence mapping probabilities be two into When (zero or one) of system, allele counting is equivalent to allele and counts probability.In some embodiments, allele meter Number probability can be binary.In some embodiments, allele, which counts probability, can be set equal to DNA measurement As a result.

Existing for allele distributions or ' allele count distribution ' refer at each site in one group of locus The relative quantity of each allele.Allele distributions can refer to individual, sample or the one group of measurement carried out to sample.For example In the digital allele measurement of sequencing, allele distributions refer to each allele being mapped in one group of polymorphic locus The numerical value of the reading of the specific allele at place may numerical value.In the simulation allele measurement of such as SNP array, equipotential Gene distribution refers to allele intensity and/or allele ratio.Allele measurement result can with probabilistic manner into Row processing, that is to say, that be point between 0 and 1 for specifying the likelihood of allele in specified sequence reading presence Number, alternatively, they can be handled by binary mode, that is to say, that any specified reading is considered precisely specific etc. Zero or one copy of position gene.

Allele distributions mode refers to a different set of allele for background (for example, different parent's backgrounds) Distribution.Certain allele distributions modes can indicate certain ploidy states.

Allele deviation refers to the ratio and initial DNA or RNA sample in the allele of heterozygous genes site measurement In the presence of ratio difference degree.Allele extent of deviation at specific site is equal to be seen at the gene loci The allele ratio (such as measured) observed divided by DNA initial on this site or RNA sample allelic ratio. Allele deviation may be due to amplification deviation, purifying deviation or in different ways to influence some other of not iso-allele Phenomenon.

Allele imbalance refers to, for SNV, usually using mutation allele frequency (the equipotential base of mutation Because of number of sites/total site allele sum) ratio of the abnormal DNA of measurement.Due to tumour two homologue quantity it Between difference be it is similar, we measure the ratio of exception DNA in CNV by average allele uneven (AAI), are defined as | (H1H2) |/(H1H2), wherein Hi is homologue I copy number average value in sample, and Hi/ (H1+H2) indicates that homologue I's is rich Spend score or homologous ratio.Maximum homology is the homology of more abundant homologue.

Detection Loss Rate refers to single nucleotide polymorphism (SNP) percentage not read, more with whole mononucleotides State property (SNP) estimation.

Monoallelic loses (ADO) rate and refers to single nucleotide polymorphism existing for only one allele (SNP) Percentage only uses heterozygosis SNP estimation

Primer, also referred to as " PCR probe " refer to monokaryon acid molecule (such as DNA molecular or DNA oligomer) or nucleic acid point The set of sub (such as DNA molecular or DNA oligomer), wherein the molecule is consistent or almost consistent, and wherein Primer contains a region, which is designed to hybridize to target site (for example, targeting polymorphic site or non-polymorphic Property site) or hybridize to it is common cause sequence, and include an initiation sequence, which is designed that PCR amplification.Primer can also contain molecular barcode.Primer can containing for each individually molecule it is different with Machine region.

Primed libraries refer to the group of two or more primers.In various embodiments, the library includes at least 100、200、500、750、1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、 50,000,75,000 or 100,000 different primers.In various embodiments, the library include at least 100,200, 500、750、1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75, 000 or 100,000 different primer pair, wherein each pair of primer includes positive test primer and negative testing primer, wherein often To test primer hybridization a to target site.In some implementation embodiments, primed libraries include at least 100,200, 500、750、1,000、2,000、5,000、7,500、10,000、20,000、25,000、30,000、40,000、50,000、75, 000 or 100,000 respectively hybridizes to the different independent primers in different target site, wherein the independent primer is not primer Pair a part.In some embodiments, it is not the list of a part of primer pair that the library, which has (i) primer pair and (ii), Only primer (such as universal primer).

Different primers refers to different primer.

Different libraries refers to different library.

Different target sites refers to different target site.

Different amplicons refers to different amplicon.

Hybrid capture probe refers to any nucleic acid sequence that can be modified, and the nucleic acid sequence is for example, by PCR or straight It the various methods such as is bonded into generate, and is intended to complementary with a chain of the specific targets DNA sequence dna in sample.It can be to system Exogenous hybrid capture probe is added in standby sample and by denaturation-reannealing process hybridization to form exogenous-endogenous The double helix of segment.These double helixs may then pass through various means and physically separate with sample.

Sequence reads refer to the data for indicating the nucleotide base sequence using the measurement of (for example) clone sequencing.Clone surveys Sequence, which can produce, to be indicated single part of initial DNA molecular or clones or the sequence data of cluster.Sequence reads can also be in sequence There is relevant mass fraction, which indicates nucleotide by the probability of correctly interpretation at each base positions of column.

Sequence of mapping reading is the process of the source position of sequence reads in the genome sequence for measure specific organism.Sequence The source position of reading is based on the similarity of the nucleotide sequence of reading and genome sequence.

It matches copy errors and is also referred to as " matching chromosomal aneuploidy " (MCA), refer to that a cell contains two unanimously Or the aneuploid state of almost consistent chromosome.Such aneuploidy can appear in gamete in meiosis During formation, and meiosis can be referred to as and do not separate mistake.Such mistake can appear in mitosis. Matching trisomy can refer to that there are two in the specified chromosome of three copies and the copy to be consistent in individual Situation.

Unmatched copy errors are also referred to as " unique chromosomal aneuploidy " (UCA), refer to that a cell contains and come From the aneuploid state of same two chromosome of parent, they can be homologous but inconsistent.Such non-multiple During property can appear in meiosis, and meiosis mistake can be referred to as.Unmatched trisomy can refer to a Two in specified chromosome and the copy copied in body there are three are from same parent and are homologous but different The case where cause.It should be noted that unmatched trisomy can refer to wherein exist two homologues from a parent and Wherein some sections of the chromosome are consistent and other sections are only homologous situation.

Homologue refers to the chromosome copies containing the same group of gene usually matched during meiosis.

Consistent sex chromosome refers to that they have consistent or almost consistent containing with group gene and about each gene With the chromosome copies of group allele.

Allelic loss (ADO) refers to that at least one base-pair in one group of base-pair from homologue is referring to Determine the case where can't detect at allele.

It loses (LDO) and refers to two base-pairs in one group of base-pair from homologue in specified equipotential base in site The case where can't detect because of place.

Homozygosis refers to similar allele as corresponding chromosomal foci.

Heterozygosis refers to different allele as corresponding chromosomal foci.

Heterozygosis rate refers to the ratio of the individual in group at specified site with Heterozygous alleles.Heterozygosis rate can be with Refer to the allele ratio for expecting or measuring at the specified site in individual or DNA or RNA sample.

Chromosomal region refers to the section or complete chromosome of chromosome.

Chromosome segment refers to that magnitude range can be the chromosomal section from a base-pair to whole chromosome.

Chromosome refers to segment or the part of complete chromosome or chromosome.

Copy refers to the copy number of chromosome segment.It can refer to chromosome segment consistency copy or inconsistency, Homologous copies, wherein the different of chromosome segment copy containing one group of essentially similar site, and in its allelic One or more be different.It should be noted that under some cases of aneuploidy, such as M2 copy errors, it is possible to specified dyeing Some copies of body segment are consistent and some copies of identical chromosome segment are inconsistent.

Haplotype refers to the combination of the allele on multiple sites of usual coinheritance on same chromosome.According to one The quantity for the recombination event having occurred and that between the specified site of group, haplotype can only refer to as little as two sites, or refer to entire dye Colour solid.Haplotype can also refer to one group of single nucleotide polymorphism (SNP) on the relevant single chromatid of statistics.

Haplotype data is also referred to as " determining phase data " or " orderly genetic data ", refers to from diploid or polyploid gene The data of single chromosome in group, that is, the separated maternal or male parent copy of the chromosome in diploid gene group.

Determine mutually to refer to the haplotype genetic data for measuring individual in view of unordered diploid (or polyploid) genetic data Behavior.It can determine two genes at allele with pointer to the one group of allele found on item chromosome Which of behavior relevant to each in two homologues in individual.

Determine phase data and refers to the genetic data it has been determined that one or more haplotypes.

Assuming that referring to a kind of possible state, such as the first homologue or chromosome segment and the second homologous dyeing Body or chromosome segment are compared, the possibility degree that copy number is overexpressed, a possibility that deletion, a possibility that repetition, one group give Possible ploidy state in fixed one or more than one chromosome or chromosome segment, at one group specified one or one Possible allele status in a above site, parent's relationship possibility or one group given one or one with Possible DNA, RNA, fetus percentage on upper chromosome or chromosome segment or inhereditary material amount from one group of site. The genetic state property of can choose is connected with probability, illustrate to assume in kinship possibility and its in assuming between each element Kinship between his element is truer, or the kinship possibility assumed is entirely correct.This group of possibility can To include one or more elements.

Copy number assumes that the chromosome also referred to as " ploidy state hypothesis " referred to about in individual or chromosome segment are copied The hypothesis of shellfish number.It is that it can also refer to the identity about each in chromosome it is assumed that including that the parent of every chromosome comes Which item in source and two parentals set of chromosome is present in individual.It can also refer to about which dyeing from related individuals Body or chromosome segment (if present) genetically correspond to the hypothesis of the specified chromosome of individual.

Related individuals refer to and therefore shared haploid any individual genetically related to target individual.One In the case of kind, related individuals can be the gene parent of target individual or any inhereditary material from parent, such as sperm, Polar body, embryo, fetus or child.It can also refer to siblings, parent or grand parents.

The identical any individual of individual that siblings refer to its gene parent and discussed.In some embodiments, It can refer to bear child, embryo or fetus, or from gone out to bear child, one or more cells of embryo or fetus. Siblings can also refer to the individual of the monoploid from one side of parent, such as sperm, polar body or any other group of haplotype heredity Substance.Individual is considered the siblings of its own.

Child can refer to embryo, blastomere or fetus.It should be noted that in disclosed embodiments of the present invention, the concept Be applied equally well to as gone out to bear child, fetus, embryo or individual from one group of cell therein.Term child's Using can simply mean that the individual referred to as child is the hereditary offspring of parent.

Fetus refers to " fetus " or " genetically similar to the placenta region of fetus ".In pregnant woman, placenta it is certain Part is genetically similar to fetus, and the foetal DNA of the free floating found in maternal blood is probably derived from placenta The part to match with fetus genotype.It should be noted that the hereditary information of a hemichromosome is the mother of heredity from fetus in fetus. It in some embodiments, is considered as " fetal origin from the DNA of the chromosome from fetal cell of these maternal inheritances ", rather than " maternal source ".

The DNA of fetal origin refers to the DNA of its genotype cell initial protion substantially equal with fetus genotype.

The DNA in maternal source refers to the DNA of its genotype cell initial protion substantially equal with maternal gene type.

Parent refers to science of heredity mother or father of individual.There are two parent (maternal and male parents) for individual usually tool, still Situation may be not necessarily in this way, for example in gene or chromosomal mosaic.Parent is considered individual.

Parent's content refers to each in one or both two relative chromosomes in two parents of target On, specified single nucleotide polymorphism (SNP) genetic state.

Maternal blood plasma refers to the blood plasma fractions of the blood from pregnant female.

It is clinical to determine to refer to that any of action for taking or not taking the result with the health or survival for influencing individual determines It is fixed.Clinic determines to refer to the decision for continuing test, refers to termination or maintain the decision of pregnancy, refer to and take action to subtract The decision that the decision of light undesirable phenotype or the phenotype thus that takes action to are prepared.

Diagnosis box refers to a machine of the one or more aspects for being designed to execute method disclosed herein The combination of device or machine.In one embodiment, diagnosis box can be placed on patient care point.In one embodiment, it examines Disconnected box can execute targeting and expand and then be sequenced.In one embodiment, diagnosis box can be individually or by means of technician It works.

Referred to based on the method for information and is largely dependent upon statistics to understand the method for mass data.Antenatal In the case where diagnosis, it refers to the side for being designed to determine one or more chromosome or chromosome segment ploidy state Method, the method for determining allele status at one or more allele, the or (example in given a large amount of genetic datas Such as, the genetic data from molecular array or sequencing), most probable state is intervened by statistics and determines parent child relationship, without It is the method that direct physics measurement state determines parent child relationship.In one embodiment of the invention, the technology based on information can To be the technology disclosed in this patent.In one embodiment of the invention, it can be PARENTA LSUPPORT^TM

Original genetic data refers to the analog intensity signal exported by genotyping platform.In the case where SNP array, Original genetic data refers to the strength signal before carrying out any genotype interpretation.In the case where sequencing, original heredity number According to the analogue measurement referred to similar to chromatogram as a result, it has been reflected before the identity for measuring any base-pair and in sequence Sequenator is completed before being mapped to genome.

Secondary genetic data refers to the processed genetic data exported by genotyping platform.In the feelings of SNP array Under condition, secondary genetic data refers to the allele interpretation carried out by software relevant to SNP array reader, wherein described The interpretation that specified allele is present or not present in sample has been made in software.In the case where sequencing, secondary heredity Data refer to the base-pair identity it has been determined that sequence, and may also refer to that of genome be the sequence have been mapped to Place.

The priority enrichment of the priority enrichment of DNA corresponding to site or the DNA at gene loci refer to promote enrichment after High percentage in DNA mixture corresponding to the DNA molecular of the gene loci corresponds to described before being enriched in DNA mixture Any method of the percentage of the DNA molecular of locus.The method can be related to selective amplification corresponding to gene loci DNA molecular.The method can be related to the DNA molecular that removal does not correspond to locus.The method can be related to method combination. Degree of enrichment is defined as corresponding in mixture after being enriched with the percentage of the DNA molecular of the locus divided by mixture before being enriched with In correspond to the site DNA molecular percentage.Priority enrichment can execute at multiple locus.Of the invention one In a little embodiments, degree of enrichment is greater than 20.In some embodiments of the invention, degree of enrichment is greater than 200.In some realities of the invention It applies in example, degree of enrichment is greater than 2,000.When executing in the priority enrichment of multiple locus, degree of enrichment can refer in locus group The average enrichment of all locus.

Amplification refers to the method for increasing the copy number of DNA or RNA molecule.

Selective amplification, which can refer to, increases specific DNA (either RNA) molecule or corresponding to the region specific DNA (or RNA) DNA (or RNA) molecule copy number method.It, which can also refer to, increases specific targeting DNA (or RNA) molecule or targeting The copy number in the region DNA (either RNA) and be more than increase non-targeted molecule or the region DNA (or RNA) method.Selectivity Expand the method that can be priority enrichment.

It is general cause sequence refer to can for example by engagement, PCR or engagement mediate PCR be attached to target dna (or Person RNA) molecular population DNA (or RNA) sequence.After being added to target molecule group, there is spy to general initiation sequence Anisotropic primer can expand target group to use pair for amplification primer.General initiation sequence usually and target sequence without It closes.

General aptamer or ' engagement aptamer ' or ' library label ' are containing can be covalently attached to target double chain DNA molecule group 5 ' and 3 ' end general initiation sequences DNA molecular.5 ' and 3 ' ends for being added to target group of aptamer provide general draw Sequence is sent out, pair for amplification primer can be used from the general initiation sequence, PCR amplification occurs, own to from target group Molecule is expanded.

Targeting refers to for corresponding to one group of gene in selective amplification or priority enrichment DNA (or RNA) mixture The method of those of seat DNA (or RNA) molecule.

Joint Distribution model refers to the model for defining the probability of happening, and the event is defined about multiple stochastic variables, The multiple stochastic variables defined on identical probability space are specified, wherein the probability of variable is chain.In some embodiment party In case, the not chain degeneracy situation of the probability of variable can be used.

Cancer related gene refers to a gene relevant to the prognosis of cancer of risk of cancer or change changed.Example The gene relevant to cancer that can promote tumour of property includes oncogene；Promote proliferation, invasion and the base of transfer of cell Cause；Inhibit the gene of apoptogene；With the gene of Angiogensis.The cancer related gene of cancer is inhibited to include, but are not limited to Tumor suppressor gene；Inhibit the gene of cell Proliferation, invasion or transfer；Promote the gene of Apoptosis；With anti-angiogenesis base Cause

The relevant cancer of estrogen refers to a kind of cancer adjusted by estrogen.The example packet of the relevant cancer of estrogen It includes, is not limited to, breast cancer and oophoroma.HER2 is in many estrogen relevant cancer (U.S. Patent No. 6165464, by drawing Card be fully incorporated herein herein) in overexpression.

The relevant cancer of androgen refers to a kind of cancer adjusted by androgen.One example of cancer relevant to androgen Son is prostate cancer

Refer to that the expression of mRNA or albumen is higher than control group (such as without disease or illness, such as higher than normal expression level Cancer) corresponding molecule Average expression level.In various embodiments, expression is at least higher than the expression of control group 50,40,75,90,100,200,500, even 1000%.

Lower than the expression that normal expression level refers to mRNA or albumen lower than control group (such as without disease or illness, Such as cancer) corresponding molecule Average expression level.In various embodiments, expression is at least than the expression water of control group Put down low 20,40,50,75,90,95 or 100%.In some embodiments, the expression of mRNA or protein is undetectable 's.

Adjust expression or activity refer to relative to the expression for increasing or decreasing protein or nucleic acid sequence referring to condition or Activity.In some embodiments, expression or it is active adjusting be increase or reduce at least 10,20,40,50,75,90, 100,200,500 or even 1000%.In various embodiments, treatment method adjusting transcription, translation, mRNA or protein Stability or mRNA or protein and the in vivo combination of other molecules.In some embodiments, it is printed using standard Northern It scores and analyses determining mRNA level in-site, and analyzed with the standard Western marking and determine protein level, analyze as described herein Or in such as Ausubel et al. (Current Protocols in Molecular Biology (molecular biosciences at present Scheme), John Wiley&Sons, New York is incorporated herein on July 11st, 2013 herein by citation) described in.? In one embodiment, enzyme activity level is measured by using standard method to determine the level of protein.It is preferred at another In embodiment, mRNA, albumen or enzyme activity level be equal to or less than 20,10,5 or 2 times of respective horizontal in reference cell with On, the functional form of the albumen is not expressed, for example, the cell homozygote of nonsense mutation.In still another embodiment In, mRNA, albumen or enzyme activity level are equal to or less than 20,10,5 or 2 times or more of the corresponding basic horizontal of reference cell, institute Reference cell such as non-cancerous cells is stated, inducing cell abnormality proliferation is not contacted or inhibits the cell of the environment of Apoptosis, or The cell of patient from the disease or exception that do not have care.

The expression or active dosage that are enough regulating mRNA or protein refer to a kind of amount for the treatment of, are administered when to theme When, this amount can increase or decrease the expression or activity of mRNA or albumen.In some embodiments, the table for that can reduce Reach or active compound, the adjusting be compared with identical main body is before be administered inhibitor, or be not treated It is compared referring to main body, the intracorporal expression of treated master or activity reduction at least 10%, 30%, 40%, 50%, 75%, or 90%.In addition, in one embodiment, for can increase expression or active compound, mRNA in treated main body Either protein expression perhaps active amount compared with identical main body is before being administered inhibitor or with not treated reference Main body is compared, and at least increases by 1.5 times, 2 times, 3 times, 5 times, 10 times or 20 times.

In some embodiments, compound can directly or indirectly regulating mRNA or protein expression or activity. For example, compound can be by can directly or indirectly influence mRNA or protein expression of concern or active adjusting The expression or activity of molecule (such as nucleic acid, albumen, signaling molecule, growth factor, cell factor or chemotactic factor (CF)), are adjusted indirectly The mRNA and protein expression interest or activity of section, directly or indirectly affect the expression or activity of the mRNA and albumen of interests. In certain embodiments, compound inhibits cell division or induces cell apoptosis.These compounds may include in the treatment, For example, not purifying or purifying protein, antibody, the organic molecule of synthesis, naturally occurring organic molecule, nucleic acid molecules and its group Point.Compound in combination therapy can be simultaneously or sequentially administered.Exemplary compounds include signal transduction inhibitor.

Purifying refers to separates a certain component from its original adjoint component.Under normal conditions, when a factor Protein, antibody are not contained from weight at least 50%, and its original adjoint natural organic molecules are that this factor is base It is pure in sheet.In some embodiments, factor purity in weight at least accounts for 75%, 90% or 99%.One basic The upper pure factor can be obtained by chemical synthesis, separated and obtained from the natural factor, or from do not generate originally this because It is produced in the recombinant cell of the host cell of son.Standard technique protein purification and small can be used in those of ordinary skill in the art Molecule, as Ausubel et al. (Current Protocols in Molecular Biology, John Wiley&Sons, New York, July 11,2013 is fully incorporated herein herein by citation).In some embodiments, using polyacrylamide Amine gel electrophoresis, column chromatography, spectrodensitometry, efficient liquid phase chromatographic analysis or the western marking analyze (Ausubel Et al., ibid) described 2,5,10 times at least purer than starting material of the factor of measurement.Illustrative purification process includes immune heavy It forms sediment, column chromatography (for example, immunoaffinity chromatography), magnetic bead immunoaffinity purification, and translation and plate binding antibody.

From being described in detail below in claims, other features and advantages of the invention are become apparent.

Detailed description of the invention

Patent or application documents include an at least cromogram.This patent or Patent Application Publication with color drawings Copy will be provided by office, according to requesting and pay necessary expense.

Presently disclosed embodiment will be explained further by the reference of attached drawing, wherein the identical knot in several views Structure is indicated by the same numbers.Shown in attached drawing it is not necessarily to scale, emphasis is usually not placed, current according to explanation The principle of open embodiment.

Figure 1A -1D shows that the distribution of test statistics S is false divided by the various copy numbers for being 500 for reading depth (DOR) If T (quantity of SNP) (" S/T ") and tumour score be 1%, for more and more single nucleotide polymorphism (SNP) come It says.

Fig. 2A -2D shows the distribution of S/T, is 2% for the DOR various copy number hypothesis for being 500 and tumour score, For more and more single nucleotide polymorphism (SNP).

Fig. 3 A-3D shows the distribution of S/T, is 3% for the DOR various copy number hypothesis for being 500 and tumour score, For more and more single nucleotide polymorphism (SNP).

Fig. 4 A-4D shows the distribution of S/T, is 4% for the DOR various copy number hypothesis for being 500 and tumour score, For more and more single nucleotide polymorphism (SNP).

Fig. 5 A-5D shows the distribution of S/T, is 5% for the DOR various copy number hypothesis for being 500 and tumour score, For more and more single nucleotide polymorphism (SNP).

Fig. 6 A-6D shows the distribution of S/T, is 6% for the DOR various copy number hypothesis for being 500 and tumour score, For more and more single nucleotide polymorphism (SNP).

Fig. 7 A-7D shows the distribution of S/T, is for the DOR various copy number hypothesis for being 1000 and tumour score 0.5%, for more and more SNP.

Fig. 8 A-8D shows the distribution of S/T, is 1% for the DOR various copy number hypothesis for being 1000 and tumour score, For more and more SNP.

Fig. 9 A-9D shows the distribution of S/T, is 2% for the DOR various copy number hypothesis for being 1000 and tumour score, For more and more SNP.

Figure 10 A-10D shows the distribution of S/T, is for the DOR various copy number hypothesis for being 1000 and tumour score 3%, for more and more SNP.

Figure 11 A-11D shows the distribution of S/T, is for the DOR various copy number hypothesis for being 1000 and tumour score 4%, for more and more SNP.

Figure 12 A-12D shows the distribution of S/T, is for the DOR various copy number hypothesis for being 3000 and tumour score 0.5%, for more and more SNP.

Figure 13 A-13D shows the distribution of S/T, is for the DOR various copy number hypothesis for being 3000 and tumour score 1%, for more and more SNP.

Figure 14 is a table, the sensitivity and specificity of instruction 6 kinds of microdeletion syndromes of detection.

Figure 15 A-15C is the diagram of euploid.X-axis indicates linear position of the individual polymorphic site along chromosome, y-axis The number for indicating A allele reading, a part as total (A+B) allele reading.Maternal and fetus genotype quilt Instruction is on the right side of figure.Picture has carried out color coding according to maternal genotype, so that red indicate maternal frequency of genotypes AA, Blue indicates maternal genotype BB, and green indicates maternal genotype AB.Figure 15 A be when two chromosomes exist simultaneously, The figure that fetus cfDNA ratio is 0%.The figure comes from the Ms not being pregnant, therefore represents genotype and be entirely Maternal mode.Therefore allele cluster surrounds 1 (AA allele), 0.5 (AB allele) and 0 (BB allele).Figure 15B is the figure that the ratio of fetus is 12% in the presence of two chromosomes.Foetal allele reads ratio to A allele Contribution move some allele point position, upward or downward along y-axis.Figure 15 C is when two chromosomes exist When, figure that the ratio of fetus is 26%.The mode, including two red and the peripheral band of two blues and three central green bars Band is obvious.

Figure 16 A and 16B are the graphic representation of 11.2 deletion syndrome of 22q.Figure 16 A is that maternal 22q 11.2 is lacked Carrier (is indicated) by the missing of green AB SNPs.Figure 16 B is (red by one for the 22q11 missing of the paternal inheritance in fetus The presence of color and a blue peripheral strip indicates).X-axis indicates the linear position of SNPs, and y-axis indicates A equipotential base in total indicator reading Because of the ratio of reading.Each point represents single SNP gene loci.

Figure 17 be matrilinear inheritance Cri-du-Chat deletion syndrome (by two center green bands rather than three it is green The presence of vitta band indicates) diagram.X-axis indicates the linear position of SNPs, and y-axis indicates A allele reading in total indicator reading Ratio.Each point represents single SNP gene loci.

Figure 18 is the Wolf-Hirschhom deletion syndrome of paternal inheritance (by a red and a blue peripheral strip Presence indicate) diagram.X-axis indicates the linear position of SNP, and y-axis indicates the ratio of A allele reading in total indicator reading. Each point represents single SNP gene loci.

Figure 19 A-19D is the diagram of X chromosome mark-on experiment, to indicate the additional copy of chromosome or chromosome segment. The figure illustrates the not same amounts of the male parent DNA mixed with daughter DNA: 16% male parent DNA (Figure 19 A), 10% male parent DNA (Figure 19 B), the male parent DNA (Figure 19 D) of 1% male parent DNA (Figure 19 C) and 0.1%.X-axis indicates SNP on X chromosome Linear position, y-axis indicate the ratio of the M allele reading in total indicator reading (M+R).Each point, which represents, has allele M or R Single SNP gene loci.

Figure 20 A and 20B are the figures of false negative rate, using Haplotype data (Figure 20 A) and not Haplotype data (figure 20B)。

Figure 21 A and 21B are the figures of the false positive rate of p=1%, using (Figure 21 A) of Haplotype data and without haplotype (Figure 21 B) of data.

Figure 22 A and 22B are the figures of the false positive rate of p=15%, using (Figure 22 A) of Haplotype data and without haplotype (Figure 22 B) of data.

Figure 23 A and 23B are the figures of the false negative rate of p=2%, using (Figure 23 A) of Haplotype data and without haplotype (Figure 23 B) of data.

Figure 24 A and 24B are the figures of the false positive rate of p=2.5%, using (Figure 24 A) of Haplotype data and without single times (Figure 24 B) of type data.

Figure 25 A and 25B are the figures of the false positive rate of p=3%, using (Figure 25 A) of Haplotype data and without haplotype (Figure 25 B) of data.

Figure 26 is the table to the false positive rate of first time simulation.

Figure 27 is the table to the false negative rate of first time simulation.

Figure 28 A is reference count (counting of an allele, such as " A " allele) divided by the figure of tale, right In the gene loci of normal (non-cancerous) cell line.

Figure 28 B is chart of the reference count divided by tale, for having the cancer cell system of missing.Figure 28 C is reference The chart divided by tale is counted, for the DNA mixture from normal cell system and cancer cell system.

Figure 29 is chart of the reference count divided by tale, for the plasma sample from IIa primary breast cancer patient, is swollen Tumor score is estimated as 4.33% (wherein 4.33% DNA is from tumour cell).The green portion expression of chart is not deposited wherein In the region of CNV.The blue of chart and red part indicate the region that wherein there is CNV, and measured allele Ratio has one significantly to separate with expected allele ratio 0.5.One haplotype of blue-colored instruction, and red Color indicates another haplotype.The SNP of about 636 heterozygosity is analyzed in the region of CNV.

Figure 30 is chart of the reference count divided by tale, for the plasma sample from IIb primary breast cancer patient, is swollen Tumor score is estimated as 0.58%.The green portion of chart indicates the region that CNV is wherein not present.The blue of chart and red portion Point indicate wherein there is the region of CNV, but measured allele ratio and expected allele ratio 0.5 without one A apparent separation.For the analysis, the SNP of 86 heterozygosity is analyzed in the region of CNV.

Figure 31 A and 31B show the maximal possibility estimation of tumour score.Maximal possibility estimation is indicated by the peak value of figure, right It is 4.33% in Figure 31 A, is 0.58% for Figure 31 B.

Figure 32 A is a comparison to the logarithmic chart of various possible tumour score odds ratios, for high tumour score sample This (4.33%) and low tumour fractional samples (0.58%).If logarithm probability ratio less than 0, is more likely euploid vacation It says.If logarithm probability ratio is greater than 0, CNV more likely there are.

The probability that Figure 32 B is missing from divides low tumour various possible tumour scores divided by the probability of no missing For numerical example (0.58%).

Figure 33 is the logarithmic chart of the odds ratio of the various possible tumour scores of low tumour fractional samples (0.58%).Figure 33 is The amplified version of low tumour fractional samples is used in Figure 32.

Figure 34 is shown to the limiting value of single nucleotide mutation detection in tumor biopsy, three kinds described in use example 6 Distinct methods.

Figure 35 is shown to the limiting value of single nucleotide mutation detection in plasma sample, three kinds described in use example 6 Distinct methods.

Figure 36 A and 36B are the analysis charts of DNA (Figure 36 B) in genomic DNA (Figure 36 A) or individual cells, are set using one Meter is used to detect the library of about 28,000 primers of CNV.There are two center strips rather than a center strip shows CNV Presence.X-axis indicates the linear position of SNP, and y-axis indicates the ratio of A allele reading in total indicator reading.

Figure 37 A and 37B are the analysis charts of DNA (Figure 37 B) in genomic DNA (Figure 37 A) or individual cells, are set using one Meter is used to detect the library of about 3,000 primers of CNV.There are two center strips rather than a center strip shows The presence of CNV.X-axis indicates the linear position of SNP, and y-axis indicates the ratio of A allele reading in total indicator reading.

Figure 38 shows the uniformity of the reading depth (DOR) of these about 3,000 gene locis.

Figure 39 is the table that a Comparative genomic strategy DNA and the error from individual cells DNA call index.

Figure 40 is the figure of the error rate of transition mutations and transversional mutation.

Figure 41 a-d is the figure with the sensitivity of the CoNVERGe of PlasmArts measurement.(a) CoNVERGe calculate AAI and The correlation between score is actually entered, in the PlasmArt sample of DNA lack from 22q11.2 and matching normal cell system In this.(b) correlation between the AAI calculated and the input of practical Tumour DNA, from chromosome 2p and 2q CNV's The DNA's of HCC2218 breast cancer cell and matched normal HCC2218BL cell (containing 0-9.09% Tumour DNA score) In PlasmArt sample.(c) calculate AAI and practical Tumour DNA input between correlation, from have chromosome 1p with The HCC1954 breast cancer cell of 1q CNV and matched normal HCC1954BL cell (containing 0-5.66% Tumour DNA score) In the PlasmArt sample of DNA.(d) gene frequency figure, for the HCC1954 cell used in (c).At (a), (b) and (c) in, data point and error bars respectively indicate average value and standard deviation (SD), for 3-8 repetition.

Figure 42 provides details, Plasmart standard exemplary for one, including in the clip size compared with lower part point The figure of cloth.

Figure 43 correctly provides from Plasmart synthesis ctDNA standard items dilution curve as a result, micro- for verifying Missing and cancer index.Figure 43 A；Right figure shows the maximum likelihood of tumour, assesses the result of the part DNA as an advantage Than figure.Figure 43 B is one for detecting the figure of transversion event.Figure 43 C is one for detecting the figure of transition events.Figure 44 is one It opens and shows the figure of the CNV of various chromosomal regions, i.e., what different samples were indicated at different %ctDNA.

Figure 45 is the figure of a CNV for showing various chromosomal regions, for each of different %ctDNA levels For kind oophoroma sample.

Figure 46 is that a table shows that breast cancer or patients with lung cancer have SNV or combined SNV and/or CNV in ctDNA Percentage.

The chart of the % sample of breast cancer out of phase of Figure 47 _ be has tumour-specific SNV in blood plasma And/or CNV and associated tables of data are on the right.

Figure 48 is the chart of the % sample of breast cancer different subagees, have in blood plasma tumour-specific SNV and/ Or CNV and associated tables of data are on the right.

Figure 49 is the chart of the % sample of a lung cancer out of phase, in blood plasma have tumour-specific SNV and/or CNV and associated tables of data are on the right.

Figure 50 is the chart of the % sample of breast cancer different subagees, have in blood plasma tumour-specific SNV and/ Or CNV and associated tables of data are on the right.

Figure 51 A indicates histology discovery/history of primary tumors of lung, analyzes its clone and subclone Tumor Heterogeneity. Figure.Figure 51 B is the table of the VAF identity of a biopsy lung neoplasm, is measured by genome sequencing and AmpliSEQ.

Figure 52 illustrates to go identification clone and subclone SNA mutation using the ctDNA from blood plasma, to overcome tumour different Matter.

Figure 53 is a table, and the VAF for comparing AmpliSeq and mmPCR-NGS is called, for SNV in primary tumor Detection, is missed by AmpliSeq the and SNV mutant identified in blood plasma ctDNA.

Figure 54 A is the figure of %VAF in primary tumors of lung.Figure 54 B is a linear regression graph, to AmpliSeq VAF phase For Nater aVAF's.

Figure 55 is the figure in the library 1/4 of 84-plex SNV PCR primer reaction, when primer concentration is by limited time.

Figure 56 is the figure in the library 2/4 of 84-plex SNV PCR primer reaction, when primer concentration is by limited time.

Figure 57 is the figure in the library 3/4 of 84-plex SNV PCR primer reaction, when primer concentration is by limited time.

Figure 58 is the figure in the library 4/4 of 84-plex SNV PCR primer reaction, when primer concentration is by limited time.

Figure 59 illustrates a detection limit (LOD) to the figure of reading depth (DOR), and for detecting, SNV is converted and transversion is prominent Become, is repeated in PCR reaction the 84 of 15 PCR cycles.

Figure 60 illustrates a detection limit (LOD) to the figure of reading depth (DOR), and for detecting, SNV is converted and transversion is prominent Become, is repeated in PCR reaction the 84 of 20 PCR cycles.

Figure 61 illustrates a detection limit (LOD) to the figure of reading depth (DOR), and for detecting, SNV is converted and transversion is prominent Become, is repeated in PCR reaction the 84 of 25 PCR cycles.

Figure 62 is that a figure illustrates comparable sensitivity between tumour and individual cells genomic DNA.Upper part Display uses the result of tumor cell gene group DNA.Lower part shows the result using individual cells genomic DNA.

Figure 63 illustrates the workflow of analysis CNV, in kinds cancer sample type, in the extensive more of targeting SNP - Figure 63 a in weight PCR (mmPCR) measurement.Figure 63 b-f compares CoNVERGe measurement and microarray assays, in breast cancer cell line In matched normal cell system.

Figure 64 provide a fresh food frozen (FF) and FFPE (the fixed paraffin embedding of formalin) breast cancer sample with Comparison with control.Figure a-h compares CoNVERGe measurement and microarray assays, in breast cancer cell line and matched leucocyte In layer gDNA check sample.

Figure 65 illustrates gene frequency figure to reflect chromosomal copy number, is detected using CoNVERGe measurement single CNV in cell.Figure 65 a-c comes from the unicellular duplicate analysis of three breast cancer.Figure 65 d is lacked in target region The analysis of the bone-marrow-derived lymphocyte system of CNV.

Figure 66 illustrates gene frequency figure to reflect chromosomal copy number, is detected using CoNVERGe measurement true CNV in plasma sample.Figure 66 a is II primary breast cancer blood plasma cfDNA sample and its matched tumor biopsy gDNA.Figure 66 b It is advanced ovarian cancer blood plasma cfDNA sample and its matched tumor biopsy gDNA.Figure 66 c is a chart, illustrates to pass through The Tumor Heterogeneity of CNV detection assay, in five kinds of advanced ovarian cancer blood plasma and matched tissue samples.

Figure 67 illustrates that chromosome location and mutation in breast cancer change.

Figure 68 illustrates main (Figure 68 A) and minorAllele (Figure 68 B) frequency of SNP, anti-for 3168mmPCR It answers.

Figure 69 shows an example system system X00, for executing embodiment of the present invention.

Figure 70 illustrates an example computer system, for executing embodiment of the present invention.Although above-mentioned attached drawing is explained Presently disclosed embodiment is stated, other embodiments are also conceived to, as pointed by under discussion.The disclosure illustrates Illustrative embodiment, to present and unrestricted mode.Many other modifications and embodiment can be by those in skill The technical staff in art field designs, in accordance with the scope and spirit of the principle of presently disclosed embodiment.

The specific descriptions of invention

On the one hand, the present invention relates generally to, be at least partly related to determine copy number variation presence or absence improved method, Such as the missing or duplication of chromosome segment or whole chromosome.The method is particularly useful for detecting small missing or repetition, It is difficult the detection by high specific and sensitivity, by existing method, due to from the available of relative chromosome segment Data are seldom.This method includes improved analysis method, improved bioassay method and improved analysis and bioassay The combination of method.Method of the invention may be utilized for detection and exist only in test cell or the nucleic acid molecules of small percentage Missing or repetition.This allow lack or repeat be detected, disease occur before (such as in precancer) or disease morning Phase, such as before there is missing or duplicate a large amount of diseased cells (such as cancer cell) accumulation.More accurate detection, is directed to Missing or duplicate relevant to disease or obstacle is predicted so that being used to diagnose, and is prevented, delay, stablizes or treat disease or disease The method of disease is improved.Several missings repeat known related to cancer or serious spirit or physical disturbances.

On the other hand, the present invention relates generally to be at least partly related to detecting the improvement side of single nucleotide variations (SNV) Method.These improved methods include improved analysis method, improved bioassay method and by improved analysis and biology The improved method that measuring method is composed.Method in certain illustrative embodiments be used to detect, diagnosis, monitoring or Cancer staging, such as in SNV in sample existing for low-down concentration, to be, for example, less than 10%, 5%, 4%, 3%, 2.5%, 2%, 1%, 0.5%, 0.25% or 0.1%, for the normal copy of SNV gene loci sum, such as circulation is free DNA sample.That is, in certain illustrative embodiments especially suitable for there is relatively low percentage in these methods The sample of mutation or variation, existing for the gene site normally for polymorphic allele.Finally, mentioning herein The method of confession is combined with the modification method for detecting copy number variation and the improvement side for detecting single nucleotide variations Method.

A kind of disease such as cancer is successfully treated, early diagnosis is often relied on, correct staging is effectively treated The selection of scheme, and monitoring closely is to prevent or detect recurrence.For cancer diagnosis, the tumour material obtained from tissue biopsy The Histological evaluation of material is typically considered most reliable method.However, sampling invasive based on tissue biopsy so that its It is unsuitable for Large-scale Screening and regular follow-up.Therefore, this method has advantage, can non-invasively carry out, if necessary to opposite Low cost and if the quick turnaround time.Method of the invention can use targeting sequencing, need to be sequenced than air gun less Reading, such as it is millions of reading rather than 4,000 ten thousand reading, to reduce cost.Multiplex PCR and next-generation sequencing can by with In increase read volume and reduce cost.

In some embodiments, the method be used to detect the missing in an individual, and duplication or mononucleotide become It is different.One sample of individual can be analyzed comprising having the cell or nucleic acid of missing, duplication or single nucleotide variations.One In a little embodiments, sample from the doubtful tissue or organ with missing, duplication or single nucleotide variations, such as cell or The a large amount of cell for suspecting canceration.Method of the invention, which can be used to detect, to be existed only in a cell or a small amount of cell Missing, duplication or single nucleotide variations, at one containing having a missing, the cell of duplication or mononucleotide variant and without having In the mixture of the cell of missing, duplication or mononucleotide variant.In some embodiments, from the blood sample of individual CfDNA or cfRNA it is analyzed.In some embodiments, cfDNA or cfRNA are secreted by cell, such as cancer cell.One In a little embodiments, cfDNA or cfRNA are discharged by the cell of experience necrosis or apoptosis, such as cancer cell.Method of the invention can For detecting the missing in the cfDNA or cfRNA that exist only in small percentage, duplication or single nucleotide variations.In some implementations In scheme, one or more cells from embryo are tested.

In some embodiments, the method be used for fetus Noninvasive or invasive antenatal exaination.These sides Method can be used to determine the missing or duplicate presence or absence of chromosome segment or whole chromosome, such as known to missing or duplication With serious spirit or physical disturbances, learning disorder or cancer are related.For some of the antenatal test (NIPT) of Noninvasive In embodiment, from the cell of the blood sample of pregnant mothers, cfDNA or cfRNA are tested.This method, which allows to detect, to be lacked Lose or repeat, in the cell from fetus, in cfDNA or cfRNA, although from maternal a large amount of cells, cfDNA or CfRNA there is also.DNA or RNA (example in some embodiments for invasive antenatal test, in fetus sample Such as CVS or amniocentesis sample) it is tested.Even if sample is by DNA or the RNA pollution from pregnant mother, the method Missing or repetition in foetal DNA or RNA can also be used to detect.

Other than determining the presence or absence of copy number variation, one or more other factors can be analyzed, if needed If wanting.These factors can be used for improving diagnosis accuracy (such as determine cancer presence or absence or cancer increase Risk, cancer classification or cancer staging) or prognosis.These factors can also be used to select a specific therapy or treatment side Case, may be in subject effectively.Example factors include the presence or absence of polymorphism or mutation；Change (it is increased or Reduce) total or specific cfDNA, cfRNA, the level of tiny RNA (miRNA)；(the increased or reduction) tumour point changed Number；(the increased or reduction) methylation level changed, (increased or reduction) DNA integrality of change, change (increase It is adding or reduction) or variable mRNA montage.

Following section describes method for detect missing or repeat, using phase data (such as infer or measurement phase Position data) or obfuscated data；The sample that can be tested；Sample preparation, amplification and quantitative method；The side of phase genetic data Method；Polymorphism, mutation, nucleic acid change, mRNA alternative splicing, and the change for the nucleic acid level that can be detected；From The result database of method, other risk factors and screening technique；The cancer that can be diagnosed or treat；Treatment of cancer；For Test the cancer model for the treatment of；And the method for formulating and applying treatment.

The illustrative methods of ploidy are determined using phase data

Certain methods of the invention are based partially on following discovery: detect CNV using phase data, reduce false negative and False positive rate, compared with using obfuscated data (Figure 20 A-27).This improvement is for CNV is with sample existing for low-level It is maximum.Therefore, phase data improve CNV detection accuracy, with use obfuscated data ratio in (such as calculate one or more Allele ratio at a gene loci summarizes allele ratio to provide summarizing on chromosome or chromosome segment It is worth the method for (such as average value), does not consider whether the allele ratio at different genes site shows identical or different list Times type seems with abnormal amount presence).Allow more accurately to determine using phase data, measurement and expected allele ratio Between difference whether be due to noise or due to the presence of CNV.For example, if most of or whole base in a region Because it is being measured on site and expected from difference between allele ratio show that identical haplotype is overexpressed, then CNV is more It may be existing.Using chain between haplotype allelic, allow to determine whether measured genetic data and mistake The identical haplotype (rather than random noise) of expression is consistent.On the contrary, if allele ratio measure and expected Between difference only due to noise (such as experimental error), then in some embodiments, approximately half of time first Haplotype looks like overexpression, about the other half time, and the second haplotype looks like overexpression.

Can be chain between SNP by considering, and (generation forms embryo and grows into matching for fetus in meiosis Son) during a possibility that intersecting improve accuracy.When the expection for the allele measurement for creating one or more hypothesis It is more preferable when than not using chain corresponding to reality using the expected allele measurement distribution of chain creation when distribution. For example, it is assumed that it is near there are two SNP, 1 and 2, it is A, SNP 2 at the SNP 1 of mother on a homologue Place is A, and SNP 1 is B on Article 2 homologue, and SNP 2 is B.If father is on two homologues Two SNP are A, measure B for fetus SNP 1, then this shows that Article 2 homologue is inherited by fetus, therefore There is a higher possibility to come across the site fetus SNP2 for B.In view of chain model can predict this point, without Consider that chain model then cannot.It alternately, is AB at nigh SNP 2 if mother is AB at SNP 1, then that Two hypothesis that a site corresponds to maternal three-body can be used-one (not divide in subtrahend comprising matched copy error Divide in II or in the mitosis of early stage development of fetus), one (is not divided in meiosis comprising unmatched copy error In 1).In the case where a matching copy error three-body, if fetus is at SNP 1 from mother's heredity AA, fetus It is more likely at SNP 2 from mother heredity AA or BB, rather than in AB.In the case where a unmatched copy error, tire Youngster inherits AB from mother at two SNP.The allele distributions hypothesis that CNV call method is formulated, it is contemplated that it is chain, it can do These are predicted out, therefore the measurement for corresponding to actual allele has comparable bigger degree, does not consider than one chain CNV call method.

In some embodiments, phase genetic data is used to determine whether that there are the first homologous chromosomal segments copies Several overexpressions, compared with the second homologous chromosomal segments in genes of individuals group (such as the gene in one or more cells In group or in cfDNA or cfRNA).Repetition or second of the illustrative overexpression including the first homologous chromosomal segments are homologous The missing of chromosome segment.In some embodiments, there is no be overexpressed because the first and second homologous chromosomal segments with Equal proportion (such as one of each segment copy in diploid sample) exists.In some embodiments, in sample of nucleic acid The allele ratio of calculating be compared with expected allele ratio, be discussed further below with determining whether there is Overexpression.In this specification, phrase " the first homologous chromosomal segments compared with the second homologous chromosomal segments " refers to one First homologue of chromosome segment and the second homologue of chromosome segment.

In some embodiments, the method includes obtaining the phase genetic data of the first homologous chromosomal segments, packet Containing the identity for being present in the allele on the first homologous chromosomal segments at the gene loci, for the first homologue For each gene loci in the set of Genetic polymorphism site in segment, the phase for obtaining the second homologous chromosomal segments is lost Pass data, the identity comprising being present in the allele on the second homologous chromosomal segments at the gene loci, for second For each gene loci in the set of Genetic polymorphism site on homologous chromosomal segments, and obtain the heredity etc. of measurement Position gene data for each allele at each gene loci in the set of Genetic polymorphism site, including comes Each equipotential base present in DNA the or RNA sample of individual one or more target cells and one or more non-target cells The amount of cause.In some embodiments, the method includes enumerating one group of one or more hypothesis, the first homologue is specified The degree of the overexpression of segment；It calculates, for each hypothesis, the expected genetic data in multiple sites in sample, from what is obtained In phase genetic data, for from one or more target cell DNA or RNA into sample one or more of total DNA or RNA For a possibility ratio；Calculate (such as on computers calculate), ratio possible for each of DNA or RNA and it is each it is assumed that Data fit between the sample genetic data of acquisition and the expection genetic data of sample, for DNA or RNA possibility ratio and For that hypothesis；One or more hypothesis is ranked up according to data fitting；The highest hypothesis that wherein sorts is selected, thus Determine the degree of the overexpression of the copy number of the first homologous chromosomal segments in individual one or more cellular genomes.

In one aspect, the present invention describes a kind of method and is used to determine the chromosome of fetus or the copy of chromosome segment Number.In some embodiments, the method includes obtaining the phase genetic data of at least one biology parent of fetus, Middle phase genetic data includes the Genetic polymorphism position on the first homologous chromosomal segments of parent and the second homologous chromosomal segments The identity of allele present on each gene loci in point set.In some embodiments, the method includes The genetic data at the Genetic polymorphism site set in DNA or RNA mixing sample on chromosome or chromosome segment is obtained, is mixed Closing sample includes foetal DNA or RNA and mother body D NA or RNA from fetus mother, by measuring on each gene loci often The amount of a allele.In some embodiments, this method includes enumerating one group of one or more hypothesis, specified to be present in tire The copy number of interested chromosome or chromosome segment in youngster's genome.In some embodiments, the method includes Creation (such as on computers create), for each hypothesis, each site in multiple gene locis in mixing sample On each allele desired amount probability distribution, the phase genetic data obtained from (i) from parent, or (ii) Be likely to occur in one or more probability intersected during gamete is formed, gamete be fetus contribute to interested chromosome or One copy of chromosome segment；(such as calculating on computers) is calculated, for each hypothesis, in (1) mixing obtained The genetic data of sample and (2) are for each allele on each site in multiple gene locis in the hypothesis mixing sample Desired amount probability distribution between；One or more hypothesis are ranked up according to data fitting；And selected and sorted highest Hypothesis, so that it is determined that in Fetal genome interested chromosome segment copy number.

In some embodiments, the method includes obtaining phase genetic data, any side described herein is utilized Method or any known method.In some embodiments, the method includes simultaneously or successively (i) obtains the in any order The phase genetic data of one homologous chromosomal segments, it includes be present on the first homologous chromosomal segments at the gene loci The identity of allele, for each gene loci in polymorphic position point set on the first homologous chromosomal segments, (ii) The phase genetic data for obtaining the second homologous chromosomal segments, it includes be present in the gene position on the second homologous chromosomal segments The identity of allele at point, for each gene position in polymorphic position point set on the second homologous chromosomal segments Point, and (iii) obtain the genetic alleles data of measurement comprising in the set of Genetic polymorphism site on each site The amount of allele, in the DNA sample from individual one or more cells.

In some embodiments, the method includes calculating allele ratio, Genetic polymorphism site is gathered In one or more gene locis, at least one cell is that (such as the gene loci exists heterozygosis in isolated sample It is heterozygosis in fetus and/or is heterozygosis in female parent).In some embodiments, the calculating in specific gene site etc. Position gene ratio is the measurement amount an of allele divided by the overall measurement amount of allele all on gene loci.In some realities It applies in scheme, the allele ratio of the calculating in specific gene site is an allele (such as the first homologue piece Allele in section) measurement amount divided by other one or more allele measurement amount (such as the second homologue Allele in segment).The allele ratio of calculating can be calculated, and any method described herein or any mark are utilized Quasi- method (such as any mathematic(al) manipulation of the allele ratio of calculating described herein).

In some embodiments, the method includes determining whether there is the mistake of the first homologous chromosomal segments copy number Expression, the allele ratios with the gene loci of one or more calculating of a gene loci is expected by comparing Allele ratio, if the first and second homologous chromosomal segments exist at equivalent ratios.In some embodiments, in advance The allele ratio of phase, which assumes that the possible allele on a gene loci is having the same, there is a possibility that.Some In embodiment, wherein being the measurement amount an of allele for the allele ratio of the calculating in a specific gene site Divided by the overall measurement amount of allele all on gene loci, corresponding expected allele ratio is 0.5 double for one Allele site, or for 1/3 for a triallelic site.In some embodiments, it is contemplated that allele ratio Rate assume the possible allele of a gene loci can have it is different there is a possibility that, such as based on each equipotential base A possibility that frequency of cause, in the specific crowd belonging to subject, such as the crowd of the ancestors based on subject.It is such etc. Position gene frequency is publicly available (plans see, for example, HapMap；Perlegen mankind's haplotype project； The website ncbi.nlm.nih.gov/projects/SNP/；Sherry ST, Ward MH, Kholodov M et al. dbSNP:the NCBI database of genetic variation.Nucleic Acids Res.2001Jan 1；29 (1): 308-11, These are each by as a whole incorporated by reference).In some embodiments, it is contemplated that allele ratio be to particular individual Allele ratio, the specific hypothesis which is just being specified the first homologous chromosomal segments overexpression degree are tested.Example Such as, the expected allele ratio of particular individual can be determined, and based on the phase from individual or obscure genetic data (example Such as it is less likely from individual with missing or a duplicate sample, such as non-cancerous sample), or from individual The data of one or more relatives.In some embodiments for antenatal test, it is contemplated that allele ratio be to one The expected allele ratio of a mixing sample, the mixing sample include DNA or the RNA from pregnant mothers and fetus (such as maternal blood plasma or serum sample comprising from the cfDNA of mother and cfDNA of fetus), one is referred to For the specific hypothesis of the overexpression degree of fixed first homologous chromosomal segments.For example, the expected allele of mixing sample Ratio can be determined, the genetic data of the prediction based on the genetic data and fetus from mother (such as fetus may From the prediction of mother and/or the allele of father's heredity).In some embodiments, come solely from mother (such as From the leukocytic cream of maternal blood sample) DNA or RNA a sample phase or obscure genetic data, be determining May inherit from mother (therefore may for the allele of female parent DNA or RNA and fetus in mixing sample Be present in the foetal DNA or RNA of mixing sample) allele.In some embodiments, the DNA of father is come solely from RNA sample phase or obscure genetic data be used to determine allele that fetus may inherit from father (with And be therefore likely to be present in the foetal DNA or RNA of mixing sample).Expected allele ratio can be calculated, this is utilized Any method or any standard method described in text (such as any mathematics of expected allele ratio as described herein becomes Change) (US publication on November 18th, 2012/0270212,2011 submits, and is listed in herein with reference to whole reference).

In some embodiments, the allele ratio of calculating indicates the mistake of the copy number of the first homologous chromosomal segments Expression, if (i) allele ratio (is present in the survey of the allele on the first homologous chromosomal segments at the gene loci Amount amount, divided by the overall measurement of all allele at gene loci) it is greater than the expected allele ratio at the gene loci Rate, or (ii) allele ratio (are present in the measurement of the allele on the second homologous chromosomal segments at the gene loci Amount, divided by the overall measurement of all allele at gene loci) it is less than the expected allele ratio at the gene loci. In some embodiments, the allele ratio of calculating only considers that instruction is overexpressed, if it is to be significantly greater or less than The desired ratio in the site.In some embodiments, the allele ratio of calculating indicates the first homologous chromosomal segments Copy number is not overexpressed, if (i) allele ratio (is present on the first homologous chromosomal segments at the gene loci The measurement amount of allele, divided by the overall measurement of all allele at gene loci) it is less than or equal at the gene loci Expected allele ratio, or (ii) allele ratio (is present in the gene loci on the second homologous chromosomal segments The measurement amount of the allele at place, divided by the overall measurement of all allele at gene loci) it is greater than or equal to the gene position Expected allele ratio at point.In some embodiments, with mutually it is contemplated that the equal calculating ratio of ratio be ignored (because It indicates not to be overexpressed for them).

In various embodiments, one or more following methods are used to the allele of the one or more calculating of comparison Ratio is expected allele ratio with corresponding.In some embodiments, a kind of method determines whether the allele calculated Ratio is higher or lower than expected allele ratio, and gene loci specific for one does not consider the size of difference.One In a little embodiments, a kind of method determines the big of the difference between the allele ratio calculated and expected allele ratio Small, gene loci specific for one is higher or lower than expected equipotential base regardless of whether the allele ratio of calculating Because of ratio.In some embodiments, it is expected etc. to determine whether that the allele ratio calculated is higher or lower than for a kind of method Position gene ratio, and the size of the difference for a specific gene site.In some embodiments, a kind of method determines The average value or weighted average of the allele ratio whether calculated are higher or lower than being averaged for expected allele ratio Value or weighted average, do not consider the size of difference.In some embodiments, a kind of method determines the allele ratio calculated Difference between the average value or weighted average of rate and the average value or weighted average of expected allele ratio it is big Small, average value or weighted average regardless of whether the allele ratio of calculating are higher or lower than expected allele ratio The average value or weighted average of rate.In some embodiments, a kind of method determines whether the allele ratio calculated Average value or weighted average are higher or lower than the average value or weighted average and difference of expected allele ratio Size.In some embodiments, a kind of method determines average value or weighted average, for calculating allele ratio with The size of difference between expected allele ratio.

In some embodiments, the difference between the allele ratio of calculating and expected allele ratio is big It is small, for one or more gene locis, it is used to determine whether the overexpression of the copy number of the first homologous chromosomal segments It is the missing of the repetition or the second homologous chromosomal segments due to the first homologous chromosomal segments, in one or more cells In genome.

In some embodiments, the overexpression of the copy number of the first homologous chromosomal segments is determined existing, if one A or multiple following situations occur.In some embodiments, the number of the allele ratio of calculating, instruction first are homologous The overexpression of the copy number of chromosome segment is higher than threshold value.In some embodiments, the number of the allele ratio of calculating Mesh, indicate the copy number of the first homologous chromosomal segments without being overexpressed, be lower than threshold value.In some embodiments, it counts The allele ratio (it indicates the overexpression of the copy number of the first homologous chromosomal segments) and corresponding expected equipotential base of calculation Because the difference size between ratio is higher than threshold value.In some embodiments, the equipotential for all calculating being overexpressed for instruction The summation of gene ratio, the difference size between the allele ratio of calculating and corresponding expected allele ratio is higher than threshold Value.In some embodiments, the difference between the allele ratio of calculating and corresponding expected allele ratio is big Small to be lower than threshold value, the ratio indicates that the copy number of the first homologous chromosomal segments is not overexpressed.In some embodiments In, the allele ratio (to the measurement amount for the allele being present on the first homologue) of calculating is divided by gene loci All allele overall measurement amount average value or weighted average, higher than expected allele ratio average value or At least one threshold value of weighted average.In some embodiments, the allele ratio of calculating is (to being present in the second homologous dye The measurement amount of allele on colour solid) divided by gene loci all allele overall measurement amount average value or weighting it is flat Mean value, lower than at least one threshold value of the average value or weighted average of expected allele ratio.In some embodiments, Data fitting between the allele ratio of calculating and the allele ratio of prediction, on the first homologous chromosomal segments Copy number be overexpressed, lower than threshold value (indicate good data fitting).In some embodiments, the allele of calculating Data fitting between ratio and the allele ratio of prediction, did not had the copy number on the first homologous chromosomal segments Expression, it is higher than threshold value (the data fitting of instruction difference).

In some embodiments, the overexpression of the copy number of the first homologous chromosomal segments is confirmed as being not present, such as The one or more following situations of fruit occur.In some embodiments, indicate that the copy number of the first homologous chromosomal segments crosses table The quantity of the allele ratio of the calculating reached is lower than threshold value.In some embodiments, the first homologous chromosomal segments are indicated The number of the allele ratio of calculating that is not overexpressed of copy number be higher than threshold value.In some embodiments, calculating Allele ratio (it indicates the overexpression of the copy number of the first homologous chromosomal segments) and corresponding expected allele ratio Difference size between rate is lower than threshold value.In some embodiments, (it indicates the first homologous dye to the allele ratio of calculating The copy number of chromosome fragment is not overexpressed) and the difference size between expected allele ratio is higher than threshold value accordingly.? In some embodiments, the allele ratio of calculating (to the measurement amount for the allele being present on the first homologue) Divided by the average value or weighted average of the overall measurement amount of all allele at gene loci, expected allele is subtracted The average value or weighted average of ratio are lower than threshold value.In some embodiments, it is contemplated that allele ratio average value Or weighted average, subtract the allele ratio (measurement to the allele being present on the second homologue of calculating Amount) divided by all allele at gene loci overall measurement amount average value or weighted average, be lower than threshold value.Some In embodiment, the data between the allele ratio of calculating and the allele ratio of prediction are fitted, homologous for first What the copy number on chromosome segment was overexpressed, it is higher than threshold value.In some embodiments, the allele ratio of calculating and pre- Data fitting between the allele ratio of survey, is not overexpressed the copy number on the first homologous chromosomal segments, Lower than threshold value.In some embodiments, threshold value is determined, to the known sample with CNV interested and/or known Lack the empirical test of the sample of CNV.

In some embodiments, it is determined whether there are the overexpressions of the copy number of the first homologous chromosomal segments, including One group of one or more hypothesis is enumerated, the degree of the overexpression of the first homologous chromosomal segments is specified.

One exemplary hypothesis is that there is no overexpressions, because the first and second homologous chromosomal segments are at equivalent ratios In the presence of (such as copy for each segment in diploid sample).Other exemplary hypothesis include the first homologue Segment is replicated that one or many (such as the 1 of the first homologue, 2,3,4,5 or more additional copies are homologous with second The copy number of chromosome segment is compared).Another exemplary hypothesis includes the missing of the second homologous chromosomal segments.However it is another A exemplary hypothesis is the missing of the first and second homologous chromosomal segments.In some embodiments, the allele of prediction Ratio, for be at least one cell heterozygosity gene loci (such as in fetus heterozygosity and/or in parent it is miscellaneous The site of conjunction property), it is evaluated for each hypothesis, it is contemplated that the degree of the specified overexpression of that hypothesis.In some embodiment party In case, a possibility that hypothesis is correct, is calculated, by comparing the allele ratio of the allele ratio and prediction that calculate, And the hypothesis with maximum likelihood is selected.

In some embodiments, the desired distribution of a test statistics is calculated, using prediction allele ratio, For each hypothesis.In some embodiments, a possibility that hypothesis is correct is calculated, by comparing test statistics (benefit Calculated with the allele ratio of calculating) it (is calculated using the allele ratio of prediction with the expected distribution of test statistics ), and the hypothesis with maximum likelihood is selected.

In some embodiments, the allele ratio of prediction, for being the base of heterozygosity at least one cell Because site (such as in fetus heterozygosity and/or in parent heterozygosity site), be evaluated, it is contemplated that the first homologous dyeing What the phase genetic data of body segment, the phase genetic data of the first homologous chromosomal segments and that hypothesis were specified crosses table The degree reached.In some embodiments, a possibility that hypothesis is correct is calculated, by comparing the allele ratio calculated With the allele ratio of prediction, and selected with the hypothesis of maximum likelihood.

Use mixing sample

It should be appreciated that sample is the mixing sample with DNA or RNA, from one for many embodiments Or multiple target cells and one or more non-target cell.In some embodiments, target cell is the cell with CNV, example Such as interested missing or repetition, non-target cell is that the cell without copy number interested variation (such as is lacked with interested Lose or duplicate cell and without any missing or one of duplicate tested cell mixing).In some embodiments, Target cell is cell (such as cancer cell) relevant to disease or obstacle or relevant with the increase risk of disease or obstacle, non-target Cell is cell (such as non-cancerous cell) unrelated with disease or illness or unrelated with the increase risk of disease or obstacle.? In some embodiments, target cell CNV all having the same.In some embodiments, two or more target cells have Different CNV.In some embodiments, one or more target cells have CNV, polymorphism or mutation (with disease or obstacle It is related or related to the increase risk of disease or obstacle), it is not found at least one other target cell.It is some this In the embodiment of sample, cell relevant to disease or obstacle or relevant with the increase risk of disease or obstacle is from sample Ratio in this total cell is assumed to be greater than or equal to the most common of these CNVs in sample, polymorphism or mutation Ratio.For example, if 6% cell is mutated with K-ras, and 8% cell is mutated with BRAF, at least 8% cell quilt It is assumed to carcinous.

In some embodiments, from the DNA (or RNA) of one or more target cells at sample total DNA (or RNA) In ratio calculated.In some embodiments, one group of one or more hypothesis (mistake of specified first homologous chromosomal segments The degree of expression) it is listed.In some embodiments, the allele ratio of prediction is evaluated, at least one Be in cell heterozygosity gene loci (such as in fetus be the site of heterozygosity and/or in female parent be heterozygosity position Point), it is contemplated that the allele ratio of the calculating of DNA or RNA, and the overexpression degree specified by hypothesis are evaluated, for Each hypothesis.In some embodiments, a possibility that hypothesis is correct is calculated, by comparing the allele ratio calculated With the allele ratio of prediction, and selected with the hypothesis of maximum likelihood.

In some embodiments, the expected distribution of a test statistics, utilizes the allele ratio and meter of prediction The ratio of the DNA or RNA of calculation are calculated, are assessed, for each hypothesis.In some embodiments, hypothesis is just Really a possibility that, is determined, and (utilizes the DNA or RNA of the allele ratio and calculating that calculate by comparing test statistics Ratio calculated) and test statistics expected distribution (using the allele ratio of prediction and the DNA of calculating or What the ratio of RNA was calculated), and the hypothesis with maximum likelihood is selected.

In some embodiments, the method includes enumerating one group of one or more hypothesis, the first homologous dyeing is specified The degree of the overexpression of body segment.In some embodiments, the method includes assessments, for each it is assumed that no matter (i) is pre- The allele ratio of survey, the gene loci at least one cell being heterozygosity (such as is heterozygosity in fetus Site and/or in female parent be heterozygosity site), it is contemplated that the specified overexpression degree of that hypothesis, or (ii) are right (allele ratio of prediction is utilized in the expected distribution of the ratio of one or more possible DNA or RNA, test statistics And calculated from possibility ratio of one or more target cell DNA or RNA in sample total DNA or RNA).One In a little embodiments, data fitting is calculated, by comparing the allele ratio of (i) allele ratio calculated and prediction Rate, or (ii) test statistics (are calculated using the allele ratio and DNA of calculating or the possibility ratio of RNA ), the expected distribution with test statistics (is counted using the allele ratio and DNA of prediction or the possibility ratio of RNA It calculates).In some embodiments, one or more hypothesis are sorted, and are fitted according to data, and the highest hypothesis quilt that sorts Selection.In some embodiments, a technology or algorithm, such as a searching algorithm, be used for a step in following steps or Multistep: calculating data fitting, to hypothesis sequence or the highest hypothesis of selected and sorted.In some embodiments, data, which are fitted, is Fitting to β-bi-distribution fitting or to a bi-distribution.In some embodiments, the technology or algorithm It include maximal possibility estimation, MAP estimation, Bayesian Estimation, dynamic estimation (such as Dynamic Bayesian selected from one group Estimation) and expectation maximization estimation set.In some embodiments, the method includes using the technology or algorithm, In the genetic data and desired genetic data of acquisition.

In some embodiments, the method includes creating the subregion of a possible ratio, range is from a lower limit To a upper limit, for ratio of the DNA or RNA from one or more target cells in the total DNA or RNA of sample.? In some embodiments, one group of one or more hypothesis is specified the degree of the overexpression of the first homologous chromosomal segments, is arranged It lifts.In some embodiments, the method includes assessments, for DNA or RNA ratio possible each of in subregion and For each hypothesis, no matter the allele ratio of (i) prediction, for being the gene loci of heterozygosity at least one cell (such as in fetus be the site of heterozygosity and/or in female parent be heterozygosity site), it is contemplated that possible DNA or RNA Ratio and that hypothesis specified by overexpression degree, or the expected distribution of (ii) test statistics utilizes prediction Allele ratio and the ratio of possible DNA or RNA calculated.In some embodiments, the method packet Calculating is included, for DNA or RNA ratio possible each of in subregion and for each hypothesis, a possibility that hypothesis is correct, (meter is utilized by comparing the allele ratio of (i) allele ratio calculated and prediction, or (ii) test statistics What the allele ratio and DNA of calculation or the possibility ratio of RNA were calculated), the expected distribution with test statistics (utilizes What the allele ratio and DNA of prediction or the possibility ratio of RNA were calculated).In some embodiments, for each The combined probability of hypothesis is determined, by combining the probability in the hypothesis of the possible ratio of each of subregion；And there is maximum The hypothesis of join probability is selected.In some embodiments, the combined probability of each hypothesis is determined, by weighting a hypothesis A possibility that probability of (specifically may ratio for one) based on the possible ratio is correct ratio.

In some embodiments, a kind of technology includes maximal possibility estimation, MAP estimation, shellfish selected from one group The set of Ye Si estimation, dynamic estimation (such as Dynamic Bayesian estimation) and expectation maximization estimation is used to estimation and comes from In the ratio in sample total DNA or RNA of DNA or RNA of one or more target cells.In some embodiments, from Ratio of the DNA or RNA of one or more target cells in sample total DNA or RNA be assumed it is identical, for two or more Multiple (or all) interested CNV.In some embodiments, from the DNA or RNA of one or more target cells in sample Ratio in this total DNA or RNA is calculated, for every kind of interested CNV.

Utilize the illustrative methods of incomplete phase data

It should be appreciated that incomplete phase data is used for many embodiments.For example, it may be possible to 100% determination Do not know which allele is present on one or more sites of the first and/or second homologous chromosomal segments.One In a little embodiments, it be used to calculate the probability of each hypothesis to the priori of the possibility haplotype of individual.It can for individual The priori (such as on the basis of haplotype of the group based on Haplotype frequencies) of energy haplotype is for calculating the general of each hypothesis Rate.In some embodiments, the priori of possible haplotype is adjusted, by using another method come phase genetic data or Information (is used for improve population data by using the phase data from other subjects (such as priori subject) Learn), the phase data based on individual.

In some embodiments, the phase genetic data includes probability data, for two or more possible phases Position genetic data set, wherein each possible phase data set includes the polymorphic site on the first homologous chromosomal segments In the possible identity of allele existing at each gene loci in set and the second homologous chromosomal segments The possible identity of allele existing at each gene loci in polymorphic position point set.In some embodiments In, the probability of at least one hypothesis is determined, for set possible for each of phase genetic data.In some embodiment party In case, it is assumed that combined probability be determined, by combine hypothesis probability, for each possible collection of phase genetic data For conjunction；And the hypothesis with greatest combined probability is selected.

Any method disclosed herein or any known method can be used to generate incomplete phase data (such as benefit Gone to infer most probable phase with the group based on Haplotype frequencies), in the method for being stated.In some embodiments In, phase data is obtained, by the haplotype for combining smaller fragment probabilityly.For example, it may be possible to haplotype can be true It is fixed, according to a haplotype from first area and another list in another region of same chromosome The possibility combination of times type.Probability from the specific haplotype of different zones be it is identical, it is bigger on same chromosome Haplotype section can be determined, and be utilized, for example, the known weight between group and/or different zones based on Haplotype frequencies Group rate.

In some embodiments, the exclusion test of single hypothesis is used for the null hypothesis of diploidy.In some implementations In scheme, the probability of disomy hypothesis is calculated, and the hypothesis of disomy is excluded, if probability is lower than given threshold value (being, for example, less than one thousandth).If null hypothesis is excluded, this may be due in incomplete phase data mistake or by In the presence of a CNV.In some embodiments, more accurate phase data is obtained (such as from disclosed herein To obtain the phase data that any molecule phase method of actual phase data obtains, rather than inferred based on bioinformatics Phase data).In some embodiments, the probability of disomy hypothesis is recalculated, the more accurate phase data of utilization with Determine whether that two-body hypothesis still should be excluded.The exclusion of the hypothesis shows the repetition of chromosome segment or missing is to exist 's.If desired, false positive rate can be changed, by adjusting threshold value.

The further exemplary implementation scheme of ploidy is determined using phase data

In illustrative embodiment, provide a method here, for determining the ploidy of a chromosome segment, In individual specimen.Method includes the following steps:

A. gene frequency data, the amount including each allele present in sample, in chromosome segment are received On one group of Genetic polymorphism site in each gene loci at；

B. phase allelic information is generated, for one group of Genetic polymorphism site, by assessing gene frequency number According to phase；

C. the individual probability for generating gene frequency utilizes the Genetic polymorphism site under Different Ploidy state Gene frequency data；

D. the joint probability for generating one group of Genetic polymorphism site uses individual probability and phase allelic information；With And

E. it selects, is based on joint probability, a best fit model indicates ploidy, thereby determines that chromosome segment Ploidy.

As disclosed herein, gene frequency data (the also referred to as genetic alleles data of measurement herein) It can be generated, the method by being known in the art.For example, data can be generated, qPCR or microarray are utilized.One In a illustrative embodiment, data are generated, and utilize nucleic acid sequence data, especially high-throughput nucleic acid sequence data.

In certain illustrative examples, gene frequency data are corrected error, be used to generate at it individual general Before rate.In specific illustrative embodiment, the error of correction includes amplified allele efficiency variation.In other implementations In scheme, the error of correction includes environmental pollution and genotype pollution.In some embodiments, the error of correction includes equipotential Gene magnification deviation, environmental pollution and genotype pollution.

In certain embodiments, individual probability is generated, and using a group model, which has Different Ploidy state and wait Position gene imbalance score, for one group of Genetic polymorphism site.In these embodiments and other embodiments, joint Probability is generated, chain between Genetic polymorphism site on chromosome segment by considering.

Therefore, it (is combined with some in these embodiments) in one illustrative embodiment, provided herein is A kind of method, for detecting the ploidy in individual specimen comprising following steps:

A. the nucleic acid sequence data of allele, one group of Genetic polymorphism site in individual chromosome segment are received Place；

B. the gene frequency at one group of gene loci is detected, the nucleic acid sequence data is utilized；

C. amplified allele efficiency variation is corrected, in gene frequency detected, to generate the equipotential of correction Gene frequency, for one group of Genetic polymorphism site；

D. phase allelic information is generated, for one group of polymorphic site, by estimating the nucleic acid sequence According to phase；

E. the individual probability for generating gene frequency leads to for the Genetic polymorphism site of Different Ploidy state The gene frequency for comparing correction is crossed, and has Different Ploidy state and allele uneven on one group of Genetic polymorphism site One group model of weighing apparatus ratio；

F. joint probability is generated, for one group of Genetic polymorphism site, by combining individual probability, it is contemplated that dyeing It is chain between Genetic polymorphism site in body segment；And

G. it selects, is based on the joint probability, indicates the best fit model of chromosomal aneuploidy.

As disclosed herein, individual probability can be generated, and using a group model or hypothesis, there is different ploidies State peace allele imbalance score, for one group of Genetic polymorphism locus.For example, particularly showing at one Example property example in, individual probability is generated, by simulate chromosome segment the first homologue and chromosome segment second The ploidy state of homologue.The ploidy state being modeled include the following:

(1) all cells all not no missings or amplification of the first homologue of chromosome segment or the second homologue；

(2) at least some cells have the missing of the first homologue of chromosome segment or the amplification of the second homologue；With

(3) at least some cells have the missing of the second homologue of chromosome segment or the amplification of the first homologue.

It should be appreciated that above-mentioned model can also be referred to as the hypothesis for restricted model.Therefore, above to prove 3 hypothesis It can be used.

The average allele imbalance score of modeling may include that the average allele of any range is uneven, packet The actual average allele for including chromosome segment is uneven.For example, modeling is averaged in certain illustrative embodiments The unbalanced range of allele can 0,0.1,0.2,0.25,0.3,0.4,0.5,0.6,0.75,1,2,2.5,3,4 and 5% lower limit and 1,2,2.5,3,4,5,10,15,20,25,30,40,50,60,70,80,90,95 and 99% the upper limit it Between.The section of modeling with range can be any section, depending on the computing capability that uses and be allowed for analysis when Between.For example, 0.01,0.05,0.02 or 0.1 section can be modeled.

In certain illustrative embodiments, sample have chromosome segment average allele imbalance between Between 0.4% to 5%.In certain embodiments, average allele imbalance is low.In these embodiments, it puts down Equal allele imbalance is usually less than 10%.In certain illustrative embodiments, allele imbalance between 0.25, 0.3,0.4,0.5,0.6,0.75,1,2,2.5,3,4 and 5% lower limit and 1,2,2.5,3,4 and 5% the upper limit.At it In its illustrative embodiment, average allele imbalance between 0.4,0.45,0.5,0.6,0.7,0.8,0.9 or 1.0% lower limit and 0.5,0.6,0.7,0.8,0.9,1.0,1.5,2.0,3.0,4.0 or 5.0% the upper limit.For example, sample Average allele it is uneven, in an illustrative example, between 0.45 and 2.5%.In another example, Average allele imbalance is detected, with 0.45,0.5,0.6,0.8,0.8,0.9 or 1.0 sensitivity.In the method for the present invention In one have in the unbalanced exemplary sample of hypomorph, including from the cancer with Circulating tumor DNA The plasma sample of body, or the plasma sample from the pregnant female with circulation foetal DNA.

It should be appreciated that the ratio of abnormal DNA is measured, using mutation allele frequency (at gene loci for SNV The mutation allele number/gene loci at allele sum).Due in tumour between the amount of two homologues Difference be it is similar, we measure the ratio of abnormal DNA, and for a CNV, it is uneven to pass through average allele (AAI), it is defined as | (H1-H2) |/(H1+H2), wherein Hi is the average copy number of homologue i in sample, and Hi/ (H1+H2) is The homology of fractional abundance or homologue i.Maximum homology is the homology of richer homologue.

Measurement leakage code rate is the percentage for the single nucleotide polymorphism (SNP) not read, and is estimated using all SNP Meter.Monoallelic missing (ADO) rate be there is only the percentage of the SNP of an allele,

Estimated just with the SNP of heterozygosity.The confidence level of genotype can be determined, by being fitted a binomial The number of readings per taken (it is B- allele reading) being distributed at each SNP, and gone using the ploidy state of the focal zone of SNP Estimate the probability of each genotype.

For tumor tissues sample, the aneuploidy (being illustrated in this section, pass through CNV) of chromosome can be drawn It is fixed, by gene frequency be distributed between conversion.In plasma sample, CNV can be accredited, and pass through a maximum likelihood Algorithm, the plasma C NV in the algorithm search region, tumor sample is from also with the same individual of CNV in this area. The algorithm can simulate desired gene frequency, across all allele unbalance factors in 0.025% section, for three Group hypothesis: (1) all cells are normal (no allele are uneven), and (2) some/all cells are lacked with homologue 1 It loses or homologue 2 expands, or there is (3) some/all cells the missing of homologue 2 or homologue 1 to expand.The possibility of each hypothesis Property can be determined, at each SNP, using a Bayes classifier (expected from all heterozygosity SNP With the β Binomial Model for the gene frequency observed), then the joint possibility between multiple SNP can be calculated, at certain The chain of SNP gene loci is considered in a little illustrative embodiments, as shown here.Then maximum likelihood hypothesis can be chosen It selects.

Consider a chromosomal region (it has average N copy in tumour), and c is enabled to indicate DNA points in blood plasma Number is taken from normal and tumour cell mixing, in the region of a disomy.AAI is calculated as follows:

In certain illustrative examples, gene frequency data are corrected error, be used to generate at it individual general Before rate.Different types of error and/or deviation correction are disclosed herein.In specific illustrative embodiment, The error of correction is amplified allele efficiency variation.In other embodiments, the error of correction includes environmental pollution and base Because type pollutes.In some embodiments, the error of correction includes amplified allele deviation, and environmental pollution and genotype are dirty Dye.

It should be appreciated that amplified allele efficiency variation can be determined, for an allele, as one Experiment or laboratory testing (including to test sample) a part or it can be determined in a different time, benefit With one group of sample (including efficiency is just in allele calculated).Environmental pollution and genotype pollution is typically determined, with In the identical operation that test sample is analyzed.

In certain embodiments, environmental pollution and genotype pollution are determined, for the equipotential of the homozygosity in sample For gene.It should be appreciated that some gene locis in sample will be heterozygosity for any given sample from individual , other gene locis will be homozygosity, even if a gene loci is selected to analyze, because it has in group There is relatively high heterozygosity.It is advantageous, in some embodiments, although the ploidy of a chromosome segment may be by Determining, by the heterozygous genes site of individual, homozygosity gene loci can be used to calculate environment and genotype pollution.

In certain illustrative examples, selection is carried out, by the equipotential for analyzing phase allelic information and estimation The size of difference between gene frequency (being generated for model).

In the illustrated example, the individual probability of gene frequency is generated, based on a β Binomial Model (at this Organize the expected and gene frequency observed at Genetic polymorphism site).In the illustrated example, individual probability It is generated, utilizes Bayes classifier.

In certain illustrative embodiments, nucleic acid sequence data is generated, right by carrying out high-throughput DNA sequencing A series of multiple copies (being generated using multiplex amplification reaction) of amplicons, wherein each amplification in this series of amplicon Son crosses at least one Genetic polymorphism site (in the set of the Genetic polymorphism site), and wherein every in the set A polymerization gene loci is all amplified.In certain embodiments, multiplex amplification reaction (draw restricted Under the conditions of object) at least 12 reactions.In some embodiments, restricted primer concentration is used for the 1/10 of multiple reaction, 1/ 5,1/4,1/3,1/2 or all reactions in.

It provided herein is the factors for considering to realize restricted primer condition in amplified reaction such as PCR.

In certain embodiments, method provided herein detects ploidy, multiple on a plurality of chromosome for spanning Chromosome segment.Therefore, ploidy is determined in these embodiments, for the group chromosome segment in sample. For these embodiments, higher multiplex amplification reaction is required.Therefore, for these embodiments, multiplex amplification Reaction may include, such as 2,500 to 50,000 multiple reaction.In certain embodiments, following range of multiple reaction quilt It carries out: in range between 100,200,250,500,1000,2500,5000,10,000,20,000,25000,50000 Lower limit, and between 200,250,500,1000,2500,5000,10,000,20,000,25000,50000 and 100,000 it Between range the upper limit.

In illustrative embodiment, Genetic polymorphism site set is one group of gene of the known high heterozygosity of display Site.However, it is expected for any given individual, some in these gene locis will be homozygosity.At certain In a little illustrative embodiments, method of the invention utilizes nucleic acid sequence information, for the homozygosity and heterozygosis of an individual The gene loci of property.The homozygosity gene loci of an individual is used, for example, it is used for error correction, and heterozygous genes position Point is used for determining that the allele of sample is uneven.In certain embodiments, at least 10% Genetic polymorphism site is The gene loci of heterozygosity, for individual.

As disclosed herein, Preference is presented, and is the target SNP base of heterozygosity for analyzing known in group Because of site.Therefore, in certain embodiments, Genetic polymorphism site is selected, wherein at least 10,20,25,50,75,80, 90,95,99 or 100% Genetic polymorphism site is known to be heterozygosity, in group.

As disclosed herein, in certain embodiments, sample comes from the plasma sample of a pregnant female.

In some instances, the method further includes executing the method, there is known average equipotential base at one Because on the check sample of uneven ratio.Control can have an average allele imbalance ratio, specific for one Allele status, indicate the aneuploidy of chromosome segment, be 0.4 to 10% between, with one etc. in analog sample The average allele of position gene (existing for low concentration) is uneven, such as one from fetus or from tumour In circulation dissociative DNA desired by.

In some embodiments, PlasmArt is compareed, and as disclosed herein, is used as compareing.Therefore, certain Aspect, control are a samples, are generated by a kind of method, and this method includes showing chromosomal aneuploidy for known Sample of nucleic acid is cracked into segment, simulates the size of the DNA fragmentation recycled in individual blood plasma.In some aspects, control is used, That control is the chromosome segment of not aneuploid.

In illustrative embodiment, the data from one or more control can be divided in the method Analysis connects the same test sample.For example, control may include a different sample, contain dyeing from not under a cloud The individual of body aneuploid or a sample under a cloud containing CNV or chromosomal aneuploidy.For example, working as test sample It is when suspecting containing the plasma sample for recycling free Tumour DNA, the method can also be used, for one from tested The check sample of the tumour of person, together with its plasma sample.As disclosed herein, check sample can be produced, by splitting The DNA sample of chromosome aneuploid is shown known to solution.This cracking can produce DNA sample, simulate an apoptotic cell DNA composition, especially when sample from suffer from cancer individual when.Data from check sample will increase chromosome The confidence level of the detection of aneuploid.

In the certain embodiments for the method for determining ploidy, sample comes from the individual under a cloud with cancer Plasma sample.In these embodiments, the method further includes determinations, based on selection, if copy number, which changes, is It is existing, in the tumour cell of individual.For these embodiments, sample can be the plasma sample from individual.It is right In these embodiments, the method may further include determination, be based on the selection, if cancer be it is existing, in institute It states in individual.

For determining these implementation methods of the ploidy of chromosome segment, one mononucleotide of detection may further include Variation, on a single nucleotide variations site in one group of single nucleotide variations site set, wherein detecting that chromosome is non- Euploid or mononucleotide variant or both show the presence of circulating tumor nucleic acid in sample.

These embodiments may further include the haplotype information for receiving the chromosome segment of individual tumors, and benefit It is gone to generate the model set with the haplotype information, these models have different ploidy states and Genetic polymorphism site Allele imbalance ratio at set.

As disclosed herein, it is abnormal to determine that certain embodiments of the method for ploidy may further include removal Value, from initial or correction gene frequency data, in relatively more initial or correction gene frequency and the group model Before.For example, in certain embodiments, gene frequency, at least 2 or 3 standard deviations are higher or lower than chromosome The gene loci of other gene loci average values, is removed from data in segment,

For before modeling.

As noted herein, it should be understood that for many embodiments provided herein, including those are used to determine The ploidy of chromosome segment, incomplete or complete phase data is preferably used.It is also understood that it provided herein is one A little features provide the improvement for the existing method for detecting ploidy, and many different combinations of these features It can be used.

In certain embodiments, as shown in Figure 69-70, it provided herein is readable Jie of computer system and computer Matter goes to execute any method of the invention.These include system and computer-readable medium, for executing the side of determining ploidy Method.Therefore, it as the non-limitative example of system implementation plan, goes to prove that any method provided herein can be performed, Using system disclosed herein and computer-readable medium, on the other hand, it provided herein is a kind of systems, for detecting dye Colour solid ploidy, in individual sample, the system comprises:

A. an input processor is configured as receiving gene frequency data, including present in sample each The amount of allele, at the gene loci of each of one group of Genetic polymorphism site on chromosome segment；

B. a modeling device, is configured as:

I. phase allelic information is generated, for the set in Genetic polymorphism site, by estimating gene frequency The phase of data；With

Ii. the individual probability for generating gene frequency utilizes the Genetic polymorphism site under Different Ploidy state The gene frequency data；With

Iii. joint probability is generated, Genetic polymorphism site is gathered, utilizes the individual probability and the phase Allelic information；With

C. a hypothesis manager is configured to select, and is based on the joint probability, and one indicates ploidy Best fit model, so that it is determined that the ploidy of chromosome segment.

In certain embodiments of the system implementation plan, gene frequency data are generated by nucleic acid sequencing system Data.In certain embodiments, the system further comprises an error correction unit, is configured to correction equipotential Mistake in gene frequency data is used to generate individual probability wherein the gene frequency data of the correction are modeled device. In certain embodiments, error correction unit connects amplified allele efficiency variation.In certain embodiments, modeling device Individual probability is generated, there is on polymorphic loci the mould of Different Ploidy state and allele imbalance ratio using one group Type.Modeling device generates joint probability in certain illustrative embodiments, by considering polymorphism base on chromosome segment Because chain between site.

In one illustrative embodiment, it provided herein is a system, is dyed for detecting in individual specimen The ploidy of body comprising following:

A. an input processor is configured as receiving the nucleic acid sequence data of allele, in individual chromosome piece At one group of Genetic polymorphism site in section, and gene frequency is detected in this group of gene loci, utilizes the nucleic acid sequence Column data；

B. an error correction unit, is configured to correction error, in gene frequency detected, and generates correction Gene frequency, for one group of Genetic polymorphism site；

C. a modeling device, is configured as:

I. phase allelic information is generated, for the set in Genetic polymorphism site, by estimating the nucleic acid sequence The phase of data；With

Ii. the individual probability for generating gene frequency passes through the Genetic polymorphism site under Different Ploidy state Compare the phase allelic information and one group has Different Ploidy state and equipotential base at the set of Genetic polymorphism site Because of the model set of uneven ratio；With

Iii. joint probability is generated, Genetic polymorphism site is gathered, (considers chromosome piece by combining individual probability Relative distance in section between Genetic polymorphism site)；With

D. a hypothesis manager is configured to select, and is based on the joint probability, an instruction chromosome aneuploidy The best fit model of property.

In certain exemplary system embodiments provided herein, Genetic polymorphism site set includes 1000 to 50, 000 Genetic polymorphism site.In certain exemplary system embodiments provided herein, Genetic polymorphism site set packet Include 100 known heterozygosity hot spot gene locis.In certain exemplary system embodiments provided herein, polymorphism base Because site set include 100 gene locis, at the 0.5kb of recombination hotspot or within.

In certain exemplary system embodiments provided herein, best fit model analyzes following ploidy state, Second homologue of the first homologue and chromosome segment to chromosome segment:

(2) some or all of cells have the missing of the first homologue of chromosome segment or the amplification of the second homologue； With

(3) some or all of cells have the missing of the second homologue of chromosome segment or the amplification of the first homologue.

In certain exemplary system embodiments provided herein, the error of correction includes that amplified allele efficiency is inclined Difference, pollution, and/or sequencing error.In certain exemplary system embodiments provided herein, pollution include environmental pollution and Genotype pollution.In certain exemplary system embodiments provided herein, environmental pollution and genotype pollution are determined, right In homozygosity allele.

In certain exemplary system embodiments provided herein, hypothesis manager is configured to analysis phase equipotential base Because of the size of the difference between information and the estimation gene frequency generated for model.In certain exemplary systems provided herein In embodiment of uniting, modeling device generates the individual probability of gene frequency, based on gene frequency that is expected and observing A β binomial model, at the set of Genetic polymorphism site.In certain Exemplary System Embodiments provided herein, Modeling device generates individual probability, utilizes a Bayes classifier.

In certain exemplary system embodiments provided herein, nucleic acid sequence data is generated, by executing high pass DNA sequencing is measured, to the multiple copies for using a series of amplicons caused by multiplex amplification reaction, wherein in the series amplicon Each amplicon span at least one Genetic polymorphism site in the set of the Genetic polymorphism site, and wherein institute The each polymerization gene loci for stating set is amplified.In certain exemplary system embodiments provided herein, wherein more Weight amplified reaction has carried out at least 12 reactions (under the conditions of restricted primer).It is real in certain exemplary systems provided herein It applies in scheme, wherein sample has an average allele imbalance between 0.4% to 5%.

In certain exemplary system embodiments provided herein, sample, which comes from, suspects the individual with cancer Plasma sample, and the hypothesis manager is further configured and determines, is based on best fit model, if copy number variation Be it is existing, in the cell of a tumour of the individual.

In certain exemplary system embodiments provided herein, sample comes from the plasma sample of individual, and The hypothesis manager, which is further configured, to be determined, best fit model is based on, and cancer is present in individual.At these In embodiment, the hypothesis manager can be further configured one single nucleotide variations of detection, become in mononucleotide At a single nucleotide variations site in dystopy point set, wherein detecting that item chromosome aneuploid or mononucleotide become Both exclusive or indicate the presence of circulating tumor nucleic acid in sample.

In certain exemplary system embodiments provided herein, the input processor, which is further configured, to be received The haplotype information of the chromosome segment of individual tumors, and the modeling device is configured to produce using the haplotype information Raw model set, has different ploidy states and allele imbalance ratio at the set of Genetic polymorphism site.

In certain exemplary system embodiments provided herein, modeling device generates model, in allele imbalance Ratio ranges from 0% to 25% between.

It should be appreciated that any method provided herein can be performed by computer-readable code, it is stored in non- On provisional computer-readable medium.Therefore, in one embodiment, it provided herein is a kind of non-transitory computers Readable medium is held for detecting the ploidy in individual specimen, including computer-readable code when by a processing unit When row, so that processing unit:

A. gene frequency data, the amount including each allele present in sample, in the chromosome are received At each gene loci in segment in one group of Genetic polymorphism site；

B. phase allelic information is generated, for one group of Genetic polymorphism site, by estimating gene frequency number According to；

C. the individual probability for generating gene frequency utilizes the Genetic polymorphism site under Different Ploidy state The gene frequency data；

D. joint probability is generated, for one group of Genetic polymorphism site, utilizes the individual probability and the phase etc. Position gene information；With

E. it selects, is based on the joint probability, a best fit model indicates ploidy, so that it is determined that dyeing The ploidy of body segment.

In the embodiment of certain computer-readable mediums, gene frequency data are generated, from nucleic acid sequence In.The embodiment of certain computer-readable mediums further comprises correction error, in gene frequency data, and Individual probability step is generated using the gene frequency data of the correction.In the embodiment party of certain computer-readable mediums In case, the error of correction is amplified allele efficiency variation.In the embodiment of certain computer-readable mediums, individual is general Rate is generated, using one group at the set of Genetic polymorphism site with Different Ploidy state and allele imbalance ratio Model.In the embodiment of certain computer-readable mediums, joint probability is generated, polymorphic on chromosome segment by considering It is chain between property gene loci.

In a specific embodiment, it provided herein is the computer-readable mediums of a non-transitory, are used for The ploidy in individual specimen, including computer-readable code are detected, when being executed by a processing unit, so that processing Device:

A. the nucleic acid sequence data of allele, one group of Genetic polymorphism on the chromosome segment of the individual are received At site；

B. gene frequency is detected, at the gene loci set, utilizes the nucleic acid sequence data；

C. amplified allele efficiency variation is corrected, in gene frequency detected, to generate the equipotential of correction Gene frequency gathers the Genetic polymorphism site；

D. phase allelic information is generated, the Genetic polymorphism site is gathered, by estimating nucleic acid sequence The phase of data；

E. the individual probability for generating gene frequency passes through the Genetic polymorphism site under Different Ploidy state Compare correction gene frequency and one group at the set of Genetic polymorphism site have Different Ploidy state and allele The model of uneven ratio；

F. joint probability is generated, the set in the Genetic polymorphism site is examined by combining the individual probability Consider chain between the Genetic polymorphism site on the chromosome segment；With

G. it selects, is based on the joint probability, best fit model indicates the aneuploid of chromosome.

In the embodiment of certain illustrative computer-readable mediums, selection is carried out, and passes through analysis phase etc. The size of difference between position gene information and the gene frequency of estimation (for caused by model).

In the embodiment of certain illustrative computer-readable mediums, the individual probability of gene frequency is given birth to At based on the set expected from one and the β binomial model of gene frequency observed, in Genetic polymorphism site Place.

It should be appreciated that the embodiment of any method provided herein can be performed, it is stored in by execution non- Code on provisional computer-readable medium.

For detecting the illustrative embodiment of cancer

In some aspects, the present invention provides a kind of methods, for detecting cancer.Sample, it will accordingly be understood that be a tumour Sample or liquid sample, such as blood plasma suffer from the individual of cancer from suspection.The method be it is particularly effective, detecting Genetic mutation such as single nucleotide alteration (such as SNV) or copy number change (such as CNV), in these genetic changes with low water In flat existing sample, a part as sample total DNA.Therefore, the sensitivity of the DNA or RNA of cancer in sample are detected It is special.The method can combine any or all of improvement provided herein, for detecting CNV and SNV to realize this The special sensitivity of kind.

Therefore, it is a kind of method in certain embodiments provided herein, is used to determine whether that circulating tumor nucleic acid exists In individual specimen and the computer-readable medium of a non-transitory includes computer-readable code, when by processing equipment When execution, when being executed by a processing unit.It the described method comprises the following steps:

C. one group of Genetic polymorphism site for analyzing the sample to determine ploidy, in the individual chromosome segment Place；With

D. it determines the unbalanced level of average allele being present at Genetic polymorphism site, is measured based on ploidy, Wherein be averaged allele imbalance be equal to or more than 0.4%, 0.45%, 0.5%, 0.6%, 0.7%, 0.75%, 0.8%, 0.9% or 1% indicates that there are circulating tumor nucleic acid, such as ctDNA, in the sample.

In certain illustrative examples, an average allele imbalance is greater than 0.4,0.45 or 0.5% instruction The presence of ctDNA.In certain embodiments, the method be used to determine whether circulating tumor nucleic acid be it is existing, further Including detecting single nucleotide variations, at a single nucleotide variations site in the set of single nucleotide variations site, wherein examining It measures once an allele imbalance is equal to or more than 0.5, or detects single nucleotide variations or both, indicates in sample The presence of circulating tumor nucleic acid.It should be appreciated that is provided can be used for really for detecting ploidy or any method of CNV Determine the unbalanced level of allele, it is uneven to be typically expressed as average allele.It should be appreciated that provided herein for examining Any method for surveying SNV can be used to detect single nucleotide acid (in terms of this of the invention).

In certain embodiments, it is used to determine whether method existing for circulating tumor nucleic acid, further comprises carrying out institute Method is stated, on a check sample with known average allele imbalance ratio.Control, for example, it may be coming from In the sample of individual tumors.In some embodiments, it is uneven to compare average allele expected from there is one, for institute State analysis sample.For example, AAI is between 0.5% and 5% or average allele imbalance ratio is 0.5%.

In certain embodiments, the analytical procedure being used to determine whether in method existing for circulating tumor nucleic acid, including Analyze one group of known chromosome segment that aneuploid is shown as in cancer.In certain embodiments, it is used to determine whether Analytical procedure in method existing for circulating tumor nucleic acid, including analysis multiple are 1,000 to 50, between 000 or 100 to 1000 Genetic polymorphism site.In certain embodiments, the analysis being used to determine whether in method existing for circulating tumor nucleic acid Step, including the single nucleotide variations site between analysis 100 to 1000.For example, in these embodiments, analytical procedure It may include carrying out a multiplex PCR to span 1000 to 50,000 polymerization sites and 100 to 1000 monokaryon glycosides to expand The amplicon of sour variant sites.The multiple reaction can be set to individually reaction or the multiple reaction as different subsets Library.Multiple reaction method provided herein, such as extensive multiplex PCR disclosed herein, provide a kind of example process, use In carry out amplification reaction with help realize it is improved multiplexing and therefore, level of sensitivity.

In certain embodiments, multi-PRC reaction carried out at least 10% (under the conditions of restricted primer), 20%, 25%, 50%, 75%, 90%, 95%, 98%, 99% or 100% reaction.Improved condition is (for carrying out herein The extensive multiple reaction provided) it can be used.

In some aspects, it is used to determine whether that circulating tumor nucleic acid is present in the above method in individual specimen and its institute Some embodiments can be carried out with a system.Present disclose provides guidance, about the concrete function for executing this method and Structure feature.As a non-limitative example, the system comprises following:

A. an input processor is configured to analyze the data from the sample, to contaminate in the determination individual The ploidy in one group of Genetic polymorphism site on chromosome fragment；With

B. a modeling device is configured to determine the unbalanced level of (at Genetic polymorphism site) allele, It is measured based on ploidy, allelic imbalance is equal to or more than 0.5% presence for indicating circulation.

For detecting the illustrative embodiment of single nucleotide variations

In some aspects, it provided herein is methods, for detecting the single nucleotide variations in sample.It is provided herein to change It can achieve detection into method and be limited to 0.015,0.017,0.02,0.05,0.1,0.2,0.3,0.4 or 0.5%SNV, in sample In.All embodiments for detecting SNV,It can be carried out with a system.Present disclose provides guidances, should about executing The concrete function and structure feature of method.In addition, it provided herein is some embodiments comprising the meter of a non-transitory Calculation machine readable medium (including computer-readable code), when it is executed by a processing unit, so that processing unit goes to hold Row the method, to detect SNV provided herein.

Therefore, in one embodiment, it provided herein is a kind of method, it is used to determine whether that single nucleotide acid makes a variation It is present on one in individual specimen group of genomic locations, which comprises

A. for each genomic locations, an efficiency estimation and every cyclic error rate are generated, for described in a leap For the amplicon of genomic locations, training dataset is utilized；

B. the nucleotide identity information observed is received, for each genomic locations in the sample；

C. the probability for determining one group of single nucleotide variations rate, it is true from the one or more from each genome location In real mutation, by independently comparing the nucleotide identity information and a different change observed in each genome location The model of different rate, amplification efficiency and every cyclic error rate using estimation, for each genomic locations；With

D. most probable true aberration rate and confidence level are determined, from the Making by Probability Sets of each genome location.

In the illustrative embodiment for being used to determine whether method existing for single nucleotide variations, efficiency and often follow The estimation of ring error rate is generated, and the amplicon of genomic locations is spanned for one group.For example, 2,3,4,5,10,15,20, 25,50,100 or more span genomic locations amplicon can be included.

In the illustrative embodiment for being used to determine whether method existing for single nucleotide variations, the nucleosides observed Sour identity information includes the total indicator reading number (for each genomic locations) observed and the variation observed etc. Position gene reads number (for each genomic locations).

In the illustrative embodiment for being used to determine whether method existing for mononucleotide variant, sample is blood plasma sample This, single nucleotide variations are present in the Circulating tumor DNA of sample.

In another embodiment, it provided herein is a kind of methods, for estimating from a in the sample of individual The percentage of existing single nucleotide variations.It the described method comprises the following steps:

A. on one group of genomic locations, the efficiency generated across one or more amplicons of those genomic locations is estimated Meter and each loop error rate, utilize training dataset；

C. the average value and variance for generating estimation, for the sum, background error molecule and true mutating molecule of molecule, For a search space comprising the initial percentage of true mutating molecule utilizes the amplification efficiency of amplicon and every Loop error rate；With

D. determine that percentage (from what is be really mutated) in the sample occur in single nucleotide variations, by determining one The percentage of most probable true single nucleotide variations is (by the estimation for the nucleotide identity information observed in fitting sample The distribution of average value and variance determines).

In the illustrative example of the method (percentage for mononucleotide variant present in sample estimates), Sample is plasma sample, and single nucleotide variations are present in the Circulating tumor DNA of sample.

The training dataset of the embodiment of the invention is generally included from one or preferably a set of healthy individuals Sample.In certain illustrative embodiments, the training dataset is analyzed, with one or more test sample phases In same date or even identical operation.For example, from 2,3,4,5,10,15,20,25,30,36,48,96,100, 192, the sample of 200,250,500,1000 or more healthy individuals can be used to generate training dataset.When data can be used for When greater amount of healthy individuals, such as when 96 or more, the confidence level of amplification efficiency estimation is increased, even if executing survey Operation is executed before the method for sample sheet.PCR error rate can use nucleic acid sequence information (not only in SNV home position And for caused by the entire amplification region around SNV), because error rate is each amplicon.For example, using from 50 individual samples are simultaneously sequenced the amplicon of 20 base-pairs around SNV, from the mistake of 1000 bases reading Difference frequency data can be used to determine error frequency.

In general, amplification efficiency is estimated, by estimated mean value and standard deviation, the amplification of an amplified fragments is imitated For rate, a distributed model, such as a bi-distribution or a β bi-distribution are then fitted it into.Error rate is true Fixed, for a PCR reaction with known recurring number, then the error rate of every circulation is estimated.

In certain illustrative embodiments, estimate that the starting molecule of test data set further comprises updating test number According to the estimation of the efficiency of collection, using the starting molecule number estimated in step (b), if it is observed that reading number it is dramatically different In the number of readings per taken of estimation.Then, estimation can be updated, the efficiency new for one and/or starting molecule.

It is used to estimate the search space of the sum of molecule, background error molecule and true mutating molecule, may include one It is a from 0.1%, 0.2%, 0.25%, 0.5%, 1%, 2.5%, 5%, 10%, 15%, 20% or 25% lower limit and 1%, 2%, the base copy of 2.5%, 5%, 10%, 12.5%, 15%, 20%, 25%, 50%, 75%, 90% or 95% upper limit Several search spaces is used as SNV base at a position SNV.Lower range, 0.1%, 0.2%, 0.25%, 0.5% or 1% lower limit and 1%, 2%, 2.5%, 5%, 10%, 12.5% or 15% the upper limit, plasma sample can be used for Illustrative examples of implementation, wherein the method is detection Circulating tumor DNA.Higher range is used for tumor sample.

One is distributed overall error molecule (the background error and true mutation) number being fit in total molecule, to calculate seemingly Right property or probability, true mutation possible for each of search space.This distribution can be a bi-distribution or one A β bi-distribution.

Most probable true mutation is determined, and by the most probable true mutation percentage of determination and calculates confidence level, Utilize the data from fitting distribution.As an illustrative example, and it is not intended to be limited to mentioning herein for clinical interpretation The method of confession, if average mutation rate is high, the percentage confidence level (needing to make positive detection) of a SNV is low.For example, If the average mutation rate (using most probable hypothesis) of a SNV is 5% in sample, percentage confidence level is 99%, then one The result of a positive SNV can be generated.On the other hand, for the illustrative example, if the average of SNV dashes forward in sample Variability (using most probable hypothesis) is 1%, and percentage confidence level is 50%, then a positive SNV knot in some cases Fruit will not be generated.It should be appreciated that the clinical interpretation of data will be a function, about sensitivity, specificity, prevalence rate with And substitute products availability.

In one illustrative embodiment, sample is Circulating DNA sample, such as a Circulating tumor DNA sample.

In another embodiment, it provided herein is a kind of methods, for detecting in individual test sample One or more single nucleotide variations.According to the method for the present embodiment, comprising the following steps:

D. an intermediate value variation gene frequency is determined, to each normal individual in multiple normal individuals Multiple normal control samples, for each single nucleotide variations position in single nucleotide variations location sets, based on sequencing fortune It is being generated in row as a result, to determine the single nucleotide variations position of selection with the variant having in normal sample lower than threshold value Intermediate value gene frequency, and go to determine background error, (each monokaryon glycosides is being removed for each single nucleotide variations position After the exceptional value sample of sour variant position)；

E. the depth for determining the reading weighted average and variance that one is observed, for the selected list of test sample Nucleotide variants position, based on generated data in the sequencing operation to test sample；With

F. identify that one or more single nucleotide variations position (adds with the significant reading of statistics using a computer Weight average depth, compared with the background error of the position), to detect one or more single nucleotide variations.

In certain embodiments of this method for detecting one or more SNV, sample is plasma sample, control Sample is plasma sample, and the one or more single nucleotide variations detected are present in the Circulating tumor DNA of sample.It is being used for In the certain embodiments for detecting this method of one or more SNV, multiple check samples include at least 25 samples.At certain In a little illustrative embodiments, multiple check samples are at least 5,10,15,20,25,50,75,100,200 or 250 samples Lower limit and 10,15,20,25,50,75,100,200,250,500 and 1000 samples the upper limit.

In certain embodiments of this method for detecting one or more SNV, exceptional value is removed, Cong Gao In the data generated in flux sequencing operation, depth, and the variance quilt observed are weighted and averaged to calculate the reading observed It determines.In certain embodiments of this method for detecting one or more SNV, each mononucleotide of test sample The reading depth of variable position is at least 100 readings.

In certain embodiments of this method for detecting one or more SNV, sequencing operation includes more than one Weight amplified reaction (being carried out under restricted primer reaction condition).The improvement provided herein for being used to carry out multiplex amplification reaction Method is used to carry out these embodiments, in illustrative examples of implementation.

Without being limited by theory, the method for the present embodiment (utilizes normal blood plasma sample using a background error model This, is sequenced in sequencing operation identical with test sample), to solve operation specific error.With normal intermediate value The make a variation noise position of gene frequency is higher than threshold value, such as > 0.1%, 0.2%, 0.25%, 0.5%, 0.75% He 1.0%, it is removed.

Exceptional value sample removes with being iterated, from the model for considering noise and pollution.For each genomic locus Each base replacement, the standard deviation of the depth and error that read weighted average are calculated.In certain illustrative embodiment party In case, sample, such as tumour or cell free plasma sample, there is at least one threshold value reading at single nucleotide variations position Number, for example, what at least 2,3,4,5,6,7,8,9,10,15,20,25,50,100,250,500 or 1000 variations were read, with And a1Z value is greater than 2.5,5,7.5 or 10 (being directed to the background error model in certain embodiments), is counted as a candidate Mutation.

In certain embodiments, reading depth be greater than 100,250,500,1,000,2000,2500,5000,10,000, 20,000,25,0000,50,000 or 100,000 (in lower ranges) and 2000,2500,5,000,7,500,10, 000,25,000,50,000,100,000,250,000 or 500,000 reading (at the upper limit) is obtained in sequencing operation , for each mononucleotide variant position in one group of single nucleotide variations location sets.In general, sequencing operation is high Flux sequencing operation.The average value or intermediate value generated for test sample, is weighted in illustrative embodiment, passes through reading Number depth.Therefore, a variation allele measurement is true (detects in reading at 1000 times with 1 variant equipotential The sample of gene) a possibility that, it is higher than in being read at 10,000 times and detects the sample with 1 variant allele.Due to The determination of one variation allele (such as mutation) is not 100% believable, and the single nucleotide variations identified can be recognized To be a candidate variation or Candidate Mutant.

Exemplary inspection statistics for phase data analysis

One illustrative inspection statistics is described as follows, for analyzing the phase data from following samples, the sample This

It is known or suspects the aggregate sample containing the DNA or RNA for being derived from the identical cell of two or more non-heredity This.F is enabled to indicate interested DNA or RNA score, such as DNA the or RNA score with a CNV interested, or from sense DNA the or RNA score of interest cell (such as cancer cell).In some embodiments for antenatal detection, f indicates fetus DNA, RNA or cell (in fetus and mother body D NA, RNA or cell mixture) score.Note that this refer to from feel it is emerging The DNA score of interesting cell, it is assumed that two copies of DNA are given by each interested cell.This is different from emerging from sense Interesting cell lacked or repeated fragment at DNA score.

The possible allele value of each SNP is represented as A and B.AA, AB, BA and BB are used to indicate all possibility Orderly allele pair.In some embodiments, the SNP with orderly allele AB or BA is analyzed.Ni is allowed to indicate The sequence reads of i-th of SNP, Ai and Bi respectively indicate the reading of i-th of SNP of instruction allele A and B.Assuming that:

Ni=Ai+Bi

Allele ratio Ri is defined as:

T is allowed to indicate the quantity of target SNP.

Without loss of generality, some embodiments focus on a single chromosome segment.For the sake of further understanding, Phrase " the first homologous chromosomal segments compared with the second homologous chromosomal segments " refers to chromosome segment in the present specification Second homologue of the first homologue and chromosome segment.In some such embodiments, all target SNP include In interested segment chromosome.In other embodiments, multiple chromosome segments have been analyzed possible copy number and have become Change.

MAP (heredity mapping) estimation

This method, by orderly allele, goes the missing or repetition of detection target fragment using the knowledge of phase.It is right In each SNP i, definition

Then it defines

The distribution of Xi and S, various copy number hypothesis (such as two-body hypothesis, the deletion hypothesis of first or second homologue, Or the repetition hypothesis of first or second homologue) under be described as follows.

Two-body hypothesis

Under the hypothesis that target fragment is not lacked or replicated,

Wherein,

If we assume that a constant reading depth N, that give our bi-distribution S, have parameterAnd T.

Deletion hypothesis

Under the hypothesis of the first homologue missing (for example, AB SNP becomes B, BA SNP becomes A), then R i has binomial Distribution, contains parameterIt is used for AB SNP with T, andBASNP is used for T.Therefore,

If we assume that a constant reading depth N, these give a bi-distribution S, have parameterAnd T.

Under the hypothesis of the second homologue missing (for example, AB SNP becomes A, BA SNP becomes B), then Ri has one two Item distribution, contains parameterIt is used for AB SNP with T, andB ASNP is used for T.Therefore,

Repeat hypothesis

Under the hypothesis that the first homologue repeats (for example, AB SNP becomes AAB, and BA SNP becomes BBA), then R i With a bi-distribution, contain parameterIt is used for AB SNP with T, andB ASNP is used for T.Therefore,

Under the hypothesis that the second homologue repeats (for example, AB SNP becomes ABB, BA SNP becomes BAA), then R i has One bi-distribution, contains parameterIt is used for AB SNP with T, andBA SNP is used for T.Therefore,

Classification

As proved in above section, X_iIt is a binary random variables, has

This allows to calculate the probability of test statistics S, under each hypothesis.Provide the probability of each hypothesis of measurement data It can be calculated.In some embodiments, the hypothesis with maximum probability is selected.If desired, the distribution of S can be simple Change,

By the way that each Ni is similar to a constant reading depth N, or by will read depth truncation be one not Variable N.This simplification provides

The value of f can be estimated, and by the most likely value (in the case where giving measurement data) of selection f, such as be given birth to At the value of the f of optimum data fitting, using algorithm (for example, searching algorithm), such as maximal possibility estimation, MAP estimation or Bayesian Estimation.In some embodiments, multiple chromosome segments are analyzed, and the value of f is estimated based on each segment Data.If there are all target cells these to repeat or delete, the estimated value (data based on these different fragments of f ) it is similar.In some embodiments, f test measures, such as passes through the determining DNA's from cancer cell or RNA Score, based on the methylation differential (hypomethylation or hyper-methylation) between cancer and non-cancerous DNA or RNA.

In some embodiments of the mixing sample of some fetuses and maternal nucleic acids, the value of f is fetus score, i.e. fetus Score in DNA (or RNA) total amount of DNA (or RNA) in the sample.In some embodiments, the fetus score is true It is fixed, by obtain from maternal blood sample (or part thereof) genotype data, for one at least one chromosome For group Genetic polymorphism site, being expected in mother and fetus is all two-body；Create multiple hypothesis, each hypothesis pair It should be in different possibility fetus scores, on the chromosome；An expected allele measurement is established in blood sample Model, at the Genetic polymorphism site set on the chromosome, for possible fetus score；Calculate each fetus score One relative probability of hypothesis, using the model and from the allele measurements of the blood sample or part thereof； It determines the fetus score in blood sample, passes through the fetus score for selecting to correspond to the hypothesis with maximum probability.Some In embodiment, fetus score is determined, by identifying those Genetic polymorphism sites, wherein at for Genetic polymorphism site The first allele for female parent be homozygosity, and male parent is (i) for the first allele and the second allele It is heterozygosity, or (ii) is homozygosity for the second allele at the polymorphic locus；And it utilizes The amount (for each identified Genetic polymorphism site) of the second allele detected in blood sample goes to determine institute The fetus score in blood sample is stated (see, e.g., the No.2012/ of the March in 2012 of the U.S. Publication submitted on the 29th 0185176 and 2013 on March 13, U.S. Publications submitted No.2004/0065621, entire contents be used as with reference to text It offers and is cited into herein).

Another method for measuring fetus score includes going to count equipotential base using a high-throughput DNA sequencer Cause, at the genetic site of a large amount of polymorphisms (such as SNP), and mould builds possible fetus score (see, e.g. U.S.'s public affairs The number of opening 2012/0264121, whole to be cited as a reference into herein).Another method for calculating fetus score can be with In " the Noninvasive prenatal detection and selectiv eanalysis of cell- of Sparks et al. Free DNA obtained from maternal blood:evaluation for trisomy 21and trisomy 18, " Am J Obstet Gynecol 2012；It is seen in 206:319.el-9, entire contents are drawn as a reference With enter herein.In some embodiments, fetus score is determined, using a methylation assay (see, e.g. United States Patent (USP) Numbers 7,754,428；7,901,884；With 8,166,382, it is collectively referred to herein respectively as bibliography into herein), assume Certain gene locis are methylation or preferential methylation in fetus, and those identical gene locis are in female parent It does not methylate or does not methylate preferentially.

Figure 1A -13D is chart, it is shown that the distribution of test statistics S is divided by T (quantity of SNP) (" S/T "), for each Kind copy number hypothesis, for various reading depth and tumour score (wherein f is score of the Tumour DNA in total DNA), for more Carry out more SNP.

The exclusion of single hypothesis

F is not dependent on for the distribution of the S of two-body hypothesis.Therefore, the probability of measurement data can be calculated, for two-body Assuming that without calculating f.Single hypothesis, which excludes test, can be used for the null hypothesis of two-body.In some embodiments, two-body hypothesis Under the probability of S calculated, and if two-body hypothesis be excluded probability lower than given threshold value (be, for example, less than 1,000/ One).This shows the repetition an of chromosome segment or missing is existing.If desired, false positive rate can be changed, pass through Adjust threshold value

Illustrative method for phase data analysis

Illustrative method is described below, and being used to analyze from known or suspection is mixing sample (comprising coming From the DNA's or RNA in two or more non-hereditary same cells) sample data.In some embodiments, number of phases According to being used.In some embodiments, the method includes determinations, for the allele ratio of each calculating, if meter The difference that the allele ratio of calculation is above or is lower than at expected allele ratio and a specific gene site is big It is small.In some embodiments, a likelihood distribution is determined, for the equipotential base at a gene loci of specific hypothesis Because of ratio, and the allele ratio calculated, closer to the center of likelihood distribution, the more possible hypothesis is correct.? In some embodiments, the method includes determining an a possibility that hypothesis is correct, for each gene loci.? In some embodiments, the method includes determining an a possibility that hypothesis is correct, for each gene loci, and And the probability of the hypothesis at each gene loci of combination, the hypothesis with greatest combined probability are selected.In some embodiment party In case, the method includes determining an a possibility that hypothesis is correct, for each gene loci and for from one The DNA or RNA of a or multiple target cells are in the possible ratio of each of sample total DNA or RNA.In some embodiments, right It is determined in a combined probability of each hypothesis, by combining the probability of that hypothesis at each gene loci and each Possible ratio, the hypothesis with greatest combined probability are selected.

In one embodiment, following hypothesis is considered: H11 (all cells are normal), (exist only has H10 The cell of homologue 1, therefore homologue 2 lacks), (there is cell only with homologue 2, therefore homologue 1 lacks) in H01, H21 (exists with the duplicate cell of homologue 1), and H12 (exists with the duplicate cell of homologue 2).For example for target cell Cancer cell or chimeric cell a segment f (or from target cell DNA or RNA segment), heterozygosity (AB or BA) SNP Expection allele ratio can be found, it is as follows:

Equation (1):

R (AB, H₁₁)=r (BA, H₁₁)=0.5,

Deviation, pollution and sequencing error correction:

Observation Ds at SNP is as reading (n with original mappings existing for each allele_A ^oAnd n_B ^o) composition.So Afterwards, we can find the reading nA and nB of correction, utilize the anticipated deviation in the amplification of A and B allele.

Allow C_aIndicate environmental pollution (such as pollution of the DNA in air or environment) and r (C_a) indicate that environment is dirty Contaminate the allele ratio of object (it is 0.5 that it, which is initially taken).In addition, C_gExpression genotype pollution rate (such as from another The pollution of a sample), r (c_g) be pollutant allele ratio.Allow S_e(A, B) and S_e(B, A) indicates sequencing mistake, for The allele for calling one, allele different is (such as by error detection to an A allele, as B etc. In the presence of the gene of position).

Allele ratio q (r, the c observed can be found_a, r (c_a), c_g, r (c_g), S_e(A, B), S_e(B, A)), for One given expection allele ratio r, by correcting environmental pollution, genotype pollution and sequencing mistake.

Due to the genotype of pollutant be it is unknown, group's frequency can be used to find P (r (c_g)).More specifically, P is allowed to be group's frequency of an allele (it can be referred to as a reference allele).Then, we have P (r (c_g) =0)=(1-p)², P (r (c_g)=0)=2p (1-p) and P (r (c_g)=0)=p².Conditional expectation more than r (Cg) can be with It is used for determining E [q (r, c_a, r (c_a), c_g, r (c_g), s_e(A, B), s_e(B, A))].Note that environment and genotype pollution are true Fixed, using the SNP of homozygosity, therefore they are not lacked or duplicate missing or existing are influenced.Furthermore, it is possible to measure environment It is polluted with genotype, refers to chromosome using one, if necessary.

A possibility that at each SNP:

Following equation gives the probability of the nA and nB of observation, when giving an allele ratio r:

Equation (2):

Allow D_sIndicate the data of SNP s.It, can be in equation (1) for each hypothesis hc { H11, H01, H10, H21, H12 } In allow r=r (AB, h) or r=r (BA, h), and it was found that more than r (c_g) conditional expectation, to determine the equipotential base observed Because of ratio E [q (r, c_a, r (c_a), c_g, r (c_g))].Then, r=E [q (r, c are allowed in equation (2)_a, r (c_a), c_g, r (c_g), s_e (A, B), s_e(B, A))] it can determine P (D_s∣ h, f).

Searching algorithm:

In some embodiments, the SNP with (seeming to be exceptional value) allele ratio, which is ignored, (such as passes through Ignore or eliminate the SNP with the allele ratio higher or lower than at least 2 or 3 standard deviations of average value).Note that mirror A fixed advantage (for this method) is that in the presence of higher chimeric rate percentage, the variability of allele ratio can It can be high, thus ensure that SNP will not be trimmed to about due to chimeric.

Allow F={ f1 ... ..., f_NIndicate for be fitted into rate percentage (such as tumour score) search space.It can be true Determine P (Ds ∣ h, j), in each SNP S and f ∈ F, and combines the likelihood on all SNP.

The algorithm ran each f, for each hypothesis.Use a kind of searching method, it was therefore concluded that: chimerism is deposited If f has a range F*, wherein the confidence level of missing or repetition hypothesis is higher than without missing or without the confidence for repeating hypothesis Degree.In some embodiments, the maximal possibility estimation of P (Ds ∣ h, j) in F* is determined.If desired, more than f ∈ F*'s Conditional expectation can be determined.If desired, the confidence level of each hypothesis can be determined.

Additional embodiment:

In some embodiments, a β bi-distribution be used to replace bi-distribution.In some embodiments, one Item is used for determining the design parameter of the sample of β binomial with reference to chromosome or chromosome segment.

Use the theoretical performance of simulation:

If desired, can be with the theoretical performance of assessment algorithm, the number by being randomly assigned reference count has to one The given SNP for reading depth (DOR).For normal condition, use p=0.5 as binomial probability parameter, and for lack or It repeats, p is adapted accordingly.It is as follows for the exemplary input parameter of each simulation: (1) perseverance of SNP quantity S (2) each SNP Determine DOR D, (3) p, and (4) number of experiments.

First simulated experiment:

This experiment concentrates on S ∈ { 500,1000 }, D ∈ { 500,1000 } and p ∈ 0%, 1%, 2%, 3%, 4%, 5% }.We conducted 1000 simulated experiments, in each setting, (therefore 24,000 experiment has phase, 24,000 nothings Phase).We simulate reading, from a bi-distribution (if necessary, other distributions can be used).False positive Rate (in the case where p=0%) and false negative rate (in the case where p > 0%) are determined, with or without phase information In the case where.False positive rate is listed in Figure 26.Note that phase information be it is very helpful, especially for S=500, D =1000.Although D=500, algorithm has highest false positive rate for S=500,

With or without phase except test condition.False negative rate is listed in Figure 27.

Phase information be it is particularly useful, for low chimeric rate percentage (≤3%).There is no phase information, a Gao Shui Flat false negative is observed, for p=1%, because the confidence level of missing is determined by specifying equal probability to H₁₀With H₀₁, and a little deviation (tending to a hypothesis) be insufficient to compensate for from other hypothesis it is low a possibility that.

This is also applied for repeating.It is also noted that the algorithm is seemingly more sensitive, for reading depth, compared to SNP Number.For with phase information as a result, it is assumed that full phase information can be used for a large amount of continuous heterozygosity SNP.If desired, haplotype information can be obtained, by combining haplotype probabilityly in smaller fragment.

Second simulated experiment:

This experiment concentrates on S ∈ { 100,200,300,400,500 }, D ∈ { 1000,2000,3000,4000,5000 } with And p ∈ { 0%, 1%, 1.5%, 2%, 2.5%, 3% } and 10000 random experiments are in each setting.False positive rate (in the case where p=0%) and false negative rate (in the case where p > 0%) are determined, with or without phase information In the case of.False negative rate is lower than 10%, for D >=3000 and N >=200, using haplotype information, however identical performance It can achieve, for D=5000 and N >=400 (Figure 20 A and 20B).Difference between false negative rate be it is particularly pertinent, for Small chimeric percentage (Figure 21 A-25B).For example, the false negative rate less than 20% is never reached, in no list as p=1% In the case where times type data, however it close to 0% for N > 300 and D >=3000.For p=3%, one 0% vacation yin Property rate is observed, and with Haplotype data, however N >=300 and D >=3000 are needed to reach the phase same sex It can be (in the case where no Haplotype data).

For detecting missing and duplicate illustrative methods (in the case where no phase data)

In some embodiments, non-phase genetic data is used to determine whether that there are a first homologue pieces The overexpression of the copy number of section, compared to the second homologous chromosomal segments, in individual genome (such as at one or more In the genome of a cell or in cfDNA or cfRNA).In some embodiments, phase genetic data is used, but phase Position is ignored.In some embodiments, DNA or RNA sample comes from an aggregate sample of the cfDNA or cfRNA of individual This comprising is from the cfDNA or cfRNA of the different cells of two or more heredity.In some embodiments, the side The size of the difference between the allele ratio of calculating and expected allele ratio is utilized in method, for each gene position Point.

In some embodiments, the method includes acquisition genetic datas, and one on chromosome or chromosome segment On group Genetic polymorphism site, at one in the sample of individual one or more cell DNAs or RNA, existed by measurement The amount of each allele on each gene loci.In some embodiments, allele ratio is calculated, for dividing It (such as is heterozygosity and/or in parent in fetus at least one cell of sample from the gene loci for being heterozygosity In be heterozygosity gene loci).In some embodiments, for a specific gene site calculating allele ratio Rate, be an allele measurement amount divided by the overall measurement amount of all allele, for the gene loci.In some realities It applies in scheme, is that an allele is (such as same first for the allele ratio of the calculating in a specific gene site Allele in source chromosome segment) measurement amount divided by other one or more allele (such as in the second homologous dye Allele on chromosome fragment) measurement amount, for the gene loci.The allele ratio of calculating and expected equipotential Gene ratio can be calculated, and any method as described herein or any standard method (such as calculating as described herein are utilized Any mathematic(al) manipulation of allele ratio or expected allele ratio).

In some embodiments, a test statistics is calculated based on allele ratio and expection of calculating etc. The size of difference between the gene ratio of position, for each gene loci.In some embodiments, test statistics Δ is counted It calculates, uses following formula

Wherein δ i is the allele ratio and the expected allele ratio of the calculating at i-th of gene loci Between difference size；

Wherein μ i is the average value of δ i；And

Wherein σ_i ²It is the standard deviation of δ i.

For example, we can to define δ i as follows, when expected allele ratio is 0.5:

μ_iAnd σ_iValue the fact that can be calculated, using Ri be a binomial stochastic variable.In some embodiments, Standard deviation be assumed to be it is identical, for all gene locis.In some embodiments, average or weighting standard The estimation of deviation average or a standard deviation be used to obtain σ_i ²Value.In some embodiments, test statistics quilt It is assumed that having a normal distribution.For example, central-limit theorem means that the distribution polymerization of Δ is a standardized normal distribution, with The quantity of gene loci (such as quantity T of SNP) become larger.

In some embodiments, one group of one or more hypothesis (dyeing in specified one or more cellular genome The copy number of body or chromosome segment) it is listed.In some embodiments, the most probable hypothesis based on test statistics It is selected, so that it is determined that the copy number of chromosome or chromosome segment in the genome of one or more cells.In some implementations In scheme, a hypothesis is selected, if the probability of test statistics (belonging to the test statistics distribution of the hypothesis) is higher than One higher threshold value；One or more hypothesis are excluded, if test statistics (belongs to the test statistics point of the hypothesis Cloth) probability be lower than a lower threshold value；Or a hypothesis is both not selected or is not excluded, if test statistics The probability of (belonging to the test statistics distribution of the hypothesis) is or if general between lower threshold value and higher threshold value Rate is not determined to sufficiently high confidence level.In some embodiments, a higher and/or lower threshold value is true It is fixed, it is distributed from an experience, such as one from training data (such as the sample with known copy number, such as diploid Sample known has a particular hole or duplicate sample) distribution.Such a experience distribution can be used to select Threshold value is selected, hypothesis single for one excludes test.

Note that test statistics Δ is not dependent on S, therefore, both can be used independently, if necessary.

For detecting missing and duplicate illustrative methods, allele distributions or mode are utilized

This part includes method, is used to determine whether to cross table there are the copy number of first homologous chromosomal segments It reaches, compared to one the second homologous chromosomal segments.In some embodiments, the method includes enumerating (i) multiple hypothesis, The copy of chromosome or chromosome segment in the specified genome for being present in individual one or more cells (such as cancer cell) Number, or (ii) multiple hypothesis specify the degree of the overexpression of the copy number of first homologous chromosomal segments, compared to individual Second homologous chromosomal segments in the genome of one or more cells.In some embodiments, the method packet The genetic data for obtaining individual is included, multiple Genetic polymorphism sites (such as SNP gene on chromosome or chromosome segment Seat) at.In some embodiments, a probability distribution (for each hypothesis) for the expected genotype of individual is created.One In a little embodiments, the data between the probability distribution of the expected genotype of individual inheritance data and individual of acquisition Fitting is calculated.

In some embodiments, one or more hypothesis are sorted, and are fitted according to the data, and it is highest to sort Hypothesis is selected.In some embodiments, a kind of technology or algorithm, such as a searching algorithm, are used in following steps One or more: calculate data fitting, to hypothesis sequence or the highest hypothesis of selected and sorted.In some embodiments, number It is the fitting to β bi-distribution or the fitting to a bi-distribution according to fitting.In some embodiments, the technology Or algorithm is selected from the set comprising maximal possibility estimation, MAP estimation, and Bayesian Estimation, dynamic estimation (such as Dynamic Bayesian estimation) and expectation maximization estimation.In some embodiments, the method includes applying the technology or calculation Method is to genetic data obtained and is expected genetic data.

In some embodiments, specified to be present in individual one or more the method includes enumerating (i) multiple hypothesis The copy number of chromosome or chromosome segment in the genome of cell (such as cancer cell), or (ii) multiple hypothesis specify one The degree of the overexpression of the copy number of a first homologous chromosomal segments, in the genome compared to individual one or more cells Second homologous chromosomal segments.In some embodiments, it the method includes obtaining the genetic data of individual, is contaminating At multiple Genetic polymorphism sites (such as SNP gene loci) on colour solid or chromosome segment.In some embodiments, institute Stating genetic data includes that allele counts, for multiple Genetic polymorphism sites.In some embodiments, one is directed to institute It states the Joint Distribution model that expected allele counts to be created, multiple Genetic polymorphisms on chromosome or chromosome segment At site, for each hypothesis.In some embodiments, a relative probability of one or more hypothesis is determined, and is utilized The Joint Distribution model and the allele measured on sample count, and the hypothesis with maximum probability is selected.

In some embodiments, the distribution of allele or mode (such as mode of the allele ratio of calculating) quilt For determining the presence or absence of a CNV, such as a missing or repetition.Parental source if necessary to CNV can be determined, Based on the mode.The repetition of one matrilinear inheritance comes from an additional copy of maternal chromosome segment, matrilinear inheritance Missing come from maternal chromosome segment copy missing so that the unique copy of existing chromosome segment comes from Father Yu.Illustrative mode is shown in Figure 15 A-19D, and is further described below.

In order to determine an interested chromosome segment missing presence or absence, the algorithm considers sequence count The distribution of (from the sequence count of each of two possible allele, at a large amount of SNP of every chromosome).Weight What is wanted is it should be noted that some embodiments of the algorithm are unsuitable for visualization method using one kind.Therefore, in order to illustrate Purpose, the data are illustrated in Figure 15 A-18, indicate the ratio of two most probable allele, mark in a simplified manner It is denoted as A and B, pertinent trends are more easily visualized.This diagram simplified does not account for some possibility of algorithm Feature.For example, it is not possible to be come two embodiments of the algorithm illustrated with the method for visualizing of display allele ratio: 1) ability of linkage disequilibrium is utilized, that is, influence to the possibility identity of adjacent S NP of measurement at a SNP and 2) The use of non-gaussian data model (which depict the expected distributions of allele measurement, at a SNP), gives platform spy Property and amplification deviation.It is furthermore noted that the simple version of an algorithm only considers two most common allele, each At SNP, ignore other possible allele.

Interested missing is detected, in genome and maternal blood sample.In some embodiments, genome It is analyzed with maternal plasma sample, utilize the multiplex PCR and sequencing approach of example 1.Genomic DNA integrates sample and (is detected To shortage heterozygosity SNP, in target region), it is thus identified that the measurement is for distinguishing monomer (impacted) and two-body (uninfluenced) Ability.From the analysis of the cfDNA of a maternal blood sample, it is able to detect 22q11.2 deletion syndrome, Cri-du- Other deletion syndromes in Chat deletion syndrome and Wolf-Hirschhorn deletion syndrome and Figure 14, in tire In youngster.

Figure 15 A-15C describes data, shows the presence of two chromosome, (does not have fetus when sample is entirely female parent CfDNA exists, Figure 15 A), comprising one 12% medium fetus cfDNA score (Figure 15 B), or include one 26% height Fetus cfDNA score (Figure 15 C).X-axis indicates the linear position in the individual Genetic polymorphism site along chromosome, y-axis table The reading for showing A allele, a part as total allele (A+B) reading.Maternal and fetus genotype is instructed to On the right side of figure.The figure colours, according to female genotype, so that red indicate a female genotype AA, blue table Show that a female genotype BB, and green indicate a female genotype AB.Note that measurement is to be located away from maternal blood And carried out in total cfDNA including female parent and fetus cfDNA；Therefore, each point indicates that fetus and female parent DNA (contribute to At the SNP) combination.Therefore, the ratio for increasing female parent cfDNA, from 0% to 100%, will gradually change some points it is upward or It moves down (in figure), according to maternal and fetus genotype.

In all cases, SNP is homozygosity in mother and fetus for A allele (AA), is sent out Now it is closely related with the upper limit of figure, the score as A allele is read is high, because should be without the presence of B allele. On the contrary, for B allele being the SNP of homozygosity in mother and fetus, it is found to be closely related with the lower limit of figure, Score as A allele is read is low, because should there was only B allele.With the upper and lower bound of figure not tight association Point, indicate mother, fetus or both is all the SNP of heterozygosity；These points be used to identify fetus missing or repetition, but Information is capable of providing for determining male parent and maternal hereditary information.These points are separated, according to maternal and fetus base Because of type and fetus score, and therefore each single-point along y-axis exact position depend on stoichiometry and fetus score.Example Such as, female parent is AA and fetus is the gene loci of AB, it is contemplated that with a different A allele reading score, and because This carries out different positioning along y-axis, according to the score of fetus.

Figure 15 A has data, pregnant woman non-for one, therefore represents the mould when genotype is entirely female parent Formula.This mode include point " cluster ": be closely related at the top of red a cluster and figure (SNP, wherein female genotype be AA), the bottom of blue a cluster and figure is closely related (SNP, wherein female genotype is BB) and that one single is placed in the middle Green cluster (SNP, wherein female genotype be AB).For Figure 15 B, foetal allele is to A allele reading score Contribution, changes position of some allele points along y-axis upward or downward.For Figure 15 C, the mode, including two The red and peripheral band of two blues and a center are three recombinations of green stripes, are obvious.In described three Heart green stripes correspond in female parent be heterozygosity SNP, and the point of each comfortable top (red) and bottom (blue) Two " periphery " bands are corresponding to the SNP in female parent being homozygosity.

One 22q11.2 deleted carrier (female parent with the missing) analysis shows that in Figure 16 A.The deleted carrier Do not have the SNP of heterozygosity in this region, because carrier only has a copy in this region.Therefore, this is lacked Indicated by the missing of green AB SNP.The 22q11 missing of paternal inheritance analysis shows that in fig. 16b in fetus.When Fetus (in the case where paternal inheritance missing, is present in fetus when only inheriting the single copy of a chromosome segment Copy is from female parent), thus only in the hereditary segment each gene loci single allele, the heterozygosity of fetus is It is impossible.Therefore, the SNP identification of only possible fetus is A or B.Pay attention to the missing of internal peripheral band.For one Paternal inheritance missing, feature mode include two center green bands, indicate that female parent is heterozygosity for SNP, only Only there is the red and blue bands of single periphery, indicate that female parent is homozygosity for SNP, and its still with figure Upper and lower bound (1 and 0) be closely related, respectively.

The Cri-du-Chat deletion syndrome of one matrilinear inheritance analysis shows that in Figure 17.There are two center greens Band rather than three green stripes, there are two the red and peripheral bands of two blues.The missing (such as one of one matrilinear inheritance The maternal carrier of a Duchenne's muscular dystrophy (muscular dystrophy)) it can also be detected, based on detection A small amount of signal in region, in the mixing sample (such as plasma sample) of a female parent and foetal DNA, because of mother and fetus All there is missing.

Figure 18 is the figure of the secondary lupus-Xi Erhuomu deletion syndrome an of paternal inheritance, by a red and one Indicated by the presence of a peripheral band of blue.

If desired, similar figure can be generated, for one from it is under a cloud have missing or repeat (such as with The relevant CNV of cancer) individual sample.In such figure, color coding below can be used based on not CNV's The genotype of cell: red indicates that a frequency of genotypes AA, blue indicate a genotype BB, and green indicates a genotype AB. In some embodiments, one is lacked, mode includes two center green bands, and representing individual is heterozygosity SNP (green stripes at top indicate the AB from the cell not lacked, and from the A of the cell with missing, and And the green stripes of bottom indicate the AB from the cell not lacked, and the B from the cell with missing), and And only have single periphery red and blue bands, indicate that individual is the SNP of homozygosity, and it is still upper with figure Limit and lower limit (1 and 0) are closely related, respectively.In some embodiments, the separation of two green stripes increases, with tool There is the score increase of the cell, DNA or RNA of missing.

For identifying and analyzing the illustrative methods of multiple gestation

In some embodiments, any method of the invention be used to detect the presence of multifetation, such as twins is pregnant It is pregnent, wherein at least one fetus is genetically different from least one other fetus.In some embodiments, fraternal twin It is identified, based on depositing for two fetuses with not iso-allele, different allele ratios or different allele distributions On some (or all) detected gene locis.In some embodiments, fraternal twin is identified, by true Expection allele ratio at fixed each gene loci (such as SNP site), for can in sample (such as plasma sample) There can be two fetuses of identical or different fetus score.In some embodiments, a pair of specific fetus score (wherein f1 The fetus score of fetus 1, f2 is the fetus score of fetus 2) a possibility that calculated, by consider some of two fetuses or All possible genotype, genotype and genospecies body frequency depending on mother.Two fetuses and a maternal gene Type mixture, in conjunction with fetus score, it is determined that it is expected that allele ratio, at a SNP.For example, if mother is AA, tire Youngster 1 is AA, and fetus 2 is AB, then the gross score of the B allele at SNP is the half of f2.Likelihood calculating requires all SNP matches the degree of expected allele ratio jointly, all possibility combinations based on fetus genotype.With data best The fetus score combination (f1, f2) matched is selected.It is not necessary to go the specific genotype of calculating fetus；On the contrary, for example, can examine Consider all possible genotype in a statistical combination.In some embodiments, if the method do not distinguish different ovum and Identical twin, a ultrasonic wave can be performed, to determine whether there is a different ovum or with the gemellary pregnancy of ovum.If institute The ultrasound detection stated is to gemellary pregnancy, it is believed that gestation is a gemellary pregnancy with ovum, because the twins of a different ovum is pregnant It is pregnent and has been detected based on above-mentioned snp analysis.

In some embodiments, mother of a pregnancy is known has multifetation (such as a gemellary pregnancy), base In previous test, such as ultrasonic wave.Any method of the invention can be used to determine whether that the multifetation includes same Ovum or fraternal twin.For example, the allele ratio of measurement can be compared, with identical twin (with a single pregnancy Identical allele ratio) or fraternal twin's (such as calculating of allele ratio as described above) desired value. Some identical twins are single chorion twins, the risk with the Twin Transfusion Syndrome.Therefore, the one of the invention is utilized Kind method is confirmed as the twins of identical twin, is tested (such as passing through ultrasound) as scheduled to determine whether that they are single suede Trichilemma twins, and if so, these twins can be monitored (such as double Zhou Chaosheng since 16 weeks), for lose-lose The sign of blood syndrome.

In some embodiments, any method of the invention be used to determine whether some fetuses (in multifetation, Such as twins' gestation) it is aneuploid.Start from the score estimation of fetus for the detection of twinborn aneuploid.? In some embodiments, (f1, f2) is selected with the fetus score of data best match, as described above.In some embodiment party In case, a maximal possibility estimation is performed, for the parameter on possible fetus fraction range to (f1, f2).In some realities It applies in scheme, the range of f2 is from 0 to f1, because f2 is defined as lesser fetus score.Given a pair (f1, f2), data Possibility is calculated, from the allele ratio observed at one group of gene loci (such as SNP gene loci).Some In embodiment, data possibility reflects the genotype of mother, and the genotype (if it can get) of father, group is general The gained probability of rate and fetus genotype.In some embodiments, it is independent that SNP, which is assumed to be,.The fetus score of estimation To being to generate one of the maximum data likelihood.If f2 is 0, data are explained by best through only a set of fetus Genotype indicates identical twin, and wherein f1 is combined fetus score.Otherwise f1 and f2 is to single twinborn fetus point Several estimations.The best estimate for having built up (f1, mouth), can predict the gross score of B allele in blood plasma, for maternal and Any combination of fetus genotype, if necessary.Individual sequence reads need not be removed to distribute to single fetus.Ploidy detection It is carried out, using another maximal possibility estimation, which compares the data likelihood of two hypothesis.Same ovum is directed to some In twinborn embodiment, consider that hypothesis (i) two twins are euploids, and (ii) two twins are three Body.In some embodiments for being directed to fraternal twin, consider that hypothesis (i) two twins are euploid and (ii) At least one twins is three-body.The three-body hypothesis (for fraternal twin's) be based on lower fetus score because One trisomy can be also detected in twins (having one higher fetus score).Ploidy possibility is calculated, benefit With a kind of method, this method predicts the expected reading at each target gene site, using two-body or three-body hypothesis as condition.It is not required to Wanting a diploid is with reference to chromosome.For the Tobin's mean variance model of expected reading, it is contemplated that the performance in single target gene site with And between gene loci correlation (see, e.g., the United States serial 62/008,235 submitted on June 5th, 2014, with And the United States serial No.62/032 submitted on August 4th, 2014,785, respectively as bibliography be collectively referred to herein into Herein).If the lesser twins have fetus score f1, we detect the ability of a three-body in the twins, It is equal to the ability that we detect a three-body in a single pregnancy, under same fetus score.This is because detection three Polyembryony or single pregnancy are not distinguished independent of genotype in the method part (in some embodiments) of body.It is only It is to find an increased reading, according to determining fetus score.

In some embodiments, the method includes detecting twinborn presence, based on SNP gene loci (on such as Described in text).If twins are detected, SPN is used to determine the fetus score of each fetus (f1, f2), as described above.? In some embodiments, the sample with the response of high confidence level two-body is used to determine amplification deviation, on the basis of each SNP On.In some embodiments, these samples with the response of high confidence level two-body are analyzed, interested with one or more In the identical operation of sample.In some embodiments, the amplification deviation based on every SNP is used for analog reading distribution, for One or more interested chromosomes or chromosome segment, such as No. 21 chromosomes, the chromosome segment be it is expected or The two-body hypothesis and three-body hypothesis give the junior in two twins' fetus scores.A possibility that two-body or trisomy Or probability is calculated and gives the measurement amount of two models and interested chromosome or chromosome segment.

In some embodiments, the threshold value of a positive aneuploidy response (such as three-body response) is set, and is based on Twins with lower fetus score.In this way, if another twins is positive, or if be both positive , total chromosome indicates to be higher than threshold value certainly.

Illustrative method of counting/calculation method

In some embodiments, one or more method of counting (also referred to as quantitative approach) be used to detect one or Multiple CNS, such as the missing or repetition of chromosome segment or whole chromosome.In some embodiments, one or more meters Counting method is used to determine whether that the overexpression of the copy number of the first homologous chromosomal segments is due to the first homologue One repetition of segment or a missing of the second homologous chromosomal segments.In some embodiments, one or more countings Method be used to determine the chromosome segment that one has been repeated or chromosome additional copy number (for example whether there are 1,2,3, 4 or more additional copy).In some embodiments, one or more method of counting are used to distinguish one with many The sample of repetition and one smaller cancer score, from a sample with less repetition and a larger cancer score. For example, one or more method of counting can be used to distinguish between a tool, there are four extra-chromosome copy and one 10% are swollen The sample of tumor score, from a tool, there are two extra-chromosome copies and the sample of a 20% tumour score.Illustratively Method is disclosed, for example, US publication 2007/0184467；2013/0172211；And 2012/0003637；United States Patent (USP) Numbers 8,467,976；7,888,017；8,008,018；8296076；And 8,915,415；In the beauty submitted on June 5th, 2014 State's patent application serial numbers 62/008,235, and the Application U.S. Serial No 62/032,785 submitted for 4th in August in 2014, it is each It all quotes from document is all incorporated by reference into herein.

In some embodiments, method of counting includes the number for calculating DNA sequence dna, based on being mapped to one or more

The reading of given chromosome or chromosome segment.Some such methods are related to generating (the cut-off of a reference value Value), the number that the DNA sequence dna for being mapped to a specific chromosome or chromosome segment is read, wherein being more than the one of the value A number of readings per taken indicates a specific gene unconventionality.

In some embodiments, the overall measurement amount (example of all allele (for one or more gene locis) Such as a polymorphism or the total amount of non-polymorphic gene loci) it is compared, with a reference quantity.In some embodiments In, reference quantity is a desired amount of (i) threshold value or (ii) specific copy number hypothesis.In some embodiments, Reference quantity (for a CNV is not present) is the overall measurement amount of all allele, for one or more chromosomes or dyeing It is known or expected do not have missing or repeat for one or more gene locis of body segment.In some embodiments, Reference quantity (for there are a CNV) is the overall measurement amount of all allele, for one or more chromosomes or chromosome It is known or expected there is missing or repeat for one or more gene locis of segment.In some embodiments, it refers to Amount is the overall measurement amount of all allele, for one or more one or more bases for referring to chromosome or chromosome segment For site.In some embodiments, reference quantity is for two or more different chromosomes, chromosome segment or not The average value or intermediate value determined with sample.In some embodiments, random (for example, extensive parallel shotgun sequencing) or target The amount of one or more polymorphisms or non-polymorphic gene loci is used for determining to sequencing.

In some embodiments using reference quantity, the method includes (a) to measure interested chromosome or dyeing The amount of inhereditary material in body segment；(b) compare the amount and reference quantity from step (a)；(c) identification missing or duplicate Presence or absence, based on the comparison.

In using some embodiments with reference to chromosome or chromosome segment, the method includes being sequenced from one The DNA or RNA of a sample, to obtain the multiple sequence labels compared with target gene site.In some embodiments, sequence mark Label have enough length to distribute to a specific target gene site (for example, length is 15-100 nucleotide)；It is described Target gene site is from multiple and different chromosome or chromosome segment comprising at least one the first chromosome or chromosome Segment (suspect in the sample have a spatial abnormal feature) and at least one second chromosome or chromosome segment (it is assumed that Normal distribution in sample).In some embodiments, multiple sequence labels are assigned to their corresponding target gene sites.? In some embodiments, the number of the sequence label compared with the target gene site of the first chromosome or chromosome segment, and The number of the sequence label compared with the target gene site of the second chromosome or chromosome segment is determined.In some embodiments In, these numbers are compared to determine an a spatial abnormal feature (such as missing or again for the first chromosome or chromosome segment Presence or absence again).

In some embodiments, the value (such as fetus score or tumour score) of f is used, in CNV measurement, such as The difference of the amount of two chromosomes or chromosome segment that comparative observation arrives, and in given f value for a specific type CNV Expected difference (see, e.g., US publication 2012/0190020；US publication 2012/0190021；U.S. Publication Number 2012/0190557；US publication 2012/0191358 is respectively incorporated by reference document and all quotes into herein).Example Such as, the difference in item chromosome number of fragments (chromosome segment be in a fetus it is duplicate, compared to one two times For the reference chromosome segment of body, at one in the maternal blood sample for carrying fetus) increase, with fetus The increase of score.In addition, in item chromosome number of fragments difference (chromosome segment be in a tumour it is duplicate, For reference chromosome segment compared to a diploid) increase, with the increase of tumour score.In some embodiments In, the method includes comparing an interested chromosome or chromosome segment to refer to chromosome or chromosome segment to one The relative frequency of (such as expected from one or be known to be the chromosome or chromosome segment of two-body), the value with f, described in determination A possibility that CNV.For example, the first chromosome or chromosome segment and the amount with reference between chromosome or chromosome segment Difference can be compared, and desired value when given f value, for a various possible CNV (such as interested chromosome piece One or two additional copy of section).

Example portentous illustrates method of counting/quantitative approach use below, to distinguish the first homologous dyeing One duplication of body segment and a missing of the second homologous chromosomal segments.If it is considered that the normal diploid genome of host As baseline, then the analysis of normal and cancer cell a mixture produces flat between baseline and cancer DNA in mixture Equal difference.For example, it is assumed that a kind of situation, wherein 10% DNA is from the cell with a missing in sample, in a quilt On a region for measuring the chromosome targeted.In some embodiments, a kind of quantitative approach is shown, corresponds to that The amount of the reading in region is expected to 95% desired by normal sample.This is because one in two target chromosome regions, It is to lose, and the total amount for being therefore mapped to the DNA in the region is in each tumour cell with target region missing 90% (for normal cell)+1/2x10% (for tumour cell)=95%.Alternatively, in some embodiments, one etc. Position genetic method is shown, in the allele ratio average out to 19:20 of heterozygous sites.It is now assumed that a kind of situation, wherein sample In 10% DNA from the cell with the amplification of 5 times of focus, at one of the measured chromosome targeted On region.In some embodiments, a kind of quantitative approach is shown, the amount of the reading corresponding to that region is expected to normally 125% desired by sample.This is because one in two target chromosome regions, at each with 5 times of focuses expansion In the tumour cell of increasing, it has been replicated additional five times, on the target region, and be therefore mapped to the DNA in that region Total amount is 90% (for normal cell)+(2+5) × 10%/2 (for tumour cell)=125%.Alternatively, in some embodiment party In case, a kind of allele method is shown, in the allele ratio average out to 25:20 of heterozygous sites.Note that when being used alone When a kind of allele method, 5 times of the focus amplification on a chromosomal region is (in a sample with 10%cfDNA In), in fact it could happen that the missing in identical situation, with the same region (in a sample with 10%cfDNA)； In both cases, the haplotype of low expression looks like and does not have CNV's under focus amplification situation in the case where missing Haplotype, and there is no the haplotype of CNV to look like the gene being overexpressed under focus amplification situation in the case where missing Type.A possibility that in conjunction with generating a possibility that generation by the allele method and by a kind of quantitative approach, distinguish described two Kind possibility.

Illustrative method of counting/calculation method, utilizes reference sample

A kind of illustrative calculation method is described in U.S. serial 62/ using one or more reference samples 008,235 (being submitted on June 5th, 2014) and U.S. serial 62/032,785 (being submitted on August 4th, 2014), Document is incorporated by reference herein integrally to quote into herein.In some embodiments, (most probable is or not one or more reference samples With any CNV, on one or more chromosomes or interested chromosome (for example, normal sample)) it is identified, pass through The sample with highest Tumour DNA score is selected, sample of the z value closest to zero is selected,

Select the sample of data fitting hypothesis (corresponding to the CNV's not with highest confidence level or likelihood), selection Known normal sample selects the sample from the individual with the minimum possibility of cancer (for example, having the low age, to be One male is when screening breast cancer, without family history, etc.), the sample with highest DNA input quantity is selected, selection has The sample of highest signal to noise ratio selects sample to be based on being considered and other standards relevant a possibility that suffering from cancer, or selection sample Originally pass through the combination of some standards.Once reference set is selected, it can be assumed that these situations are two-bodies, and then estimation is each SNP deviation, that is, for the experimental specificity amplification of each gene loci and other machining deviations.It is then possible to utilize the experiment Specific estimation of deviation goes to correct the deviation of chromosome interested (such as No. 21 chromosomal gene sites) in the measurements, and For other chromosomal gene sites (depending on the circumstances), for not being that (wherein, diploid is assumed to be subset, and No. 21 are contaminated Colour solid) a part sample.Once deviation is corrected, in the sample of these unknown ploidies, the data of these samples then may be used With by secondary analysis, using identical or different method, to determine whether that individual (such as fetus) is with trisomy 21 syndrome 's.For example, a kind of quantitative approach can be used on the remaining sample of unknown multiple, and a z value can be calculated, and be utilized The genetic data of the measurement of the correction, on No. 21 chromosomes.Alternatively, according to a preliminary estimate as No. 21 ploidy states A part, fetus score (or tumour score that the individual specimen with cancer is suspected from one) can be counted It calculates.The ratio of the reading for the correction being expected in the case where a two-body (two-body hypothesis) and (three-body is false in a three-body Say) in the case where the ratio of the reading of correction that is expected can be calculated, for a kind of situation with that fetus score. Alternatively, if fetus score is not measured in advance, one group of two-body and three-body hypothesis can be generated, for different fetus point Number.For each case, the expected distribution of the ratio of a correction reading can be calculated, and consider expected statistical variations, In the selection and measurement of various DNA gene locis.The correction ratio for the reading observed can be compared, and be read with correction The distribution of several expection ratios, and one for two-body and three-body hypothesis a possibility that ratio can be calculated, for each The sample of unknown ploidy.Ploidy state associated with hypothesis (with highest calculating probability) can be selected, according to school Positive ploidy state.

In some embodiments, the subset with the sufficiently low sample with cancer possibility can be selected, with Serve as a control group of sample.The subset can be a fixed number or it can be one based on only selection lower than threshold Value those of sample can parameter.Quantitative data from sample set can be combined, average, or flat using a weighting It combines, wherein a possibility that weighting is normal based on sample.The quantitative data may be used to determine whether that sample sequencing is expanded Every gene loci deviation when increasing, in the check sample of instant batch.Every gene loci deviation also may include coming from In the data of the sample of other batches.Every gene loci deviation can indicate the relative excess being observed or insufficient expansion Increase, for that gene loci compared with other gene locis, it is assumed that the subset of sample does not contain any CNV, and any The over or under amplification observed is due to expanding and/or being sequenced or other deviations.Every gene loci deviation can be examined Consider the G/C content of amplicon.Gene loci can be divided into gene loci group, in order to calculate the purpose of each gene loci deviation. Once every gene loci deviation is calculated, for each gene loci in multiple gene locis, one or more The sequencing data of a sample (not in sample set, and one or more samples optionally in sample set) can be by Correction, the influence of the deviation at that gene loci is eliminated by adjusting the quantitative measurment of each gene loci.For example, if SNP 1 is observed, in the subset of patient, the reading depth with a twice average value size, then adjustment can be with The reading for corresponding to SNP 1 including replacement is a medium-sized number.Once sequencing data (for each gene loci, In one or more samples) it has been adjusted, it may be analyzed, using a kind of method (for detecting depositing for a CNV ), on one or more chromosomal regions.

In one example, sample A is the mixture of the DNA of an amplification, is derived from one normal and cancer cell Mixture, the cell is analyzed to utilize a kind of quantitative method.The illustrative possible data of description of contents below.No. 22 dyes A region of q arm is found the only DNA mapping with desired by that region 90% on colour solid；Corresponding to HER2 gene A focus area be found to have desired by that region 150% DNA mapping；The p arm of No. 5 chromosomes is found to have There is its desired 105% DNA mapping.One clinician is it is inferred that there is sample a missing (to dye at No. 22 On one region of body q arm) and HER2 gene a repetition.Clinician is it is inferred that since 22q is lacked in mammary gland It is common in cancer, and since the cell on two chromosomes all with the region 22q missing is not survived usually, so sample In close to 20% DNA from the cell with 22q missing (on one in two chromosomes).Clinician It also is homologous derived from one group of region HER2 and the region 22q it is inferred that if the DNA of the mixing sample from tumour cell Hereditary tumour cell, then the cell includes five times of repetitions in the region HER2.

In one example, sample A is also analyzed, and utilizes a kind of method of allele.Following description of contents It illustratively may data.Two haplotypes of the same area are existing on No. 22 chromosome q arms, with the ratio of a 4:5； Correspond to HER2 gene a focus area in two haplotypes be it is existing, with the ratio of a 1:2；It is contaminated at No. 5 Two haplotypes on the p arm of colour solid be it is existing, with the ratio of a 20:21.Other all measurement regions of genome There is no the statistically significant surplus of any haplotype.One clinician is it is inferred that sample includes to have one from one The DNA of the tumour of a CNV, in the region 22q, the region HER2 and 5p arm.Based on 22q missing, right and wrong are usually in breast cancer This knowledge seen and/or the quantitative analysis (showing the insufficient expression for the amount of DNA for being mapped to the region genome 22q), Clinician may infer that the presence of a tumour with 22q missing.It is very in breast cancer based on HER2 amplification This common knowledge and/or the quantitative analysis (show the excessive table for being mapped to the amount of DNA in the region genome HER2 Reach), clinician may infer that the presence of a tumour with HER2 amplification.

Illustratively refer to chromosome or chromosome segment

In some embodiments, any method as described herein is also carried out in one or more with reference to chromosome or dye Chromosome fragment, and the result is compared, and is directed to one or more interested chromosomes or chromosome segment with those Result.

In some embodiments, it is used as a control with reference to chromosome or chromosome segment, for being expected one For the chromosome or chromosome segment that CNV is not present.In some embodiments, one or more different with reference to coming from The same chromosome or chromosome segment of sample, the sample be it is known or it is expected do not have one missing or it is duplicate, On that chromosome or chromosome segment.In some embodiments, with reference to being a difference from tested sample Chromosome or chromosome segment, the sample is expected to two-body.In some embodiments, it is different from reference to being one The segment of one chromosome interested, in just tested same sample.For example, with reference to can be potential missing or duplicate block One or more segments except domain.There is a reference on tested same chromosome, avoid different chromosomes it Between difference, such as between metabolism, Apoptosis, histone, inactivation, and/or chromosome amplification on difference.Analysis does not have There is the segment of a CNV, on the same chromosome being tested, also may be used to determine whether metabolism, Apoptosis, group Difference between albumen, inactivation, and/or homologue in amplification allows the Deflection level between homologue, lacks in a CNV In the case where, it is determined to be compared and the result from a potential CNV.In some embodiments, calculating The amplitude of difference between expected allele ratio is corresponding greater than the reference for CNV potential for one Amplitude, it is confirmed that the presence of a CNV.

In some embodiments, it is used as a control with reference to chromosome or chromosome segment, for being expected presence One CNV, such as a specific interested missing or repetition.In some embodiments, with reference to come from one or The same chromosome or chromosome segment of multiple and different samples, it is known that or it is expected with a missing or repetition, in that dye On colour solid or chromosome segment.In some embodiments, referring to a different chromosome for coming from detected sample Or chromosome segment, it is known or expected with a CNV.In some embodiments, calculating and expected allele The amplitude (CNV potential for one) of difference is similar (such as without significant difference) between ratio, corresponding to reference Amplitude, for the CNV, to confirm the presence of a CNV.In some embodiments, calculating and expected equipotential base Because of the corresponding amplitude of the amplitude (CNV potential for one) of difference between ratio is less than (such as significant be less than) reference, For the CNV, to confirm the missing of a CNV.In some embodiments, at one or more gene locis, one The genotype (or DNA or RNA from a cancer cell, such as cfDNA or cfRNA) of cancer cell is different from a non-cancerous The genotype (or DNA or RNA from non-cancerous cell, such as cfDNA or cfRNA) of cell, is used for determining tumour point Number.The tumour score can be used to determine whether that the overexpression of the copy number of the first homologous chromosomal segments is due to first One repetition of homologous chromosomal segments or a missing of the second homologous chromosomal segments.The tumour score can also by with In determine a duplicate chromosome segment or chromosome additional copy number (for example whether there are 1,2,3,4 or more volumes Outer copy), such as go to distinguish the sample that a tool is copied there are four extra-chromosome and a tumour score is 10%, From a tool, there are two the samples that extra-chromosome copy and a tumour score are 20%.Tumour score can also be by For determining the data of observation and the matching degree of expected data, for possible CNV.In some embodiments, one The degree of the overexpression of CNV is used to select a kind of specific therapy or therapeutic scheme, for the individual.For example, some control Treat agent be it is effective, only at least four, six of a chromosome segment, or more copy.

It in some embodiments, is at one referring to dye for determining one or more gene locis of tumour score On colour solid or chromosome segment, such as item chromosome or chromosome segment known or that be contemplated to be diploid, seldom it is repeated Or missing item chromosome or chromosome segment (in common or specific type cancer cancer cell, one of individual quilt It is known to have or increased risk has cancer), or it is unlikely to be the item chromosome or chromosome segment (this of aneuploid The segment of sample, which is expected, will lead to cell death, if deleting or repeating).In some embodiments, times of the invention Where method be used to confirm that with reference to chromosome or chromosome segment be diploid, in cancer cell and non-cancerous cell.One In a little embodiments, one or more chromosome or chromosome segment (it is high for the confidence level of a disomy response) It is used.

Can be used to determine tumour score illustrative gene loci include cancer cell (or DNA or RNA, such as From the cfDNA or cfRNA of a cancer cell) in polymorphism or mutation (such as SNP), be not present in a non-cancerous In cell (or DNA or RNA from non-cancerous cell), on individual.In some embodiments, the tumour score quilt Determine, by identifying those Genetic polymorphism sites, one of cancer cell (or from cancer cell DNA or RNA) the allele lacked in non-cancerous cell (or DNA or RNA from non-cancerous cell) with one is coming From in a sample of an individual (such as plasma sample or tumor biopsy sample)；And it is specific etc. using cancer cell The amount (on the Genetic polymorphism site of one or more identification) of position gene, goes to determine the tumour score in sample.Some In embodiment, a non-cancerous cell is homozygosity, for the first allele at Genetic polymorphism site, and one A cancer cell is (i) heterozygosity, for first allele and the second allele, or (ii) homozygosity, it is right The second allele at polymorphic loci.In some embodiments, a non-cancerous cell is heterozygosity, for One the first allele and second allele, on Genetic polymorphism site and a cancer cell is that (i) has One third allele of one or two copy, at Genetic polymorphism site.In some embodiments, cancer cell quilt It is assumed that or it is known only with one copy allele, be not present in the non-cancerous cell.For example, if non- The genotype of cancerous cells is AA, and cancer cell is AB, and 5% of the signal in sample at that gene loci comes from B Allele, 95% from A allele, then the tumour score of sample is 10%.In some embodiments, cancer cell quilt It is assumed that or known tool there are two copy allele, be not present in the non-cancerous cell.For example, if non-cancerous The genotype of cell is AA, and cancer cell is BB, and 5% of the signal in sample at that gene loci comes from B equipotential Gene, 95% from A allele, then the tumour score of sample is 5%.In some embodiments, cancer cell has And the multiple gene locis for an allele not having in non-cancerous cell are analyzed, to determine which gene loci exists It is heterozygosity in cancer cell and which is homozygosity.For example, for be in non-cancerous cell AA gene loci, It is about 10% at some gene locis, then if the signal from B allele is about 5% at some gene locis The cancer cell is considered as heterozygosity at the gene loci with about 5%B allele, and homozygosity has about (indicate that the tumour score is about 10%) at the gene loci of 10%B allele.

The illustrative gene loci that can be used to determine tumour score is included in a cancer cell and non-cancerous cell (such as cancer cell is AB and non-cancerous cell is BB or cancer cell is for gene loci with a common alleles BB and non-cancerous cell are the gene locis of AB).The amount of a-signal, the amount of B signal or A and B signal in a mixing sample Ratio (containing the DNA or RNA from a cancer cell and a non-cancerous cell) compared, with it is corresponding value Contain a sample for coming solely from the DNA or RNA of cancer cell, or (ii) for (i) containing coming solely from non-cancerous cell DNA or RNA a sample.The difference of value is used for determining the tumour score of the mixing sample.

In some embodiments, it may be used to determine whether that the gene loci of tumour score is selected, be based on the gene Type (i) contains sample for coming solely from the DNA or RNA of cancer cell, and/or (ii) containing coming solely from non-cancerous cell DNA or RNA sample.In some embodiments, gene loci is selected, the analysis based on mixing sample, example If the absolute or relative quantity of each allele is different from expected gene loci, if cancer cell and non-cancerous cell all have Identical genotype, at a specific gene loci.For example, if cancer cell and non-cancerous cell gene having the same Type, the gene loci is expected the B signal of generation 0%, if all cells are all AA；50% B signal, if all Cell be all AB；Or 100% B signal, if all cells are all BB.The other values for being directed to B signal indicate cancer The genotype of cell and non-cancerous cell is different, and at that gene loci, therefore that gene loci can be used for Determine tumour score.

In some embodiments, the tumour score (is calculated, in one or more gene locis based on allele Place) it is compared, the tumour score (utilizing one or more method of counting disclosed herein) with calculating.

Illustrative method, for one phenotype of detection or analysis multiple mutation

In some embodiments, the method includes analyzing a sample, for one group of mutation, with disease or obstacle (such as cancer) or a kind of disease or the increased risk of obstacle are related.There are very strong correlations, in classification (such as M Or C cancer class) event between, can be used to improve a kind of signal-to-noise ratio of method, and tumour is divided into different Clinical subset.For example, it is several mutation (such as several CNV) edges as a result, joint consider one or more chromosomes or On chromosome segment, it may be possible to a very strong signal.In some embodiments, multiple interested polymorphisms are determined Or the presence or absence of mutation (such as 2,3,4,5,8,10,12,15 or more), sensitivity and/or specificity are increased, for true A kind of fixed disease or obstacle (such as cancer) or a kind of a kind of presence for increasing risk (for disease or obstacle such as cancer) with It is no.In some embodiments, it is used for more effectively observation signal, phase across the correlation between the event of a plurality of chromosome Than in individually observing each of which.The design of the method itself can be optimized to tumour of most preferably classifying.This May be it is highly useful, for early detection and screening one visible recurrence, to specific mutation/CNV sensitivity Degree may be most important.In some embodiments, the event is not always relevant, but with one by relevant general Rate.In some embodiments, a Matrix Estimation formula (noise covariance matrix with nondiagonal term) is used.

In some embodiments, the present invention describes a kind of method, for detecting phenotype (such as cancer table Type) in an individual, wherein the phenotype is defined, pass through the presence of at least one of one group of mutation.In some implementations In scheme, the method includes obtaining the measurement of DNA or RNA, for one from the DNA of individual one or more cells or RNA sample, wherein one or more cells are under a cloud to have the phenotype；The measurement of DNA or RNA is analyzed with determination, for one Each mutation in group mutation, at least one cell have a possibility that mutation.In some embodiments, the method packet It includes and determines that individual has the phenotype, if (i) at least one mutation, at least one cell contains the possibility of that mutation Property be greater than a threshold value, or (ii) at least one mutation, at least one cell have that mutation a possibility that less than one A threshold value, and for multiple mutation, there is at least one cell the combinatory possibility of at least one mutation to be greater than threshold value.One In a little embodiments, one or more cells have a subset or all mutation of ensemble de catastrophes.In some embodiments, It is related to cancer or an increased risk of cancer to be mutated subset.In some embodiments, group mutation includes a subset Or all mutation, in the mutation of M class cancer (Ciriello, Nat Genet.45 (10): 1127-1133,2013, doi: 10.1038/ng.2762 being incorporated by reference document herein all to quote into herein).In some embodiments, which is mutated Including a subset or all mutation, in the mutation of C class cancer (Ciriello, supra).In some embodiments, described Sample includes dissociative DNA or RNA.In some embodiments, the measurement of the DNA or RNA is included in one group of Genetic polymorphism Measurement at site, on one or more interested chromosomes or chromosome segment.

Illustrative method, for permanently test or genetic correlation test

Method of the invention can be used to improve the accuracy (ginseng of paternity test test or the test of other genetic correlations See, such as the US publication 2012/0122701 that on December 22nd, 2011 submits, is incorporated by reference document herein and all draws With enter herein).It is analyzed for example, multiple PCR method can permit thousands of Genetic polymorphism sites (such as SNP) for this Parent support algorithm described in text, to determine whether that a so-called male parent is the biology male parent of a fetus.In some realities It applies in scheme, the present invention describes a kind of method, is used to determine whether that a so-called male parent is the maternal institute's gestation of a pregnancy Fetus biology male parent.In some embodiments, the method is related to obtaining phase genetic data, for so-called father This (such as by being used to determine phase genetic data using another method described herein), wherein the phase genetic data packet The identity of allele is included, the allele is present in first homologous chromosomal segments and a second homologous dyeing At each gene loci in one group of Genetic polymorphism site in body segment, in so-called male parent.In some embodiments, The method includes acquisition genetic data, at one group of Genetic polymorphism site on chromosome or chromosome segment, at one It contains in a hybrid dna sample of foetal DNA and female parent DNA (from fetus mother's), by measuring each equipotential The amount of gene, at each gene loci.In some embodiments, the method includes calculating, on one computer, in advance The genetic data of phase, for mixed DNA sample, from the phase genetic data of so-called male parent；It determines, in a computer On, a possibility that so-called male parent is the biology male parent of fetus, by comparing the genetic data obtained (in a hybrid dna Generated on sample) with the expection genetic data of hybrid dna sample；And determine whether that so-called male parent is the biology of fetus Male parent is the probability of fetus biology male parent using the so-called male parent determined.In some embodiments, the method includes Phase genetic data is obtained, maternal for the biology of fetus (such as by the way that using another method described herein, it is fixed to be used for Phase genetic data), wherein the phase genetic data includes the identity of allele, the allele is present in one Each gene loci in one group of Genetic polymorphism site on one homologous chromosomal segments and second homologous chromosomal segments Place, in female parent.In some embodiments, the method includes obtaining the phase genetic data of fetus (such as to pass through utilization Another kind method described herein, for determining phase genetic data), wherein the phase genetic data includes the same of allele Property, the allele be present on first homologous chromosomal segments and second homologous chromosomal segments more than one group At each gene loci of state property gene loci, in fetus.In some embodiments, the method includes technologies, one On platform computer, it is contemplated that genetic data the phase genetic data of so-called father, Yi Jili are utilized for mixed DNA sample With maternal phase genetic data and/or the phase genetic data of fetus.

In some embodiments, the invention is characterized in that determined signified father whether be mother pregnant youngster it is one's own The method of father.In some embodiments, the method includes obtaining the gene data stage by stage of signified father (for example to pass through this Another gene data method stage by stage that described in the text is crossed), wherein gene data includes the of signified father stage by stage for this The allele in each site in a whole set of polymorphic site on one homologous chromosomal segments and the second homologous chromosomal segments Identity.In some embodiments, the method includes by measuring each allele on each site, to be wrapped Contain chromosome in the mixing sample of foetal DNA and fetus mother's mother body D NA or a whole set of polymorphic site on chromosome segment Genetic data.In some embodiments, the method includes identifying in foetal DNA without the parent in polymorphic site Allele (i) in DNA, and identify the allele lacked in foetal DNA and in the mother body D NA of polymorphic site (i).In some case study on implementation, the method includes determining that censured father is the general of the natural father of fetus on computers Rate；Wherein the measurement includes: that (1) compares (i) and is present in foetal DNA but is not present in the mother body D NA of polymorphic site Allele and the allele of the corresponding polymorphic site in the inhereditary material of (ii) from signified father, and/or (2) will (i) allele present in foetal DNA and the mother body D NA at polymorphic site and the inhereditary material of (ii) from signified father In the allele of corresponding polymorphic site be compared；It and is the natural father of fetus using determining signified father Probability come determine signified father whether be fetus natural father.

In some embodiments, the censured father of above-mentioned determination whether be fetus natural father method for determining The relative (such as grand parents, siblings, auntie or uncle) of signified fetus whether be fetus practical affiliation it is (such as logical Cross the genetic data for using signified relative rather than the genetic data of signified father).

Example combinations method

In order to improve the precision of result, two or more present or absent method (examples for detecting CNV are carried out Such as any method or any of method of the invention).In some embodiments, progress is one or more refers to for analyzing Show the present or absent factor of disease or illness or increases the method for the risk of disease or illness.(such as it is as described herein Any method or any known method).

In some embodiments, the covariance between two or more methods is calculated using standard mathematical techniques And/or correlation.Standard mathematical techniques can also be used for the combined probability that ad hoc hypothesis is determined based on two or more tests.Show Example property technology includes meta-analysis, and the fischer joint probability for independent test is tested, and relies on p value and known association for combining The Brownian method of variance, and for combining the Koster method for relying on p value and unknown covariance.Passing through first method and the In the case that two methods determine that the mode of likelihood is orthogonal or incoherent mode determines likelihood, combination likelihood is direct And it by multiplication and can be normalized to complete, or be completed by using following formula:

Rcomb=RlR2/ [R1R2+ (1-Rl) (l-R2)]

Rcomb is combined likelihood, and Ri and R2 are individual likelihoods.For example, if the possibility of the trisomy of method 1 Property be 90%, and a possibility that the trisomy of method 2 be 95%, then from two methods combination output allow clinician Infer that fetus is three-body, there is (0.90) (0.95)/[(0.90) (0.95)+(1-0.90) (1-0.95)]=99.42%.? In the case that first and second methods are non-orthogonal, that is, there are in the case where correlation between two methods, still can combine Likelihood.

Analysis Multiple factors or the illustrative methods of variable are disclosed in the U.S. Patent number .8 of authorization on September 20th, 2011, 024,128；The US publication 2007/0027636 that on July 31st, 2006 submits；With the U.S.'s public affairs submitted on December 6th, 2006 The number of opening .2007/0178501 is integrally incorporated herein each by reference).

In various embodiments, the combined probability of ad hoc hypothesis or diagnosis is greater than 80,85,90,92,94,96,98,99 Or 99.9%, or it is greater than a certain other threshold test limits

In some embodiments, the detection limit of the mutation (such as SNV or CNV) of the method for the present invention is less than or equal to 10, 5,2,1,0.5,0.1,0.05,0.01 or 0.005%.In some embodiments, the method for the present invention mutation (such as SNV or CNV detection) is limited to 15 to 0.005%, for example, comprising 10% to 0.005%, 10% to 0.01%, 10% to 0.1%5% to 0.005%, 5% to 0.01%, 5% to 0.1%, 1% to 0.005%, 1% to 0.01%, 1% to 0.1%, 0.5% to 0.005%, 0.5% to 0.01%, 0.5% to 0.1% or 0.1% to 0.01%.In some embodiments, detection limit so that In the presence of less equal than 10%, 5%, 2%, 1%, 0.5%, 0.1%, 0.05%, 0.01% or 0.005% mutation (such as SNV or CNV) detection (or enough detected) with the DNA in the site in sample or RNA molecule (such as cfDNA or The sample of cfRNA).For example, even if be less than or equal to 10%, 5%, 2%, 1%, 0.5%, 0.1%, 0.05%, 0.01% or 0.005% with the DNA or RNA molecule for having mutation in the site, may also detect that mutation (for example, instead of the open country in site Raw type or not mutated form or the different mutation at the site).In some embodiments, detection limit so that in the presence of being less than or Equal to 10%, 5%, 2%, 1%, 0.5%, 0.1%, 0.05%, 0.01% or 0.005% sample (such as cfDNA or CfRNA sample) in the mutation (such as SNV or CNV) of DNA or RNA molecule be detected or be able to detect.In some embodiment party In case, CNV is missing from.Even if this is lacked only to be less than or equal to 10%, 5%, 2%, 1%, 0.5%, 0.1%, 0.05%, 0.01% or 0.005% DNA or RNA molecule exists, and can also be detected.These DNA or RNA molecule have comprising or not Include the region of interest lacked in sample.In some embodiments, CNV is missing from.Even if this missing is only to be less than or equal to 10%, 5%, 2%, 1%, 0.5%, 0.1%, 0.05%, 0.01% or 0.005% DNA or RNA molecule exist, can also be with It is detected.In some embodiments, CNV is duplication.Even if the existing DNA additionally replicated or RNA is less than or equal to DNA Or the 10%, 5%, 2%, 1%, 0.5%, 0.1%, 0.05%, 0.01% or 0.005% of RNA molecule, this duplication can also be by It detects.These DNA or RNA molecule have comprising or not comprising the region of interest that replicates in the sample.In some embodiments, CNV is duplication.Even if the existing DNA additionally replicated or RNA is less than or equal to 10%, 5% of DNA or RNA molecule in sample, 2%, 1%, 0.5%, 0.1%, 0.05%, 0.01% or 0.005%, this duplication can also be detected.Example 6, which provides, to be used for Calculate the illustrative methods of detection limit.In some embodiments, " LOD-zs5.0-mr5 " method of use case 6.

Exemplary sample

In some embodiments of any aspect of the invention, sample includes having missing from suspection or replicating thin Born of the same parents and/or extracellular inhereditary material, such as suspect to be carcinous cell.In some embodiments, sample includes and suspects to contain There are the cell with missing or duplication, any tissue or body fluid of DNA or RNA (such as cancer cell, DNA or RNA).It can be to packet The heredity that any sample containing DNA or RNA carries out a part as these methods measures, such as, but not limited to tissue, blood, Serum, blood plasma, urine, hair, tears, saliva, skin, nail, lymph, cervical mucus, sperm or other cells comprising nucleic acid Or material.Sample may include any cell type, or DNA from any cell type or RNA can be used and (such as come From the cell of doubtful carcinous or neuron any organ or tissue).In some embodiments, sample includes core and/or line Mitochondrial DNA.In some embodiments, sample comes from any target individual disclosed herein.In some embodiments, target Individual is the product of birth individual, Pregnant Fetus, non-pregnant fetus, such as sample of becoming pregnant, embryo or any other individual.

Exemplary sample includes the sample containing cfDNA or cfRNA.In some embodiments, cfDNA can be used for analyzing The step of without lytic cell.Cell-free DNA can be obtained from Various Tissues, such as the tissue of liquid form, such as blood, Blood plasma, lymph, ascites or celiolymph.In some cases, cfDNA is made of the DNA for being originated from fetal cell.In some cases Under, cfDNA is made of the DNA for being originated from fetus and mother cell.In some cases, from being centrifuged to remove cellular material CfDNA is separated in the blood plasma of whole blood separation.CfDNA can be (such as non-from target cell (such as cancer cell) and non-target cell Cancer cell) DNA mixture.

In some embodiments, sample contains or suspects the mixture containing DNA (or RNA), such as cancer DNA (or ) and the mixture of non-cancerous DNA (or RNA) RNA.In some embodiments, at least 0.5%, 1%, 3%, 5%, 7%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 92%, 94%, 95%, 96%, 98%, Cell in 99% or 100% sample is cancer cell.In some embodiments, at least 0.5%, 1%, 3%, 5%, 7%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 92%, 94%, 95%, 96%, 98%, The percentage of DNA (such as cfDNA) or RNA (such as cfRNA) in 99% or 100% sample come from cancer cell.In various realities Apply in scheme, the percentage of the cells in sample as cancer cell is 0.5 to 99%, such as comprising 1% to 95%, 5% to 95%, 10 to 90%, 5% to 70%, 10% to 70%20% to 90% or 20% to 70%.In some embodiments, sample Product are enriched with cancer cell or DNA or RNA from cancer cell.In some embodiments of wherein example enrichment cancer cell, at least 0.5%, 1%, 3%, 5%, 7%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 92%, 94%, the cell in 95%, 96%, 98%, 99% or 100% enriched sample is cancer cell.Example enrichment comes from wherein In some embodiments of the DNA or RNA of cancer cell, at least 0.5%, 1%, 3%, 5%, 7%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 92%, 94%, 95%, 96%, 98%, 99% or 100% enrichment DNA or RNA in sample come from cancer cell.In some embodiments, using cell sorting (such as fluorescence activated cell point Choosing) it is enriched with cancer cell (Biochim Biophys Acta.1836 (1): 105-22 of Barteneva et al., in August, 2013 .doi:10.1016/j.bbcan.2013.02.004. the electronic publishing, " Adv of on 2 24th, 2013 and Abraham et al. Biochem Eng Biotechnol.106:19-39,2007, be integrally incorporated herein each by reference).

In some embodiments of any aspect of the invention, sample include it is any it is under a cloud be at least partly fetus come The tissue in source.In some embodiments, sample includes cell and/or extracellular inhereditary material from fetus, contamination of cells And/or extracellular inhereditary material (such as inhereditary material from fetus mother) or combinations thereof.In some embodiments, sample Include the cellular genetic material from fetus, contamination of cells inhereditary material or combinations thereof.

In some embodiments, sample comes from Pregnant Fetus.In some embodiments, sample comes from non-pregnant tire It becomes pregnant after youngster, such as foetal death the product of sample or the sample from any fetal tissue.In some embodiments, sample It is maternal whole blood sample, from maternal blood sample, Maternal plasma sample, maternal serum sample, amniocentesis sample, placenta tissue Sample (for example, chorionic villus, decidua or placental membrane), cervical mucus sample or other samples from fetus.Some In embodiment, at least 3%, 5%, 7%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 92%, 94%, 95%, 96%, 98%, 99% or 100% cell is mother cell in the sample.In various embodiments In, be 5% to 99% as the cell percentages in the sample of mother cell, such as 10 to 95%, 20 to 95%, 30 to 90%, 30 to 70%, 40 to 90%40 to 70%, 50 to 90% or 50 to 80%.

In some embodiments, sample is enriches fetal cells.In some realities of wherein example enrichment fetal cell It applies in scheme, at least 0.5% in enriched sample, 1%, 2%, 3%, 4%, 5%, 6%, 7% or more cell is that fetus is thin Born of the same parents.In some embodiments, cells in sample as the percentage of fetal cell between 0.5%-100%, such as comprising 1%-99%, 5%-95%, 10%-95%, 10-%95%, 20%-90% or 30%~70%.In some embodiments In, sample is enriches fetal DNA.In some embodiments of wherein example enrichment foetal DNA, in enriched sample at least 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7% or more DNA is foetal DNA.In some embodiments, in sample DNA as foetal DNA percentage between 0.5-100%, such as comprising 1%-99%, 5%-95%, 10%-95%, 10-%95%, 20%-90% or 30%~70%.

In some embodiments, sample includes individual cells or includes DNA and/or RNA from individual cells.? In some embodiments, multiple individual cells are analyzed in parallel (for example, from same subject or from different subjects' At least 5,10,20,30,40 or 50 cells).In some embodiments, the cell quilt of multiple samples from same individual Combination, this reduces workload compared with analyzing sample respectively.Combining multiple samples may also allow for testing Various Tissues simultaneously Cancer (it may be used to provide or more thoroughly screens cancer or determine whether cancer may be transferred to its hetero-organization).

In some embodiments, sample contains individual cells or a small amount of cell, such as 2,3,5,6,7,8,9 or 10 thin Born of the same parents.In some embodiments, sample has 1 to 100,100 to 500 or 500 to 1,000 cell, including 1 and 100.? In some embodiments, sample contains 1 to 10 pik, 10 to 100 piks, 100 piks to 1 nanogram, 1 to 10 nanogram, and 10 to 100 The RNA and/or DNA of nanogram or 100 nanograms to 1 microgram.

In some embodiments, sample is embedded in parafilm.In some embodiments, sample preservative example If formaldehyde saves, and optionally it is embedded in paraffin, this can cause the crosslinking of DNA, it is made less to be used for polymerase chain reaction.? In some embodiments, sample is that formaldehyde fixes the-sample of paraffin embedding.In some embodiments, sample is fresh sample (such as the sample obtained with analysis in 1 or 2 day).In some embodiments, frozen samples before analysis.In some embodiment party In case, sample is historical samples.

In these samples any method for use in the present invention.

Exemplary sample preparation method

In some embodiments, the method includes isolated or purified DNA and/or RNA.There are as known in the art Multiple standards program realizes this purpose.In some embodiments, sample can be centrifuged to separate each layer.In some implementations In scheme, it can be used and be separated by filtration DNA or RNA.In some embodiments, the preparation of DNA or RNA can be related to expand, point From, pass through chromatography, liquid-liquid separation, separation, priority enrichment, preferential amplification, targeting amplification or many known in the art Any technology in other technologies.In some embodiments for separating DNA, RNA enzyme is used for degradation of rna.For dividing In some embodiments from RNA, use DNase (such as from Invitrogen, Carlsbad, CA, DNase I of USA) Degradation of dna.In some embodiments, RNA is separated according to the scheme of manufacturer using RNeasy mini kit (Qiagen). In some embodiments, according to the scheme of manufacturer using mirVana PARIS kit (Ambion, Austin, TX, USA) isolating small RNA molecules (Gu et al., J.Neurochem.122:641-649,2012 are integrally incorporated by reference).RNA's Concentration and purity optionally use Nanovue (GE Healthcare, Piscataway, NJ, USA) to measure, and RNA is complete Whole property is optionally by using 2100Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) J.Neurochem.122:641-649,2012, be incorporated herein by reference in their entirety).In some embodiments, TRIZOL Or RNAlater (Ambion) is used to stablize RNA during storage.

In some embodiments, the connector of common tags is added to prepare library.Before proceeding, sample DNA can be with It is flush end, single adenosine base is then added to 3- element end.Before proceeding, restriction enzyme can be used or some other cut Segmentation method cutting DNA.During the connection, the complementary 3- element tyrosine jag of the 3- element adenosine of sample fragment and adapter can be with Enhance joint efficiency.In some embodiments, using the connection reagent found in the kit of Agilent SureSelect Box carries out adapter connection.In some embodiments, library is expanded using universal primer.In one embodiment, pass through Size separation or the text expanded by using product such as AGENCOURT AMPURE pearl or the classification separation of other similar approach Library.In some embodiments, target site is expanded using PCR amplification.In some embodiments, to the DNA sequencing of amplification (such as using ILLUMINA IIGAX or HiSeq sequencer).In some embodiments, from each of DNA of amplification The DNA of end sequencing amplification is to reduce sequencing mistake.If existed in particular bases when one end of the DNA from amplification is sequenced Sequence errors, then when the other side of the DNA from amplification is sequenced, a possibility that there are sequence errors in complementary base, is smaller (same end with the DNA from amplification).

In some embodiments, full-length genome application (WGA) is used for amplification of nucleic acid sample.There are many can be used for WGA's Method: the PCR (LM-PCR) of mediation, degenerate oligonucleotide primed PCR (DOP-PCR) and multiple displacement amplification (MDA) are connected.? In LM-PCR, the short dna sequence of referred to as adapter is connected to the end of DNA.These adapters contain general extension increasing sequence, use In passing through pcr amplified DNA.In DOP-PCR, also the random primer comprising universal amplification sequence is for first round annealing and PCR. Then, using the second wheel further extension increasing sequence of PCR universal primer sequence.MDA uses phi-29 polymerase, is duplication DNA and the lasting and nonspecific enzyme of the height for having been used to single cell analysis.In some embodiments, WGA is not executed.

In some embodiments, selective amplification or enrichment are for expanding or being enriched with target site.In some embodiments In, amplification and/or selective enrichment technology can be related to PCR, such as connect the PCR of mediation, be captured by the segment of hybridization, molecule Reversed probe or other circularizing probes.In some embodiments, using real-time quantitative PCR (RT-qPCR), digital pcr or cream Liquid PCR, monoallelic base extension, be followed by mass spectrography (henry etc., clinicopathologia magazine 62:308-313,2009, It is incorporated herein by reference in their entirety).In some embodiments, it is used for by the capture hybridized with hybrid capture probe preferential Enrichment DNA.In some embodiments, for expand or the method for selective enrichment may include using probe, wherein with After target sequence correctly hybridizes, the 3- element end or 5- element end of nucleotide probe and the polymorphic site of polymorphic allele pass through The nucleotide of peanut.It is this to separate the preferential amplification for reducing an allele, referred to as allele bias.This is to be related to Using the improvement of the method for probe, wherein the 3- element end of the probe correctly hybridized or 5- element end are directly adjacent to or closely The polymorphic site of allele.In one embodiment, eliminate wherein hybridization region can with or certainly contain polymorphic position The probe of point.The polymorphic site of hybridization site can lead to not equal hybridization or completely inhibit hybridization in some allele, lead Cause the preferential amplification of certain allele.These embodiments are to be related to other methods of targeting amplification and/or selective enrichment Improvement because they preferably retain the original gene frequency of sample at each polymorphic site, no matter sample comes From single individual or the pure genomic samples of individual mixture.

In some embodiments, very short amplicon (November 21 in 2012 is generated using PCR (referred to as miniature PCR) The U.S. Application No. 13/683,604 day submitted, US publication 2013/0123120, U.S. Application No. on November 18th, 2011 Submit the 13/300th, No. 235 U.S. Patent application, on November 18th, 2011 U.S. Publication submitted the 2012/0270212nd And the United States Patent (USP) of the U.S. the 61/994th, 791 that on May 16th, 2014 submits, it is whole).CfDNA (such as in maternal serum Fetus cfDNA necrosis or apoptosis release cancer cfDNA) be height fragmentation.For fetus cfDNA, clip size It is distributed with about average value for the Gaussian form of 160bp, standard deviation 15bp, minimum about 100bp are up to about 220bp. The polymorphic site in one particular target site can take up any position in the various segments from the site from start to end It sets.Because cfDNA segment is short, a possibility that there are two primer sites, the length comprising forward and reverse primer sites For L segment a possibility that be amplicon length and segment length ratio.Under ideal conditions, wherein amplicon is The measuring method of 45,50,55,60,65 or 70bp will be respectively successfully from 72%, 69%, 66%, 63%, 59% or 56% template Fragments molecules.In certain embodiments, most preferably it is related to the cfDNA of the sample of the doubtful individual with cancer, uses generation Maximum amplicon length is 85,80,75 or 70bp, and the primer amplification cfDNA for being 75bp in certain preferred embodiments has 50 to 65 DEG C of melting temperature；It is 54-60.5 DEG C in certain preferred embodiments.Amplicon length is that forward and reverse draws Send out the distance between the 5- element end in site.It can lead to than those typically used shorter amplicon length known in the art It crosses and needs short sequence read only more effectively to measure required polymorphic site.In one embodiment, most Amplicon is less than 100bp, is less than 90bp, is less than 80bp, is less than 70bp, is less than 65bp, is less than 60bp, is less than 55bp, is less than 50bp, or it is less than 45bp.

In some embodiments, using Direct Multiple PCR, consecutive PCR, nest-type PRC, dual nesting PCR, side and half Side nesting PCR, complete nesting PCR, unilateral nesting PCR completely, unilateral nest-type PRC, nested PCR, half nesting PCR, triple half is embedding PCR, half nesting PCR are covered, unilateral half nesting PCR, reversed half nested PCR process or unilateral side PCR are described in November 21 in 2012 The U.S. Application No. 13/300,235 that the U.S. Application No. day submitted is submitted on November 18th, 13/683,604,2011, the U.S. are public The application 61/994,791 that the number of opening 2012/0270212 and 2014 is submitted on May 16, entire contents are incorporated by reference into this Text.When necessary, any one of these methods can be used for miniature PCR.

When necessary, the extension step of PCR amplification can be limited from time angle to reduce to come from and be longer than 200 nucleosides The amplification of the segment of acid, 300 nucleotide, 400 nucleotide, 500 nucleotide or 1000 nucleotide.This can lead to piece The enrichment and test of sectionization or shorter DNA (such as foetal DNA or from the DNA for having undergone apoptosis or the cancer cell of necrosis) The improvement of performance.

In some embodiments, using multiplex PCR.In some embodiments, the target site in amplification of nucleic acid sample Method include that (i) contacts nucleic acid samples with primed libraries, primed libraries while at least 100；200；500；750； 1,000；2,000；5,000；7,500；10,000；20,000；25,000；30,000；40,000；50,000；75,000；Or 100,000 different target sites are to generate reaction mixture；(ii) makes reaction mixture undergo primer extension reaction condition (such as PCR condition) is to generate the amplified production including target amplicon.In some embodiments, at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 99.5% target site is amplified.In various embodiment party In case, it is less than 60%, 50%, 40%, 30%, 20%, 10%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.25%, 0.1% Or 0.05% amplified production is primer dimer.In some embodiments, primer in the solution (such as be dissolved in liquid phase and It is not in solid).In some embodiments, primer in dissolved state and is not fixed on solid support.Some In embodiment, primer is not a part of microarray.In some embodiments, primer does not include the reversed probe of molecule (MIP).

In some embodiments, by two or more (such as 3 or 4) target amplicons (such as from being disclosed herein MiniPCR method amplicon) link together, then the product of connection is sequenced.Multiple amplicons are combined into individually Connection product improves the efficiency of subsequent sequencing steps.In some embodiments, target amplicon is long before they are connected Spend less than 150,100,90,75 or 50 base-pairs.Selective enrichment and/or amplification may include with different labels, molecule item Shape code, the label for amplification and/or the label for sequencing mark each individual molecule.In some embodiments, lead to Cross sequencing (such as passing through high-flux sequence) or by with array such as SNP array, ILLUMINA INFINIUM array or AFFYMETRIX gene chip hybridization analyzes amplified production.In some embodiments, using nano-pore sequencing, such as by Ji The nano-pore sequencing technology of Ni Ya exploitation (see, e.g., WWW geniachip.com/technology, passes through reference It is integrally incorporated herein).In some embodiments, using double-strand sequencing (Schmidt's et al., " pass through next-generation sequencing detection Super rare mutation, " U.S., National Academy of Sciences institute .109 (36): 14508-14513,2012, it is fully incorporated by reference Herein).This method greatly reduces mistake by independently marking and being sequenced each of two chains of DNA duplex.Due to Two chains are complementary, the same position discovery really mutation in two chains.In contrast, PCR or sequencing mistake only exist Cause to be mutated in one chain, and therefore can be used as error of performance discount.In some embodiments, this method needs random With two chains of complementary Double-stranded nucleotide sequence label duplex DNA, referred to as duplex.By first by single-stranded randomization core Nucleotide sequence introduces a linking subchain, then extends opposite strand with archaeal dna polymerase to generate complementary double-strand label, by double-strand Sequence label mixes in standard sequencing adapter.Asymmetry after the connector of label is connected to the DNA of shearing, from aptamer tail portion The chain of primer sites PCR amplification separate marking, and carry out paired end sequencing.In some embodiments, by sample (such as DNA or RNA sample) it is divided into multiple fractions, such as different holes (for example, hole of wafer wound intelligent chip).Sample is divided into not The sensitivity that analysis can be improved in same fraction (for example, at least 5,10,20,50,75,100,150,200 or 300 fractions), because It is higher than in Bulk Samples in a some holes for the percentage of the molecule with mutation.In some embodiments, each fraction With less than 500,400,200,100,50,20,10,5,2 or 1 DNA or RNA molecule.In some embodiments, each portion Molecule in point is sequenced respectively.In some embodiments, same bar code (such as random or nonhuman sequence) is added to phase With all molecules (such as by with primer amplification containing bar code or by connecting bar code) and different items in fraction Shape code is added to the molecule in different fractions.Bar code molecule can collect and be sequenced together.In some embodiments, it is closing And and sequencing before, such as by using nest-type PRC, amplifier molecule.In some embodiments, using a forward direction and two Reverse primer or two forward directions and a reverse primer.

In some embodiments, be present in DNA in sample or RNA molecule less than 10%, 5%, 2%, 1%, Mutation (such as SNV or CNV) in 0.5%, 0.1%, 0.05%, 0.01% or 0.005% is (such as cfDNA or cfRNA Sample) (or can be detected).In some embodiments, it is present in less than 1,000,500,100,50,20,10,5,4,3 Or the mutation (such as SNV or CNV) of 2 original DNAs or RNA molecule (before amplification) is detected (or being able to detect that) sample Product (such as sample of the cfDNA or cfRNA from such as blood sample).In some embodiments, sample (example is existed only in The sample of cfDNA or cfRNA such as from such as blood sample) in 1 original DNA or RNA molecule (before amplification) mutation (such as SNV or CNV) is detected (or being able to detect that).

For example, if the detection of mutation (such as single nucleotide variations body (SNV)) is limited to 0.1%, it can be by by grade It is divided into multiple portions (such as 100 holes) to detect the mutation for being present in 0.01%.The copy that most aperture is not mutated.It is right In have mutation several holes, be mutated reading high percentage much.In one embodiment, the DNA from target site has 20,000 initial copies, and two in those copies include purpose SNV.If sample is divided into 100 holes, 98 hole tools There is SNV, 2 holes have 0.5% SNV.DNA in each hole can be expanded, be merged with the DNA from other holes by bar code, And it is sequenced.There is no the hole of SNV to can be used for measuring background amplification/sequencing error rate, whether to determine the signal from the hole that peels off Higher than the background level of noise.

In some embodiments, using array, such as array, especially have for one or more purpose chromosomes The microarray of the probe of (for example, chromosome 13,18,21, X, Y or any combination thereof) detects amplified production.It should be appreciated that example Such as, commercially available SNP detection microarray, such as hundred million sensible company's (Santiago, chemical abstracts) GoldenGate can be used, DASL, Infmium or CytoSNP-12 Genotyping measure or the SNP from Affymetrix company, the U.S. detects microarray products, example Such as OncoScan microarray.In some embodiments, one or two of embryo or fetus biology parent determines phase gene Data are used to improve the accuracy of the analysis of the array data from individual cells.

In some embodiments for being related to sequencing, the depth of reading is mapped to the number of the sequencing reading of anchor point Amount.The depth of reading can be normalized on the sum of reading.In some embodiments, deep for the reading of sample Degree, reading depth is the mean depth read on target site.In some embodiments, for the reading depth in site, The depth of reading is to navigate to the site by the number of the reading of sequenator measurement.In general, the reading depth in site is bigger, Ratio of the ratio of allele closer to original DNA sample allelic at site.Read depth can with it is a variety of not With mode indicate, including but not limited to percentage or ratio.Thus, for example the DNA sequencer parallel in height, such as In Illumin aHISEQ, such as the sequence of 1,000,000 clones is generated, sequencing 3000 times of a site cause in the site Reading depth is 3,000 readings.The ratio of reading at the site is 3,000 divided by 1,000,000 total indicator readings or total indicator reading 0.3%.

In some embodiments, allele data are obtained, wherein allele data include instruction polymorphic site The quantitative measurment of the copy number of specific allele.In some embodiments, allele data include instruction in polymorphic position The quantitative measurment of the copy number for each allele observed at point.In general, can to all of interested polymorphic site The allele of energy obtains quantitative measurment.It is, for example, possible to use discussed in earlier paragraphs for determining the site SNP or SNV Any method of allele (such as microarray, qPCR, DNA sequencing, such as high-throughput DNA sequencing) generates polymorphic site The copy number of specific allele.This quantitative measurment is referred to herein as the hereditary equipotential of gene frequency data or measurement Gene data.Allele method is sometimes referred to as quantified using the method for allele data；This be used only from non-polymorphic The quantitative data in property site does not consider from polymorphic site but that the quantitative approach of allele identity is opposite.When using high When flux sequencing measurement allele data, allele data generally include each allele for being mapped to target site Number of readings per taken.

In some embodiments, non-allelic genes data are obtained, wherein non-allelic genes data include instruction certain bits The quantitative measurment of the copy number of point.Site can be polymorphism or non-polymorphic.In some embodiments, when site right and wrong Polymorphism when, non-allelic genes data do not include about the opposite of the individual allele being likely to be present at the site or absolutely To the information of quantity.Using only the method for non-allelic genes data (that is, quantitative data from non-polymorphic allele or coming From the quantitative data of polymorphic site, but the allele identity of each segment is not considered) it is referred to as quantitative approach.In general, right The all possible allele of interested polymorphic site obtains quantitative measurment, one of value in total with the site place There is the measurement amount of allele associated.The non-allelic genes data of polymorphic site can be by equipotential base each at the site The quantitative allele of cause is summed to obtain.When using high-flux sequence measurement allele data, non-allelic genes data Generally include the quantity for being mapped to the reading in interested site.Sequencing measurement can indicate each equipotential being present at site The opposite and/or absolute number of gene, and non-allelic genes data include the summation read, but regardless of allele identity, It is mapped to site.In some embodiments, same group of sequencing measurement can be used for generating allele data and non-allelic genes Data.In some embodiments, allele data are used as a part for determining the method for target chromosome copy number, and Generated non-allelic genes data can be used as determining a part of the distinct methods of copy number on target chromosome.In some realities It applies in scheme, both methods is statistically orthogonal, and is combined more accurately to determine on interested chromosome Copy number.

In some embodiments, obtaining genetic data includes (i) by laboratory technique, such as by using automation High-throughput DNA sequencer obtains DNA sequence dna information, or (ii) obtains the information that previously passed laboratory technique obtains, wherein believing Breath is for example by the computer on internet or by the electronics transmission from sequencing device come electronics transmission.

Other exemplary sample preparation, amplification and quantitative approach are described in the U. S. application submitted on November 21st, 2012 Numbers 13/683,604 (U.S. Application No. 61/994,791 that US publication 2013/0123120 and 2014 is submitted on May 16, It is incorporated herein by reference in their entirety).These methods can be used for analyzing any sample disclosed herein.

Exemplary quantitative approach for Cell-free DNA

When necessary, the amount or concentration of standard method measurement cfDNA or cfRNA can be used.In some embodiments, it surveys The amount or concentration of fixed cell-free mitochondrial DNA (cf mDNA).In some embodiments, it determines and is originated from core DNA (cf nDNA) Cell-free DNA amount or concentration.In some embodiments, while the amount or concentration of cf mDNA and cf nDNA being measured.

In some embodiments, qPCR is for measuring cfnDNA and/or cfm DNA (section's Le et al. " plasma circulation cell The free potential source biomolecule marker of core and mitochondrial DNA level as tumor of breast ", mole cancer 8:105,2009,8:doi: 10.1186/1476-4598-8-105 being incorporated herein by reference in their entirety).It is, for example, possible to use multiple qPCR measurements to come from One or more sites of cf nDNA (such as glyceraldehyde-3-phosphate dehydrogenase, GAPDH) and from cf mDNA (ATP enzyme 8, MTATP 8) one or more sites.In some embodiments, measured using the PCR of fluorescent marker cfnDNA and/or Cf mDNA (Shi Wacen Bach et al., " the cell-free Tumour DNA and RNA of assessment breast cancer and benign breast disease patient." rub Your biosystem 7:2848-2854,2011, be incorporated herein by reference in their entirety).When necessary, standard method, example can be used The normal distribution of data is determined such as Shapiro-Wilk-Test.When necessary, can be used standard method such as nnDNA and MDNA level is compared, such as Mann-Whitney-U-Test.In some embodiments, for example using standard method Mann-Whitney-U- is examined or Kruskal-Wallis is examined the prognosis of cfnDNA and/or mDNA level and other foundation The factor is compared.

Exemplary RNA amplification, quantitative and analysis method

Any following exemplary method can be used for expanding and optionally quantify RNA, such as cfRNA, cell RNA, cytoplasm RNA, Codocyte matter RNA, non-coding cytoplasm rna, mRNA, miRNA, mitochondrial RNA (mt RNA), rRNA or tRNA.In some implementations In scheme, microRNA is any miRNA points listed in obtainable cdr database on the WWW of mirbase.org Son is incorporated herein by reference in their entirety.Exemplary microrna molecule includes miR-509；21 and micro- R-146a.

In some embodiments, it is expanded using the multiple join dependency probe amplification (RT-MLPA) of reverse transcriptase RNA.In some embodiments, every group of hybridization probe is by synthesizing few nucleosides across the two short of SNP and long oligonucleotide Acid composition (Lee et al. " Arch Gynecol Obstet." antenatal by RT-MLPA and the Noninvasive of one group of new SNP marker Diagnose trisomy 21 development ", on July 5th, 2013, DOI 10.1007/s00404-013-2926-5；Scott Teng et al. " passes through Multiple 40 nucleic acid sequences of join dependency probe amplification relative quantification ", nucleic acids research 30:e57,2002；Step on lattice et al. (2011) trisomy 21 of u non-invasive prenatal diagnosis passes through the multiple join dependency probe amplification of reverse transcriptase, " China, chemical Laboratory medicine .49:641-646,2011, be integrally incorporated herein each by reference).

In some embodiments, with reverse transcriptase PCR amplification RNA.In some embodiments, real time reverse transcriptase is used PCR amplification RNA, such as step real time reverse transcriptase PCR (Lee et al. " Arch as discussed previously using chimeric fluorescent method Gynecol Obstet." antenatal examined using one group of new SNP marker object by the Noninvasive that RT-MLPA develops trisomy 21 It is disconnected, " on July 5th, 2013, DOI 10.1007/s00404-013-2926-5；" plasma placental RNA allele is than permitting for sieve et al. Perhaps Noninvasive prenatal chromosome aneuploidy detects, " Natural medicine 13:218-223,2007；Xu et al. " is based on microarray Identification of the placenta mRNA in Maternal plasma: towards Noninvasive prenatal gene express spectra ".Chinese Journal of Medical Genetics 41:461-467,2004；Care for et al. that " Journal of Neurochemistry .122:641-649,2012, are integrally incorporated this each by reference Text).

In some embodiments, RNA is detected using microarray.For example, can be used according to the scheme of manufacturer next From mankind's microarray analysis of Agilent Technologies.In short, connecting isolated RNA dephosphorylation and with pCp-Cy3.Base 14.0 are discharged in Sanger miRBase, by the RNA of label purifying and microRNA with the probe comprising being used for people's ripe microRNA Hybridization array.Use microarray scanner (G2565BA, Agilent Technologies) washing and scanning array.It is mentioned by Agilent Software v9.5.3 is taken to evaluate the intensity of each hybridization signal.Label, hybridization and scanning can be according to Agilent microRNA microarray systems Scheme in system (care for et al. " J.Neurochem.122:641-649,2012, be incorporated herein by reference in their entirety) carries out.

In some embodiments, RNA is detected using TaqMan measuring method.Exemplary assay is hydrolysis probes array Mankind Microrna panel vl.O (Preview Release) (Applied biosystems), it includes 157 hydrolysis probes microRNAs to survey It is fixed, including respective reverse transcription primer, PCR primer and hydrolysis probes (Zhan et al., " and in Maternal plasma the detection of placenta microRNA and Characterization, " Chemistry In China .54 (3): 482-90,2008 are incorporated herein by reference in their entirety).

When necessary, can be used standard method (method gram tal fibre and Ge Deli, disease model and mechanism 1:37-42,2008, Doi:10.1242/dmm.000331 is incorporated herein by reference) the mRNA splice mode of the one or more mRNA of measurement its All).For example, high-density micro-array and/or high-throughput DNA sequencing can be used for detecting mRNA splice variant.

In some embodiments, transcript profile is measured using full transcript profile shotgun sequencing or array.

Exemplary amplification method

Developed improved PCR amplification method, be used to minimize or prevent due to same reaction volume (such as Expand the part of the sample multi-PRC reaction of all target sites simultaneously) in do caused by neighbouring or adjacent target site amplification It disturbs.These methods can be used for expanding neighbouring or adjacent target site simultaneously, different more anti-than that must be separated to neighbouring target site Answer in volume faster with it is cheaper, they are individually expanded to avoid interference.

In some embodiments, using the polymerase with low 5' → 3r exonuclease and/or low strand-displacement activity The amplification of (for example, archaeal dna polymerase, RNA polymerase or reverse transcriptase) progress target site.In some embodiments, low-level 5' → 3r exonuclease reduce or prevent nearby primer (for example, the primer that does not extend or have during primer extend adds The primer of the one or more nucleotide added) degradation.In some embodiments, low-level strand-displacement activity reduces or prevents Only setting adjacent to primer (for example, non-extension primer or the primer that during primer extend there are one or more nucleotide to add) It changes.In some embodiments, target site adjacent to each other (for example, not having base between target site) or neighbouring (for example, site In 50,40,30,20,15,10,9,8,7,6,5,4,3,2 or 1 bases).? In some embodiments, the end 3' in a site is the 50,40,30,20,15,10,9,8,7,6,5,4,3,2 or 1 of the end 5' Downstream site in a base.

In some embodiments, at least 100,200,500,750,1,000；2,000；5,000；7,500；10,000； 20,000；25,000；30,000；40,000；50,000；75,000；Or 100,000 different target site, for example, by Amplification simultaneously is in one reaction volume to expand.In some embodiments, at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 99.5% amplified production is target amplicon.In various embodiments, as target The amount of the amplified production of amplicon is 50-99.5%, such as 60-99%, 70-98%, 80-98%, 90-99.5% or 95- 99.5%.In some embodiments, at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99% Or 99.5% target site is amplified (for example, at least 5,10,20,30,50 or 100 times of amplification), such as by anti-at one It answers in volume while expanding.In various embodiments, be amplified (for example, compared with before amplification, amplification at least 5,10,20, 30,50 or 100 times) the amount of target site be 50 to 99.5%, such as 60% to 99%, 70 to 98%, 80% to 99%, 90% To 99.5%, 95% to 99.9% or 98% to 99.99%.In some embodiments, less non-target amplicon, example are generated The less amplicon that reverse primer such as from the forward primer from the first primer pair and from the second primer pair is formed.If example Reverse primer such as from the first primer pair and/or the forward primer degradation from the second primer pair and/or displacement, then can make This undesirable non-target amplicon is generated with existing amplification method.

In some embodiments, these methods allow using longer extension of time, because in conjunction with the primer of extension Polymerase be less likely degradation and/or displacement gives primer near low 5 → 3r exonuclease (such as next downstream is drawn Object) and/or polymerase low strand-displacement activity.In various embodiments, using reaction condition (such as extension of time and temperature Degree) so that the nucleotide number that the Drawing rate of polymerase allows to be added to primer extend is equal to or more than 80,90,95,100, Next downstream primer knot on the end 3' of 110,120,130,140,150,175 or 200% primer binding site and same chain Nucleotide number between the end 5' of coincidence point.

In some embodiments, using archaeal dna polymerase, DNA is used to generate DNA cloning as template.In some realities It applies in scheme, uses RNA polymerase that DNA is used to generate RNA amplification as template.In some embodiments, using reverse Enzyme is recorded, RNA is used to generate cDNA amplicon as template.

In some embodiments, the low-level exonuclease of 5' → 3 ' of polymerase is less than same amount of Thermus Active 80%, 70%, 60%, 50%, 40%, 30%, 20%, 10%, 5%, 1% or 0.1% aquatic polymerase (" Taq " Polymerase is the common archaeal dna polymerase from thermophilic bacteria, PDB 1BGX, EC 2.7.7.7, Mulally et al. " Taq DNA The crystal structure of polymerase and inhibition Fab compound: Fab is the dynamic (dynamical) intermediate of spiral-coil for the enzyme, " U.S. Proceedings of the National Academy of Sciences 95:12562-12567,1998, be incorporated herein by reference in their entirety) carry out under the same conditions.? In some embodiments, the low-level strand-displacement activity of polymerase is less than active the 80% of same amount of Taq polymerase, 70%, 60%, 50%, 40%, 30%, 20%, 10%, 5%, 1% or 0.1% under the same conditions.

In some embodiments, polymerase is PUSHION archaeal dna polymerase, such as PHUSION high-fidelity DNA polymerase Archaeal dna polymerase (M0535S, Xin Yingge are bent in (M0530S, biology laboratory Co., Ltd, New England) or PHUSION thermal starting Blue biology laboratory Co., Ltd；Gustav Freij and Su Peiman, biochemistry 2:34-35,1995；Chester and the analysis of horse picogram Biochemistry 209:284-290,1993, be integrally incorporated herein each by reference).PHUSION archaeal dna polymerase is and holds Hot-bulb bacterium-XikQ the enzyme of continuous enhancing structure domain fusion.PHUSION archaeal dna polymerase is circumscribed with 5 ' → 3 ' polymerase activities and 5 ' Nuclease, and generate blunt-end product.PHUSION archaeal dna polymerase lacks 5 ' → 3 ' exonuclease activities and strand displacement is living Property.

In some embodiments, polymerase is archaeal dna polymerase, such as high-fidelity DNA polymerase (M0491S, Xin Yingge Blue biology laboratory Co., Ltd) or Hot Start High-Fidelity archaeal dna polymerase (M0493S, New England's life Wu Xue Laboratories, Inc).High-fidelity DNA polymerase is the high-fidelity with 3' → 5' exonuclease activity, heat Stable archaeal dna polymerase is fused to the Sso7d structural domain of lasting enhancing.High-fidelity DNA polymerase lacks outside 3 ' nucleic acid Enzyme cutting activity and strand-displacement activity.

In some embodiments, polymerase is that (M0203S, New England's biology laboratory are limited for T4 archaeal dna polymerase Company；Ta Boer and Xi Tu (1989)." DNA dependent dna-polymerases ", Ashbel et al. (version), molecular biology are worked as Preceding agreement.3.5.10-3.5.12.New York: John Wiley father and son company, 1989；Pehanorm Brooker et al..Molecular cloning: experiment Handbook (second edition), 5.44-5.47.Cold spring harbor laboratory: CSH Press, 1989, it is whole each by quoting Body is incorporated herein)).T4 archaeal dna polymerase catalytic dna needs the presence of template and primer in the synthesis in the direction 5' → 3'.The enzyme With 3' → 5' exonuclease activity, the much higher .T4 archaeal dna polymerase of activity than finding in DNA polymerase i lacks 3 ' exonuclease activities and strand-displacement activity.

In some embodiments, polymerase is that (M0327S, New England's biology are real by sulfolobus solfataricus DNA polymerase i V Yan Shi Co., Ltd；(Byrd rope et al. (2001) nucleic acids research, 29:4607-4616,2001；MacDonald, 2006).Nucleic acid Research " 34:1102-1111,2006, be integrally incorporated herein each by reference).Sulfo group bacillus DNA polymerase i V is Heat-staple Y family lesion bypasses archaeal dna polymerase, in a variety of DNA profiling lesion MacDonalds, JP et al. (2006) nucleic acid Research, 34,1102-1111, be incorporated herein by reference in their entirety).Sulfolobus solfataricus DNA polymerase i V lacks outside 5' → 3' nucleic acid Enzyme cutting activity and strand-displacement activity.

In some embodiments, if primer combine with SNP region, primer can with different efficiency combine and Amplification not iso-allele, or can only in conjunction with one allele of amplification.For the subject of heterozygosis, allele it One may not be by primer amplification.It in some embodiments, is each allele design primer.For example, if there is two Allele (for example, diallele SNP), then two primers can be used for the same position in conjunction with target gene seat (for example, positive Primer combines " A " allele, and forward primer combines " B " allele).Standard method, such as nucleotide polymorphisms database, It is determined for the position of known SNP, such as the SNP hot spot with high heterozygosis rate.

In some embodiments, the size of amplicon is similar.In some embodiments, the length range of target amplicon Less than 100,75,50,25,15,10 or 5 nucleotide.In some embodiments (such as in the DNA or RNA of amplified fragments Target gene seat), the length of target amplicon is 50 to 100 nucleotide, such as 60 to 80 nucleotide or 60 to 75 nucleosides Acid.(such as multiple target gene seats are expanded in entire exon or gene) in some embodiments, the length of target amplicon For 100 to 500 nucleotide, such as 150 to 450 nucleotide, 200 to 400 nucleotide, 200 to 300 nucleotide, or 300 and 400 nucleotide.

In some embodiments, multiple target gene seats are expanded simultaneously using primer pair, the primer pair include for The forward and reverse primer of each target gene seat to be amplified in the reaction volume.In some embodiments, with each target base Because the single primer of seat carries out a wheel PCR, the second wheel PCR then is carried out with the primer pair of each target gene seat.For example, the first round The single primer that each target gene seat can be used in PCR carries out, so that all primer combination same chains (such as use each target base Because of the forward primer of seat).This allows PCR to expand in a linear fashion, and reduces or eliminates as caused by sequence or difference in length Amplification deviation between amplicon.In some embodiments, then expanded using the forward and reverse primer of each target gene seat Increase amplicon.

Exemplary primers design method

When necessary, the primer that a possibility that forming primer dimer reduces can be used and carry out multiplex PCR.Particularly, high Degree multiplex PCR, which frequently results in, generates very a high proportion of product DNA, forms production by unproductive side reaction such as primer dimer It is raw.In one embodiment, most probable may be removed from primed libraries leads to the specific primer of unproductive side reaction, obtains To primed libraries, the DNA amplification for being mapped to genome of greater proportion will lead to.Problematic primer is removed, i.e., especially may be used Being capable of fixing those of dimer primer unexpectedly can be by being sequenced the high PCR duplication water for subsequent analysis It is flat.

It is that library selects primer there are many method, wherein non-locating primer dimer or other primers wash in a pan the amount quilt of leakage product It minimizes.Empirical data suggests that a small amount of " bad " primer is responsible for a large amount of non-locating primer dimer side reaction.Remove these " bad " primer can increase the percentage for being mapped to the sequence reads of target site.A kind of method of identification " bad " primer is observation Pass through the DNA sequencing data of targeting amplification；It those of can remove to see divided by maximum frequency primer dimer, less may be used with generating It can lead to the primed libraries not with the by-product DNA of genomic mapping.The disclosure of the combination energy of various primer combinations can also be calculated Program, and remove have highest combine can program also will generation be likely to result in the not by-product with genomic mapping The primed libraries of DNA.

In some embodiments for selecting primer, by candidate target position point design one or more primer or drawing Object is to the initial libraries for generating candidate drugs.Can based on target site expectation parameter (such as target cell group in SNP frequency or The heterozygosis rate of SNP) public information select one group of candidate's target site (such as SNP).In one embodiment, it can be used (World Wide Web is in primer3.sourceforge.net for Primer3 program；Libprimer3 issues 2.2.3, whole by quoting Body is incorporated herein) design PCR primer.When necessary, primer may be designed as annealing in specific annealing region, have specific model The G/C content enclosed has specific dimensions range, generates the target amplicon within the scope of specific dimensions and/or has other parameters special Sign.Since multiple primers of each candidate target site or primer pair, increasing primer or primer pair will be retained in library and be used for A possibility that most of or all target sites.In one embodiment, selection criteria may need each target site at least one A primer pair is retained in library.In this way, most of or all target sites will be amplified when using final primed libraries.This For a large amount of positions screening missing such as in genome or repeat or screening a large amount of sequences relevant to disease are (such as polymorphic Property or other mutation) or increased disease risks application be ideal.If the primer pair from library will generate with by another The target amplicon for the target amplicon overlapping that one primer pair generates, then can remove one of primer pair to prevent from doing from library It disturbs.

In some embodiments, to the most of or all possible combinations meter of two kinds of primers from candidate drugs library Calculate " undesirable property score " (the higher score for indicating subsistence level) (such as calculating on computers).In various embodiments In, it calculates at least 80%, 90%, 95%, 98%, 99% or 99.5% of the possibility combination of candidate drugs in library undesirable Property score.Each undesirable property score is at least partially based between two candidate drugs a possibility that forming dimer.When necessary, Undesirable property score is also based on one or more other parameters selected from the following: the heterozygosis rate of target site, at target site The relevant incidence rate of sequence (for example, polymorphism), with the sequence (such as polymorphism) at target site, candidate drugs are to target The specificity in site, the size of candidate drugs, the melting temperature of target amplicon, the G/C content of target amplicon, the expansion of target amplicon Increasing Efficiency, the size of target amplicon, and the distance at the center away from recombination hotspot.In some embodiments, candidate drugs pair The specificity of target site includes candidate drugs by combining and expanding except its site in addition to being designed as the target site of amplification is wrong With a possibility that.In some embodiments, removed from library it is one or more or the wrong candidate drugs filled out.Some In embodiment, in order to increase selection candidate drugs number, the candidate drugs of wrong primer can not be removed from library. If it is considered that Multiple factors, then can calculate undesirable property score based on the weighted average of various parameters.Based on they for The importance of the specific application of primer will be used, weight that can be different to parametric distribution.In some embodiments, from library It is middle to remove the primer with the undesirable score of highest.If the primer of removal be with the primer pair of a target position dot blot at Member, then another member of primer pair can remove from library.It can according to need the process of repeated removal primer.Some In embodiment, carry out selection method, until be retained in library candidate drugs combination undesirable property score all equal to Or it is lower than minimum threshold.In some embodiments, selection method is carried out, until the number of candidate drugs remaining in library is reduced To required number.

In various embodiments, after calculating undesirable property score, removal is as with most higher than first from library The candidate drugs of the maximum number of combined a part of two candidate drugs of the undesirable property score of small threshold value.The step is neglected Roughly equal to or lower than the first minimum threshold interaction because these interactions are unobvious.If removal primer be with The member of the primer pair of one target position dot blot, then another member of primer pair can remove from library.It can be according to need Want the process of repeated removal primer.In some embodiments, selection method is carried out, until the candidate drugs being retained in library Combination undesirable property score all equal to or be lower than the first minimum threshold.If the number for the candidate drugs being retained in library It, then can be by the way that the first minimum threshold to be reduced to the mistake of lower second minimum threshold and repeated removal primer higher than desired value Journey reduces the number of primer.It, can be by by first if the quantity of remaining candidate drugs is lower than desired value in library Minimum threshold increase to higher second minimum threshold and reuse original candidates primed libraries removal primer process come after Continuous this method, so that more candidate drugs be allowed to be retained in library.In some embodiments, selection method is carried out, directly To the candidate drugs combination being retained in library undesirable property score all equal to or lower than the second minimum threshold, or until text The number of remaining candidate drugs is reduced to required number in library.

If desired, the primer pair for generating the target amplicon Chong Die with the target amplicon that another primer pair generates is segmented into Individual amplified reaction.It may be preferably, it is expected that it is feasible for analyzing all candidate target tracks using multiplexed PCR amplification reaction (rather than due to overlapping target amplicon is omitted from analysis candidate target site).

These selection methods minimize the number for the candidate drugs that must be removed from library to realize primer dimer Requirement reduce.By removing small number of candidate drugs from library, the amplification of gained primed libraries can be used more More (or all) target site.

It is multiplexed the measurement that a large amount of primer pairs may include and applies sizable constraint.The measurement to interact unintentionally causes False amplified production.The size constraint of micro- PCR may cause further constraint.In one embodiment, may start from non- Often a large amount of potential SNP target (between about 500 to more than 1,000,000), and design primer is attempted to expand each SNP.Can be with In the case where design primer, it can attempt to assess by using the disclosed thermodynamic parameter formed for DNA duplex in institute A possibility that false primer duplex formation between possible primer pair, identifies the primer pair for being likely to form false pain object.It can be with Primer interaction is ranked up by score function relevant to interaction, and is eliminated with worst interaction score Primer, until meet needed for primer number.In the most useful situation of SNP that may be heterozygosis, measurement can also be arranged List sorting and the compatible measurement for selecting most heterozygosis.Experimental verification forms primer with the primer most probable of high interaction score Dimer.Under high multiplicity, it is impossible to eliminate all false appearance interactions, but must be driven off the score that interacts with highest Primer or primer pair because they can dominate entire reaction significantly limit the amplification from expected target.We are This program is carried out to generate up to primer sets, is more than 10,0 primer in some cases.Since the improvement of the program is aobvious , so that be more than 80% by the amplification on target product determined by all PCR products are sequenced, more than 90%, More than 95%, more than 98%, even more than 99%, and compared with not removing the 10% of reaction of worst primer wherein.Before such as Described when combining with half nesting method of part, more than 90%, even more than 95% amplicon may map to target sequence.

Note that there are also for determining which PCR probe is likely to form the other methods of dimer.In an embodiment In, the analysis for determining that problematic primer has been expanded to DNA library may be enough using the primer sets of unoptimizable.For example, can To use sequencing to be analyzed, and those are confirmed as that of most probable formation dimer with dimer existing for maximum number A little dimers can also be removed.In one embodiment, the method for design of primers can be with the miniature side PCR as described herein Method is applied in combination.

The amplification and sequencing of primer dimer product can be reduced using label on primer.In some embodiments, Primer contains the interior zone with tag-shaped at ring structure.In a particular embodiment, primer includes to target site specificity The region 5', it is not specific to target site and form the interior zone of ring structure and the 3' region special to target site.In some realities It applies in scheme, ring region can be between two basic change region, and two of them bond area is designed to combine the company of template DNA Continuous or adjacent area.In various embodiments, the length in the area 3' is at least seven nucleotide.In some embodiments, the area 3' Length be 7 to 20 nucleotide, such as 7 to 15 nucleotide or 7 to 10 nucleotide.In various embodiments, primer Including not having the region 5' of specificity to target site (such as label or universal primer binding site), it is followed by target site spy Different region, not specific interior zone simultaneously form ring structure, and the 3' region special to target site.Tag primer can be with For shortening to required target-specific sequences lower than 20, being lower than 15, being lower than 12, even lower than 10 base-pairs.When target sequence When being listed in fragmentation in primer binding site, this can be the accidental generation that standard primer designs or its and can be designed to draw Object design.The advantages of this method includes: that its increase can be the number of the measurement of a certain maximum amplicon Design of length, and it contracts " non-information " of short primer sequence is sequenced.It can also be used in combination with inner marker.

In one embodiment, nonproductive production in multiple targeting PCR amplification can be reduced by improving annealing temperature The relative quantity of object.In the case where wherein one amplification has the library of label identical with target specific primer, with genome DNA is compared, and annealing temperature can increase, because label will be helpful to primer combination.In some embodiments, annealing time can To be longer than 3 minutes, it is longer than 5 minutes, is longer than 8 minutes, be longer than 10 minutes, be longer than 15 minutes, be longer than 20 minutes, is longer than 30 minutes, It is longer than 60 minutes, is longer than 120 minutes, more than 240 minutes, more than 480 minutes, even more than 960 minutes.In certain illustrative realities It applies in scheme, using longer annealing time and reduces primer concentration.In various embodiments, using than normally extending The time longer time is greater than 3,5,8,10 or 15 minutes.In some embodiments, primer concentration is down to 50nM, 20nM, 10nM, 5nM, 1nM and be lower than 1nM.This surprisingly results in height multiple reaction, such as 1, and 000 reacts again, and 2,000 It reacts again, 5,000 react again, and 10,000 react again, and 20,000 repeat to react, and 50,000 react or even 100 again, and 000 reacts again Steady performance.In one embodiment, amplification has long annealing using one, two, three, four or five circulation Time is followed by the PCR cycle with the more common annealing time of the primer of label.

For selective goal position, since candidate drugs are to design library and potential unfavorable phase between primer pair can be created Then the thermodynamical model of interaction is eliminated using model and designs incompatible design with other in library.

In one embodiment, the present invention is characterized in that reduce target site (such as may contain and disease or disease The relevant polymorphism of disease or mutation or the increased number of loci of risk to disease or illness such as cancer) and/or increase and detect Disease burden (for example, increasing the quantity of polymorphism or mutation detected).In some embodiments, the method includes Pass through polymorphism or the frequency of mutation (such as single nucleotide variations, insertion or missing or any other variation as described herein) Or again come each site in the subject of (such as from up to minimum) with disease or illness such as cancer that grades.? In some embodiments, PCR primer is designed to some or all of sites.When selection is used for the PCR primer of primed libraries, have The primer in the site of upper frequency or reproduction (higher level site) is better than with lower frequency or recurrence (lower grade site) Primer.In some embodiments, the parameter be included as the parameter in the calculating of undesirable score as described herein it One.If desired, designing incompatible primer (such as the primer in high-grade site) with other in library may include not In the same library PCR/library.In some embodiments, in individual PCR reaction using multiple library/libraries (such as 2,3,4, 5 or more) it enables to expand all (or most of) sites represented by all libraries/library.In some embodiments, Continue this method, until including enough primers in one or more library/libraries, so that primer can generally be directed to disease Or disease burden needed for illness capture is (for example, the disease by detection at least 80%, 85%, 90%, 95% or 99% is negative Lotus).

Exemplary primers library

On the one hand, the invention is characterized in that primed libraries, such as using any method of the invention from candidate drugs library The primer of middle selection.In some embodiments, library includes while hybridizing (or can hybridize simultaneously) or expanding (or energy simultaneously Enough while expanding) at least 100,200；500；750；1,000；2,000；5,000；7,500；10,000 It is a；20,000；25000；30,000；40,000；50,000；75,000 or 100,000 different target position Point.In various embodiments, library includes between 100 to 500 while expanding the primer of (or can expand simultaneously)；500 ~1,000；1,000~2,000；2,000 to 5,000；5,000 to 7,500；7,500 to 10,000；10,000 ~20,000；20,000 to 25,000；25,000 to 30,000；30,000 to 40,000；40,000 to 50,000 It is a；50,000 to 75,000；It or include 75,000 to 100,000 different target sites in a reaction volume.Each In kind of embodiment, library is included in a reaction volume while expanding (or can expand simultaneously) 1,000 to 100,000 The primer of different target sites, such as between 1,000 to 50,000；1,000~30,000；1,000~20,000；1,0 to 10, 000；2,000 to 30,000；2,000~20,000；2,000 to 10,000；5,000 to 30,000；5,000~20,000；Or 5,000 to 10,000 different target sites.In some embodiments, library is included in a reaction volume while expanding The primer of (or can expand simultaneously) target site, so that less than 60%, 40%, 30%, 20%, 10%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.25%, 0.1% or 0.5% amplified production is primer dimer.In various embodiments, as The amount of the amplified production of primer dimer is 0.5% to 60%, such as 0.1% to 40%, 0.1 to 20%, 0.25 to 20%, 0.25% to 10%, 0.5% to 20%10%, 1%~20% or 1%~10%.In some embodiments, primer is at one (or can expand simultaneously) target site is expanded simultaneously in reaction volume, so that at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 99.5% amplified production is target amplicon.In various embodiments, as target The amount of the amplified production of amplicon is 50%-99.5%, such as 60%-99%, 70%-98%, 80%-98%, 90-99.5% Or 95-99.5%.In some embodiments, primer expands (or can expand simultaneously) target position simultaneously in a reaction volume Point, so that at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 99.5% target site It is amplified (for example, expanding at least 5,10,20,30,50 or 100 times compared with before amplification).In various embodiments, expanded Increase the target site of (for example, compared with before amplification, expand at least 5,10,20,30,50 or 100 times) amount be 50% to 99.5%, such as 60% to 99%, 70% to 98%, 80% to 99%, 90% to 99.5%, 95% to 99.9% or 98% to 99.99%.In some embodiments, primed libraries include at least 100；200；500；750；1,000；2,000；5,000；7, 500；10,000；20,000；25,000；30,000；40,000；50,000；75,000；Or 100,000 primer pair, wherein often It include positive test primer and negative testing primer to primer, wherein each pair of test primer and target position dot blot.In some implementations In scheme, primed libraries include at least 100；200；500；750；1,000；2,000；5,000；7,500；10,000；20,000； 25,000；30,000；40,000；50,000；75,000；Or 100,000 respectively from the primer of different target position dot blots, wherein Each primer is not a part of primer pair.

In various embodiments, the concentration of every kind of primer is less than 100,75,50,25,20,10,5,2 or 1nM, or is less than 500,100,10 or 1 μM.In various embodiments, the concentration of every kind of primer at 1 μM between 100nM, such as 1 μM extremely 1nM, 1 to 75nM, 2 to 50nM or 5 to 50nM, including these end values.In various embodiments, the G/C content of primer is 30% To between 80%, such as between 40% to 70% or 50% to 60%, including 50% and 60%.In some embodiments, The G/C content range of primer is less than 30%, 20%, 10% or 5%.In some embodiments, the G/C content range of primer is 5% to 30%, such as 5% to 20% or 5% to 10%, including these end values.In some embodiments, the solution of primer is tested Chain temperature (Tm) is 40 DEG C to 80 DEG C, such as 50 DEG C to 70 DEG C, 55 DEG C to 65 DEG C or 57 DEG C to 60.5 DEG C (containing).In some implementations In scheme, Primer3 program is used using built-in SantaLucia parameter (WWW primer3.sourceforge.net) (libprimer3 version 2 .2.3) calculates Tm.In some embodiments, the melting temperature range of primer is less than 15 DEG C, 10 DEG C, 5 DEG C, 3 DEG C or 1 DEG C.In some embodiments, the melting temperature range of primer is 1 DEG C to 15 DEG C, such as 1 DEG C to 10 DEG C, 1 DEG C To 5 DEG C or 1 DEG C to 3 DEG C, including 1 DEG C and 5 DEG C.In some embodiments, the length of primer is 15 to 100 nucleotide, example Such as 15 to 75 nucleotide, 15 to 40 nucleotide, 17 to 35 nucleotide, 18 to 30 nucleotide or 20 to 65 nucleosides Acid.In some embodiments, less than 50,40,30,20,10 or 5 nucleotide of the length range of primer.In some embodiment party In case, the length range of primer is 5 to 50 nucleotide, such as 5 to 40 nucleotide, 5 to 20 nucleotide or 5 to 10 cores Thuja acid.In some embodiments, the length of target amplicon is 50 to 100 nucleotide, such as 60 to 80 nucleotide or 60 To 75 nucleotide.In some embodiments, less than 50,25,15,10 or 5 nucleotide of the length range of target amplicon.? In some embodiments, the length range of target amplicon is 5 to 50 nucleotide, such as 5 to 25 nucleotide, 5 to 15 cores Thuja acid or 5 to 10 nucleotide.In some embodiments, library does not include microarray.In some embodiments, library is wrapped Containing microarray.

95%) or all adapters or primer include in some embodiments, some (for example, at least 80%, 90% or One or more keys between adjacent nucleotide in addition to naturally occurring phosphodiester bond.The embodiment of this connection includes Phosphamide, thiophosphate are connected with phosphorodithioate.In some embodiments, some (for example, at least 8%0,90% or 95%) or all adapters or primer in last 3' nucleotide and second to including thiophosphate between last 3' nucleotide (such as monothio phosphate).95%) or all adapters in some embodiments, some (for example, at least 80%, 90% or Or primer includes that thiophosphate (such as monothio phosphate) terminates between last 2,3,4 or 5 nucleotide of the end 3'. In some embodiments, some (95%) or all adapters or primer are included at least 1 for example, at least 80%, 90% or, Last 10 nucleotide of thiophosphate (such as monothio phosphate) between 2,3,4 or 5 nucleotide in the end 3'.? In some embodiments, such primer is less likely to be cut or degraded.In some embodiments, primer does not contain digestion position Point (such as protease cutting site).

The U.S. Application No. 13/683,604 (US publication 2013/0123120) submitted on November 21st, 2012 and Other exemplary multiple PCR method and library are described in the U.S. Application No. 61/994,791 that on May 16th, 2014 submits, It is integrally incorporated herein each by reference).These methods and library can be used for analyzing any sample disclosed herein and for these In any method of invention.For detecting the Exemplary primers library of recombination.

In some embodiments, the primer in design primer library with determine recombination whether occur it is one or more The recombination hotspot (such as exchange between homologous human chromosome) known.Knowing the exchange occurred between chromosome allows to be individual true Fixed more accurate phase gene data.Recombination hotspot is the regional area of chromosome, and wherein recombination event is tended to concentrate.Usually it Flank be " cold spot ", lower than the region of average recombination frequency.Recombination hotspot tends to share similar form, and length It is about 1 to 2kb.Hotspot's distribution is positively correlated with G/C content and repeat element distribution.13 aggressiveness motifs of partial deterioration CCNCCNTNNCCNC works in some hot spot activity.Have shown that referred to as the zinc finger protein of PRDM9 combine the motif and Cause recombination in its position.Average distance between recombination hotspot center is it is reported that for~80kb.In some embodiments, it recombinates The distance between hot spot center is in about 3kb between about 100kb.Public database includes a large amount of known human recombinant hot spots, Such as HUMHOT and world HapMap project database are (see, for example, Buddhist nun still top grade people, " HUMHOT: mankind's meiotic recombination The database of hot spot, " nucleic acids research periodical, 34:D25-D28,2006, database problem；Ma Cikaiweiqi et al., " mankind's base Because distribution-computer simulation of recombination hotspot in group is compared with truthful data " Public science library 8 (6): e65272, Doi:10.1371/journal.pone.0065272；And hapmap.ncbi.nlm.nih.gov/ on WWW Downloads/index.html.en is integrally incorporated herein each by reference).

In some embodiments, the primer in primed libraries is at recombination hotspot (such as known human recombinant hot spot) Or it nearby clusters.In some embodiments, sequence in recombination hotspot or neighbouring is determined, using corresponding amplicon with true The certain hotspot is scheduled on recombination whether to occur (such as whether the sequence of amplicon is expected sequence when recombinating, either No generation recombination expected sequence if there is no recombination).In some embodiments, design primer is to expand recombination hotspot Some or all of (and the sequence for being optionally disposed in recombination hotspot flank).In some embodiments, sequencing is read using long (such as carrying out the sequencing that sequence is up to about 10kb using the Moleculo technology developed by Illumina) or paired end sequencing Some or all of recombination hotspot is sequenced.It can be used about whether the knowledge that recombination event occurs and which haplotype determined Section is located at the flank of hot spot.When necessary, the primer special to the region in haplotype section can be used to confirm specific list The presence of times type section.In some embodiments, it is assumed that do not intersect between known compound hot spot.In some embodiments In, the primer in primed libraries clusters near end of chromosome or its.For example, such primer can be used for determining whether there is dye The particular arm or section of colour solid end.In some embodiments, the primer in primed libraries gathers at or near recombination hotspot Cluster, and cluster near end of chromosome or its.

In some embodiments, primed libraries include one or more primers (for example, at least 5；10；50；100；200； 500；750；1,000；2,000；5,000；7,500；10,000；20,000；25,000；30,000；40,000；50,000 are not With primer or different primer pairs), for recombination hotspot (such as known human recombinant hot spot) be specificity and/or It is specific (such as the 5' or 3' of the recombination hotspot in 10,8,5,3,2,1 or 0.5kb for the region near recombination hotspot End).In some embodiments, at least one, 5,10,20,40,60,80,100 or 150 different Primer (or primer pair) is specific to same recombination hotspot, or to identical recombination hotspot or region close to recombination hotspot. In some embodiments, at least one, 5,10,20,40,60,80,100 or 150 different primers (or primer pair) is specific for the region between recombination hotspot (such as the region that can not be recombinated)；These primers It can be used for the presence of confirmation unit type section (such as according to whether occurring to recombinate those of expected).In some embodiments, In primed libraries at least 10,20,30,40,50,60,70,80 or 90% primer for recombination hotspot be specificity and/or For the region close to recombination hotspot be it is specific (such as the end 5' or 3' of recombination hotspot 10,8,5,3,2,1 or In 0.5kb).In some embodiments, primed libraries are for determining whether recombination occurs be greater than or equal to 5；10；50； 100；200；500；750；1,000；2,000；5,000；7,500；10,000；20,000；25,000；30,000；40,000；Or 50,000 different recombination hotspots (such as known human recombinant hot spot).In some embodiments, weight is targeted by primer The region of group hot spot or near zone is substantially evenly unfolded along the part of genome.In some embodiments, at least 1 A, 5,10,20,40,60,80,100 or 150 different primers (or primer pair) are to end of chromosome Or neighbouring region (such as region is in 20 away from end of chromosome, 10,5,1,0.5,0.1,0.01 or 0.001mb).One In a little embodiments, at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90% draws in primed libraries Region of the object to end of chromosome or near it be it is specific (such as region is apart from end of chromosome 20,10,5,1,0.5, In 0.1,0.01 or 0.001mb).In some embodiments, 10,20,40,60,80,100 or 150 are not With primer (or primer pair) in chromosome it is potential it is micro-deleted in region be specific.In some embodiments, It dives at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90% primer pair chromosome in primed libraries It is micro-deleted in region be specific.In some embodiments, at least 10%, 20%, 30% in primed libraries, 40%, 50%, 60%, 70%, 80% or 90% primer pair recombination hotspot, the region near recombination hotspot, or close to dye The end of colour solid or the potential micro-deleted interior region in chromosome.

Exemplary kit

On one side, the present invention is characterized in that kit, such as the reagent for the target site in amplification of nucleic acid sample Box, for using the missing and/or duplication of any method detection chromosome segment as described herein or whole chromosome.Some In embodiment, kit may include any primed libraries of the invention.In one embodiment, kit includes multiple Internal forward primer and optional multiple internal reverse primers, and optional external forward primer and external reverse primer, In each design of primers be with the target site close to upstream and/or on target chromosome or chromosome segment of DNA (for example, more State property site) one of and optional other chromosome or chromosome segment downstream.In some embodiments, kit packet It includes and expands target site using primed libraries, such as using any method as described herein to detect one or more chromosome pieces The one or more missings and/or repeat description book of section or whole chromosome.

In certain embodiments, kit of the invention is provided for detecting chromosome aneuploid and CNV measurement Primer pair, such as the primer pair of a large amount of multiple reactions for detecting chromosome aneuploid, such as CNV (CoNVERGe) (copy number variant event shows genotype) and/or SNV.In these embodiments, kit may include at least 100, 200,250,300,500,1000,2000,2500,3000,5000,10,000,20,000,25,000,28,000,50,000 or 75,000 and at most 200,250,300,500,1000,2000,2500,3000,5000,10,000,20,000,25,000,28, 000,50,000,75,000 or 100,0 primer pair is transported together.Primer pair may be embodied in single container, such as individually Pipe or box or multiple pipes or box.In certain embodiments, primer pair is limited in advance by commercial suppliers and is sold together, and And in other embodiments, client selects the gene target and/or primer of customization, and commercial suppliers are not provided to a client With transport primer depositary management or multiple pipes.In certain exemplary implementation schemes, kit includes for detecting drawing for CNV and SNV Object, especially known CNV and SNV relevant to the cancer of at least one type.

The kit for Circulating DNA detection of some embodiments according to the present invention includes detecting for Circulating DNA Standard items and/or control.For example, in certain embodiments, standard items and/or reference substance sold and optionally with In carrying out the primer (such as primer for carrying out CoNVERGe) of amplified reaction provided herein transportation and packaging together.At certain In a little embodiments, control includes polynucleotides such as DNA, point including showing one or more chromosome aneuploid such as CNV From genomic DNA and/or including one or more SNV.In certain embodiments, standard and/or control are referred to as PlasmArt standard, and the region including the genome with known performance CNV has the polynucleotides of sequence identity, especially It is in certain genetic diseases, and in certain disease-states such as cancer, distribution reflects the cfDNA naturally found in blood plasma The distribution of segment.The illustrative methods for being used to prepare PlasmArt standard items are provided in the embodiments herein.In general, coming from The genomic DNA in the known source including chromosome body of gland is separated, fragmentation, purifying and size selection.

It therefore, can be by will not show chromosome known to the isolated polynucleotides sample incorporation prepared as outlined above In the DNA sample of aneuploid and/or SNV, people is prepared under conditions of being similar to concentration observed by for internal cfDNA Work cfDNA polynucleotides standard items and/or control, for example, in the fluid 0.01% to 20%, 0.1% to 15% or 0.4 to 10% DNA.These standard/controls may be used as measurement design, characterization, exploitation and/or the control verified, and in the test phase Between be used as quality control standard, such as the cancer test that carries out in the laboratory CLIA and/or be only used for the standard that uses of research, Diagnostic test packet.

Exemplary normalization/bearing calibration

In some embodiments, for deviation adjusting different loci, the measurement of chromosome segment or chromosome, such as by In the difference of G/C content, due to other differences of amplification efficiency, the deviation adjusted due to sequencing mistake.In some embodiments In, the measurement for the not iso-allele of same loci is directed to metabolism, apoptosis, histone, between inactivation and/or allele The difference of amplification be adjusted.In some embodiments, for the measurement of the not iso-allele of same loci in RNA, It is adjusted for the difference of transcription rate or stability between different RNA allele.

Determine the illustrative methods of phase genetic data

In some embodiments, any known method using method described herein or for determining phase genetic data is come Determine phase genetic data (the PCT Publication WO 2009/1053531 and PCT Publication submitted see, for example, on 2 9th, 2009 WO2010/017214 was submitted on August 4th, 2009；US publication on November 21st, 2013/0123120,2012；The U.S. is public The number of opening 2011/0033862 is filed on October 7th, 2010；US publication 2011/0033862,2010；On 2 3rd, 2011 Announce 2011/0178719 in the U.S. of submission；The United States Patent (USP) 8,515,679 that on March 17th, 2008 submits；November 22 in 2006 Announce 2007/0184467 in the U.S. that day submits；United States serial No.2008/0243398 that on March 17th, 2008 submits and The United States serial 61/994,791 that on May 16th, 2014 submits is integrally incorporated herein each by reference).In some realities It applies in scheme, determines phase that is known or suspecting one or more regions containing interested CNV.In some embodiments, Also phase is determined for one or more regions of the region CNV flank and/or one or more reference zones.In an embodiment In, individual (for example, the relative such as fetus or embryo of the individual or Pregnant Fetus or embryo tested using method of the invention Parent) genetic data by inferring individual described in measurement tissue be monoploid, such as by measure one or more sperms or Ovum.In one embodiment, by inferring that (such as the parent of individual is (such as from a using one or more first degree relatives The sperm of the father of body) or siblings) measurement genotype data come determine mutually individual genetic data.

In one embodiment, individual genetic data determines phase by dilution, wherein diluting in one or more holes DNA or RNA, such as by using digital pcr.In some embodiments, DNA or RNA are diluted to every in expected each hole Then the point of a haplotype no more than about one copy measures DNA or RNA in one or more holes.In some embodiments In, when chromosome is close beam, cell stops at m period, and microfluid is used to for the chromosome separated being placed in point In the hole opened.Because DNA or RNA are diluted, it is impossible to have more than one haplotype in same score (or pipe).Cause This, can effectively exist in single DNA molecules in pipe, allow to determine the haplotype in single DNA or RNA molecule.One In a little embodiments, the method includes DNA or RNA sample are divided into multiple portions, so that at least one portion includes coming from A chromosome or a chromosome segment and Genotyping for dyad is (for example, determination is two or more polymorphic Property site), DNA or RNA sample at least one fraction, so that it is determined that haplotype.In some embodiments, gene point Type is related to sequencing (such as air gun sequencing or single-molecule sequencing), for detecting the SNP array or multiplex PCR of polymorphic site.? In some embodiments, Genotyping is related to detecting polymorphic site using SNP array, and for example, at least 100；200；500； 750；1,000；2,000；5,000；7,500；10,000；20,000；25,000；30,000；40,000；50,000；75,000； Or 100,000 different polymorphic sites.In some embodiments, Genotyping is related to using multiplex PCR.In some realities It applies in scheme, the method includes contacting the sample in fraction with the primed libraries hybridized simultaneously at least 100；200； 500；750；1,000；2,000；5,000；7,500；10,000；20,000；25,000；30,000；40,000；50,000；75, 000；Or 100,000 different polymorphic sites (such as SNP) are to generate reaction mixture；And draw reaction mixture experience Object extension condition is measured with generating amplified production using high-flux sequence instrument to generate sequencing data.In some implementations In scheme, RNA (such as mRNA) is sequenced.Because mRNA only contains exon, sequencing mRNA allows in genome In big distance (such as several megabase) on determine the allele of polymorphic site (such as SNP).In some embodiment party In case, individual haplotype is sorted by chromosome and is determined.When illustrative chromosome sorting method is included in chromosome tight beam The cell of mitotic stages is prevented, and separated chromosome is placed in separated hole using microfluid.Another method relates to And the monosome separation and collection monosome mediated using FACS.Standard method (such as sequencing or array) can be used for identifying list Allele on a chromosome is to determine individual haplotype.

In some embodiments, individual haplotype is determined by long reading sequencing, such as by using The Moleculo technology of Illumina exploitation.In some embodiments, library preparation step includes that DNA is cut into segment, Such as the segment of~10kb size, it dilutes segment and they is placed in hole (so that about 3,000 segment is in single hole), lead to It crosses long-range PCR and is cut into short-movie section and barcode encoding is carried out to segment, and the bar shaped chip segment from each hole is merged Together they to be sequenced.After sequencing, calculating step is related to the bar code based on connection and separates from each hole It reads, and they is grouped as segment, the segment being overlapped on heterozygosis SNV is assembled into haplotype section, and is based on sublevel Section statistically determines section described in phase, haplotype contig with reference to panel.

In some embodiments, the haplotype of individual is determined using the data from individual relatives.In some embodiment party In case, SNP array is for determining at least 100 presence；200；500；750；1,000；2,000；5,000；7,500；10,000； 20,000；25000；30,000；40,000；50,000；75,000；Or 100,000 in individual DNA or RNA sample The relatives of a different polymorphic site and individual.In some embodiments, the method includes making the DNA sample from individual The relatives of product and/or individual contact with primed libraries, primed libraries while at least 100；200；500；750；1,000； 2,000；5,000；7,500；10,000；20,000；25000；30,000；40,000；50,000；75,000；Or 100,000 Different polymorphic sites (such as SNP) are to generate reaction mixture；And reaction mixture is made to undergo primer extension reaction condition To generate amplified production, measured with high-flux sequence instrument to generate sequencing data.

In one embodiment, using the computer journey for inferring most probable phase based on the haplotype frequency of group Sequence, such as phase is determined based on HapMap, the genetic data of fixed mutually individual.For example, can use known list times in general groups Type block (for example, public HapMap project and the Perlegen mankind's haplotype plan creation) statistical method, directly from diploid Data derive monoploid data set.Haplotype section is substantially a series of related equipotentials repeated in each kind of groups Gene.Due to these haplotype sections be usually it is ancient and common, they can be used for predicting from diploid gene type single Times type.The publicly available algorithm for completing this task includes faulty systematic growth method, the Bayes based on conjugate prior Method and priori from Population Genetics.In these algorithms it is some use hidden Markov model.

In one embodiment, the hereditary number of mutually individual is determined using the algorithm from genotype data estimation haplotype According to, such as the algorithm clustered using local haplotype is (see, for example, Browning, John Moses and Browning, John Moses, " haplotype phase fast and accurately Position and missing data infer genome-wide association study by using localization haplotype cluster " American Journal of Human Genetics. In November, 2007；81 (5): 1084-1097 is incorporated herein by reference in their entirety).Exemplary process is than Ge Er version: 3.3.2 or edition 4 (can be on the world wide web (www in hfaculty.washington.edu/browning/beagle/ Beagle.html is obtained, and is incorporated herein by reference in their entirety).

In one embodiment, the hereditary number of mutually individual is determined using the algorithm of haplotype is estimated according to genotype data According to, for example, using with distance, the decaying of the linkage disequilibrium of the sequence and interval of genotypic markers, missing number it is estimated that recombination Rate estimation, or combinations thereof (see, e.g., Stephens and Xi Zi, " the link imbalance haplotype reasoning of accounting decay and lack Lose data estimation ", American Journal of Human Genetics .76:449-462,2005, it is whole that it is incorporated by reference herein).It is exemplary Program be PHASE v.2.1 or v2.1.1.It (can be on the world wide web (www in stephenslab.uchicago.edu/ Software.html is obtained, and is incorporated herein by reference in their entirety).

In one embodiment, the hereditary number of mutually individual is determined using the algorithm from group's genotype data estimation haplotype According to, such as the algorithm for allowing making cluster member relationship continuously to be changed according to hidden Markov model along chromosome.This method is Flexibly, allow " bulk " mode of linkage disequilibrium and be gradually reduced with the linkage disequilibrium of distance (see, for example, this base of a fruit Fen Si and Xi Zi, " the quick and flexible statistical model for large-scale groups genotype data: for inferring the gene of missing Type and haplotype phase." American Journal of Human Genetics, 78:629-644,2006, be incorporated herein by reference in their entirety).Example Property program be fastPHASE (can on the world wide web (www stephenslab.uchicago.edu/software.html obtain, It is incorporated herein by reference in their entirety).

In one embodiment, using genotype interpolation, such as using below with reference to one or more in data set A method determines the genetic data of mutually individual: HapMap data set carries out the control of Genotyping in multiple SNP chips Data set, and come from 1,000 Genome Project.Illustrative methods are flexible modeling frameworks, increase accuracy and across It is more multiple with reference to panel combination information (see, for example, person of outstanding talent she, Donnelly and Ma Qini (2009) are " a kind of flexibly and accurate gene Type interpolation is next-generation genome-wide association study." Public science library-science of heredity magazine 5 (6): el000529, 2009, be incorporated herein by reference in their entirety).Exemplary process is IMPUTE or IMPUTE version 2 (also referred to as IMPUTE2) (it can obtain in WWW mathgen.stats.ox.ac.uk/impute/imputev2.html, be integrally incorporated by reference Herein).

In one embodiment, the genetic data of mutually individual is determined using the algorithm for deriving haplotype, such as is being recombinated Coalescence genetic model under infer the algorithm of haplotype, such as the genetic model developed in PHASE v2.1 by Stefan.It is main The algorithm improvement wanted depends on the Candidate haplotype set that each individual is indicated using binary tree.These On Binary Tree Representations: (1) The calculating of the posterior probability of haplotype, and (2) is accelerated to pass through by avoiding the redundant operation carried out in PHASE v2.1 In terms of index of the most haplotype of Intelligent exploration to overcome haplotype reasoning problems (for example, with reference to Draenor, Ku Longre and bundle Gu Li, " Shape-IT: the new algorithm fast and accurately inferred for haplotype ", BMC bioinformatics 9:540, 2008doi:10.1186/1471-8), reasonable approach (i.e. haplotype) 2105-9-540 is identified in binary tree, by drawing Be integrally incorporated herein).Exemplary process is that SHAPEIT (can be on the world wide web (www in mathgen.stats.ox.ac.uk/ Genetics_software/shapeit/shapeit.html is obtained, and is incorporated herein by reference in their entirety).

In one embodiment, the heredity of mutually individual is determined using the algorithm from group's genotype data estimation haplotype Data, such as obtain using haplotype piece band frequency the algorithm of the probability based on experience of longer haplotype.In some implementations In scheme, algorithm rebuilds haplotype so that they have maximum partial coherence (see, for example, Ai Luoning, Haier thatch and Toea It is fertile peaceful, " HaploRec: effective and accurate large-scale reconstruction of haplotype, " BMC bioinformatics 7:542,2006, pass through Reference is integrally incorporated herein).Exemplary process is HaploRec, such as HaploRec version 2 .3.(it can be existed by WWW Cs.helsinki.fi/group/genetics/haplotyping.html is obtained, and is incorporated herein by reference in their entirety).

In one embodiment, the heredity of mutually individual is determined using the algorithm from group's genotype data estimation haplotype Data, for example, using piecewise connection strategy and the algorithm of the algorithm based on expectation maximization (see, for example, the Qin, ox and Liu, " subregion The haplotype reasoning of connection-expectation-maximization algorithm single nucleotide polymorphism ", American Journal of Human Genetics .71 (5): 1242-1247,2002, be incorporated herein by reference in their entirety).Exemplary process is that PL-EM (can be in WWW People.fas.harvard.edu/junliu/plem/click.html is obtained, and is incorporated herein by reference in their entirety).

In one embodiment, the heredity of mutually individual is determined using the algorithm from group's genotype data estimation haplotype Data, such as genotype to be determined to the algorithm mutually divided for haplotype and block simultaneously.In some embodiments, validity period Hope maximize algorithm (see, for example, base Meier and Shamir, " GERBIL: genotype resolution ratio and identified using the block of likelihood ", Institute, national academy of sciences, United States of America report (PNAS) 102:158-162,2005, be incorporated herein by reference in their entirety).Example Property program is GERBIL, and a part that can be used as GEVALT version 2 program (can be from WWW acgt.cs.tau.ac.il/ Gevalt/ is obtained, and is incorporated herein by reference in their entirety).

In one embodiment, the heredity of mutually individual is determined using the algorithm from group's genotype data estimation haplotype Data, such as the algorithm for giving the ML estimation of the Haplotype frequencies of genotype measurement of not designated phase is calculated using EM algorithm. The algorithm also allows to lack some genotype measurements (since such as PCR fails).It also allows the multiple interpolation of single haplotype (see, for example, Clayton D.(2002), " SNPHAP: for estimating the program of big haploid frequency of SNP ", passes through reference It is integrally incorporated herein).Exemplary process is that SNPHAP (can gene.cimr.cam.ac.uk/clayton/ on the world wide web (www Software/snphap.txt is obtained, and is incorporated herein by reference in their entirety).

In one embodiment, the heredity of mutually individual is determined using the algorithm from group's genotype data estimation haplotype Data, such as algorithm is inferred to the statistical haplotype of the genotype of collection based on SNP is directed to.The software can be used for a large amount of length The relatively accurate of genome sequence determines phase, for example, obtaining from DNA array.Exemplary process is input with genotype matrix, and Corresponding haplotype matrix is exported (see, for example, Bu Linzha and lucky, " 2SNP: expansible based on 2-SNP haplotype determines phase, " Bioinformatics .22 (3): 371-3,2006 are incorporated herein by reference in their entirety).Exemplary process is that 2SNP (can be in ten thousand dimensions It nets alla.cs.gsu.edu/sofltware/2SNP to obtain, be incorporated herein by reference in their entirety).

In various embodiments, intersect at the different location in chromosome or chromosome segment using about chromosome Probability data (such as using recombination data, such as the recombination data that can be found in HapMap database, to generate weight Group risk score) come any interval of genetic data of sequencing individual), to simulate the polymorphism etc. on chromosome or chromosome segment Dependence between the gene of position.In some embodiments, it is calculated on computers based on sequencing data or SNP array data more Allele at state property site counts.In some embodiments, multiple hypothesis each relate to chromosome or chromosome segment Different possible states (such as in the genome of the copy number of the first homologous chromosomal segments and the first homologous chromosomal segments The cell that indicates or more of crossing of comparing of the second homologous chromosomal segments, the duplication of the first homologous chromosomal segments, second is same The equal expression of the missing of source chromosome segment or the first and second homologous chromosomal segments) (such as creating on computers)； Model (such as the Joint Distribution counted for each expection allele for assuming to establish at polymorphic site on chromosome Model) (such as establishing on computers), it is counted using joint distributed model and allele to determine the opposite of each hypothesis Probability (such as determination on computers)；And select the hypothesis with maximum probability.In some embodiments, equipotential is established The step of Joint Distribution model of gene count and the relative probability of determining each hypothesis, is used without using with reference to chromosome Method carry out.

In one embodiment, using one or more relationships of individual (such as one or more parent, Xiong Dijie Younger sister, children, fetus, embryo, grand parents, uncle, aunt or cousin) genetic data come determine mutually individual hereditary number According to.In one embodiment, using the something lost of the hereditary offspring (for example, 1,2,3 or more offspring) of the one or more of individual Pass data, such as embryo, fetus, the children of birth or miscarriage sample.In one embodiment, phase is determined using other parents Hereditary the non-of offspring of the one or more of Haplotype data and parent determine phase genetic data, to parent (such as Pregnant Fetus or The parent of embryo) genetic data carry out determining phase.

In some embodiments, sample (such as biopsy, such as tumor biopsy, blood sample, blood plasma sample from individual Product, blood serum sample or may mainly contain or contain only interested CNV DNA or RNA another sample) (such as suspect suffer from Have the individual of cancer, fetus or embryo), to determine known or suspection one containing interested CNV (such as lack or repeat) A or multiple regions phases.In some embodiments, sample have high tumour score (such as 30,40,50,60,70,80, 90,95,98,99 or 100%).In some embodiments, sample (such as maternal whole blood sample, it is female from maternal blood sample Body plasma sample, maternal serum sample, amniocentesis sample, placenta tissue sample (such as chorionic villus, decidua or placenta Film) cervical mucus sample, the fetal tissue after foetal death other samples from fetus or may mainly contain or contain only Have the cell of interested CNV, another sample of DNA or RNA) it analyzes as from known to the determination of the pregnant mothers of fetus or fetus Or suspect the phase in one or more regions containing interested CNV (such as missing or duplication).In some embodiments, sample Product have high fetus score (such as 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% or 100%).

In some embodiments, sample has haplotype imbalance or any aneuploid.In some embodiments, Sample includes any mixture of two kinds of DNA, and two of them type has two kinds of different haplotype ratios, and altogether Enjoy at least one haplotype.For example, mother is 1:1 in the case where fetus mother, fetus is 1:0 (plus paternal haplotype). For example, in the case of a tumor, normal tissue 1:1, tumor tissues are 1:0 or 1:2,1:3,1:4 etc..In some embodiments In, at least 10:100；500；1,000；2,000；3,000；5,000；8,000；Or 10,000 polymorphic sites are to determine The phase of allele at some or all of sites.In some embodiments, sample comes from and is handled to become non-multiple The cell or tissue of body, such as the aneuploidy induced by extended cell culture.

In some embodiments, largely or entirely DNA or RNA has interest CNV in sample.In some embodiments In, the ratio of total DNA or RNA in DNA or RNA and sample from one or more target cells containing interested CNV It is at least 80%, 85%, 90%, 95% or 100%.For having the sample of missing, for having the cell of missing (or DNA Or RNA) there is only a kind of haplotypes.First haplotype standard method can be used determine be present in absent region etc. The identity of position gene determines.In the sample of the cell (or DNA or RNA) only containing missing, there is only come to be present in The signal of the first haplotype in those cells.Also containing the cell (or DNA or RNA) not lacked on a small quantity it is (such as a small amount of Non-cancerous cell) sample in, the weak signal of the second haplotype in these cells (or DNA or RNA) can be ignored.Come Second haplotype present in other cells of the individual lacked from missing, DNA or RNA can be determined by inferring.Example Such as, if the genotype of the cell from the individual not lacked is (AB, AB), and the phase data of determining of individual indicates first list Times type is (A, A)；So, another haplotype may infer that as (B, B).

For wherein there is the cell (or DNA or RNA) and the cell (or DNA or RNA) that does not lack with missing Sample still can measure phase.For example, being similar to the producible figure of Figure 18 or 29, wherein x-axis is indicated along the individual of chromosome The linear position in site, y-axis indicate the quantity that the A allele of the score as total (A+B) is read.In some embodiments In, for missing, mode includes representing two center strips of the SNP that individual is heterozygosis (top strap being indicated from scarce The AB of the cell of mistake, the A from the cell with missing, lower strip indicate that AB and B from the cell not lacked come from Cell with missing).In some embodiments, with the cell with missing, the score of DNA or RNA increase, and this two The separation of band increases.Therefore, the identity of A allele is determined for the first haplotype, and B allele is same Property is determined for the second haplotype.

Duplicate sample, there are additional haplotype copies for duplicate cell (or DNA or RNA).Standard method can be used Determine this haplotype of repeat region, to determine in repeat region the identity of existing allele in an increased amount, Or standard method can be used and determine that the standard method of the identity of repeat region determines the haplotype in unduplicated region Allele exist with the amount of reduction.Once it is determined that a haplotype, then can determine another single times by inferring Type.

There is duplicate cell (or DNA or RNA) and without duplicate cell (or DNA or RNA) for wherein existing Sample still can be used with the above-mentioned similar method of method for missing and measure phase.For example, for example, being similar to Figure 18 or 29 Producible figure, wherein x-axis indicates the linear position in individual sites along chromosome, and y-axis indicates the score as total (A+B) A allele read quantity.In some embodiments, for missing, mode includes representing individual as the SNP of heterozygosis (top strap indicates the AB from the cell not the replicated and AAB from the cell with duplication, bottom to two center strips Band indicates the AB from the no duplicate cell and ABB from the unit with duplication).In some embodiments, with There is duplicate cell, the score of DNA or RNA increase, and the separation of the two bands increases.Therefore, A allele is same Property is determined for the first haplotype, and the identity of B allele is determined for the second haplotype.In some realities Apply in scheme, the phase in one or more regions CNV (such as in measured region polymorphic site at least 50%, 60%, 70%, 80%, 90%, 95% or 100% phase) it is to sample (such as the tumor biopsy from the known individual with cancer Or plasma sample) measurement, and for analyze the subsequent sample from same individual with monitor the progress of cancer (such as monitor cancer Alleviation or relapse cancer).In some embodiments, with high tumour score sample (such as tumor biopsy or from tool Have the plasma sample of the individual of high tumor load) determine phase data for obtaining, be used to analyze have lower tumour score with Sample (such as plasma sample of the individual from experience treatment of cancer or alleviation) afterwards.

For in another embodiment of pre-natal diagnosis, determining phase parent haploid number according to for detecting one from father The presence of a above homologue, it means that the inhereditary material from more than one fetus is present in maternal blood sample.It is logical The chromosome for crossing the euploid that concern is expected in fetus can exclude a possibility that fetus is with trisomy.Furthermore, it is possible to really Determine whether foetal DNA is not from current father.

In some embodiments, two or more methods as described herein are used to determine the genetic data of individual Phase.In some embodiments, bioinformatics method is (such as most probable using being inferred based on the Haplotype frequencies of group Stage) and molecular biology method (such as any molecule phasing method disclosed herein, to obtain actual sublevel segment data, Rather than the deduction phase data based on bioinformatics).In some embodiments, using from other subjects (such as Previous subject) determine phase data to improve population data.For example, the phase data of determining from other subjects can be added to In population data, to calculate the priori of the possibility haplotype of another subject.In some embodiments, using from other by Examination person's (such as previous subject) determines phase data to calculate the priori of the possibility haplotype of another subject.

In some embodiments, probability data can be used.For example, due in sample DNA molecular indicate it is probability Matter and various amplifications and measured deviation are divided from two different locis or from the DNA that the not iso-allele to anchor point measures Son relative populations be always do not represent in mixture or in individual molecule relative number.If attempted by from individual The DNA sequencing of blood plasma come determine normal diploid individual on autosome chromosome to the genotype at anchor point, can be with It is expected that only observing two allele (heterozygosis) of an allele (homozygous) or approximately equivalent quantity.If at this 10 A allele molecules are observed at allele, and observe 2 B allele, then do not know the individual in the position Whether point is homozygous, and two molecules of B allele be due to noise or pollution, or if individual is heterozygosis, And the molecule of the relatively low number of B allele is random, the statistics variations due to DNA molecular number in blood plasma, expands deviation, Pollution or any number of other reasons.In such a case, it is possible to calculate, individual is the probability of homozygosis and individual is heterozygosis Corresponding probability, and these probability genotype can be used for further calculating.

Note that the number of observed molecule is bigger, which closely represents for given allele ratio A possibility that ratio of DNA molecular, is bigger in individual.For example, if 100 A molecules of measurement and 100 B molecules, practical ratio If rate is that a possibility that 50% a possibility that ratio measures 10 A molecules and 10 B molecules is much bigger.In an embodiment In, one uses bayesian theory in conjunction with detailed data model, and to determine, ad hoc hypothesis is in the case where given observation A possibility that correct.For example, one corresponds to disomy individual-if it is considered that two are assumed that-corresponds to a trisomy individual Then disomy is assumed the case where being correct probability for two 100 molecules each of assumed will it is quite high with observe Into two allele, the case where 10 molecules of each, is compared, and observes allele.Due to deviation, pollution or some Other noise sources, or as the observation quantity at given trace declines, data become more plus noise, in the given number observed In the case where decline, the probability that maximum likelihood is assumed is true.In practice, the probability on many sites can be gathered Collection can determine that maximum likelihood hypothesis is the confidence level correctly assumed to increase.In some embodiments, simply polymerization is general Rate is recombinated without considering.In some embodiments, it calculates and considers to intersect.

In one embodiment, the phased data of probability are for determining that copy number changes.In some embodiments, probability Phased data are the unit type block frequency data based on population from such as data source of HapMap database.In some implementations In scheme, the phased data of probability are the Haplotype datas obtained by molecular method, such as carry out determining phase by dilution, wherein often Each section of chromosome is diluted to individual molecule by a reaction, but due to hereditary noise, the identity of haplotype may be without cease To knowing.In some embodiments, probability to determine phase data be the Haplotype data obtained by molecular method, wherein single times The identity of type can be known to high certainty.

The case where imagining a hypothesis, doctor want to determine whether people in their body has some cells specific Chromosome segment by measurement individual plasma dna have a deletion.Doctor can utilize following knowledge: if all blood plasma The cell of DNA is diploid and genotype having the same, then for heterozygous sites, for each of two allele Decline is divided into one centered on 50%A allele and 50%B allele by the relative number for the DNA molecular observed Distribution.However, there is missing at specific chromosome segment if being originated from a part of the cell of plasma dna, for heterozygosis Site, it is contemplated that the relative number for the DNA molecular that each of two allele is observed will fall into two distributions, wherein One center is located on the 50%A allele in the site of the chromosome segment missing containing B allele, and a center Positioned at the 50% or less of the site of the chromosome segment missing containing A allele.The ratio containing deletion cells of plasma dna Bigger, then the 50% of the two distributions is remoter.

In the case where this hypothesis, imagine that a clinician wants to determine whether a people has a chromosomal region The cell of a ratio deleted in personal body.Clinician can by individual blood be drawn into vacuum blood collection tube or its In the blood tube of his type, centrifugal blood, and plasma layer is isolated.Clinician can separate DNA from blood plasma, at target site Enrichment DNA, may be by targeting or other amplifications, and site capture technique, size is enriched with or other beneficiation technologies.Clinician can For example by using measurement such as qPCR, to be sequenced, the allele quantity of one group of SNP of microarray or other measurements, in other words The DNA for generating gene frequency data, enrichment and/or amplification is said, to measure the amount of DNA in sample.We will consider clinic Data in the case that doctor uses targeting amplification technique amplification cell-free plasma DNA are analyzed, and are then surveyed to the DNA of amplification Sequence, with provided at six SNP finding on the chromosome segment of instruction cancer following exemplary may data, wherein individual exists Those SNP are heterozygosis:

SNP 1:460 interprets A allele；540 interpret B allele；(46%A)

SNP 2:530 interprets A allele；；470 interpret B allele；(53%A)

SNP 3:40 interprets A allele；；60 interpret B allele；(40%A)

SNP 4:46 interprets A allele；；54 interpret B allele；(46%A)

SNP 5:520 interprets A allele；；480 interpret B allele；(52%A)

SNP 6:200 interprets A allele；；200 interpret B allele；(50%A)

From this group of data, it may be difficult to which distinguishing individual ownership cell is that diploid is normally or individual may have A part of cancer cell, DNA make have the Cell-free DNA found in missing or the blood plasma of duplication on chromosome.For example, tool There are two of maximum likelihood to assume that can be individual has missing in the chromosome segment, with 6% tumour score, and Wherein genotype of the deleted segment of chromosome with six SNP for being more than (A, B).B, A, A, B, B) or (A, B, A, A, B, For idiotype in this expression on one group of SNP, the first letter in bracket corresponds to the gene of the haplotype of SNP 1 Type, second corresponds to SNP 2 etc..

If determining the haplotype of the individual at the chromosome segment using a kind of method, and it was found that two dyeing The haplotype of one of body is (A, B, A, A, B, B), then this will coincide with maximum likelihood hypothesis, and the individual calculated is in the segment Place has missing and therefore may have a possibility that carcinous or precancerous cell that will increase.On the other hand, if it find that individual tool There is haplotype (A, A, A, A, A, A), then individual in the chromosome segment there is a possibility that missing will significantly reduce, and not lack A possibility that hypothesis, can be higher (practical likelihood value will depend on the measurement noise etc. in other parameters, such as system).

The haplotype of individual, many descriptions elsewhere in this document are determined there are many method.Here it provides Partial list, is not meant to exhaustive.A kind of method is biological method, and wherein single DNA molecules are diluted, until About molecule from each chromosomal region is in any given reaction volume, then using the method being such as sequenced Measure genotype.Another method is that the population data based on information coupled based on various haplotypes with its frequency can be with general Rate mode uses.Another method is that the diploid data of measurement individual and expection are shared haplotype section with individual and inferred One or more related individuals of haplotype section.Still an alternative is that taking out the missing or repeated fragment with high concentration Tissue sample, and haplotype is determined based on allele imbalance, it is, for example, possible to use the neoplasmic tissue samples with missing Genotype measurement determines phase data determine the missing area, which can be used for determining regrowth after whether cancer has cut off.

In fact, measurement is typically more than 20 SNP on given chromosome segment, it is more than 50 SNP, is more than 100 SNP is more than 500 SNP, more than 1,000 SNP or is more than 5,000 SNP.

For determining phase, prediction allele ratio and the illustrative methods for rebuilding fetus genetic data

On one side, the present invention is characterized in that the method for determining one or more haplotypes of fetus.In various implementations In scheme, this method allows to determine which polymorphic site (such as SNP) by fetus genetic and rebuilds, which homologue (packet Include recombination event) it is present in fetus (and thus sequence between interpolation polymorphic site).When necessary, tire can substantially be rebuild The whole gene group of youngster.If there are some remaining ambiguities (such as to have the interval intersected in the genome of fetus In), when necessary, this ambiguity can be minimized by analyzing other polymorphic site.In various embodiments, it selects It selects polymorphic site and one or more chromosome is covered with certain density, any ambiguity is reduced to required level.This method Important application with polymorphism or other interested mutation (such as missing or repetition) in detection fetus, because it can base It is detected in chain (such as there are chain polymorphic sites in Fetal genome), rather than instructs detection Fetal genome In purpose polymorphism or other mutation.For example, if parent is the carrier of mutation relevant to cystic fibrosis (CF), it can To analyze the nucleic acid samples for including the mother body D NA from fetus mother and the foetal DNA from fetus, to determine that foetal DNA is No includes that the haplotype contains CF mutation.Particularly, it can analyze polymorphic site to determine whether foetal DNA includes containing CF The haplotype of mutation, without detecting CF mutation itself in foetal DNA.This can be used for screening one or more mutation, such as The relevant mutation of disease, without directly detecting mutation.

In some embodiments, the method includes for example determining parent's list by using any method as described herein Times type (for example, haplotype of mother of fetus or father).In some embodiments, without using from mother or father The data of relatives are determined.In some embodiments, SNP Genotyping or survey are carried out using dilution method as described herein Sequence measures parent's haplotype.In some embodiments, passed through herein using the data of the relatives from mother (or father) Any method determines the haplotype of mother (or father).In some embodiments, single times of father and mother is determined Type.

Parent's haplotype data can be used for determining fetus whether heredity parent's haplotype.In some embodiments, It the use of SNP array analysis include the nucleic acid samples from mother body D NA and foetal DNA, with detection at least 100；200；500；750； 1,000；2,000；5,000；7,500；10,000；20,000；25,000；30,000；40,000；50,000；75,000；Or 100,000 different polymorphic sites.In some embodiments, logical including carrying out the nucleic acid samples of mother body D NA and foetal DNA Crossing contacts sample with the primed libraries for hybridizing at least 100 simultaneously to analyze；200；500；750；1,000；2,000；5, 000；7,500；10,000；20,000；25,000；30,000；40,000；50,000；75,000；Or 100,0 different more State property site (such as SNP) is to generate reaction mixture.In some embodiments, make reaction mixture experience primer extend anti- Condition is answered to generate amplified production.In some embodiments, with high-flux sequence instrument measurement amplified production to generate sequencing number According to.

In various embodiments, intersect at the different location in chromosome or chromosome segment using about chromosome Probability data (such as by using recombination data, such as the recombination data that can be found in HapMap database, to produce Raw recombination risk score) determine any interval of fetus haplotype) to simulate the polymorphism etc. on chromosome or chromosome segment Dependence between the gene of position, as described above.In some embodiments, the method consider SNP (such as positioned at gene or The SNP of mutation flank interested) and recombinate the recombination data of possibility from location specific and surveyed from the hereditary of Maternal plasma The physical distance for the data observed is measured, to obtain at most possible fetus genotype.It then can be to obtaining from these SNP Targeting sequencing or SPN array data carry out PARENTAL SUPPORT TM, and to determine fetus, from two parents heredity, which is same Source object is (see, for example, U.S. Application No. 11/603,406 (US publication 20070184467), U.S. Application No. 12/076,348 (US publication 20080243398), U. S. application 13/110,685 (US publication 2011/0288780), PCT application PCT/US09/52730 (PCT Publication WO/2010/017214) and PCT Application No. PCT/US10/050824 (PCT Publication W0/2011/041485), U.S. Application No. 13/300,235 (US publications 2012/0270212), U.S. Application No. 13/ 335,043 (US publications, 2012/0122701), U.S. Application No. 13/683,604 and U.S. Application No. 13/780,022, It is integrally incorporated herein each by reference).

Assuming that the possibility allele at one of site is the general embodiment of A and B；It is any by identity A or B points The specific allele of dispensing.For the parent genotype of specific SNP, referred to as genetic background, it is expressed as female parent | male parent gene type. Therefore, if it is heterozygosis that mother, which is homozygous and father, this will be indicated as AA | AB.Similarly, if two parents couple Be in identical allele it is homozygous, then parent genotype will be indicated as AA | AA.In addition, fetus will never have AB or BB state, and the number of the sequence reads with B allele will be low, and therefore be determined for measurement and base Because of the noise response of parting platform, influence and sequencing mistake including such as low-level DNA pollution；These noise responses can be used for The genetic data of modeling prospective is composed.Only there are five types of possible maternal father's genetic backgrounds: AA | AA, AA | AB, AB | AA, AB | AB and AA|BB；Other backgrounds pass through symmetry equivalent.Wherein parent is that homozygous SNP is only used for determining for phase iso-allele The information of noise and level of pollution.Wherein parent is not that homozygous SNP is determining fetus score and copying for phase iso-allele It is informedness in terms of shellfish counting number.

The number of the reading of each allele of the NAJ and NBJ expression at SNP is enabled, and Ci is enabled to indicate in the site Parent's genetic background at place.Data set for specific chromosome is by Nab={ NaxNbj } i=1...N and C={ Ci }, i= 1...N it indicates.In order to rebuild part or all of Fetal genome, optionally it can determine whether fetus has aneuploid (example Such as the missing or additional copy of chromosome or chromosome segment).For each individual chromosome or chromosome in research, H is allowed It indicates total chromosome number, is recombinated during fertilization gamete is formed on the parental source and parental chromosome of each chromosome One or more set assumed of position create child.Can be used data from HapMap database and with each ploidy The relevant previous message of state assumes the probability of P (H) to calculate.

In addition, F is enabled to indicate the part fetus cfDNA in sample.One group of possible H, C and F are given, it can be based on to molecule The noise source of measurement and microarray dataset is modeled to calculate N ab, the probability of P (N ab, H, F, C).Target is to find to assume H With so that the maximized fetus score F. of P (H'F Nab) and is assumed that F is uniform from 0 to 1 using Standard Bayesian statistical technique Probability distribution, this can according to maximize P (Nab | H, F, C) P (H) relative to the probability of H and F and rewrite wherein now can based on It calculates.It will be with specific copy number and fetus score (such as trisomy and F=10%, but cover all possible parental set of chromosome and rise Point and crossover location) probability of relevant all hypothesis is added.The copy number with maximum probability is selected to assume to tie as test Fruit, fetus score associated with the hypothesis discloses fetus score, and probability associated with the hypothesis is knot calculated The accuracy of fruit.

In some embodiments, algorithm generates larger numbers of hypothesis sequencing data collection using computer simulation, It may be from the possible fetus genetic hereditary pattern of method, sample parameters and amplification and measurement illusion.More specifically, algorithm is first Parent genotype first with a large amount of SNP and the crossover frequency data from HapMap database predict possible fetus gene Type.Then it predicts the anticipatory data spectrum of sequencing data, will carry mother of the fetus of every kind of possible fetus genotype Mixing sample measure and consider various parameters, including fetus score, it is contemplated that read depth spectrum, Fetal genome is present in sample Equivalent in product, expected amplification deviation and multiple noise parameters at each SNP.Data model is described for given Each of these hypothesis of specific set of parameters, how Preference order or SNP array data occur.Selection is in the modeling data Hypothesis with optimum data fitting between measurement data.

When necessary, the result that the haplotype of fetus genetic can be used calculates the expection equipotential of DNA or RNA from fetus Gene ratio.Expected allele ratio (this can also be calculated to comprising the mixing sample from mother and the nucleic acid of fetus A little allele ratios indicate the desired value for measuring the total amount of each allele, including the equipotential base from maternal nucleic acids The amount acid and fetal nucleic acid of cause).It can be the different hypothesis meters of the degree of the overexpression of specified first homologous chromosomal segments Calculate expected allele ratio.

In some embodiments, this method includes determining whether fetus has one of following illness or a variety of: capsule Property fibrosis, Huntington's chorea, fragile X, thrombopenia, muscular dystrophy (such as Duchenne muscular dystrophy), Ah Alzheimer's disease, Fanconi anemia, Gaucher disease, IV, Niemann-Pick disease, tay-Sachs disease, sickle-cell anemia, Parkinson's disease, Twist mode myodystony and cancer.In some embodiments, for being derived from one or more of chromosome 13,18,21, X and Y A chromosome determines fetus haplotype.In some embodiments, fetus haplotype is determined for all fetal chromosomals.Each In kind embodiment, this method substantially determines the whole gene group of fetus.In some embodiments, for the gene of fetus At least the 30%, 40%, 50%, 60%, 70%, 80%, 90% or 95% of group determines haplotype.In some embodiments, The haplotype measurement of fetus includes the information which allele to have at least 100 about；200；500；750；1,000；2, 000；5,000；7,500；10,000；20,000；25,000；30,000；40,000；50,000；75,000；Or 100,000 Different polymorphic sites.In some embodiments, this method is used to determine the haplotype or allele ratio of embryo.

For predicting the illustrative methods of allele ratio

The illustrative methods of expection allele ratio for calculating sample are described below.Table 1 is shown containing next From the expection allele ratio of mother and the mixing sample (such as maternal blood sample) of the nucleic acid of fetus.Expected from these etc. Position gene ratio indicates desired for the measurement of the total amount of each allele, including the parent core in mixing sample The amount of the allele of acid and fetal nucleic acid.In one embodiment, parent is in the expected two adjacent sites (examples isolated It is such as, expected between site that there is no two sites of chromosome exchange) it is heterozygosis.Therefore, mother is (AB, AB).Now Imagine mother stage by stage statistics indicate that, for a haplotype, she is (A, A)；Therefore, it for other haplotypes, may infer that She is (B, B).Table 1 gives the different expection allele ratios assumed that fetus score is 20%.It is not false for the example If the knowledge of father's data, and assume that heterozygosis rate is 50%.Expected allele ratio is with each of two SNP's (expection ratio/reading sum of A reading) provides.Determining phase data using parent, (haplotype is (A, A) and one is (B, B)) and phase data is determined without using parent to calculate these ratios.Table 1 includes the fetal chromosomal piece from each parent The different of the copy number of section are assumed.

The expection genetic data of mother and fetal nucleic acid mixing sample

Other than the quantity for using phase data to reduce possible expected allele ratio, it also changes each expection The previous likelihood of allele ratio, so that maximum likelihood result is it is more likely that correctly.Eliminate impossible expection etc. Position gene ratio is assumed to increase selection correct a possibility that assuming.As an example it is supposed that the allele ratio of measurement is (0.41,0.59).In the case where not using sublevel segment data, it can be assumed that the hypothesis with maximum likelihood is that two-body is assumed (it is assumed that similitude of the allele ratio of measurement and disomy expected allele ratio (0.40,0.60)).However, using Data by stages can exclude the expection allele ratio that (0.40,0.60) is assumed as two-body, and can choose three-body hypothesis It is more likely to.

Assuming that the allele ratio of measurement is (0.4,0.4).There is no any haplotype information, the mother at each SNP The probability of body missing will be 0.5 × P (A missing)+0.5 × P (B missing).Therefore, while it seem that A is deleted (scarce in fetus Lose), but the average value that a possibility that deleting will both be.For sufficiently high fetus score, still can determine most probable Assuming that.For sufficiently low fetus score, average value may be unfavorable for missing and assume.However, for haplotype information, homologue 1 deleted probability P (A deletion) is bigger, and will preferably be fitted measurement data.When necessary, it is also contemplated that two sites Between crossover probability.

In another illustrative embodiments using phase data combination likelihood, consider two continuous SNP s1 with S2, and D1 and D2 indicate the allele data in these SNP.We provide an example herein, how to combine this two The probability of a single nucleotide polymorphism.C is enabled to indicate that two continuous heterozygosis SNP have phase iso-allele in identical homologue The probability of (that is, it is BA that two SNP, which are AB or two SNP).Therefore, 1-c indicates that SNP is AB, and the other is BA's is general Rate.For example, it is contemplated that assuming H10 and allele unbalanced value f.First, it is assumed that assuming that all SNP are AB or BA, calculate all Probability.Then, we can be as a result as follows by the probabilistic combination in two continuous SNP:

Lik(D₁, D₂|H₁₀, f)=

Lik(D₁|H₁₀, f) and × c × Lik (D₂|H₁₀, f) and+Lik (D₁|H₁₀, f) and × (1-c) × Lik (D₂|H₀₁, f) and

We can recursively determine the combined probability Lik (D of all SNP₁..., D_N|H₁₀, f).

Exemplary mutations

With the increased risk of disease or illness (such as cancer) or disease or illness (such as cancer) (such as higher than normal Risk level) relevant exemplary mutations include mononucleotide variant (SNV), polynucleotides mutation, missing (such as missing 2 to 3,000 ten thousand base pair regions), duplication or tandem sequence repeats.In some embodiments, mutation is in DNA, such as cfDNA, Cell-free mitochondrial DNA (cf mDNA) is originated from core DNA (cf nDNA), the Cell-free DNA of cell DNA or mitochondrial DNA. In some embodiments, mutation is RNA, such as cfRNA, cell RNA, cytoplasm rna, Codocyte matter RNA, and non-coding is thin Cytoplasm RNA, mRNA, miRNA, mitochondrial RNA (mt RNA), rRNA or tRNA.In some embodiments, mutation is suffering from disease or illness Subject in the subject of (such as cancer) than no disease or illness (such as cancer) exists with higher frequency.Some In embodiment, mutation instruction cancer, such as pathogenic mutation.In some embodiments, mutation is that have in disease or illness There is the driving of pathogenic effects to be mutated.In some embodiments, mutation is not pathogenic mutation.For example, in certain cancers, it is multiple Mutation accumulation, but some of them are not pathogenic mutations.The mutation of non-pathogenic is (such as in the subject with disease or illness Than in the subject of no disease or illness to be mutated existing for higher frequency) still can be used for diagnosing the illness or illness.? In some embodiments, mutation is the loss of heterozygosity (LOH) in one or more microsatellites.

In some embodiments, subject is screened known to subject to have and (for example, testing its presence, there are these The variation of the amount of polymorphism or the cell of mutation, DNA or RNA) one or more polymorphisms or mutation or cancer remission or again Occur).In some embodiments, for be in known to subject one of risk or a variety of polymorphisms or mutation (such as The subject of relatives with polymorphism or mutation) screening subject.In some embodiments, to subject's screening and disease Or the relevant one group of polymorphism of illness such as cancer or mutation, (for example, at least 5,10,50,100,200,300,500,750,1, 000,1,500,2,000 or 5,000 polymorphism or mutation).

Many coding variants relevant to cancer are described in Abadan et al., " the exon group of NCI-60 experimental subjects: base Because of the carcinobiology and system of group resource ", cancer research, on July 15th, 2013 and WWW at Dtp.nci.nih.gov/branches/btb/characterizationNCI60.html, it is whole simultaneously each by reference Enter herein).NCI-60 human carcinoma cell line group represents lung, colon, brain, ovary, mammary gland, prostate and kidney and white blood by 60 The different cell lines composition of the cancer of disease and melanoma.The hereditary variation identified in these cell lines is by two types: normal The II type variant of the I type variant and cancer specific found in group forms.

Example polymorphic or mutation (such as missing or repetition) are in one or more following genes: TP53, PTEN, PIK3CA, APC, EGFR, NRAS, NF2, FBXW7, ERBB, ATAD5, KRAS, BRAF, VEGF, EGFR, HER2, ALK, p53, BRCA, BRCA1, BRCA2, SETD2, LRP1B, PBRM, SPTA1, DNMT3A, ARID1A, GRIN2A, TRRAP, STAG2, EPHA3/5/7, POLE, SYNE1, C20orfB0, CSMD1, CTNNB1, ERBB2.FBXW7, KIT, MUC4, ATM, CDH1, DDX11, DDX12, DSPP, EPPK1, FAM186A, GNAS, HRNR, KRTAP4-II, MAP2K4, MLL3, NRAS, RBI, SMAD4, TTN, ABCC9, ACVR1B, ADAM29, ADAMTS19, AGAP10, AKT2, CBWD1, CCDC30, CCDC93, CD5L, CDC27, CDC42BPA, CDH9, CDKN2A, CHD8, CHEK2, CDK2, CHIN9, CIZ1, CLSPN, CNTN6, COL14A1, CREBBP, CROCC, CTSF, CYP1A2, DCLK1, DHDDS, DHX32, DKK2, DLEC1, DNAH14, DNAH5, DNAH9, DNASE1L3, DUSP16, DYNC2H1, ECT2, EFHB, RRN3P2, TRIM49B, TUBB8P5, EPHA7, ERBB3, ERCC6, FAM21A, FAM21C, FCGBP, FGFR2, FLG2, FLT1, FOLR2, FRYL, FSCB, GAB1, GABRA4, GABRP, GH2, GOLGA6L1, GPHB5, GPR32, GPX5, GTF3C3, HECW1, HIST1H3B, HLA-A, HRAS, HS3ST1, HS6ST1, HSPD1, IDH1, JAK2, KDM5B, KIAA0528, KRT15, KRT38, KRTAP21-1, KRTAP4-5, KRTAP4-7, KRTAP5-4, KRTAP5-5, LAMA4, LATS1, LMF1, LPAR4, LPPR4, LRRFIP1, LUM, LYST, MAP2K1, MARCH1, MARCO, MB21D2, MEGF10, MMP16, MORC1, MRE11A, MTMR3, MUC12, MUC17, MUC2, MUC20, NBPF10, NBPF20, NEK1, NFE2L2, NLRP4, NOTCH2, NRK, NUP93, OBSCN, OR11H1, OR2B11, OR2M4, OR4Q3, OR5D13, OR8I2, OXSM, PIK3R1, PPP2R5C, PRAME, PRF1, PRG4, PRPF19, PTH2, PTPRC, PTPRJ, RAC1, RAD50, RBM12, RGPD3, RGS22, ROR1, RP11-671M22.1, RP13-996F3.4, RP1L1, RSBN1L, RYR3, SAMD3, SCN3A, SEC31A, SF1, SF3B1, SLC25A2, SLC44A1, SLC4A11, SMAD2, SPTA1, ST6GAL2, STK11, SZT2, TAF1L, TAX1BP1, TBP, TGFBI, TIF1, TMEM14B, TMEM74, TPTE, TRAPPC8, TRPS1, TXNDC6, USP32, UTP20, VASN, VPS72, WASH3P, WWTR1, XPO1, ZFHX4, ZMIZ1, ZNF167, ZNF436, ZNF492, ZNF598, ZRSR2, ABL1, AKT2, AKT3, ARAF, ARFRP1, ARID2, ASXL1, ATR, ATRX, AURKA, AURKB, AXL, BAP1, BARD1, BCL2, BCL2L2, BCL6, BCOR, BCORL1, BLM, BRIP1, BTK, CARD11, CBFB, CBL, CCND1, CCND2, CCND3, CCNE1, CD79A, CD79B, CD73, CDK12, CDK4, CDK6, CDK8, CDKN1B, CDKN2B, CDKN2C, CEBPA, CHEK1, CIC, CRKL, CRLF2, CSF1R, CTCF, CTNNA1, DAXX, DDR2, DOT1L, EMSY (Cllorf10), EP300, EPHA3, EPHB1, ERBB4, ERG, ESR1, EZH2, FAM123B (WTX), FAM46C, FANCA, FANCC, FANCD2, FANCE, FANCF, FANCG, FANCL, FGF10, FGF14, FGF19, FGF23, FGF3, FGF4, FGF6, FGFR1, FGFR2, FGFR3FGFR4, FLT3, FLT4, FOXL2, GATA1, GATA2, GATA3, GID4 (C17 or 39), GNA11, GNA13, GNAQ, GNAS, GPR124, GSK3B, HGF, IDH1, IDH2, IGF1R, IKBKE, IKZF1, IL7R, IRF4, IRS2, JAK1, JAK3, JUN, KAT6A (MYST3), KDM5A, KDM5C, KDM6A, KDR, KEAP1, KLHL6, MAP2K2, MAP2K4, MAP3K1, MCL1, MDM2, MDM4, MED12, MEF2B, MEN1, MET, MITF, MLH1, MLL, MLL2, MPL, MSH2, MSH6, MTOR, MUTYH, MYC, MYCL1, MYCN, MYD88, NF1, NFKBIA, NKX2-1, NOTCH1, NPM1, NRAS, NTRK1, NTRK2, NTRK3, PAK3, PALB2, PAX5, PBRM1, PDGFRA, PDGFRB, PDK1, PIK3CG, PIK3R2, PPP2R1A, PRDM1, PRKAR1A, PRKDC, PTCH1, PTPN11, RAD51, RAF1, RARA, RET, RICTOR, RNF43, RPTOR, RUNX1, SMARCA4, SMARCB1, SMO, SOCS1, SOX10, SOX2, SPEN, SPOP, SRC, STAT4, SUFU, TET2, TGFBR2, TNFAIP3, TNFRSF14, TOPI, TP53, TSC1, TSC2, TSHR, VHL, WISP3, WT1, ZNF217, ZNF703 and combinations thereof (Soviet Union et al. " J More Dell root 2011,13:74-84；DOI:10.1016/ j.jmoldx.2010.11.010；With Abadan et al., " the exon group of NCI-60 experimental subjects: the cancer of genome resource Biology and system ", cancer research, on July 15th, 2013 are integrally incorporated herein each by reference).In some embodiment party In case, the repetition is that chromosome 1p (" Chrlp ") relevant to breast cancer repeats.In some embodiments, one or more A polymorphism or mutation are in BRAF, such as V600E mutation.In some embodiments, one or more polymorphisms or mutation It is K-ras.In some embodiments, there are one or more polymorphisms or the combinations of mutation in K-ras and APC.One In a little embodiments, there are one or more polymorphisms or the combinations of mutation in K-ras and p53.In some embodiments, There are one or more polymorphisms or the combinations of mutation in APC and p53.In some embodiments, in K-ras, APC and p53 There are one or more polymorphisms or the combinations of mutation.In some embodiments, exist in K-ras and EGFR a kind of or more The combination of kind polymorphism or mutation.Example polymorphic or mutation are in one or more following microRNAs: miR-15a, miR- 16-1, miR-23a, miR-23b, miR-24-1, miR-24-2, miR-27a, miR-27b, miR-29b-2, miR-29c, miR- 146, miR-155, miR-221, miR-222 and miR-223 (card woods et al. " prognosis with chronic lymphocytic leukemia and into Open up relevant Microrna label " New England Journal of Medicine 353:1793-801,2005, this is integrally incorporated by reference Text).

In some embodiments, missing is at least 0.01kb, 0.1kb, 1kb, 10kb, 100kb, 1mb, 2mb, 3mb, The missing or 40mb of 5mb, 10mb, 15mb, 20mb, 30mb.In some embodiments, missing is 1kb to lacking between 40bp It loses, such as 1kb to 100kb, 100kb be to 1mb, 1 to 5mb, 5 to 10mb, 10 to 15mb, 15 to 20bp mb, 20 to 25mb, 25 To 30mb or 30 to 40mb.

In some embodiments, the repetition is at least 0.01kb, 0.1kb, 1kb, 10kb, 100kb, 1mb, 2mb, The repetition of 3mb, 5mb, 10mb, 15mb, 20mb, 30mb or 40mb.In some embodiments, described to repeat to be 1kb to 40bp Between repetition, such as 1kb to 100kb, 100kb be to 1mb, 1 to 5mb, 5 to 10mb, 10 to 15mb, 15 to 20mb, 20 to 25mb, 25 to 30mb or 30 to 40mb.

In the oncogene that some reality BRAF are the downstreams of Ras.In glioma, melanoma is reflected in thyroid gland and lung cancer Having determined BRAF mutation, (Manuel Diaz-Si Tegeda et al. BRAF V600E mutation is common in pleomorphism yellow cell tumor: diagnosing and controls Treating influences Public science library journal 2011；6:e17948,2011；B-RAF DNA mutation is for supervising in the rugged equal human serums of formal little slender bamboo Survey the applying clinical Cane Res 13:2068-2074 for receiving the melanoma patient of biochemotherapy, 2007；With Boulder et al. Detection participation AZD6244 (ARRY-142886) advanced melanoma II phase studies .Brit J Cane 2009；101:1724- 1730, be integrally incorporated herein each by reference).BRAF V600E mutation occurs, such as " in Melanoma Tumor, and Late the stage is more common.Detect that V600E is mutated in cfDNA.It applies in scheme, tandem sequence repeats are 2 to 60 nucleotide, example Such as 2 to 6,7 to 10,10 to 20,20 to 30,30 to 40,40 to 50 or the repetition of 50 to 60 nucleotide.In some embodiments In, tandem sequence repeats are the repetitions (dinucleotides repetition) of 2 nucleotide.In some embodiments, tandem sequence repeats are 3 nucleosides The repetition (Trinucleotide repeats) of acid.

In some embodiments, polymorphism or mutation are prognosis.Illustrative prognosis mutation includes that K-ras is mutated, Such as the K-ras mutation as colorectal cancer disorders post surgery relapse indications (relies peace et al. " in the serum of colorectal carcinoma patient A perspective study of cycle mutant KRAS2: strong prognostic indicator follow-up after surgery ", lattice spy 52:101-108,2003；With The detection of free circulating tumor correlation DNA and its relationship with prognosis, international cancer in Patrice Leconte T et al. colorectal cancer patients blood plasma Disease magazine 100:542-548,2002, be integrally incorporated herein each by reference).

In some embodiments, polymorphism or mutation and the reacting of the change to particular treatment (such as effect or secondary work Increase or decrease) it is related.Embodiment include K-ras mutation in non-small cell lung cancer to the anti-of the treatment based on EGFR It should reduce in relation to (" KRAS mutation based on blood plasma analyzes the potential clinical meaning in late Patients with Non-small-cell Lung to Wang et al. Justice, " Clinical Cancer Research 16:1324-1330,2010, be incorporated herein by reference in their entirety).

K-ras is the oncogene being activated in many cancers.K-ras cfDNA is mutated in cancer of pancreas, lung cancer, colon Identification in the carcinoma of the rectum, bladder cancer and gastric cancer (Fu Lieshi Haake your Schmidt " circle nucleic acid (CNA) and cancer-one investigate ", Acta Biochimica et Biophysica Sinica [J] 1775:181-232,2007, be incorporated herein by reference in their entirety).

P53 is the tumor suppressor gene of tumour progression to be mutated and facilitated in many cancers (human relations of Lay text & Austria are " before p53 30 years: growth becomes increasingly complex, and is naturally comprehensive to say cancer periodical, 9:749-758,2009, bibliography).Many different passwords Son can be mutated, such as Ser249.In breast cancer, lung cancer, oophoroma, bladder cancer, gastric cancer, cancer of pancreas, colorectum P53cfDNA mutation (Fu Lieshi Haake that Schmidt " circle nucleic acid (CNA) and cancer are identified in cancer, intestinal cancer and hepatocellular carcinoma Disease-investigation, " Acta Biochimica et Biophysica Sinica [J] 1775:181-232,2007, be incorporated herein by reference in their entirety).

BRAF is the oncogene in the downstream of Ras.In glioma, melanoma identifies BRAF in thyroid gland and lung cancer (Manuel Diaz-Si Tegeda et al. BRAF V600E mutation is common in pleomorphism yellow cell tumor: diagnosing and treating influences for mutation Public science library journal 2011；6:e17948,2011；B-RAF DNA mutation receives for monitoring in the rugged equal human serums of formal little slender bamboo The applying clinical Cane Res 13:2068-2074 of the melanoma patient of biochemotherapy, 2007；Join with Boulder et al. detection .Brit J Cane 2009 is studied with AZD6244 (ARRY-142886) advanced melanoma II phase；101:1724-1730, it is each From being incorporated herein by reference in their entirety).BRAF V600E mutation occurs, such as " in Melanoma Tumor, and the late stage It is more common.Detect that V600E is mutated in cfDNA.

EGFR facilitates cell Proliferation and is adjusted (in Tang's Ward J. Targeted cancer therapy in many cancers by mistake RAS signal pathway is naturally comprehensive to say cancer periodical 3:11-22,2003；" first 30 years of p53: growth is more next with Lay text & Austria human relations More complicated is naturally comprehensive to say cancer periodical, " 9749-758,2009, be incorporated herein by reference in their entirety).Exemplary EGFR mutation Including the EGFR mutation in identified exons 1 8-21 in patients with lung cancer.EGFR is identified in patients with lung cancer (" prediction/pleural effusion of blood plasma epidermal growth factor receptor mutation is to treated with gefitinib evening by Ji Ya et al. for cfDNA mutation The curative effect of phase non-small cell lung cancer, " cancer research and Journal of Clinical Oncology 2010；136:1341-1347,2010, pass through Reference is integrally incorporated herein).

Example polymorphic relevant to breast cancer or mutation include loss of heterozygosity (section's Le et al. " blood in microsatellite Starch the free potential source biomolecule marker of core and mitochondrial DNA level as tumor of breast of circulating cells ", mole cancer 8:doi: 10.1186/1476-4598-8-105 2009, be incorporated herein by reference in their entirety), p53 mutation (such as in exon 5-8 Mutation) (add West Asia et al. " extracellular Tumour DNA blood plasma and patient with breast cancer overall survival, " gene, chromosome and cancer Disease 45:692-701,2006, is incorporated herein by reference in their entirety), (Soren is gloomy et al. by human epidermal growth factor acceptor II " survival and reaction of circulation HER2 DNA prediction breast cancer after Herceptin treatment, " anticancer research 30:2463-2468, (Moore tower bundle et al., " is sequenced by plasma dna to acquired cancer for 2010, PIK3CA, MED1 and GAS6 polymorphism or mutation Disease treats the TSfon invasion analysis of drug resistance, " nature periodical 2013；Doi:10.1038/ natural 12065, by quote with It is integrally incorporated herein) 2013, be incorporated herein by reference in their entirety).

Increased cfDNA level and LOH are related to totality and without the reduction of disease survival rate.P53 be mutated (exon 5-8) with Overall survival reduces related.HER2 is targeted in the circulation HER2 cfDNA level of reduction and HER2 positive breast tumors subject The better reaction for the treatment of is related.The activated mutant of PIK3CA, the splice mutation in the truncation and GAS6 of MED1 cause to treatment Resistance.

Example polymorphic relevant to colorectal cancer or mutation include p53, APC, K-ras and thymidylate synthase mutation (" Molecular Detection that APC, K-ras and p53 are mutated in serum in patients with colorectal is as circulation by Wang et al. with pi6 gene methylation Biomarker ", world magazine 28:721-726,2004；Lai An et al. " in colorectal carcinoma serum cycle mutant KRAS2 A perspective study: strong prognostic indicator follow-up after surgery ", Gut 52:101-108,2003；Patrice Leconte et al. " Colon and rectum The detection of free circulating tumor correlation DNA and its relationship with prognosis in cancer patients blood plasma, " international journal of cancer 100:542- 548,2002；Shi Wacen Bach et al. " thymidylate synthase polymorphism of cell-free Circulating DNA in advanced colorectal cancer blood samples of patients Analysis of molecules, " international journal of cancer 127:881-888,2009, be integrally incorporated herein each by reference).Postoperative detection K-ras mutation in serum is the strong predictive factor of palindromia.The detection and reduction of K-ras mutation and p16 gene methylation Survival it is related to increased palindromia.The detection of K-ras, APC and/or p53 mutation is related to recurrence and/or transfer.Make With the polymorphism of the thymidylate synthase (the chemotherapeutic target gene based on fluoropyrimidine) of cfDNA, (including LOH, SNP, can parameter Mesh tandem sequence repeats and missing) it may be related to therapeutic response.

Example polymorphic relevant to lung cancer (such as non-small cell lung cancer) or mutation include K-ras (such as codon Mutation in 12) and EGFR mutation.Exemplary prognosis mutation includes EGFR relevant to increased totality and progresson free survival prominent Become the progresson free survival of (missing of exons 19 or exon 21 are mutated) and K-ras mutation (in codon 12 and 13) and reduction It is related that (" prediction/pleural effusion of blood plasma epidermal growth factor receptor mutation is non-to gefitinib in treatment small for good day et al. The curative effect of cell lung cancer, " cancer research and Journal of Clinical Oncology 136:1341-1347,2010；Wang et al. " is based on blood plasma KRAS mutation analysis late the potential clinical meaning in Patients with Non-small-cell Lung " Clinical Cancer Research 16:1324-1330, 2010, be integrally incorporated herein each by reference).It indicates the example polymorphic of the response to treatment or mutation includes improvement It is prominent to the K-ras of the reaction for the treatment of to the EGFR mutation (missing of exons 19 or exon 21 are mutated) and reduction of the response for the treatment of Become (codon 12 and 13).Identified in EFGR resistance-conferring mutation (Moore tower prick et al. " be sequenced by plasma dna To the analysis of the TSfon invasion of acquired treatment of cancer drug resistance, " nature doi:10.1038/nature12065,2013, It is incorporated herein by reference in their entirety)).

Example polymorphic relevant to melanoma (such as uveal melanoma) or mutation include GNAQ, GNA11, BRAF With example polymorphic those of in p53 or mutation.Exemplary GNAQ and GNA11 mutation includes that R183 and Q209 is mutated.QA99 It is related to Bone tumour to be mutated GNAQ or GNA11.BRAF V600E mutation can detect in metastatic/advanced melanoma patient It arrives.BRAF V600E is the indicator of aggressive melanoma.After chemotherapy BRAF V600E be mutated presence with to treatment not React related.

Example polymorphic relevant to cancer of pancreas or mutation include K-ras and p53 in those of (such as p53Ser249).P53 Ser249 also has with hepatitis B infection and hepatocellular carcinoma and oophoroma and non-Hodgkin lymphoma It closes.

It can also even be detected in the sample with method of the invention with polymorphism existing for low frequency or mutation.For example, logical It crosses and carries out thousands of times sequencing reading, it can be observed that with polymorphism existing for millionth frequency or mutation 10 times.It must When wanting, the quantity of sequencing reading can be changed according to required level of sensitivity.In some embodiments, reanalyse sample or Using greater number of sequencing another sample of reading Analysis from subject to improve sensitivity.For example, if detecting Increased wind that is no or only detecting a small amount of (such as 1,2,3,4 or 5 kind) polymorphism relevant to cancer or mutation or cancer Danger, then reanalyse sample or another sample tested.

In some embodiments, multiple polymorphisms or mutation are needed for cancer or metastatic cancer.In such case Under, it screens multiple polymorphisms or the ability of accurate setting diagnosis cancer or metastatic cancer can be improved in mutation.In some embodiment party In case, multiple polymorphisms or mutation needed for an object has progress cancer or metastatic cancer, subject can be later It is screened, sees whether subject obtains additional mutation.

In multiple polymorphisms needed for wherein cancer or metastatic cancer or some embodiments of mutation, each polymorphism Or the frequency of mutation can compare, and see whether they occur in similar frequency.For example, if cancer needs two mutation (being expressed as " A " and " B "), some cells do not have, some cells have an A, some have B and it is some there is A and B, if A and B are In similar observed frequency, subject is to be more likely to both some cell A and B.If A and B is in different frequency, subject Different cell masses can be had by more having.

In I multiple polymorphisms needed for wherein cancer or metastatic cancer or some embodiments of mutation, quantity or Check in the multiple polymorphisms or some embodiments of mutation of the identity of polymorphism or mutation, be present in originally can be used for predicting it is tested Person is possible to or may have quickly the polymorphism of disease or illness or the quantity of mutation and identity.In some embodiments, Polymorphism or mutation often occur in certain sequence, and subject can periodically test, and to look into, whether subject has been obtained The other polymorphisms or mutation obtained.

In some embodiments, multiple polymorphisms or mutation (color: such as 2,3,4,5,8,10,12,15, or more) Presence or absence of can increase the present or absent sensitivity and/or specificity of disease or illness, such as cancer, or in danger In for increase with disease or illness: such as cancer.

In some embodiments, multiformity or mutation can directly detect.In some embodiments, multiformity or prominent Become indirectly by the inspection for detecting polymorphism or mutation that one or more sequences (for example, polymorphic site such as SNP) are also linked to It surveys.

Exemplary nucleic acid changes

In some embodiments, integrality (such as the piece of RNA or DNA relevant to disease or illness (such as cancer) The change of the size of the cfRNA or cfDNA of sectionization or the change of nucleosome composition) there is variations, or increase disease or illness The risk of (such as cancer).In some embodiments, methylation patterns RNA relevant to disease or illness (such as cancer) or There is variations in DNA, or the wind with disease or illness (such as cancer) (such as hyper-methylation of tumor suppressor gene) Danger increases.Such as, it has been suggested that the methylation on the island promoter region Zhong CpG of tumor suppressor gene triggers local gene suppression System.The abnormal methylation of pi6 tumor suppressor gene occurs suffering from liver, in the subject of lung and breast cancer.Various types of Other tumor suppressor genes often to methylate, including APC, Ras association structure domain family protein 1A are had detected that in cancer (RASSF1A), glutathione S-transferase PI (GSTP1) and DAPK, such as nasopharyngeal carcinoma, colorectal cancer, lung cancer disease, oesophagus Cancer, prostate cancer, bladder cancer, melanoma and acute leukemia.The methylation of certain tumor suppressor genes (such as p16) is retouched State for cancer formed in earliest events, therefore can be used for early-stage cancer screening.

In some embodiments, using bisulfite conversion or the base of use methylation sensitive restriction Enzyme digestion In the strategy of non-bisulfites come determine methylation patterns (henry et al., clinicopathologia magazine 62:308-313,2009, It is incorporated by reference into its entirety).In bisulfite conversion, the cytimidine of methylation is left cytimidine, and does not methylate Cytosines be uracil.Methylation sensitive restriction enzyme (such as BstUI) is in specific recognition site (for example, BstUI 5f-CG v CG-3') the unmethylated DNA sequence dna of cutting, and methylated DNA fragments keep complete.In some embodiments, Detect complete methylated DNA fragments.In some embodiments, non-first of the stem ring primer for selective amplification limitation enzymic digestion Base segment, without the methylate DNA of the non-enzymic digestion of coamplification.

The exemplary change of mRNA montage

In some embodiments, the variation of mRNA montage and disease or illness (such as cancer) or disease or illness (example Such as cancer) risk increase it is related.In some embodiments, the variation of mRNA montage is one relevant to cancer or more A following nucleic acid or risk of cancer increase: DNMT3B, BRCA1, KLF6, Ron or Gemin5.In some embodiments, it is examined The mRNA splice variant of survey is related to disease or illness (such as cancer).In some embodiments, a variety of mRNA splice variants It is generated by healthy cell (such as non-cancerous cell), but the variation of the relative quantity of mRNA splice variant and disease or illness are for example Cancer is related.In some embodiments, the variation of mRNA montage is the variation due to mRNA sequence (such as in splice site Mutation), the variation of splicing factor level can (such as reduction be due to splicing factor and repetition with the variation of the amount of splicing factor Combination caused by can use splicing factor amount), the montage adjusting of change or tumor microenvironment.

Montage reacts the polyprotein by spliceosome/RNA compound and carries out (Fa Kesong and Ge Deli, disease model and machine System: 37-42,2008, doi:10.1242/dmm.000331 are incorporated by reference into its entirety).Spliceosome identifies introne- Exon boundary and the introne of the removal insertion of two ester exchange reactions by causing two neighboring exons to connect.The reaction Fidelity must be accurate, because if connection occur it is incorrect, normal protein coding potentiality may be damaged.For example, It, can during exon skipping reservation specified translation in the case where the reading frame of the codeword triplet of the identity and sequence of amino acid The mRNA for becoming montage can specify the protein for lacking critical amino acid residues.More commonly, exon skipping turns over destruction Frame is translated, premature terminator codon is caused.These mRNA are usually degraded by the process of referred to as nonsense-mediated mRNA decay At least 90%, it reduce this defect information by accumulate to generate truncated protein product a possibility that.If misspelled MRNA escapes the approach, then generates truncated, mutation or unstable protein.

Alternative splicing is the means of several or many different transcripts shown from identical genomic DNA, and is Due to the available exon comprising specific protein subset and generate.By excluding one or more exons, certain eggs White matter structural domain may be from the protein loss of coding, this can lead to protein function and loses or increase.It has been described several The alternative splicing of type: exon skipping；Substitute 5' or 3' splice site；Mutually exclusive exon；In less common Containing sub- reservation.Other people compare the amount of alternative splicing in cancer and normal cell using bioinformatics method, and determine cancer Disease shows alternative splicing more lower level than normal cell.In addition, in cancer and normal cell, alternative splicing events The distribution of type is different.Cancer cell shows less exon skipping, but substitution 5' and 3' more more than normal cell is cut It connects site selection and introne retains.When inspection exon phenomenon (uses sequence as exon mainly by its hetero-organization conduct Introne use) when, gene relevant to the external sourceization in cancer cell is preferentially related with mRNA processing, shows that cancer cell and cancer are thin Exception mRNA splicing form is directly contacted between born of the same parents' generation.

The exemplary change of DNA or rna level

In some embodiments, DNA (such as cfDNA cf mDNA, cfnDNA, the cell DNA of one or more types Or mitochondrial DNA) or RNA (cfRNA, cell RNA, cytoplasm rna, Codocyte matter RNA, non-coding cytoplasm rna, mRNA, MiRNA, mitochondrial RNA (mt RNA), rRNA or tRNA).In some embodiments, one or more specific DNA (such as cfDNA cf MDNA, cfnDNA, cell DNA or mitochondrial DNA) or RNA (cfRNA, cell RNA, cytoplasm rna, Codocyte matter RNA are non- Codocyte matter RNA, mRNA, miRNA, mitochondrial RNA (mt RNA), rRNA or tRNA) molecule.In some embodiments, an equipotential Gene has more forms of expression than another allele of target site.Exemplary miRNA is the short of adjusting gene expression The RNA molecule of 20-22 nucleotide.In some embodiments, there are variation, such as one or more RNA points in transcript profile The variation of the identity or amount of son.

In some embodiments, the increase of the total amount or concentration of cfDNA or cfRNA and disease or illness (such as cancer) Or the risk increase of disease or illness (such as cancer) is related.In some embodiments, a type of DNA (such as cfDNA Cf mDNA, cfnDNA, cell DNA or mitochondrial DNA) or RNA (cfRNA, cell RNA, cytoplasm rna, Codocyte matter RNA, non-coding cytoplasm rna, mRNA, miRNA, mitochondrial RNA (mt RNA), rRNA or tRNA) total concentration and the type DNA or The total concentration of RNA is compared to increasing at least 2, and 3,4,5,6,7,8,9,10 times or more in healthy (such as non-cancerous) subject. In some embodiments, the total concentration of cfDNA is received in 75 nanograms/milliliters to 100 nanograms/milliliters, 100 nanograms/milliliters to 150 Grams per milliliter, 150 nanograms/milliliters to 200 nanograms/milliliters, 200 nanograms/milliliters to 300 nanograms/milliliters, 300 nanograms/milliliters are extremely 400ng/mgL, 400 nanograms/milliliters to 600 nanograms/milliliters, 600 to 800 nanograms/milliliters, 800 nanograms/milliliters to 1,000 receives The total concentration of grams per milliliter (including end value) or cfDNA are greater than 100 nanograms/milliliters, are greater than 200 nanograms/milliliters, 300 receive Grams per milliliter, 400 nanograms/milliliters, 500 nanograms/milliliters, 600 nanograms/milliliters, 700 nanograms/milliliters, 800 nanograms/milliliters, 900 Nanograms/milliliter or 1,000 nanograms/milliliter indicate cancer, and risk of cancer increases, and malignancy of tumor rather than benign risk increase, Cancer is poor possibly into the prognosis of alleviation or cancer.In some embodiments, a type of DNA (such as cfDNA cf MDNA, cf nDNA, cell DNA or mitochondrial DNA) or RNA (cfRNA, cell RNA, cytoplasm rna, Codocyte matter RNA, Non-coding cytoplasm rna, mRNA, miRNA, mitochondrial RNA (mt RNA), rRNA or tRNA) have and disease or illness (such as cancer) phase The risk of the one or more polymorphism/mutation (such as missing or repetition) or increased disease or illness (such as cancer) closed It is at least 2,3 kinds, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 14%, 16%, 18%, 20% or 25%. In some embodiments, at least the 2% of the total amount of a type of DNA, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 14%, 16%, 18%, 20% or 25% is used as cfDNA cf mDNA, cf nDNA, cell DNA or line Mitochondrial DNA) or RNA (cfRNA, cell RNA, cytoplasm rna, Codocyte matter RNA, non-coding cytoplasm rna, mRNA, MiRNA, mitochondrial RNA (mt RNA), rRNA or tRNA) specific polymorphism or mutation relevant to disease or illness (such as cancer) (such as Missing or repeat) or increased disease or illness (such as cancer) risk.

In some embodiments, cfDNA is packaged.In some embodiments, cfDNA is not packaged.

In some embodiments, measure total DNA (such as tumor section in total cfDNA or has in total cfDNA The part tumour cfDNA of specific mutation) in Tumour DNA score.It in some embodiments, can be true for multiple mutation The score of Tumour DNA is determined, wherein mutation can be mononucleotide variant, copy number variant, differential methylation or combinations thereof.One In a little embodiments, the average tumor score calculated to the one or a set of mutation for calculating tumour score with highest is as sample In practical tumour score.In some embodiments, using the average tumor score calculated for all mutation as in sample Practical tumour score.In some embodiments, the tumor section is for cancer by stages (because higher tumour score can be with The cancer in more advanced stage is related).In some embodiments, tumor section is used to determine the size of cancer, because of biggish tumour It may be related to the ratio of the Tumour DNA in blood plasma.In some embodiments, tumor section is for determining with single or more The ratio of the tumour of a mutation because between the tumor section and given tissue size measured in plasma sample there may be Related mutations genotype.For example, have the size of the tissue of given mutated-genotype can be related to the score of Tumour DNA, It can be calculated by focusing on the specific mutation.

Exemplary database

Feature of the invention includes the database from one or more results of method of the invention.For example, database can To include the following information of record with any one or more subjects: (color: for example copy number becomes for any polymorphism/mutation Change) it identifies, any known associated polymorphism/mutation increases with disease or illness or in the risk of disease or illness, polymorphism/ It is mutated the expression to the mRNA or protein of coding or the influence of activity level, a part of DNA, the RNA, (table of cell fraction Existing or activity level polymorphism/mutation effect: such as DNA, RNA or cell have polymorphism/mutation disease or illness) it is thin The total DNA of the relevant disease of born of the same parents or obstacle, RNA, cell sample identify polymorphism/mutation source (color: such as from specific group The blood sample or sample knitted), the number of diseased cells repeats test gained (such as retest monitors disease from subsequent Or the progress or alleviation of disorder), the test of other diseases or obstacle, the type of the disease or obstacle made a definite diagnosis implements treatment, treatment Reaction, the side effect for the treatment of, symptom type, symptom (check symptom relevant to the disease or illness), the duration of alleviation And quantity, the time-to-live (such as from initial test to the dead duration and/or from diagnosis to the dead duration), extremely Reason is died, with and combinations thereof.

In some embodiments, database includes the record with the following information of any one or more subjects: being appointed What polymorphism/mutation (color: for example copy number changes) identification, any known associated polymorphism/mutation and disease or illness Or increase in the risk of disease or illness, polymorphism/mutation is to the expression of the mRNA or protein of coding or the shadow of activity level It rings, a part of DNA, RNA, (performance of cell fraction or polymorphism/mutation effect of activity level: such as DNA, RNA or cell With polymorphism/mutation disease or illness) total DNA of the relevant disease of cell or obstacle, RNA, cell sample identifies more State property/mutation source (color: such as from the blood sample of specific organization or sample), the number of diseased cells, from subsequent heavy Gained (such as retest to monitor the progress or alleviation of disease or disorder) is tested in retrial, and the test of other diseases or obstacle is made a definite diagnosis Disease or obstacle type, implement treatment, the reaction for the treatment of, the side effect for the treatment of, symptom type, symptom (check with it is described Disease or the relevant symptom of illness), the duration and quantity of alleviation, the time-to-live (such as from initial test to it is dead lasting when Between and/or from diagnosis to the dead duration), the cause of death, with and combinations thereof.In some embodiments, the reaction for the treatment of Including following any: reducing or stablize the size of tumour (for example, benign or cancerous tumour), slow down or prevent tumor size Increase, reduce or stablize increase tumor disappearance and its occur again between the disease-free survival time, prevent the initial or subsequent of tumour Occur, reduces or stablize ill symptoms relevant to tumour, or combinations thereof.In some embodiments, including for disease or Other one or more tests of illness such as cancer as a result, for example from screening test, medical imaging or tissue sample The result of microexamination.

In such one side, the present invention is characterized in that including at least 5,10,102,103,104,105,106,107,108 Or more record electronic databank.In some embodiments, database has at least 5,10,102,103,104,105, The record of 106,107,108 or more different subjects.

On the other hand, the present invention is characterized in that including the computer and user interface of database of the invention.One In a little embodiments, user interface can be shown comprising information some or all of in one or more records.In some implementations In scheme, user interface can show that (i) has been accredited as the cancer of one or more types containing polymorphism or mutation, Record storage in a computer, (ii) one or more polymorphisms or mutation, (iii) certain types of cancer or specific polymorphic Property or the prognosis information of mutation, record storage is in a computer；(iv) one or more compounds or other therapeutic agents are used for Cancer with record storage polymorphism in a computer or mutation, (v) adjusts the mRNA of its record storage in a computer Or expression or the active one or more compounds of protein, and (vi) one or more mRNA molecules or protein, table It reaches or activity is adjusted by the compound of its record storage in a computer.The internal component of computer generally includes to be coupled to storage The processor of device.External module generally includes mass-memory unit, such as hard disk drive；User input equipment, such as key Disk and mouse；Display, such as monitor；And optionally, computer system can be connected to other computers to allow The network link of shared data and processing task.Program can be loaded into during operation in the memory of the system.

On the other hand, the present invention is characterized in that including the calculating of the one or more steps of any method of the invention The process that machine is realized.

Exemplary risk factors

In some embodiments, also assess subject disease or illness (such as cancer) one or more risks because Element.Illustrative risk factors include the family history of disease or illness, life style (such as smoke and be exposed in carcinogenic substance) With one or more hormones or haemocyanin (such as the alpha-fetoprotein (AFP) in liver cancer, carcinomebryonic antigen (CEA) or prostate cancer In prostate-specific antigen (PSA)).In some embodiments, the size and/or number of tumour are measured, and for true Determine the prognosis of subject or selects the treatment of subject.

Exemplary screening technique

When necessary, it can confirm the existence or non-existence of disease or illness (such as cancer), or any standard side can be used Method classifies to disease or illness such as cancer.For example, disease or illness, such as cancer can be detected with many modes, wrap Include the presence of certain S&Ss, tumor biopsy, screening test or medical imaging (such as mammogram or ultrasonic wave). Once detecting possible cancer, can be diagnosed by the microexamination of tissue sample.In some embodiments, diagnosis Subject carries out retest at multiple time points using method of the invention or the test of known disease or illness, to monitor disease The alleviation or recurrence of the progress or disease or illness of disease or illness.

Exemplary cancers

Any method of the invention can be used to diagnose, prognosis is stablized, and the exemplary cancers for the treatment of or prevention include entity Tumor, cancer, sarcoma, lymthoma, leukaemia, germinoma or blastoma.In various embodiments, cancer is acute leaching Bar chronic myeloid leukemia, acute myelogenous leukemia, adrenocortical carcinoma, AIDS associated cancer, AIDS associated lymphoma, anus Cancer, appendix cancer, astrocytoma (such as children's cerebellum or cerebral astrocytoma), basal-cell carcinoma, cholangiocarcinoma (such as extrahepatic bile ducts Cancer), bladder cancer, bone tumour (such as osteosarcoma or malignant fibrous histiocytoma), brain stem glioma, the cancer of the brain (such as cerebellum star Shape cytoma, cerebral astrocytoma/glioblastoma, ependymoma, medulloblastoma, neuroectodermal tumors or vision Approach and inferior colliculus glioma brain tumour), spongioblastoma, breast cancer, bronchial adenoma or class cancer, Burkitt lymphoma, class Carcinoma (such as children or stomach and intestine carcinoid tumor), cancer central nervous system lymphoma, cerebellar astrocytoma or malignant nerve glue Matter tumor (such as Cerebellar Astrocytoma in Children. An or glioblastoma), cervix cancer, childhood cancer, chronic lymphocytic are white Blood disease, chronic myelogenous leukemia, chronic myeloproliferative disease, colon cancer, cutaneous T-cell lymphomas, small circle cell are swollen Tumor, carcinoma of endometrium, ependymoma, the cancer of the esophagus, Ewing's sarcoma, the tumour of tumour, extracranial germ cell tumour (such as children Phase extracranial germ cell tumour), vulva germinoma, cancer eye melanoma or retinoblastoma cancer eye), gallbladder cancer, Gastric cancer, gastrointestinal associated cancers tumor, gastrointestinal stromal tumor, germinoma (such as outside cranium, vulva or ovarian germ cell tumors), Pregnant trophoblastic tumor, glioma cerebral astrocytoma or children's vision approach and inferior colliculus glioma brain tumour), gastric cancer, Hairy cell leukemia, head and neck cancer, heart cancer, liver cell (liver) cancer, Hodgkin lymphoma, hypopharyngeal cancer, hypothalamus and pathways for vision Glioma glioma), islet-cell carcinoma (such as endocrine or pancreatic islet cell cancer), Kaposi's sarcoma, kidney, Laryngocarcinoma, leukaemia (such as acute lymphoblastic, Acute Meyloid, chronic lymphocytic, chronic myelognous or hairy cell leukemia) Carcinoma of mouth, embryonal-cell lipoma, liver cancer (such as non-small cell or small cell carcinoma), lung cancer, lymthoma (such as AIDS is relevant, Bai Ji Spy, cutaneous T-cell, Huo Qijin, non-Hodgkin lymphoma or central nervous system lymphoma), macroglobulinemia (such as Wa Er Moral steps on macroglobulinemia, the malignant fibrous histiocytoma of bone or osteosarcoma, medulloblastoma (such as children are at nerve Solencyte tumor), melanoma, Merkel cell cancer, celiothelioma (such as adult or children's celiothelioma), in occult matastasis squamous Skin cancer, Multiple Endocrine tumor form syndrome (such as childhood Multiple Endocrine tumor formed syndrome), multiple marrow Tumor or plasmacytoma, mycosis fungoides, myelodysplastic syndrome, myeloproliferative disease (such as Chronic Myeloid hyperplasia Property disease), nasal cavity or paranasal sinus cancer, nasopharyngeal carcinoma, neuroblastoma (such as Adult Acute Myeloid Leukemia), myeloproliferative The malignant fibrous histiocytoma of disease, carcinoma of mouth, oropharyngeal cancer, osteosarcoma or bone, oophoroma, epithelial ovarian cancer, ovarian germinal Cytoma, the low malignant potential tumour of ovary, cancer of pancreas (such as islet cells cancer of pancreas), paranasal sinus or CARCINOMA OF THE NASAL CAVITY, parathyroid gland Cancer, carcinoma of penis, pharynx cancer, pheochromocytoma, pineal body astrocytoma, Pineal Germ-cell Tumor.Neuroblastoma, it is primary Sexual centre nervous system lymthoma, cancer, the carcinoma of the rectum, clear-cell carcinoma, renal plevis or carcinoma of ureter (such as neuroblastoma), mind Through blastoma or neurogenicity neuroectodermal tumors as renal plevis or transitional cell carcinoma of ureter, retinoblastoma, Rhabdomyosarcoma (such as Children Rhabdomyosarcoma), salivary-gland carcinoma, sarcoma (sarcoma in such as tumour family, Ka Boxi, soft group Knit or sarcoma of uterus), sezary syndrome, cutaneum carcinoma (such as non-melanoma, melanoma or Meike your cell cutaneum carcinoma), small intestine Cancer, squamous cell carcinoma, Supratentorial primitive neuroectodermal tumour (such as childhood primitive neuroectodermal tumor), T cell lymph Tumor (such as skin T cell lymphoma) carcinoma of testis, laryngocarcinoma, thymoma (such as children's thymoma), thymoma or thymic carcinoma, thyroid gland Cancer (such as childhood thyroid cancer), Trophoblastic (such as gestational trophoblastic tumors), unknown original site cancer original site cancer), Carcinoma of urethra (such as endometrium uterine cancer), sarcoma of uterus, carcinoma of vagina, visual pathway or inferior colliculus glioma brain tumour (such as youngster Virgin visual pathway or inferior colliculus glioma brain tumour), carcinoma of vulva, Valdez spy's human relations macroglobulinemia or wilms tumour (such as Children wilms tumour).In various embodiments, cancer has been shifted or has not been shifted.

Cancer can be or can not be (for example, estrogen or androgen associated cancer) of hormone correlation or dependence. Benign tumour or malignant tumour can be used method and/or composition diagnosis, prognosis of the invention and stablize, and treat or prevent.

In some embodiments, subject suffers from cancer syndrome.Cancer syndrome is a kind of inherited disorder, wherein one Genetic mutation in a or multiple genes makes impacted individual tend to the development of cancer, and is also possible to lead to these cancers Early onset thereof.Cancer syndrome usually not only shows the high lifetime risk of developing cancer, but also shows multiple independent primary The development of tumour.Many in these syndromes is as caused by the mutation in tumor suppressor gene, and the gene is related to protection of Cell is from canceration.Possible impacted other genes are DNA-repair gene, oncogene and the gene (blood for participating in angiogenesis Pipe generates).The Common examples of inherited cancer syndrome are heredity mammary gland-ovarian cancer syndrome and Hereditary non-polyposis Property colon cancer (Lynch syndrome).

In some embodiments, to one or more polymorphisms or mutation n K-ras, p53, BRA, EGFR or The subject of HER2 tries out the treatment of targeting K-ras, p53, BRA, EGFR or HER2 respectively.

Method of the invention is generally used for treating the pernicious or benign tumour of any cell, tissue or organ type.

Exemplary treatment

When necessary, subject (for example, being accredited as with cancer or the increased subject of risk) can be used for Stablize, treats or prevents disease or illness (such as cancer) or disease or illness (such as cancer) is used for using of the invention any The cancer of method).In various embodiments, treatment is known treatment or the treatment group for disease or illness (such as cancer) It closes, such as cytotoxic agent, targeted therapy, immunization therapy, hormone therapy, radiotherapy, cancer cell or is likely to become the thin of cancer The operation of born of the same parents is cut off, stem cell transplantation, bone-marrow transplantation, photodynamic therapy, palliative treatment or combinations thereof.In some embodiments In, treatment (such as preventive medicine) for preventing in disease or the increased subject of illness (such as cancer) risk, delay or Reduce the seriousness of disease or illness (such as cancer).

In some embodiments, targeted therapy is the specific gene of target on cancer, protein or facilitates growth of cancers With the treatment of the organizational environment of survival.Such treatment prevents the growth and diffusion of cancer cell, while limiting to normal thin The damage of born of the same parents usually has less side effect than other cancer drugs.

More successful method first is that target vascular therapy generates, the new blood vessel growth around tumour.Targeted therapy such as shellfish cuts down list Anti- (Avastin), lenalidomide (Revlimid), Sorafenib (Nexavar), Sutent (Sutent) and Thalidomide (Thalomid) angiogenesis is interfered.Another embodiment is using the treatment of targeting HER2, such as Herceptin or drawing pa For Buddhist nun, for being overexpressed the cancer (such as some breast cancer) of HER2.In some embodiments, monoclonal antibody is for blocking Specific target outside cancer cell.Embodiment includes Alemtuzumab (Campath-1H), bevacizumab, Cetuximab (Erbitux), Victibix (Vectibix), handkerchief trastuzumab (Omnitarg), Rituximab (Rituxan) and toltrazuril Monoclonal antibody.In some embodiments, the western trastuzumab of monoclonal antibody (Bexxar) is used to deliver to tumour and radiate.In some realities It applies in scheme, the cancer disease process in the little molecules in inhibiting cancer cell of oral cavity.Embodiment includes Dasatinib (Sprycel), and Lip river in distress is replaced Buddhist nun (Tarceva), Gefitinib (Iressa), Imatinib (Gleevec), Lapatinib (Tykerb), nilotinib (Tasigna), Sorafenib, Sutent and tesirolimus (Torisel).In some embodiments, proteasome presses down Preparation (such as Huppert's disease drug, bortezomib (Velcade)) interference is known as decomposing the enzyme of other protein in cell Specialization protein.

In some embodiments, immunization therapy is intended to improve the natural phylactic power defensive power of human body to anticancer.Exemplary class The immunization therapy of type is using the material prepared by body or in the lab, to support, targeting or restores function of immune system.

In some embodiments, hormone therapy is by reducing the amount of hormone in vivo come treating cancer.The cancer of several types Disease, including some breast cancer and prostate cancer are only known as growing and spreading in the presence of the natural chemical substance of hormone in vivo. In various embodiments, hormone therapy is for treating prostate, mammary gland, the cancer of thyroid gland and reproductive system.

In some embodiments, treatment includes stem cell transplantation, and wherein the marrow of illness is referred to as candidate stem cell The cell replacement of eggcase.Candidate stem cell is present in blood flow and marrow.

In some embodiments, treatment includes optical dynamic therapy, uses the specific drugs and use of referred to as photosensitizer In the light for killing cancer cell.These drugs work after by certain photoactivation.

In some embodiments, it is described treatment include operation excision cancer cell or be likely to become cancer cell (such as cream Room tumorectomy or mastectomy).For example, having breast cancer predisposing genes mutation (BRCA1 or BRCA2 gene mutation) Women can reduce her breast cancer and the risk of oophoroma, reduce salpingo-oophorectomy (removal fallopian tubal and ovary) And/or reduce risk bilateral mastectomy (two breast of removal).Laser is very powerful, accurate light beam, can be with For replacing blade (scalpel) to be used for very careful surgical work, including treatment certain cancers.

In addition to slow, stop or eliminate except the treatment of cancer (also referred to as disease targeted therapy), treatment of cancer it is important Part is the symptom and side effect for alleviating subject, such as pain and nausea.It includes to patient's branch body, and emotion and society need The support wanted, a method of being referred to as palliative treatment or supportive treatment.People usually receive treatment and while energy for disease Alleviate the treatment of symptom.

Typical treatment includes actinomycin D, adcetris, adriamycin, Aldesleukin, alemtuzumab, Alimta, amine Benzacridine, amsacrine, Anastrozole, Aredia, Arimidex, Arnold, L-Asparaginasum, Arastin, bevacizumab, than card Shandong Amine, bleomycin, ibandronic acid injection, disodium clodronate capsule, bortezomib, busilvex, busulfan, Irinotecan, Capecitabine, carboplatin, Carmustine, Carmustine, Cetuximab, chimax, Chlorambucil, Cimetidine, cis-platinum, gram 2-CdA, clodronate, clofarabine, Ke Lita enzyme, cyclophosphamide, cyproterone, prostaglandin before filling in, cytarabine, carefully Born of the same parents' toxin, Dacarbazine, dactinomycin D, Dasatinib, daunorubicin, dexamethasone, adriamycin, Flutamide, estramustine, Epi-ADM, eposin, Erbitux, Tarceva, estradiol phosphate, Estramustine, Etopophos, Etoposide, Evoltra, Exemestane, fareston, Letrozole, Filgrastim, fludarabine, fludarabine, fluorouracil, Flutamide, easily Auspicious sand, gemcitabine, gemcitabine, Gleevec, Gleevec.Gonapeptyl depot, Goserelin, methanesulfonic acid Ai Ruibu Woods, Trastuzumab, Top is happy to agree, and hydroxycarbamide, ibandronic acid replaces emol, and idarubicin, ifosfamide, interferon, she replaces at horse Buddhist nun, Gefitinib, Irinotecan, Cabazitaxel, Lan Kuaishu, Lapatinib, Letrozole, chlorambucil, Leuprorelin, leustat, Lomustine, alemtuzumab, Mabthera, megestrol acetate, megestrol acetate, methotrexate, mitoxantrone, mitomycin, Mutulane, busulfan, vinorelbine train Filgrastim, Filgrastim, Nexavar, Pentostatin, tamoxifen, vein point Instillation is penetrated, vincristine, taxol, Pamidronate Disodium, PCV, pemetrexed, sprays Pentostatin, handkerchief trastuzumab, methyl benzyl Hydrazine, Pu Luowenqi, prednisolone, prostrap, Raltitrexed, Rituximab and Dasatinib, Sorafenib, tamoxifen Sweet smell, streptozotocin, diethylstilbestrol, stimuvax, Sutent, sotan, tabloid, safe stomach beauty, special jasmine is fragrant, tamoxifen, special sieve Triumphant, taxol, taxotere, Tegafur and uracil, Temozolomide, Temozolomide, Thalidomide, Thioplex, plug replace Group, Toremifene, Herceptin, vitamin A acid, Triamcinolone acetonide, trifluoroacetic acid porphines amide, Triptorelin, flavones, violet, Bortezomib, Fan Bishi, flavonoids, vincristine, Crizotinib, capecitabine, her monoclonal antibody, Vande Thani, zanad, promise thunder Moral, zoladronate select safe zoledronic acid and abiraterone.

For mRNA or protein mutant form (for example, cancer correlation form) and wild-type form (for example, and cancer Incoherent form) subject, treat the preferred expression for inhibiting mutant or activity form and inhibits wild-type form than it Expression or activity are at least 2,5,10 or 20 times high.While a variety of therapeutic agents or sequence uses the generation that can substantially reduce cancer Rate simultaneously reduces the quantity that the cancer for the treatment of of resistance is generated to treatment.In addition, the therapeutic agent for being used as a part of combination treatment can The dosage of the required lower treating cancer of corresponding dosage when can need than therapeutic agent is used alone.Every kind of change in combination treatment The low dosage of conjunction object reduces the seriousness of the potential adverse side effect of compound.

In some embodiments, being accredited as having the subject of increased risk of cancer can invent or any standard Method), it avoids specific risk factors or changes lifestyles to reduce any additional risk of cancer.

In some embodiments, polymorphism, mutation, risks and assumptions or any combination thereof are used to select the treatment of subject Scheme.In some embodiments, bigger for subject's selection with bigger risk of cancer or with worse prognosis Dosage or greater amount for the treatment of.

It include other compounds in individual or combination treatment

When necessary, it can be identified from the large-scale library of natural products or synthesis (or semi-synthetic) extract for stablizing, Treat or prevent disease or illness (such as cancer) or increase the risk of disease or illness (such as cancer) other compounds or Chemistry library is according to methods known in the art.Those skilled in the art or drug discovery and exploitation it will be understood that Test extraction object or The accurate source of compound is not crucial for method of the invention.Therefore, any amount of chemistry can actually be screened Extract or compound are to the cell from specific types of cancer or from the effect of particular subject, or to them to cancer The activity of relevant molecule or the influence of expression are screened, and (cancer relevant molecule known to such as has in certain types of cancer There are the activity or expression of change).When finding that crude extract adjusts the activity or expression of cancer relevant molecule, plumbean can be carried out and mentioned The further classification of object is taken to separate, to use methods known in the art separation to be responsible for the chemical component for the effect observed.

For testing the exemplary mensuration and animal model for the treatment of

When necessary, can be used cell line (such as with identified in the subject diagnosed one or more mutation Cell line) test one or more effects of the treatment to disease or illness (such as cancer) disclosed herein, use the present invention Method and cancer or increased risk of cancer) or disease or illness animal model, such as SCID mice model (simple grace et al. " tumor model of cancer research, safe thorough, Hu Mana Press, Inc, Tuo Tuowa, New Jersey, 647-671 pages, 2001, pass through Reference is integrally incorporated herein).Additionally, there are many standard tests and animal models, for determining that specific therapy for stablizing, is controlled The effect of risk for the treatment of or prevention disease or illness (such as cancer) or increased disease or illness (such as cancer).Treatment can also It is tested in the human clinical trial of standard.

In order to select the preferred therapy of particular subject, compound can be tested to the one or more being mutated in subject The expression of gene or active influence.It is, for example, possible to use standard Northern, Western or microarray analysis are come detection Close the ability that object adjusts the expression of specific mRNA molecule or protein.In some embodiments, one or more chemical combination are selected Object (i) inhibits the expression or activity that promote the mRNA molecule or protein of cancer, and the mRNA molecule or protein are to be higher than Normal level expression promotes to inhibit with the activity (such as in sample from subject) or (ii) for being higher than normal level The expression or activity of the mRNA molecule or protein of cancer, the mRNA molecule or protein are in subject to be lower than normal water It puts down or with the activity expression for being lower than normal level.Individual or combination treatment (i) adjust the maximum number of mRNA molecule or egg White matter has mutation relevant to the cancer in subject, and (ii) is adjusted in subject.In some embodiments, selected The individual or combination treatment selected have high efficacy of drugs, and generate seldom (if any) adverse side effect.

As the alternative solution of above-mentioned subject's specificity analysis, DNA chip can be used for certain types of early stage or evening Expression (Waldemar Malak et al. " immunology in the expression and normal tissue of mRNA molecule in phase cancer (such as breast cancer cell) It is new " 12,206-209,2000；Breathe out gold, oncologist .5:501-507,2000；Bruno Pellizzari et al. " research of nucleic acid " 8 (22): 4577-4581,2000 are respectively integrally incorporated from there through reference).Based on the analysis, can choose with this type The individual or combination treatment of the subject of the cancer of type come adjust changed in such cancer expression mRNA or The expression of protein.

Other than for for particular subject or subject group selection treatment, express spectra can be used for monitoring in treatment phase Between the variation of mRNA and/or protein expression that occurs.For example, express spectra can be used for determining cancer related gene expression whether It has been restored to normal level.If not, thus it is possible to vary the dosage of one or more compounds is controlled in treatment with increasing or decreasing Treat the influence to the expression of corresponding cancer related gene.In addition, the analysis can be used for determining whether treatment influences other bases Because of the expression of (for example, gene relevant to adverse side effect).When necessary, thus it is possible to vary the dosage or composition for the treatment of to prevent or Reduce undesirable side effect.

Exemplary formulation and method of administration

For stabilization, treats or prevents disease or illness such as cancer or increases the risk of disease or illness such as cancer, Any method well known by persons skilled in the art can be used and prepare and apply composition (referring to U.S. Patent number 8,389,578 It with 8,389,557, is integrally incorporated herein each by reference).For preparation and the general technology of application in " Remington: medicine It studies science and practices, " the 21st edition, David Troy editor, 2006, Donald Lippincott WILLIAMS-DARLING Ton & Louis Wilkins, Philadelphia, It is incorporated herein by reference in their entirety) liquid, slurries, tablet, capsule, pill, pulvis, granule, gelling agent, ointment, bolt Agent, injection, inhalant and aerosol are the embodiments of such preparation, for example, modified or extended release oral preparation can be with It the use of the other suitable matrix forming material of method include such as wax (for example, Brazil wax, beeswax, paraffin, ceresine, worm Glue wax, fatty acid and fatty alcohol), oil, fixed oil or fat (such as hardened rapeseed oil, castor oil, tallow, palm oil and soybean Oil) and polymer (such as hydroxypropyl cellulose, polyvinylpyrrolidone, hydroxypropyl methyl cellulose and polyethylene glycol).It is other Suitable matrix tabletting material is microcrystalline cellulose, cellulose powder, hydroxypropyl cellulose, ethyl cellulose and other loads Body and filler.Tablet can also include particle, be coated powder or piller.Tablet is also possible to multilayer.Optionally, finished tablet Can be coating or uncoated.

Giving the classical pathway of such composition includes but is not limited to oral, sublingual, oral cavity, and part is transdermal, sucking, Parenteral (such as subcutaneous, intravenously, intramuscular, breastbone inner injection or infusion techniques), rectum, vagina and intranasal.Preferred In embodiment, is applied and treated using extended release device.Composition of the invention is prepared to allow activity contained therein Ingredient is bioavailable when applying composition.Composition can take the form of one or more dosage units.Combination Object can contain 1,2,3,4 or more active constituents, and can optionally contain 1,2,3,4 or more it is nonactive at Point.

Alternate embodiment

Any method described herein may include with physical format (such as on the computer screen or papery print it is defeated On out) output data.Any method of the invention can be with can be by format that doctor takes action and the data that can act Output combination.For determining that some embodiments described in the document of genetic data related with target individual can be with medicine The potential chromosome abnormality (such as missing or duplication) of professional or the combination of notifications of shortage, are optionally examined with decision antenatal Stop or do not stop fetus in the case where disconnected.Some embodiments as described herein can with can action data output and execution Lead to the clinical decision of clinical treatment or execute to combine without the clinical decision of movement.

In some embodiments, disclosed herein is disclose the result of any method of the invention for generating and (such as delete The existence or non-existence for removing or replicating) report method.The result that can use method of the invention generates report, and can To be sent to doctor with electronic form, be shown on output equipment (such as number report), or with reading report form (such as Hard copy this report of printing) it is delivered to doctor.In addition, described method can be with the clinical decision for leading to clinical treatment The practical clinical decision combination for executing or executing without movement.

In certain embodiments, the present invention provides use multiple PCR method disclosed herein to detect from same sample The reagent of CNV and SNV, kit and method, and the computer system and computer medium with coded command.Certain excellent In the embodiment of choosing, sample is to suspect unicellular sample or plasma sample containing Circulating tumor DNA.These embodiments benefit With following discovery: detecting DNA sample from unicellular or blood plasma by using super-sensitive multiple PCR method disclosed herein Product are used for CNV and SNV, and the cancer detection that can improve goes out relative to independent detection CNV or SNV especially for cancer displays CNV such as breast cancer, oophoroma and lung cancer.In certain illustrative embodiments, the method for analyzing CNV inquires 50 to 100, 000 or 50 to 10,000 or 50 to 1,000 SNP and SNV inquire 50 to 1000 SNV or 50 to 500 SNV or 50 to 250 SNV.Method provided herein for detecting CNV and/or SNV in the blood plasma for suspecting the subject with cancer, packet For example known cancer for showing CNV and SNV, such as breast cancer, lung cancer and oophoroma are included, provides detection CNV and/or from logical The SNV for the tumour being often made of in terms of genetic constitution heterogeneous cancer cell group.Therefore, it is absorbed in some districts for only analyzing tumour The conventional method in domain can usually miss CNV or the SNV being present in the cell in other regions of tumour.Plasma sample is living as liquid Inspection can be asked to detect any CNV and/or the SNV that exist only in tumour cell subgroup.

Computer Architecture example

Figure 69 shows the example system architecture X00 for executing the embodiment of the present invention.System architecture X00 includes connection To the analysis platform X08 of one or more laboratory information systems (" LIS ") X04.As shown in Figure 69, analysis platform X08 can be with LIS X04 is connected to by network X02.Network X02 may include one or more networks of one or more network types, packet Include LAN, WAN, any combination of internet etc..Network X02 may include between any or all component in system architecture X00 Connection.Analysis platform X08 can alternatively, or in addition be directly connected to LIS X06.In one embodiment, analysis platform X08 analyzes the genetic data provided by LIS X04 in software, that is, service model, and wherein LIS X04 is third party LIS, and is divided Analysis platform X08 analyzes the genetic data provided by LIS X06, service or internal model, wherein LIS X06 and analysis platform X08 It is controlled by same side.In the embodiment that analysis platform X08 provides information by network X02, analysis platform X08 can be service Device.

In the exemplary embodiment, laboratory information system X04 includes collecting, and manages and/or store one of genetic data Or multiple public or private organization.Those skilled in the relevant art will be understood that method and standard for conservation genetics data are Know, and various information security technologies and strategy can be used to realize, such as usemame/password, Transport Layer Security (TLS), other cipher protocols of security socket layer (SSL) and/or offer communications security.

In the exemplary embodiment, system architecture X00 is operated as Enterprise SOA, and uses client-server Device model will understand by those skilled in the relevant art, with realize various forms of interactions between LIS X04 and analysis and Communications platform X08.System architecture X00 can be distributed on various types of network X02 and/or can be used as cloud computing framework behaviour Make.Cloud computing framework may include any kind of distributed network architecture.As an example, not a limit, cloud computing architecture (SaaS) is serviced for providing software, infrastructure services (IaaS), and platform services (PaaS), and network services (NaaS) (DaaS) is serviced, database services (DBaaS), back-end services (BaaS), and test environment services (TEaaS), API (APIaaS) is serviced, integrated platform services (IPaaS) etc..

In the exemplary embodiment, LIS X04 and X06 respectively includes computer, equipment, interface etc. or its any subsystem. LI SX04 and X06 may include operating system (OS), install the application for performing various functions, such as access and/or navigation It is locally accessible, in memory and/or the data that pass through network X02.In one embodiment, LIS X04 passes through application Programming interface (" API ") access analysis platform X08.LI SX04 further includes can be independently of the one or more primary of API operation Using.

In the exemplary embodiment, analysis platform X08 includes input processor X12, it is assumed that manager X14, modeling device X16, Error correction unit X18, one or more of machine learning unit X20 and output processor X18.Input processor XI2 connects Receive and handle the input from LI SX04 and/or X06.Processing can include but is not limited to such as parse, transcoding, translate, adaptation Or operation from the received any input of LI SX04 and/or X06 is handled in other ways.It can be flowed via one or more, feedback It send, database or the input of other data sources can such as be accessed by LIS X04 and X06.Data error can pass through execution Above-mentioned error correction mechanism is corrected by error correction unit X18.

In the exemplary embodiment, it is assumed that manager XI4 is configured as to prepare according to the something lost for being expressed as model and/or algorithm The form for passing the hypothesis of analysis to handle receives the input transmitted from input processor X12.Modeling device XI6 can be used such Model and/or algorithm are with for example based on dynamic, real-time and/or historical statistics or other indexs carry out generating probability.For export and The data for filling such Policy model and/or algorithm can use hypothesis manager X14 via such as genetic data source X10. Genetic data source X10 may include such as nucleic acid sequencing instrument.It is based on for example filling its mould assuming that manager XI4 can be configured as Variable needed for type and/or algorithm is assumed to formulate.Once being filled, model and/or algorithm can be modeled device XI6 and be used to Generate one or more hypothesis as described above.Assuming that manager X14 can choose particular value, it is worth range or is based on most probable Hypothesis estimated as output as described above.Modeling device XI6 can be according to the model by machine learning unit X20 training And/or algorithm operates.For example, machine learning unit X20 can be by being applied to training set for sorting algorithm as described above Database (not shown) develops such model and/or algorithm.In certain embodiments, machine learning unit analyze one or Multiple control samples are to generate useful training dataset in SNV detection method provided herein.

Once assuming that manager XI4 has identified specific output, then such output can be returned to by output Manage the specific LIS 104 or 106 of device X22 solicited message.

Various aspects of the disclosure can be by software, firmware, and hardware or combinations thereof is realized on the computing device.Figure 70 shows Example computer system Y00 is gone out, wherein the embodiment or part thereof conceived may be implemented as computer-readable code.Root Various embodiments are described according to example computer system Y00.

Processing task in the embodiment of Fig. 5.70 are executed by one or more processors Y02.It should be noted, however, that this In various types of processing techniques, including programmable logic array (PLA), specific integrated circuit (ASIC), multicore can be used Processor, multiprocessor or distributed processors.The additional dedicated processes resource of such as figure, multimedia or mathematical processing ability It can be used for assisting certain processing tasks.These process resources can be hardware, software or its is appropriately combined.For example, one Or multiple processor Y02 can be graphics processing unit (GPU).In embodiment, GPU is processor, is designed to fast The special electronic circuit of mathematically-intensive application on speed processing electronic equipment.GPU can have highly-parallel structure, for big The parallel processing of data block (such as math-intensive data) is effective.Alternatively or in addition, one of processor Y02 or one It above can be the special parallel processing without graphics-optimized, such parallel processor executes math-intensive described herein Function.One or more of processor Y02 may include processor accelerator (for example, DSP or other application specific processors).

Computer system Y00 further includes main memory Y30, and can also include additional storage Y40.Main memory Y30 can be volatile memory or nonvolatile memory, and be divided into channel.Additional storage Y40 may include Such as such as hard disk drive Y50, the nonvolatile memory of removable Storage driver Y60 and/or memory stick.It is removable to deposit Storing up driver Y60 may include floppy disk drive, tape drive, CD drive, flash memory etc..Storage can be removed to drive Dynamic device Y60 reads from removable storage unit 470 and/or is written in known manner removable storage unit 470.It is removable to deposit Storage unit Y70 may include the floppy disk read and write by removable Storage driver Y60, tape, CD etc..Such as related fields Technical staff understand, removable memory module Y70 include wherein be stored with the computer of computer software and/or data can Use storage medium.

In substitution is realized, additional storage Y40 may include for allowing computer program or other instruction loads Other similar device into computer system Y00.Such device may include such as removable memory module Y70 and interface (not shown).The example of such device may include programming box and pod interface (such as finding in video game device), It removable memory chip (such as EPROM or PROM) and associated socket and other removable memory modules Y70 and connects Mouthful, allow software and data to be transmitted to computer system Y00 from removable storage unit Y70.

Computer system Y00 can also include Memory Controller Y75.Memory Controller Y75 is controlled to main memory The data access of Y30 and additional storage Y40.In some embodiments, Memory Controller Y75 can be outside processor Y10 Portion, as shown in Figure 1 in other embodiments, Memory Controller Y75 can also directly be a part of processor Y10.For example, The a part of many AMDTM and IntelTM processors used as identical with processor Y10 (being not shown in Figure 70) chip Integrated memory controller.

Computer system Y00 can also include communication and network interface Y80.Communication and network interface Y80 allow software and Data are transmitted between computer system Y00 and external equipment.Communication and network interface Y80 may include modem, lead to Believe port, PCMCIA slot and card etc..Software and data via communication and network interface Y80 transmission are the forms of signal, Can be can be by communication and the received electronics of network interface Y80, electromagnetism, light or other signals.These signals are via communication lines Diameter Y85 is supplied to communication and network interface Y80.Communication path Y85 carries signal, and line or cable can be used, optical fiber, electricity Line, cellular phone link, RF link or other communication channels are talked about to realize.

Communication and network interface Y80 allow computer system Y00 to pass through communication network or medium (such as LAN, WAN, Yin Te Net etc.) communication.Communication and network interface Y80 can be via wired or wireless connections and remote site or network interface.

In the document, term " computer program medium ", " computer usable medium " and 66 " non-state mediums " are usual For the tangible of reference such as removable memory module Y70, removable Storage driver Y60 and the hard disk being installed therein etc Medium hard disk drive Y50.The signal carried by communication path Y85 can also embody logic described herein.Computer Program medium and computer usable medium can also refer to memory, such as main memory Y30 and additional storage Y40, can To be memory semiconductor (such as DRAM etc.).These computer program products are for providing software to computer system Y00 Device.

Computer program (also referred to as computer control logic) is stored in main memory Y30 and/or additional storage Y40 In.Computer program can also be received via communication and network interface Y80.Such computer program makes to succeed in one's scheme when executed Calculation machine system Y00 can be realized embodiment as discussed herein.Specifically, computer program makes processor when executed Y10 can be realized disclosed process.Therefore, such computer program indicates the controller of computer system Y00.It is using In the case where software realization embodiment, software be can store in computer program product, and be driven using such as removable Storage Dynamic device Y60, interface, hard disk drive Y50 or communication and network interface Y80 are loaded into computer system Y00.

Computer system Y00 can also include input/output/display equipment Y90, and such as keyboard, monitor, instruction sets It is standby, touch screen etc..

It should be noted that the simulation of various embodiments, synthesis and/or manufacture can be partially by using including universal programming Language (such as C or C++), the computer-readable code of hardware description language (HDL) etc. are realized.Such as Verilog HDL, VHDL, Altera HDL (AHDL) or other available programming tools.The computer-readable code can be set any known Computer usable medium in, including semiconductor, disk, CD (such as CD-ROM, DVD-ROM).In this way, code can wrap Include the transmitted over communications networks of internet.

Embodiment further relate to include the software being stored on any computer usable medium computer program product.When When executing in one or more data processing equipments, such software grasps data processing equipment as described herein Make.Embodiment is using any computer is available or readable medium, and now or any computer known to future is available or can Read storage medium.Computer can be used or the example of computer-readable medium includes but is not limited to main storage device (for example, any class The random access memory of type), auxiliary storage device is (for example, hard disk drive, floppy disk, CD ROM, ZIP disk, magnetic storage are set Standby, optical storage apparatus, MEMS, nanotechnological storage device etc.) and communication media (for example, wired and wireless communication network, office Domain net, wide area network, Intranet etc.).Computer is available or computer-readable medium may include any type of temporary (its packet Include signal) or non-transitory medium (it excludes signal).Non-transitory medium includes, as non-limiting example, said physical It stores equipment (for example, main storage device and auxiliary storage device).

Can so it understand, any embodiment disclosed herein can be combined with any other embodiment disclosed herein to be made With.

Experimental section

Presently disclosed embodiment is described in the examples below, and the embodiment is set forth to help to understand this public affairs It opens, and is not necessarily to be construed as limiting the scope of the present disclosure limited in claim thereafter in any way.It proposes following How embodiment uses the complete disclosure and description of described embodiment to provide to those of ordinary skill in the art, and It is not intended to be limited to the scope of the present disclosure, the experiment that being not intended to indicates following is carried out whole or unique experiment.? It makes efforts to ensure the accuracy about used number (for example, amount, temperature etc.), it is contemplated that some experiments miss Difference and deviation.Unless otherwise indicated, number is parts by volume, and temperature is degree Celsius.It should be appreciated that can be intended to not changing experiment The variation of described method is carried out in the case where the basic sides of explanation.

Embodiment 1

Exemplary sample preparation and amplification method are described in the U.S. Application No. 13/683 submitted on November 21st, 2012, 604；US publication 2013/0123120 and the U.S. Application No. 61/994,791 submitted on May 16th, 2014 it is preferential Power, entire contents are incorporated herein by reference.These methods can be used for analyzing any sample disclosed herein.

In an experiment, plasma sample is prepared and expanded using half nesting 19,488-plex scheme.It makes in the following manner Standby sample: it will be up to the centrifugal blood of 20mL to separate buffy coat and blood plasma.It is prepared in blood sample from buffy coat Genomic DNA.Genomic DNA can also be prepared from saliva sample.Use QIAGEN CIRCULATING NUCLEIC ACID Cell-free DNA in kit separated plasma, and eluted in 50uL TE buffer according to the manufacturer's instructions.It will be general Connection adapter is attached to the end of the plasma dna of the 40uL purifying of each molecule, uses adapter primer amplified text 9, library circulation.With AGENCOURT AMPURE pearl purified library, and eluted in 50 μ lDNA buffer suspension liquid.

With 15 STAR 1 recycled, (95 DEG C are used for initial polymerization enzyme activation in 10 minutes, then 96 DEG C 30 of 15 circulations Second；65 DEG C 1 minute；58 DEG C 6 minutes；60 DEG C) amplification 65 DEG C of 6ul DNA 4 minutes, 72 DEG C 30 seconds；Finally extend 72 DEG C 2 points Clock), the reverse primer marked using the 19 of 7.5nM primer concentration, 488 target-specifics and a library adapter specificity are just To primer.

Half nesting PCR scheme is related to second of amplification, 15 (95 DEG C 10 points of circulations (STAR 2) of 1 product dilution of STAR Clock is used for initial polymerization enzyme activition, then 95 DEG C of 15 circulations 30 seconds；65 DEG C 1 minute；60 DEG C 5 minutes；65 DEG C 5 minutes and 72 DEG C 30 seconds；Finally extend 2 minutes at 72 DEG C), using the reversed label concentration of 1000nM and the concentration of 20nM for 19,488 Each of target-specific forward primer.

Then 2 product of STAR for passing through standard PCR amplification equal portions is reversely drawn with 1uM label specificity is positive with bar code Object carries out 12 circulations, to generate bar code sequencing library.The aliquot in each library is mixed from the library of different bar codes It closes, and uses spin column purification.

In this way, 19,488 primers are used in single hole reaction；Design primer is to target in chromosome 1,2 and 3 The SNP of upper discovery.Then 13,18,21, X and Y. is sequenced amplicon using ILLUMINA GAIIX sequenator.It is necessary When, the number of sequencing reading can be increased to increase the number for the targeting SNP for being amplified and being sequenced.

Dependency basis is expanded in 7.5nM using the reverse primer of half nested 19,488 outside forward primers and label in STAR Because group thermal cycle conditions of DNA sample STAR 2 and composition and bar code PCR are identical as half nested scheme.

Embodiment 2

Exemplary primers selection method is described in the 13/683,604 (U.S. of U.S. Application No. submitted on November 21st, 2012 Publication number 2013/0123120) and the United States serial 61/994,791 submitted on May 16th, 2014, bibliography).These sides Method can be used for analyzing any sample disclosed herein.

Primed libraries of the following description of test for designing and select any multiple PCR method for use in the present invention Illustrative methods.Purpose is to can be used in single reaction volume from selection in the initial libraries of candidate drugs while expanding largely The primer of target site (or subset of target site).For the candidate target site of initial group, it is not necessary to be each target spot.

Step 1

Based on the publicly available information of the expectation parameter about target site, such as the frequency of the intracorporal SNP of target complex The heterozygosis rate (world web at ncbi.nlm) of rate or SNP selects one group of candidate's target site (such as SNP).nih.gov/ projects/SNP/；Sherry ST, Ward MH, Kholodov M, et al., the hereditary variation of dbSNP:NCBI database Nucleic acids research .2001 January 1；29 (1): 308-11 is integrally incorporated each by reference).For each candidate bit Point, using Primer3 program, (World Wide Web is in primer3.sourceforge.net；Libprimer3 release 2.2.3, It is incorporated herein by reference in their entirety) design one or more PCR primers pair.If do not had for the PCR primer in particular target site There is feasible design, then the target site is eliminated from further consideration.

When necessary, can calculate " target site scoring " for most of or all target sites (indicates higher satisfaction Higher scoring), such as the various expectation parameters based on target site weighted average calculate target site scoring.Base In them for the importance of the specific application of primer will be used, weight that can be different to parametric distribution.Exemplary parameter packet The heterozygosis rate for including target site, incidence rate relevant to sequence (such as polymorphism) at target site, with the sequence at target site (such as polymorphism) relevant disease genepenetrance is arranged, for expanding the candidate drugs of target site, for expanding the candidate of target site The size of primer and the size of target amplicon.In some embodiments, candidate drugs include waiting to the specificity of target site Select primer by combine and expand except its be designed as amplification target site in addition to site misguide a possibility that.Some In embodiment, removed from library it is one or more or the wrong candidate drugs filled out.

Step 2

Each primer of thermodynamic interaction score value and all primers from step 1 (for example, I ties up, H.T. and Sheng Luxi Any nucleic acids research are calculated between " in internal DNA mismatch CT thermodynamics " every other target site by Asia, J., Jr. (1998) 26,2694 years to 2701；.Peyret, N., plug receive Wella Tener, PA, A Lawei, H.T. and St. Lucia, J., JR (1999), " the nuclear magnetic resonance mispairing of neighbour's thermodynamics and inside AA, CC, GG DNA, TT sequence ", biochemistry 38,3468- 3477；I ties up, H.T. and St. Lucia, J.Jr. (1998), " it is small size in the mismatch neighbour heat power of DNA internal communication: The influence of sequence dependent and pH ", biochemistry 37,9435-9444；I ties up, H.T. and St. Lucia, J.Jr. (1998), " arest neighbors thermodynamic parameter is in internal DNA mismatch GA ", biochemistry 37,2170 to 2179 years；And I ties up, HT and Sheng Luxi Asia, J., Jr. (1997), " NMR of thermodynamics DNA internal G. T mismatches and " biochemistry 36,10581-10594；MULTIPLX 2.1 (Kapp Lin Siji L, An Delie join beauty M.MULTIPLX in R, T Puurand: automatic grouping and evaluation PCR primer.Biology letter Breath is learned.On April 15th, 2005；21 (8): 1701-2 is respectively hereby incorporated by reference in its entirety herein).This step causes The 2D matrix of interaction score value.The prediction primer dimer of the interaction score value involves the primer of two interactions Possibility.Score calculation method is as follows:

It interacts branch=MAX (- deltaG_2,0.8* (- deltaG_l)).

Wherein,

DeltaG_2=gibbs energy (fracture dimer needed for energy) is for by PCR being extendible at both ends The end 3' of dimer, i.e., each primer is annealed to another primer；And

DeltaG_l=gibbs energy is used for the dimer expansible by PCR at least one end.

Step 3:

For each target trajectory, if there is more than one primer pair designs, following methods can be used and select one Design:

1 each primer pair design about track, (highest) interaction score value is to design in the worst cases for discovery In two primers, and complete all primers from all designs of other target sites.

2 selections have the design of best (minimum) worst condition interaction score.

Step 4

Built-in figure asks one site of each node on behalf and its design of relevant primer pair (for example, maximum is gathered together Topic).A side is created between every pair of nodes.The weight at each edge is equal to associated with two nodes connected by edge Worst case (highest) interaction score between primer.

Step 5

When necessary, each pair of design of target sites different for two a, wherein primer being designed from one and coming from One primer of another design will be annealed to the target region of overlapping, add other edge between the node that two are designed. The weight at these edges is equal to the highest weight specified in step 4.Therefore, step 5 prevents text

Step 6

Initial interaction score threshold calculates as follows: weight threshold=max (side right) -0.05* (max (side right)-min (side right))

Max (side right) is the maximum side right weight in figure；With the minimum edge weight that min (side right weight) is in figure.At the beginning of threshold value Initial line circle is provided that weight limit threshold value=max (side right) minimal weight threshold value=min (side right)

Step 7

The new figure constituted with the identical node group of figure of step 5, only comprises only the side of the weight more than weight threshold.Cause This, step ignores the interaction of the score equal to or less than weight threshold.

Step 8

Node (and all edges for being connected to the node of removal) are removed from the figure of step 7, until there is no edge to leave. By repeating to apply following procedure removing node:

1 finds the node with topnotch (highest number of edges).It, then can be one optional if there is multiple.

2 define the node by choosing above and are connected to the node collection that its all nodes form, but do not include any degree Less than the node of the node selected above.

3 select from step 1 in the set with minimum target trajectory score (lower score represents lower desirability) Node.The node is deleted from figure.

Step 9

If remaining number of nodes meets the required target site quantity in the multiplexing pond PCR (acceptable in figure In tolerance), then this method can be continued to use in step 10.

If executing binary search there are too many or very little node in figure to determine which threshold value will lead in figure The node of remaining desired amt.If there is too many node in figure, weight threshold boundary is adjusted as follows:

Weight limit threshold value=weight threshold

Otherwise (if there are two nodes in figure), weight threshold boundary is adjusted as follows:

Minimal weight threshold value=weight threshold

Then, adjustment weight threshold is as follows:

Weight threshold=(weight limit threshold value+minimal weight threshold value)/2

The method for repeating 7 to 9 steps of step.

Step 10

Select primer pair relevant to the node being retained in figure designed for primed libraries.The primed libraries can be used for In any method of the invention.

When necessary, the primed libraries that can be used to expand target site to only one primer (rather than primer pair) carry out This method of design and selection primer.In this case, node provides one to each target site (rather than primer pair) Primer.

Embodiment 3

When necessary, method of the invention can assess its missing or duplicate ability for detecting chromosome or chromosome segment. Following experiment is carried out to prove compared with X chromosome or X chromosome section from mother, detects X chromosome or from father Crossing for the section of the X chromosome of close heredity shows.The measurement is designed to the missing or again of simulation chromosome or chromosome segment It is multiple.By the different amounts of DNA from father (with XY sex chromosome) and the filial generation from father (with XX sex chromosome) DNA mixing, for analyzing the X chromosome (Figure 19 A-19D) of the additional quantity from father.

It extracts the DNA from father and progeny cell system and is quantified using Qubit.Using paternal cell system AG16782, CAG16782-2-F and daughter cell system AG16777, cAG16777-2-P.In order to determine X chromosome father haplotype, detection To being present on X chromosome but be not present in the SNP on Y chromosome, therefore the X chromosome from father will be present rather than Y The signal of chromosome.Daughter inherits this haplotype from father there.Haplotype from daughter other X chromosomes is from her Mother inherit.This haplotype from mother can by will from not from the progeny cell of father's heredity be DNA In SNP distribute to the haplotype from mother to determine.

In order to determine whether to detect the overexpression of the X chromosome from father, by from paternal cell system not The DNA of same amount is mixed with the DNA from daughter cell system.Total DNA input is the genomic DNA of about 75ng (about 25k copy).It uses About 3,456 SNP of Direct Multiple PCR amplification are measured for X and Y chromosome.It is surveyed using the 50bp one way with 7bp bar code Sequence is sequenced amplified production using Rapid/HT mode.The reading of each SNP is about 10K.

As seen in figs. 19 a-19d, it can detecte the chimera from father DNA.These charts are bright to can detecte dyeing The whole chromosome of body segment or overexpression.

All patents recited herein, patent application and disclosed bibliography are incorporated herein by reference in their entirety.Although The specific embodiment for having been combined the disclosure describes disclosed method, it is understood that it can further be modified.This Outside, this application is intended to cover any variation of disclosed method, use or reorganization, it is included in disclosed method fields Known or conventional practice in the deviation to the disclosure, and fall within the scope of the appended claims.Of the invention is any Embodiment can be carried out by DNA in analysis sample and/or RNA.For example, any side disclosed herein for DNA Method can be easily adaptable RNA, such as by including the reverse transcription step that RNA is converted to DNA.

Embodiment 4

This embodiment describes the examples for the cell-free Tumour DNA detection breast cancer correlation copy number variation of Noninvasive Property method.Breast cancer screening is related to mammography, leads to high false positive rate and lacks certain cancers.To cancer correlation The analysis of circulation Cell-free DNA (ctDNA) derived from the tumour of CNV allows safer and more accurate screening earlier.It is based on CNV is screened in the ctDNA that extensive multiplex PCR (mmPCR) method of SNP is used to separate in the blood plasma from patient with breast cancer. MmPCR measurement is designed with 3 on targeting staining body 1,2 and 22,168 SNP, usually have in cancer CNV (for example, 49% breast cancer sample is lacked with 22q).Analyze six plasma samples-one stage Ha from patient with breast cancer, four An a stage IIb and stage Illb.Each sample has CNV on one or more targeting staining bodies.The measurement identifies CNV in six kinds of plasma samples, a stage IIb sample being correctly known as including the ctDNA fraction 0.58% (Figure 30, 31B, 32A, 32B and 33)；Detection only needs 86 heterozygosis SNP.About 636 heterozygosis SNP (Figure 29,31A and 32A) are also used, The calibration phase IIa sample under 4.33% ctDNA score.This shows that focus or whole chromosome arm CNV are normal in cancer It is seeing and can easily detect.

In order to further evaluate sensitivity, by 22 kinds of artificial mixtures of the 3Mb 22q CNV of cancerous cell line and from just The DNA mixing of normal cell line (5:95), to simulate the part ctDNA (Figure 28 A-28C) between 0.43% and 7.35%.The party Method is correctly detecting CNV in the 100% of these samples.It therefore, can be other by the way that isolated polynucleotides sample to be incorporated into Artificial cfDNA polynucleotides standard items/control is prepared in DNA sample, the polynucleotides sample includes by known displaying CNV The non-source cfDNA (such as tumor cell line) generate fragmented polynucleotide mixture, concentration be similar to cfDNA, example Such as 0.01% to 20% in the fluid, 0.1% to 15% or 0.4% to 10% DNA.These standard/controls may be used as The control of measurement design, characterization, exploitation and/or verifying, and it is used as quality control standard dduring test, such as in CLIA reality It tests the cancer test carried out in room and/or is only used for the standard that research uses, diagnostic test packet.In many cancers (including mammary gland Cancer and oophoroma) in, CNV relative to point mutation more commonly.This supports this mmPCR method based on SNP to provide for examining Survey the cost-effective noninvasive method of these cancers.

Embodiment 5

Present embodiment describes the examples that copy number variation in extensive multiplex PCR detection breast cancer sample is targeted with SNP Property method.The evaluation of CNV is usually directed to SNP microarray or aCGH in tumor tissues.These methods have high full genome component Resolution, but need a large amount of input material, has a high fixed cost, and fix in formaldehyde-paraffin insertion (FFPE) sample on It cannot work well.For the embodiment, 28, the 000 heavy chain SNP targeting PCR targeting with next-generation sequencing (NGS) is used 1p, 1q, 2p, 2q, 4p16,5p15,7q11,15q, 17p, 22q11,22q13 and chromosome 13,18,21 and X are for detecting mammary gland CNV in cancer sample.Accuracy with aneuploid or micro-deleted 96 samples is verified.It is unicellular by analyzing Establish single molecule sensitivity.In 17 breast cancer samples (15 fresh food frozens and 2 FFPE tumor tissues, 5 pairs of matched tumours With normal cell system) in, observe 16 (including two FFPE) in all or part of CNV in 1 to 15 target (it is average: 7.8)；Observe the evidence of Tumor Heterogeneity.Three kinds of tissues with a CNV all there is Iq to repeat, and be most normal in breast cancer The cytogenetic abnormalities seen.The most common region with CNV is 1q, 7p and 22ql.Only one tumor tissues (has 9 A CNV) there is the region with LOH；The LOH is also detected that in the adjacent presumption normal tissue for lacking other 8 kinds of CNV.Phase Than under, detected in cell line (average: 5 or more regions 12.8) with LOH and high total CNV disease incidence.Cause This, extensive multiplex PCR is provided economic high throughput method and is studied CNV in a manner of targeted, and is suitable for being difficult to The sample of analysis, such as FFPE tissue.

Embodiment 6

This embodiment illustrates the illustrative methods of the detection limit for calculating any method of the invention.These methods are used In the detection limit for calculating mononucleotide variant (SNV) in tumor biopsy (Figure 34) and plasma sample (Figure 35).

First method (" LOD-mr5 " is expressed as in Figure 34 and 35) is based on minimum 5 readings and calculates detection limit, the reading Number is selected as observing that SNV there are in fact with the minimum number SNV with enough confidence levels in sequencing data.Detection The limit is whether to be higher than the minimum value 5 based on the reading depth (DOR) observed.Figure 34 and 35 indicates that detection limit is limited by DOR SNV in these cases, be not measured by enough readings to reach the limit of error of measurement.When necessary, increase can be passed through DOR improves the detection limit (leading to lower numerical value) of these SNV.

Second method (" LOD-zs5.0 " is expressed as in Figure 34 and 35) is based on z-score and calculates detectable limit.Z- score It is the quantity for standard deviation of the percentage error far from background mean error observed.When necessary, exceptional value can be removed, and Z-score can be recalculated, the process can also be repeated.The final weighted average and standard deviation of error rate are for calculating z Score.Average value is weighted by DOR, because precision is higher as DOR higher.

For the exemplary z-score calculating for the present embodiment, back is calculated from all other samples of identical sequencing operation Scape mean error and standard deviation for each genomic locus and replace type, depth weighted by reading.If background distributions There are 5 standard deviations apart from average background value, does not then consider the background distributions of sample.Figure 34,35 orange line indicate detection limit The SNV limited by the bit error rate.These SNV can get enough readings to reach 5 reading minimum value, and detect limit by mistake Accidentally rate limitation.When necessary, error rate can be reduced by optimization measurement to improve detection limit.

The third maximum value of method (" LOD-zs5.0-mr5 " is expressed as in Figure 34 and 35) based on above-mentioned two measurement Calculate detectable limit.

The averaging limits of the analysis detection of tumor sample shown in Figure 34 are 0.36%, and the intermediate value limit of detection is 0.28%.The quantity of DOR limitation (grey lines) SNV is that the quantity of 934. error rates limitation (orange line) SNV is 738.

The averaging limits of detection in plasma sample shown in analysis chart 35 are 0.24%, and the intermediate value limit of detection is 0.09%.The quantity of DOR limitation (grey lines) SNV is that the quantity of 732. error rates limitation (orange line) SNV is 921.

Embodiment 7

This embodiment illustrates the detections from identical single celled CNV and SNV.Use following primed libraries: for examining Survey CNV~library of 28,000 primer, for detect CNV~library of 3,000 primer and for detecting drawing for SNV Object library.For single celled analysis, by cell serial dilution, until every drop has 3 or 4 cells.Pipette individual cells simultaneously It is placed in PCR pipe.Using Proteinase K, salt and DTT use the following conditions lytic cell: 56 DEG C 20 minutes, 95 DEG C 10 minutes, so 4 DEG C of holdings afterwards.Analysis for genomic DNA is bought or passes through culture cell and extract DNA acquisition from slender with analysis The DNA of the identical cell line of born of the same parents.

In order to use the amplified library of~28,000 primer, use following PCR condition: 40 μ L reaction volumes, 7.5nM's is every Kind primer and 2 × main mixture (MM).In some embodiments, QIAGEN multiple PCR reagent kit is used for main mixture (QIAGEN catalog number (Cat.No.) 206143；Referring to WWW qiagen.com/products/catalog/assay-technologies/ End-point-pcr- and-pr-pcr-reagents/qiagen-multiplex-pcr-kit is integrally incorporated by reference Herein).Kit includes that the grasp of 2x kit multiplex PCR mixes (providing final concentration of 3mM MgCl2,3 × 0.85ml), 5 × Q- solution (1 × 2.0ml) and without RNA enzyme water (2 × 1.7ml).QIAGEN multiplex PCR main mixture (MM) contains KCl and (NH 4) combination of 2O 4 and PCR additive Factor MP increase the local concentration of primer in template.Factor M P stablizes special Property combine primer, allow for example, by the effective primer extend of HotStarTaq archaeal dna polymerase.HotStarTaq DNA is poly- Synthase is the modified forms of Taq archaeal dna polymerase, at ambient temperature without polymerase activity.Following thermal cycle conditions are for the One takes turns PCR:95 DEG C 10 minutes；25 circulation 96 DEG C 30 seconds, 65 DEG C 29 minutes and 72 DEG C 30 seconds；Then 72 DEG C 2 minutes, 4 DEG C It keeps.For the second wheel PCR, 10 μ l reaction volumes, every kind of primer of 1 × MM and 5nM are used.Use following thermal cycle conditions: 95 DEG C 15 minutes；94 DEG C 30 seconds, 65 DEG C 1 minute, 60 DEG C 5 minutes, 65 DEG C 5 minutes, 72 DEG C of 25 of 30 seconds circulations；Then 72 DEG C 2 minutes, 4 DEG C of holdings.

For the library of~3,000 primers, exemplary reaction condition includes the 10ul reaction volume of each primer, 2 × For MM, 70mM TMAC and 2nM primer for the primed libraries for detecting SNV, exemplary reaction condition includes every kind of primer 10ul reaction volume, 2 × MM, 4mM EDTA and 7.5nM primer.Illustrative thermal cycle conditions include 95 DEG C 15 minutes, 94 DEG C 30 seconds, 65 DEG C of 15 minutes and 72 DEG C of 20 of 30 seconds circulations；Then 72 DEG C 2 minutes, 4 DEG C of holdings.

Amplified production adds bar code.The wheel sequencing carried out, the reading of each sample are approximately equal.

Figure 36 A and 36B are shown using the library analysis of about 28.0 primers designed for detection CNV from single thin The genomic DNA (Figure 36 A) of born of the same parents (Figure 36 B) or the result of DNA.Each sample measures about 4,000,000 readings.There are two centers There are CNV for band rather than the instruction of center band.For three DNA samples from individual cells, the percentage of reading is mapped Respectively 89.9%, 94.0% and 93.4%.For two samples of genomic DNA, the mapping of each sample reads percentage It is 99.1%.

Figure 37 A and 37B are shown using the library analysis of about 3.0 primers designed for detection CNV from single The genomic DNA (Figure 37 A) of cell (Figure 37 B) or the result of DNA.Each sample measures about 1,200,000 readings.There are in two There are CNV for central band rather than the instruction of center band.For three DNA samples from individual cells, the percentage of reading is mapped Than being respectively 98.2%, 98.2% and 97.9%.For two samples of genomic DNA, the mapping of each sample reads percentage Than being 98.8%.Figure 38 shows the uniformity of the DOR in these -3,000 sites.

For calling SNV, the calling percentage of the true positives mutation from unicellular and genomic DNA DNA is similar 's.The true positives mutation of individual cells in y-axis calls percentage relative to the Positive mutants of the genomic DNA in x-axis The figure of percentage is called to generate the curve matching of y=1.0076x-0.3088, wherein R2=0.9834.Figure 39 is shown from single The genomic DNA of a cell error calls measurement similar with DNA's.Figure 40 shows that the error rate for detecting transition mutations is greater than inspection The error rate for surveying transversional mutation shows that possible needs select reverse mutation to be used for detection rather than Transpositional mutation.

Embodiment 8

The embodiment further demonstrates referred to as CoNVERGe (the copy number variant thing that copy number variation gene is shown Part) the extensive multiple PCR method measured for chromosome aneuploid and CNV, and further illustrate for ctDNA The exploitation and use of " PlasmArt " standard of the PCR of sample.PlasmArt standard include and it is known show CNV genomic region Domain has the size distribution of naturally occurring cfDNA segment in the polynucleotides and reflection blood plasma of sequence identity.

Sample collection

From American type culture collection (ATCC) obtain human breast cancer cell line (HCC38, HCC1143, HCC1395, HCC1937, HCC1954 and HCC2218) and matched normal cell system (HCC38BL, HCC1143BL, HCC1395BL, HCC1937BL, HCC1954BL and HCC2218BL).Trisomy 21B- lymphocyte (AG16777) and pairing Father/sub- DiGeorge syndrome (DGS) cell line (respectively GM10383 and GM10382) comes from Ke Ruier cell bank (card nurse It steps on, New Jersey).GM10382 cell only has the area male parent 22q11.2.

We acquire tumor tissues from 16 patient with breast cancers, including from geneticist (city Glan Dai Er, California) 11 fresh food frozen (FF) samples and 5 formalin for coming from north bank-Li Jie (graceful Haast, New York) fix paraffin embedding (FFPE) sample.We obtain matched buffy coat sample for 8 patients, obtain matched blood plasma for 9 patients Sample.FF tumor tissues and matched buffy coat from five ovarian cancer patients and plasma sample come from North Shore-LIJ.For 8 tumor of breast FF samples, resection organization is sliced for analyzing.It obtains and comes from north bank/LIJ IRB and Kazakhstan The institutional review board of the state-run Ethics Committee, medical university of Er Kefu is ratified, and is used for sample collection, and from all subjects Obtain informed consent form.

By blood sample collection into EDTA pipe.Using QIAamp circle nucleic acid kit (Qiagen, Valencia, CA) from Circulating tumor DNA is separated in 1mL blood plasma.

In order to prepare Daqi Technology Co., Ltd's standard items according to a kind of illustrative methods, firstly, by 9 × 106 Cell cracks 15 with hypotonic lysis buffer (20mM Tris-Cl (pH 7.5), 10mM NaCl and 3mM MgCl 2) on ice Minute.Then, 10%IGEPAL CA-630 (Sigma, St.Louis, MO) is added to final concentration of 0.5%.With 3 at 4 DEG C, 000g is centrifuged after ten minutes, the core of precipitating is resuspended in 1 before 1000U MNase (new england biological experiment room) is added × Micrococcal nuclease (MNase) buffer (new england biological experiment room, Ipswich, MA) is 5 minutes at 37 DEG C.Pass through EDTA is added and terminates reaction to final concentration of 15mM.Indigested chromatin is removed by being centrifuged 1 minute in 2,000g.With The DNA of DNAClean&Concentrator TM-500 kit (Zymo Research, Irvine, CA) purified fragments.? Disappeared using (Beckman Coulter Inc., mine-laying is sub-, the carbonic anhydrase) purifying of AMPure XP magnetic bead and size selection by micrococcus luteus enzyme Change the monocyte DNA generated.With biological analyser DNA 1000 chip (Agilent Technologies, Santa Clara, carbonic anhydride Enzyme) size is carried out to DNA fragmentation and is quantified.

In order to simulate the ctDNA of various concentration, by the difference of the PlasmArts from HCC1954 and HCC2218 cancer cell Fraction is mixed to from those of corresponding matched normal cell system (respectively HCC1954BL and HCC2218BL).Analysis is each Three samples of concentration.Similarly, in order to simulate the allele imbalance in the focal region 3.5Mb in plasma dna, we From the DNA mixture of the DNA of the different ratios containing the DNA from the children lacked with parent 22q11.2 and from father Generate PlasmArts.Sample only containing father DNA is used as negative control.Analyze eight samples of each concentration.

Therefore, in order to evaluate the sensitivity and repeatability of CoNVERGe, especially when the abnormal DNA ratio of CNV or average When allele imbalance (AAI) is low, we detect the CNV in DNA mixture using it, and the DNA mixture includes first The abnormal sample titration of preceding characterization enters matched normal specimens.Mixture is made of the artificial cfDNA for being known as " PlasmArt ", With the fragment size distribution (seeing above) close to natural cfDNA.Figure.Size distribution of Figure 42 graphical display with cfDNA It compares, the size distribution of the exemplary PlasmArt prepared from cancerous cell line observes chromosome arm lp, lq, on 2p and 2q CNV.In first pair, son's tumor DNA sample of the CNV missing with 3 μM of the regions 22q11.2 is in the total cfDNA of 0-1.5% Between be titrated to (Figure 41 a) in the matched normal specimens from father.CoNVERGe is repeatably identified corresponding to known different Normal CNV, AAI > 0.35% estimated in > 0.5%+/- 0.2%AAI mixture, at 6/8 of 0.25% exception DNA Fail to detect CNV in repeating, is < 0.05% for all 8 negative control samples.It is shown by the AAI value that CoNVERGe estimates Show High Linear (R2=0.940) and reproducibility (error variance=0.087).The measurement is to level of amplification different in same sample It is sensitive.Based on these data, the conservative detection threshold value of 0.45%AAI can be used for subsequent analysis.Using the cutoff value, carry out another One experiment, wherein mixing the ctDNA of Plasmart synthesis with known concentration to generate the synthesis of about 0.5% to about 3.5% Cancer blood plasma.It further include negative plasma as control.The cancer blood plasma of all synthesis generates high 0.45% estimated value, negative blood The reading of slurry is far below 0.45% (Figure 43 A-C).Figure 43 A；Right figure shows the maximum likelihood of tumour, as odds ratio figure The estimation of DNA fragmentation result.Figure 43 B is the figure for detecting transversion event.Figure 43 C is the detection figure of transition events.

Also have evaluated from the tumour and normal cell system sample of pairing to and on chromosome 1 or chromosome 2 with CNV Two other PlasmArt titrate (Figure 41 b, 41c).In negative control, < 0.45%, and High Linear is (right for all values In HCC1954Ip R2=0.952, for HCC2254Iq R2=0.993, for HCC2218 2p R2=0.977, for HCC2218 2q R2=0.967) and reproducibility (error variance=and for HCC1954lp, it is 0.029 for HCC1954lq, it is right It is 0.250 in HCC2218 2p, is 0.350) for HCC2218 2q.It is calculated in known input amount of DNA with by CoNVERGe Amount between observe.The difference of the slope of the recurrence of the region lp and lq of one sample pair and the region lp and lq of same sample B- gene frequency (BAF) in the relative different of copy number observed it is related, show relative accuracy by CoNVERGe The AAI of calculating estimates (Figure 41 c, 41d).

Workflow for handling sample is shown in FIG. 5.63.CoNVERge is suitable for various samples source, including FFPE, fresh food frozen is unicellular, germline control and cfDNA.We using CoNVERGe to six human breast cancer cell line and The normal cell system matched, to assess whether it can detecte body cell CNVs.Arm level and focal CNV are present in all six It in kind tumor cell line, but is not present in its matched normal cell system, in addition to the chromosome 2 in HCC1143, wherein normally Cell line is shown and the deviation map 63b of the homologous ratio of 1:1).In order to verify these on different platforms as a result, we carry out CytoSNP-12 microarray analysis generates consistent result (Figure 63 d, 63e) to all samples.In addition, by CoNVERGe and The maximum of the CNV of CytoSNP-12 microarray identification is homologous than showing strong linear dependence (R2=0.987, P < 0.001) (Figure 63 f).What next we fixed CoNVERGe applied to fresh food frozen (FF) (Figure 64 a) and formalin, paraffin packet (FFPE) breast tumor tissue samples (Figure 64 b, 64d) buried.In two kinds of sample types, there are several arm levels and focal CNV；However, not detecting CNV in the DNA from matched buffy coat sample.CoNVERGe result with come from phase With result (Figure 64 e-h of the microarray analysis of sample；R2=0.909, for P < 0.001 of the CytoSNP-12 on Cy；For The R2=0.992 of the OncoScan of FFPE, P < 0.001) it is highly relevant.CoNVERGe is also to from laser capture microdissection (LCM) a small amount of DNA of sample extraction generates consistent as a result, wherein microarray method is not applicable.

The CNV in unicellular is detected with CoNVERGe

In order to test this mmPCR method applicability limitation, we are from six kinds of above-mentioned cancerous cell lines and in target region In do not have to separate individual cells in the B- lymphocyte cell line of CNV.CNV from these unicellular experiments is composed in three repetitions And come since being consistent (figure between those of genomic DNA (gDNA) that the bulk sample of about 20,000 cells extracts 65).The quantity of SNP based on no sequencing reading, the average test expulsion rate of bulk sample are 0.48% (range: 0.41%- 0.60%), this is attributed to synthesis or measurement design failure.For other average measurement rate of descent that is unicellular, observing For 0.39% (range: 0.19%-0.67%).For the single cell measurements (that is, there is no measurements to fall off) not failed, make It is only 0.05% (range: 0.00%-0.43%) with the average single ADO rate that heterozygosis SNP is calculated.In addition, having high confidence The percentage of the SNP (that is, the SNP genotype determined at least 98% confidence level) of genotype is for unicellular and a large amount of samples Be it is similar, and the genotype in unicellular sample matched with the genotype in gross sample (average 99.52% range: 92.63%-100.00%).

In unicellular, gene frequency is expected directly to reflect chromosomal copy number, this is different from tumor sample, wherein Tumor sample may be obscured by TH and non-tumor cell pollution.The BAF of 1/n and (n-1)/n indicates that n chromosome in region is copied Shellfish.Chromosomal copy number (Figure 65) is indicated on the gene frequency figure of unicellular and matched gDNA sample.

Application of the CoNVERGe in plasma sample

The ability of CNV is detected to study CoNVERGe in actual plasma sample, our method is applied to by we With the cfDNA for matching tumor biopsy and matching of each of two II primary breast cancer patients and 5 advanced ovarian cancers.Institute Have in 7 patients, CNV (Figure 66) is detected in FF tumor tissues and corresponding plasma sample.Figure 67 provides SNV breast cancer The list of mutation.In five regions measured, detected in seven plasma samples (range: 0.48-12.99%AAI) 32 CNV, level > 0.45%AAI in total represent about 20% genome.Note that due to the orthogonal method for lacking substitution, The presence of CNV in blood plasma is not can confirm that.

Although AAI estimation may show related to the BAF in tumour, due to Tumor Heterogeneity, it is not necessarily intended to direct Proportionality.For example, the oval 66a of the top left region of Fig. 8 indicates to have compatible with N=11 in sample B C5 (Figure 66 a) The region of BAF；By itself and the AAI calculations incorporated from plasma sample, cause the c's in two regions to be estimated as 2.33% He 2.67%.The value between 4.46% and 9.53% is provided using other regions estimation c in sample, this clearly demonstrates that tumour Heterogeneous presence.

These statistics indicate that, CNV can be detected in most of sample in blood plasma, and show CNV in tumour more Generally, a possibility that observing in cfDNA is bigger.In addition, CoNVERGe detects CNV from liquid biopsy, otherwise it can It can not be observed in traditional tumour biopsy.

Embodiment 9

This embodiment offers certain exemplary sample preparation methods of the CoNVERGe analysis for different type sample Details.

Unicellular CNV scheme for 28,000-plex PCR

Multiplex PCR allows to expand many targets simultaneously in single reaction.There is 10% Minimum plant Population minorAllele Frequency (1000 Genomes project datas；On April 30th, 2012 version) each genome area in identify target SNP.For Each SNP, multiple primers, half nesting, it is designed as having between the amplicon length of the maximum length of 75bp and 54-60.5 DEG C Melting temperature.Calculate the primer interaction score of all possible primer combination；Eliminating has the primer of balloon score to drop A possibility that low primer dimer product formation.Based on target SNP minorAllele frequency, the heterozygosis rate observed (is come from DbSNP), the presence in HapMap and amplicon length are classified and are selected to candidate PCR measurement.

In some experiments, using mmPCR 28,000-plex scheme prepares and expands unicellular sample.Sample is with following Prepared by mode: in order to analyze individual cells, by cell serial dilution, until each drop has 3 or 4 cells.It pipettes single thin Born of the same parents are placed in PCR pipe.Using Proteinase K, salt and DTT use the following conditions lytic cell: 56 DEG C 20 minutes, 95 DEG C 10 points Clock, then 4 DEG C of holdings.Analysis for genomic DNA, purchase or pass through culture cell and extract DNA obtain come from and analysis Unicellular identical cell line DNA.DNA is being contained into Qiagen mp-PCR main mixture (2XMM ultimate density), It is expanded in the 40uL reaction volume of 7.5nM primer concentration.For having the 28K primer pair of half nesting Rev primer, at 95 DEG C 10 points Clock, 25 × [96 DEG C 30 seconds, 65 DEG C 29 minutes, 72 DEG C 30 seconds], 72 DEG C 2 minutes, 4 DEG C of holdings.Amplified production is diluted in water 1:200, and STAR 2 (10 μ l reaction volume) 1XMM is added in 2 μ l, 5nM primer concentrate is simultaneously drawn using half nested inner Fwd Object and label specificity Rev primer: 95 DEG C 15 minutes, 25 × [94 DEG C 30 seconds, 65 DEG C 1 minute, 60 DEG C 5 minutes, 65 DEG C 5 minutes, 72 DEG C 30 seconds], 72 DEG C 2 minutes, 4 DEG C of holdings carry out PCR.

Total order column label and bar code are connected to amplified production, and use adapter primer amplified 9 circulations. Before sequencing, merge bar code library production, purified with QIAquickPCR purification kit (Qiagen), and uses Qubit DsDNA BR assay kit (Life Technologies, Inc., the U.S.) is quantitative.Using 2500 sequenator of Illumina HiSeq to amplification Son is sequenced.

DNA is extracted from blood/plasma sample

By blood sample collection into EDTA pipe.Whole blood sample is centrifuged and is divided into three layers: upper layer, 55% blood sample are Blood plasma, and contain Cell-free DNA (cfDNA)；Contain white thin less than 1% DNA with total DNA amount in buffy coat middle layer Born of the same parents；And bottom, the 45% of collected blood sample contains red blood cell, because red blood cell is picked-off, does not deposit in the fraction In DNA.Using QIAamp circle nucleic acid kit Qia-Amp (Qiagen, Valencia, CA) according to the scheme of manufacturer to Circulating tumor DNA is separated in few 1mL blood plasma.The plasma C NV of 3, the 168-plex for chromosome 1p, lq, 2p, 2q and 22q11 Scheme.

It prepares plasma dna library and is expanded using mmPCR 3,168-plex scheme.Sample is prepared in the following manner: will be high Up to 20mL centrifugal blood to separate buffy coat and blood plasma.The blood plasma for carrying out cfDNA extracts and library preparation.DNA is existed It is eluted in 50uLTE buffer.The input of mmPCR is the Natera blood plasma library of 6.7uL amplification and purifying, and input quantity is about 1200ng.Containing Qiagen mp-PCR main mixture (2XMM ultimate density), 20 μ L reactants of 2nM labeled primer concentration Plasma dna is expanded in product.(12.7uM in total) and PCR amplification: 95 DEG C 10 minutes, 25 × [96 DEG C 30 seconds, 65 DEG C 20 minutes, 72 DEG C 30 seconds], 72 DEG C 2 minutes, 4 DEG C holding.Amplified production is diluted into 1:2000 in water, and 10 μ L reactants are added in 1 μ l Barcoding-PCR in product.Using tag-specific primers, bar code is connected to amplified production 12 by PCR amplification Circulation.Merge the product of multiple samples, is then purified with QIAquick PCR purification kit (Qiagen), and in 50 μ lDNA It is eluted in buffer suspension liquid.Such as being carried out by NGS to sample described in the unicellular CNV scheme for 28,000 heavy chain PCR Sequencing.SNV panel of the feasibility of breast cancer from blood plasma.

The cfDNA from patient with breast cancer's blood sample is prepared, and is drawn using 336 that are distributed in four 84- Kong Chizhong Object is to expanding.It is used for chromosome 1p, 1q, 2p as being directed to, described in the plasma C NV scheme of 3, the 168- aggressiveness of 2q and 22q11 Prepare Natera blood plasma library.DNA is eluted in 50uL TE buffer.The input of mPCR is 2.5uL amplification and purifying Natera blood plasma library, input quantity are about 600ng.Figure 68 A-B indicate SNP main used in 3168mmPCR reaction and MinorAllele frequency.X-axis indicates the number of the SNP for chromosome 1q, 1p, 2q, 2p and 22q from left to right.From the mankind 1000 Genome Atlas in select SNP, the 19th group and dbSNP selection target, but be used only and come from 1000 genomes SNP screen minorAllele frequency.In 84 overlapping primers ponds, containing Qiagen mp-PCR main mixture, (2XMM is most Final concentration), the 10uL reaction volume of 4mM EDTA, 7.5nM primer concentration (amounting to 1.26uM) is parallel with four of PCR amplification anti- Answer middle amplification plasma dna: 95 DEG C 15 minutes, 25 × [94 DEG C 30 seconds, 65 DEG C 15 minutes, 72 DEG C 30 seconds], 72 DEG C 2 minutes, 4 DEG C of guarantors It holds.By the amplified production of 4 kinds of sub- buffers, respectively 1:200 dilutes in water, and 1 μ l is added to containing the main mixing of Q5 HS HF In Barcoding-PCR reaction in the 10uL reaction volume of every kind of bar code primer of object (1xfinal) and 1uM, and every kind Expand pond in following reaction: 98 DEG C 1 minute, 25 × [98 DEG C 10 seconds, 70 DEG C 10 seconds, 60 DEG C 30 seconds, 65 DEG C 15 seconds, 72 DEG C 15 Second], 72 DEG C 2 minutes, 4 DEG C holding.Library QIAquick PCR purification kit (Qiagen) is purified, and in 50 μ lDNA It is eluted in buffer suspension liquid.Sample is sequenced by paired end sequencing.

Embodiment 10

This embodiment offers about the details for identifying certain illustrative methods of SNV for analyzing sequencing data.

SNV method 1: for the embodiment, background error model is constructed using normal plasma samples, in identical survey Sequencing is in sort run to consider to run specific pseudomorphism.In certain embodiments, 5,10 are analyzed in identical sequencing operation, 15,20,25,30,40,50,100,150,200,250 or be more than 250 normal plasma samples.In certain illustrative embodiments In, 20,25,40 or 50 normal plasma samples are analyzed in identical sequencing operation.It removes to have and is greater than the normal of cutoff value The noisy locations of intermediate value variant gene frequency.For example, in certain embodiments, which is > 0.1%, 0.2%, 0.25%, 0.5%, 1%, 2%, 5% or 10%.In certain illustrative embodiments, there is the normal internal greater than 0.5% The noise position of variant gene frequency is removed.Remove exceptional value sample iteratively from model to consider noise and dirt Dye.In certain embodiments, the sample with the Z score greater than 5,6,7,8,9 or 10 is removed from data analysis.For every Each base replacement of a genomic locus calculates the standard deviation of the depth and error that read weighted average.To have to Few 5 variants read and are known as candidate for the Z-score of background error model for 10 tumour or cell-free plasma sample position Mutation.

SNV method 2: for the embodiment, we are intended to determine mononucleotide variant using blood plasma ctDNA data (SNV).PCR process model building is random process by we, estimates parameter using training set, and carry out most using individual test set Whole SNV is called.Main thought is propagation of the determining error in multiple PCR cycles, calculates the average value and variance of background error, And distinguish background error and true mutation.

For the following parameter of each base estimation:

P=efficiency (each in each cycle to read the probability being replicated)

p_eThe error rate (probability of the mistake of type e) in each period of=mutation type e

Occur)

X_oThe initial number of=molecule

It is read due to being replicated during PCR, so the mistake occurred is more.Therefore, the error profile of reading by with original The separating degree read that begins determines.We, which will read, is known as kth generation, if it has gone through k duplication, until it is generated. Let us is that each basis defines following variable:

X_ijThe number that=the i-th generation generated in PCR cycle j reads

Y_ij=generation sum the i read at the end of period j

X_ij ^e=the number that there is the i-th generation of mutation e to read generated in PCR cycle

In addition, other than normal molecular Xo, if when PCR process starts in the presence of the other/eXo molecule with mutation e (therefore fe/ (1+fe) by be mutating molecule in original mixture score).

To the sum for generating i-1 and reading for being scheduled on period j-1, there is sample in the quantity that i is read that generates that period j is generated The bi-distribution of size and probability parameter p.Therefore, E (X_ij/ Yi-ij-i, p)=p Y_{I-1, j-1}With Var (X_ij|Y_{I-1, j-1}, p) and=p (l-p)Y_{I-1, j-1}。

We also haveTherefore, by recurrence, simulation or similar method, we can determine whether E (X_ij).Similarly, the distribution of p can be used to determine Var (X in we_ij)=E (Var (X_ij,/p))+Var (E (X_ij,/p)).

Finally, E (X_ij ^e/Y_{I-1, j-1}, pe) and=pe Y_{I-1, j-1}With Var (X_ij ^e/Y_{I-1, j-1}, pe) and=pe (1-pe) Y_{I-1, j-1}Meter Calculate E (X_ij ^e) and Var (X_ij ^e)。

20.

6+2 algorithm

The algorithm begins with the efficiency and error rate that training set estimates each period.N is enabled to indicate the total of PCR cycle Number.

The number of reading Rb at each substrate b can be approximated to be (1+Pb) n X 0, and wherein pb is the efficiency at substrate b. Then, (Rb/Xo) 1/n can be used for approximate 1+pb.Then, we can determine whether the average values and standard of pb in all training samples Variation, the parameter or similar distribution being distributed with estimated probability).

Similarly, the number that the error e at each substrate b reads Rb can be used to estimate ρ e.Determining all trained samples After the average value and standard deviation of this error rate, its approximate probability distribution (such as normal distribution, β or similar distribution), Parameter is estimated using the average value and standard deviation value.

Next, we estimate that the initial starting copies of each bases are for test dataWherein f () is the estimation distribution from training set.

Wherein, f () is the estimation distribution from training set.

Therefore, we have estimated the parameter used in random process.Then, by using these estimations, Wo Menke To estimate that average value and the variance of molecule create in each period (note that we are respectively normal molecular, mistake molecule and are dashed forward Variation is done so).

Finally, by using probabilistic method (such as maximum likelihood method or the like), we can determine whether best fe value, It is suitble to error, the optimal distribution of mutation and normal state molecule.More specifically, we estimate the error point of various/e value in finally reading Son determines a possibility that our data are for each of these values to the desired ratio of total molecule, then selection tool There is the value of highest likelihood.

In certain embodiments, the above method 2 is carried out as follows:

A) using training dataset estimation PCR efficiency and each circular error rate；

B) starting molecule of the test data set at each base is estimated using the distribution for the efficiency estimated in step (a) Number,

C) when necessary, the estimation of the efficiency of test data set is updated using the starting molecule number estimated in step (b),

D) use is in step (a), (b), (c) the middle test set data estimated and parameter Estimation molecule sum, background error (search formed for the initial percentage by true mutating molecule is empty for the average value and variance of molecule and true mutating molecule Between).

E) the overall error molecular number (background error and true mutation) being distributed in total molecule is closed, and is calculated in search space A possibility that each true mutation percentage；And

F) it determines most probable practical mutation percentage, and calculates confidence level using the data in step (e).

Embodiment 11

This embodiment offers use following as a result, being used to pass through detection for multiplex PCR CoNVERGe method provided herein CNV in circular DNA detects cancer.It is used for chromosome 1p, 1q, 2p using provided herein, the 3 of 2q and 22q11,168 weights Plasma CNV scheme.Analysis comes from the blood plasma of 21 patient with breast cancers's (I-IIIB phase).As a result it proves as shown in figure 44 in institute Have in sample and detects CNV using AAI >=0.45% and needs as little as 62 heterozygosis SNP.Come using similar program analysis From the blood plasma of ovarian cancer patients.Using 0.45% cutoff value, 100% oophoroma verification and measurement ratio, five samples as shown in figure 25 are realized Each of also have matched tumor sample.

Embodiment 12

The embodiment proves to realize that the significant of ability of detection cancer changes by the presence of CNV and SNV in measurement blood plasma It is kind.CNV and SNV is detected using the method provided in above-described embodiment.Sample is prepared according to the appropriate scheme in embodiment 9.Make SNV is identified with above-mentioned SNV method 1.As shown in figure 46, by analyzing from I-III phase cancer patient from CNV's and SNV Blood plasma, compared with individually test SNV, the significant sensitivity for improving detection mammary gland and lung cancer.SNV is only analyzed, in plasma sample In detect 71% cancer.However, by the presence of analysis SNV and/or CNV, the inspection of mammary gland in the PATIENT POPULATION of analysis Extracting rate reaches 83%, and the recall rate of lung is 92%.If it is considered that all SNV identified in TCGA and COSMIC data set and CNV, it is contemplated that diagnostic load will be greater than 97% breast cancer and > 98% lung cancer.

Using the plasma sample preparation method provided in embodiment 9 provided above and SNV method 1 to from not Sample with 41 Patient Sample As of carcinoma stage is further analyzed.As shown in figure 47, when from patient with breast cancer's When measuring CNV and SNV in Circulating tumor DNA, detected using the determination limit of the 0.2%ctDNA of SNV and 0.45 determination limit 60% I phase, 88% II phase and 100% III primary breast cancer CNV %ctDNA.As shown in figure 48, when being examined in ctDNA When surveying CNV and SNV and observing 41 Patient Sample As with different stages for breast cancer, 60% stage I, 100% stage II, 90% stage IIA, the III phase of 80% stage IIB and 100%, IIIA phase and IIIB primary breast cancer, use SNV's The determination limit of the 0.45%ctDNA of 0.2%ctDNA and CNV is detected.As shown in figure 49, when from patients with lung cancer sample When measuring CNV and SNV in 24 Circulating tumor DNAs of product, 88% is detected using the quantitation limit of the 0.2%ctDNA of SNV Stage I, the III phase lung cancer of 100% stage II and 100% and the 0.45%ctDNA for CNV.As shown in figure 50, when When detecting CNV and SNV in ctDNA and checking 24 Patient Sample As with different lung cancer, in addition to using the patient of IB lung cancer real Except existing 82% verification and measurement ratio, for all 100% verification and measurement ratios of realization by stages, the 0.45% of the 0.2%ctDNA and CNV of SNV The quantitative limit of ctDNA.

Embodiment 13

The embodiment proves that SNV is detected in ctDNA to be overcome due to Tumor Heterogeneity and identify and become in biopsy samples The limitation of body allele.Use the TRACERx sample and a gland cancer patients with lung cancer sample of three Patients With Small Cell Carcinoma of The Lung samples Product, wherein having collected tumor biopsy and corresponding operation consent plasma sample for analyzing Tumor Heterogeneity.Sample is ground obtained from cancer Study carefully Britain's lung cancer center of excellence, University College London, London WC1E 6BT, Britain.Sample is for analyzing the primary of SNV mutation Property lung cancer sample.Two to three biopsy (figures of each region from entire carcinous lung are taken out from each patient 51A).Pass through full sequencing of extron group (Illumin aHiSeq200；Hundred million sensible companies, Santiago, CA) each biopsy of measurement Sample then carries out AmpliSeq sequencing (ion torrent company, South San Francisco, CA) on PGM, for identifying potential clone It is heterogeneous.It is being sequenced with after SNV analysis, is determining the variant gene frequency (VAF) (Figure 51 B) of each biopsy samples.

Plasma sample from each of four patients is for separating ctDNA and identifying the clone in blood plasma and sub- gram Grand SNV mutation is to overcome Tumor Heterogeneity (Figure 52).Clonal population has VAF equipotential in all measurement biopsy samples and blood plasma Gene, and being subcloned group, there is VAF allele to call at least one biopsy samples, but not all biopsy samples. Blood plasma be considered as in the ctDNA of each patient the accumulation of SNV that finds represent.The not all SNV energy by sequencing identification Enough corresponding PCR measurements with design.

In order to compare AmpliSeq (Si Wangdun) and the mmPCR/NGS measuring method for identifying Tumor Heterogeneity, PCR measurement of the Natera designed for each SNV mutation of the VAF detection in the biopsy and corresponding ctDNA from blood plasma (Figure 53).Blanc cell represents no biopsy samples and can be used, and zero expression does not detect VAF.11 genes initially lead to below AmpliSeq FP or FN test for identification is crossed as negative (false VAF is called), but passes through NateraTP or TN test and mmPCR/NGS Test method is correctly called: L12:CYFIP1, FAT1, MLLT4 and RASA1；L13:HERC4, JAK2, MSH2, MTOR and PLCG2；L15:GABRG1；L17:TRIM67., it is surprising that being tested when reexamining AmpliSeq raw sequencing data These results are demonstrate,proved.Original AmpliSeq data sequencing file shows that data can detect threshold value setting lower than PGM or Illumina. The data of 16/38 variant of identification are detected in blood plasma, and in the L12 Patient Sample A being mutated with dominant clone SNV There are several biopsy samples: L12:BRIP1, CARS, FAT1, MLLT4, NFE2L2, TP53, TP53 and patient L13:EGFR, EGFR, TP53 and L15:KDM6A, ROS1.It was found that other two patients have four kinds of subclone variation mutation in total in blood plasma: L12:CIC, KDM6A and L17；NF1, TRIM67.It is being averaged for each sample listed in Figure 53 that these results, which are summarised in Figure 54 A, The whisker of VAF.Figure 54 B is the direct comparison indicated by the linear regression graph of the VAF sample mean of each measurement.

Embodiment 14

The embodiment shows by using low primer concentration, so that primer amount is the restricted reaction in multiplex PCR Object, in the workflow for being followed by next-generation sequencing, the uniformity of the reading density across amplified reaction pond and detection therefore Limit is to improve.Using some experiments for carrying out plasma C NV according to 3,168 orifice plates of above-described embodiment 9, the difference is that overall reaction Volume is 10uL rather than 20uL.In addition, PCR carries out 15,20 or 25 circulations.According to the scheme of embodiment 9, breast cancer is used Four holes 84- pond on sample carries out other experiments, unlike primer concentration be 2nM, and PCR amplification carry out 15,20 or 25 circulations.

It is without being bound by theory, it is believed that the restricted multiplex PCR of primer provides improved multiplex PCR before more reading sequencings and reads Take uniformity depth, such as in Illumina HiSeq or MiSeq system or based on Ion Torrent PGM or proton system Sequencing is based on considered below: if some amplifications in multiplex PCR have lower efficiency than other amplifications, utilizing normal Multiplex PCR, we will obtain reading depth (" DOR ") value of wide scope, however, if primer is limited, and multiplex PCR Cycle ratio discharge primer needed for often, then more effectively amplification will stop doubling (because they are more Primer use), more inefficient primer will continue to double will lead to the amplified production of amount more like for all amplified productions, This translates into the distribution of DOR more evenly.

It is calculated below for determining the primer of accurate specified rate and the recurring number of initial nucleic acid template:

Assume given starting DNA input level: each target 100k copies (10A5；Amplification library can be used to be easy for this It realizes on ground)

Assume that we use every kind of primer of 2nM as exemplary concentration, but other concentration such as 0.2,0.5,1, 1.5,2,2.5,5 or 10nM can also work.

Calculate the primer molecule number of every kind of primer: 2*10^9 molar concentration, 2nM) × 10*10^-6 (reaction volume, 10 μ L) X6*10^23 (every molecular number, Avogadro number)=12*10^9

Calculate amplification times needed for consuming all primers: 12*10^9 (primer molecule number)/10^5 (copy by each target Shellfish number)=12*10^4

Recurring number needed for calculating reaches the amplification times, it is assumed that in 100% efficiency of each circulation: 2 (12^10 of log ^4)=17 circulation (this is log 2, because in each circulation, copy number is double).

Therefore for these condition (100k copy input, 2nM primer, 10 μ l reaction volumes, it is assumed that in each cycle 100% PCR efficiency), primer will be in 17 PCR cycle post consumptions.

However, crucial hypothesis is, some products do not have 100% efficiency, therefore do not measure their efficiency (this It is feasible for a small amount of they), consuming them will need over 17 periods.

Figure 55-58 shows the result in four 84-plex SNV PCR primer ponds.For each pond, it is observed that with From 15 to 20 to 25 circulation increase DOR efficiency improve.Similar knot is obtained using the experiment of 3,168- panel (Figure 59-61) Fruit.With the increase for reading depth, detection limit reduces (i.e. SNV sensitivity increases).In addition, when detection transversional mutation is more prominent than transformation When change, sensitivity is more preferable always.When multiplex PCR restricted using primer before more reading sequencing, may use additionally Circulation can obtain the additional increase of DOR efficiency.

Therefore, on the one hand, there is provided herein the methods of multiple target sites in amplification of nucleic acid sample comprising (i) makes core Sour sample is contacted with primed libraries and other primer extension reaction components to provide reaction mixture, wherein with other primer extends Reactive component is compared, in reaction mixture the relative quantity of every kind of primer generate wherein primer to be reacted existing for restricted concentration, And wherein primer and multiple and different target position dot blots；(ii) keeps reaction mixture experience primer extension reaction condition enough The circulation of number is to consume or exhaust the primer in primed libraries, to generate the amplified production for including target amplicon.For example, multiple Different target trajectories can include at least 2,3,5,10,25,50,100,200,250,500,1,000；2,000；5,000；7, 500；10,000；20,000；25,000；30,000；40,000；50,000；75,000；Or 100,000 different target site, It and is at most 50,100,200,250,500,1,000；2,000；5,000；7,500；10,000；20,000；25,000；30, 000；40,000；50,000；75,000；100,000,200,000,250,000,500,000 and 1,000,000 different target Site is to generate reaction mixture.

Method in illustrative embodiment include determine by be rate limit amount primer amount.The calculating generally includes Estimation and/or the quantity for determining target molecule, and be related to analyzing and/or determining the quantity of carried out amplification cycles.For example, In illustrative embodiment, the concentration of every kind of primer is less than 100,75,50,25,10,5,2,1,0.5,0.25,0.2 or 0.1nM. In various embodiments, the G/C content of primer is between 30 to 80%, such as between 40 to 70% or 50 to 60%, including End value.In some embodiments, primer G/C content range (for example, maximum G/C content subtracts minimum G/C content, such as The range of 80%-60%=20%) less than 30%, 20%, 10% or 5%.In some embodiments, the melting temperature of primer It (Tm) is 40 DEG C to 80 DEG C, such as 50 DEG C to 70 DEG C, 55 DEG C to 65 DEG C or 57 DEG C to 60.5 DEG C, including end value.In some implementations In scheme, the melting temperature range of primer is less than 20 DEG C, 15 DEG C, 10 DEG C, 5 DEG C, 3 DEG C or 1 DEG C.In some embodiments, draw The length of object is 15 to 100 nucleotide, such as 15 to 75 nucleotide, 15 to 40 nucleotide, 17 to 35 nucleotide, 18 To 30 nucleotide, 20 to 65 nucleotide.In some embodiments, primer includes the label of non-target-specific, such as shape At the label of internal ring structure.In some embodiments, label is between two combined areas DNA.In various embodiments, Primer includes the region 5' to target site specificity, not specific to target site and form the interior zone of ring structure and to target position The special region 3' of point.In various embodiments, the length in the area 3' is at least seven nucleotide.In some embodiments, 3' The length in area is 7 to 20 nucleotide, such as 7 to 15 nucleotide or 7 to 10 nucleotide.In various embodiments, it surveys Examination primer includes the region 5' for not having specificity to target site (such as label or universal primer binding site), is followed by target The region of site-specific is not specificity for target site and forms ring structure, and the 3' region special to target site.One In a little embodiments, less than 50,40,30,20,10 or 5 nucleotide of length range of primer.In some embodiments, target The length of amplicon is 50 to 100 nucleotide, such as 60 to 80 nucleotide or 60 to 75 nucleotide.In some embodiment party In case, less than 100,75,50,25,15,10 or 5 nucleotide of length range of target amplicon.

In multiple embodiments of any aspect of the invention, primer extension reaction condition is polymerase chain reaction item Part (PCR).In various embodiments, the length of annealing steps be greater than 3,5,8,10 or 15 minutes but less than 240,120,60 or 30 minutes.In various embodiments, extend step length be greater than 3,5,8,10 or 15 minutes but less than 240,120,60 or 30 minutes.

Embodiment 15

The identification in single cell analysis (also referred to as single molecule analysis) that this example demonstrated SNV detection methods of the invention The ability of chimera.Figure 62 is shown using according to the 28K-plex primer sets of the unicellular method of the 28K provided in embodiment 9 Tumor cell gene group DNA and individual cells/molecule input multiplex PCR result.Reading quilt using this method, more than 85% Mapping-reads (each target about 167 readings) more than 4.7M.Be partially shown under figure in cell observe it is chimeric.

Claims

1. a kind of for detecting one kind in blood, serum or the plasma sample with cancer or the object under a cloud with cancer Or the method for various mutations or genetic mutation, the method include:

By full sequencing of extron group, various mutations or genetic mutation in the tumor sample of the object are identified；

Blood, serum or plasma sample are collected from the object, and are divided from the blood, serum, blood plasma or tumor sample From Cell-free DNA；

From the Cell-free DNA, amplification corresponds to multiple locus of the mutation or genetic mutation, to obtain amplicon；

The amplicon is sequenced, to obtain sequence reads；And

Detection one of the mutation or genetic mutation present in the Cell-free DNA or more from the sequence reads Kind.

2. according to the method described in claim 1, wherein the Cell-free DNA includes Circulating tumor DNA.

3. according to the method described in claim 1, wherein the mutation or genetic mutation become comprising one or more mononucleotides Body (SNV) mutation.

4. according to the method described in claim 1, wherein the mutation or genetic mutation change comprising one or more copy numbers (CNV).

5. according to the method described in claim 3, wherein at least one of described SNV mutation is in gene selected from the following: TP53、PTEN、PIK3CA、APC、EGFR、NRAS、NF2、FBXW7、ERBBs、ATAD5、KRAS、BRAF、VEGF、EGFR、 HER2、ALK、p53、BRCA、BRCA1、BRCA2、SETD2、LRP1B、PBRM、SPTA1、DNMT3A、ARID1A、GRIN2A、 TRRAP、STAG2、EPHA3/5/7、POLE、SYNE1、C20orf80、CSMD1、CTNNB1、ERBB2. FBXW7、KIT、MUC4、 ATM、CDH1、DDX11、DDX12、DSPP、EPPK1、FAM186A、GNAS、HRNR、KRTAP4-11、MAP2K4、MLL3、NRAS、 RB1、SMAD4、TTN、ABCC9、ACVR1B、ADAM29、ADAMTS19、AGAP10、AKT1、AMBN、AMPD2、ANKRD30A、 ANKRD40、APOBR、AR、BIRC6、BMP2、BRAT1、BTNL8、C12orf4、C1QTNF7、C20orf186、CAPRIN2、 CBWD1、CCDC30、CCDC93、CD5L、CDC27、CDC42BPA、CDH9、CDKN2A、CHD8、CHEK2、CHRNA9、CIZ1、 CLSPN、CNTN6、COL14A1、CREBBP、CROCC、CTSF、CYP1A2、DCLK1、DHDDS、DHX32、DKK2、DLEC1、 DNAH14、DNAH5、DNAH9、DNASE1L3、DUSP16、DYNC2H1、ECT2、EFHB、RRN3P2、TRIM49B、TUBB8P5、 EPHA7、ERBB3、ERCC6、FAM21A、FAM21C、FCGBP、FGFR2、FLG2、FLT1、FOLR2、FRYL、FSCB、GAB1、 GABRA4、GABRP、GH2、GOLGA6L1、GPHB5、GPR32、GPX5、GTF3C3、HECW1、HIST1H3B、HLA-A、HRAS、 HS3ST1、HS6ST1、HSPD1、IDH1、JAK2、KDM5B、KIAA0528、KRT15、KRT38、KRTAP21-1、KRTAP4-5、 KRTAP4-7、KRTAP5-4、KRTAP5-5、LAMA4、LATS1、LMF1、LPAR4、LPPR4、LRRFIP1、LUM、LYST、 MAP2K1、MARCH1、MARCO、MB21D2、MEGF10、MMP16、MORC1、MRE11A、MTMR3、MUC12、MUC17、MUC2、 MUC20、NBPF10、NBPF20、NEK1、NFE2L2、NLRP4、NOTCH2、NRK、NUP93、OBSCN、OR11H1、OR2B11、 OR2M4、OR4Q3、OR5D13、OR8I2、OXSM、PIK3R1、PPP2R5C、PRAME、PRF1、PRG4、PRPF19、PTH2、 PTPRC、PTPRJ、RAC1、RAD50、RBM12、RGPD3、RGS22、ROR1、RP11-671M22.1、RP13-996F3.4、 RP1L1、RSBN1L、RYR3、SAMD3、SCN3A、SEC31A、SF1、SF3B1、SLC25A2、SLC44A1、SLC4A1、SMAD2、 SPTA1、ST6GAL2、STK11、SZT2、TAF1L、TAX1BP1、TBP、TGFBI、TIF1、TMEM14B、TMEM74、TPTE、 TRAPPC8、TRPS1、TXNDC6、USP32、UTP20、VASN、VPS72、WASH3P、WWTR1、XPO1、ZFHX4、ZMIZ1、 ZNF167、ZNF436、ZNF492、ZNF598、ZRSR2、ABL1、AKT2、AKT3、ARAF、ARFRP1、ARID2、ASXL1、ATR、 ATRX、AURKA、AURKB、AXL、BAP1、BARD1、BCL2、BCL2L2、BCL6、BCOR、BCORL1、BLM、BRIP1、BTK、 CARD11、CBFB、CBL、CCND1、CCND2、CCND3、CCNE1、CD79A、CD79B、CDC73、CDK12、CDK4、CDK6、 CDK8、CDKN1B、CDKN2B、CDKN2C、CEBPA、CHEK1、CIC、CRKL、CRLF2、CSF1R、CTCF、CTNNA1、DAXX、 DDR2、DOT1L、EMSY (C11orf30)、EP300、EPHA3、EPHA5、EPHB1、ERBB4、ERG、ESR1、EZH2、 FAM123B (WTX)、FAM46C、FANCA、FANCC、FANCD2、FANCE、FANCF、FANCG、FANCL、FGF10、FGF14、 FGF19、FGF23、FGF3、FGF4、FGF6、FGFR1、FGFR2、FGFR3、FGFR4、FLT3、FLT4、FOXL2、GATA1、 GATA2、GATA3、GID4 (C17orf39)、GNA11、GNA13、GNAQ、GNAS、GPR124、GSK3B、HGF、IDH1、IDH2、 IGF1R、IKBKE、IKZF1、IL7R、INHBA、IRF4、IRS2、JAK1、JAK3、JUN、KAT6A (MYST3)、KDM5A、 KDM5C、KDM6A、KDR、KEAP1、KLHL6、MAP2K2、MAP2K4、MAP3K1、MCL1、MDM2、MDM4、MED12、MEF2B、 MEN1、MET、MITF、MLH1、MLL、MLL2、MPL、MSH2、MSH6、MTOR、MUTYH、MYC、MYCL1、MYCN、MYD88、 NF1、NFKBIA、NKX2-1、NOTCH1、NPM1、NRAS、NTRK1、NTRK2、NTRK3、PAK3、PALB2、PAX5、PBRM1、 PDGFRA、PDGFRB、PDK1、PIK3CG、PIK3R2、PPP2R1A、PRDM1、PRKAR1A、PRKDC、PTCH1、PTPN11、 RAD51、RAF1、RARA、RET、RICTOR、RNF43、RPTOR、RUNX1、SMARCA4、SMARCB1、SMO、SOCS1、SOX10、 SOX2、SPEN、SPOP、SRC、STAT4、SUFU、TET2、TGFBR2、TNFAIP3、TNFRSF14、TOP1、TP53、TSC1、 TSC2, TSHR, VHL, WISP3, WT1, ZNF217 and ZNF703.

6. according to the method described in claim 3, wherein at least one of described SNV mutation is in gene selected from the following: CYFIP1, FAT1, MLLT4, RASA1, HERC4, JAK2, MSH2, MTOR, PLCG2, GABRG1 and TRIM67.

7. according to the method described in claim 3, wherein the SNV mutation is mutated comprising one or more clone SNV.

8. according to the method described in claim 7, wherein at least one of described clone SNV mutation is in gene selected from the following In: BRIP1, CARS, FAT1, MLLT4, NFE2L2, TP53, TP53, EGFR, EGFR, TP53, KDM6A and ROS1.

9. according to the method described in claim 3, wherein the SNV mutation is mutated comprising one or more subclone SNV.

10. according to the method described in claim 9, wherein at least one of described subclone SNV mutation is selected from the following In gene: CIC, KDM6A, NF1 and TRIM67.

11. according to the method described in claim 3, wherein the SNV mutation includes one or more clone SNV mutation and one Kind or a variety of subclone SNV mutation.

12. according to the method for claim 11, wherein the method further includes the clone for determining the tumor sample It is heterogeneous.

13. according to the method described in claim 1, wherein amplification step includes that amplification corresponds to the mutation or genetic mutation 10 to 50 locus.

14. according to the method described in claim 1, wherein amplification step includes to carry out targeting amplification by multiplex PCR.

15. according to the method for claim 14, wherein the method is further included designed in the tumor sample In recognize the mutation or genetic mutation targeting PCR test.

16. according to the method described in claim 1, wherein sequencing steps include high-flux sequence.

17. according to the method described in claim 1, wherein sequencing steps include next-generation sequencing.

18. according to the method described in claim 1, wherein the mutation or genetic mutation are mutation or the gene of tumour-specific Variation.

19. according to the method described in claim 1, wherein the method is further included and is detected from the Cell-free DNA To the mutation or genetic mutation in detect the recurrence and/or transfer of the cancer.

20. according to the method for claim 19, wherein the cancer is colorectal cancer, lung cancer, bladder cancer or breast cancer.