WO2018160907A1 - Extraction of nucleic acids for reduced probe count variability - Google Patents

Extraction of nucleic acids for reduced probe count variability Download PDF

Info

Publication number
WO2018160907A1
WO2018160907A1 PCT/US2018/020553 US2018020553W WO2018160907A1 WO 2018160907 A1 WO2018160907 A1 WO 2018160907A1 US 2018020553 W US2018020553 W US 2018020553W WO 2018160907 A1 WO2018160907 A1 WO 2018160907A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
lysis buffer
detergent
buffer
probe
Prior art date
Application number
PCT/US2018/020553
Other languages
French (fr)
Inventor
James F. CREGG
Karina Liang WONG
Nathan Daniel COPELAND
Original Assignee
Counsyl, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Counsyl, Inc. filed Critical Counsyl, Inc.
Publication of WO2018160907A1 publication Critical patent/WO2018160907A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Definitions

  • the present disclosure relates generally to methods and compositions for extracting nucleic acids from a biological sample, and more particularly to methods and compositions for extracting nucleic acids from a biological sample so that probe counts determined from the extracted nucleic acids have reduced probe count variability.
  • DTS Direct Targeted Sequencing
  • a sequencing substrate i.e., a flow cell
  • the DTS protocol modifies the sequencing surface to capture genomic DNA from a specially prepared library.
  • the captured library is then sequenced as a normal gDNA library would be, and the captured and sequenced DNA can be used for a variety of diagnostic applications.
  • Recent improvements to the DTS system involve the sequencing of specific target sequences in a sample. For example, barcode-tagged polynucleotides are sequenced simultaneously and sample sources are identified on the basis of barcode sequences. The sequencing data can then be used, for example, to determine one or more genotypes at one or more loci comprising a causal genetic variant. For example, a copy number variant of a gene can arise when a subject has more or less than two copies of a gene. [005] To determine a copy number variant, the DTS system relies on determining interactions between capture probes used in the DTS system and the DNA fragments with which the probes interact. Determining such interactions, however, is complicated by the fact that the biological samples inherently contain a variety of cellular contaminants.
  • a method for reducing probe count variability of a biological sample includes, for example, contacting a biological sample, such as a saliva sample, with a lysis buffer to form an extraction solution.
  • the lysis buffer includes a detergent, and in certain aspects a metal chelator.
  • the detergent is a non-ionic detergent, such as Triton X100.
  • the detergent is an anionic detergent, such as sodium dodecyl sulfate (SDS).
  • the buffer of the lysis buffer is a Tris buffer.
  • the lysis buffer includes 25.0 mM Triton X-100, 2.5 mM Tris- HCL, and 0.025 mM EDTA.
  • the extraction solution formed by contacting the biological fluid sample with the lysis buffer has a ratio of about 1 : 12 (v/v) of lysis buffer to biological fluid sample.
  • the method includes isolating polynucleotide fragments from the extraction solution.
  • the polynucleotide fragments are then contacted with multiple homologous capture probes.
  • the homologous capture probes for example, bind to at least a portion of the polynucleotide fragments.
  • a probe count for the multiple homologous capture probes is then determined, the probe count providing an indication of the number of polynucleotide fragments that bind the multiple homologous capture probes.
  • a composition for reducing probe count variability of a biological sample includes, for example, a detergent, a buffer, and a metal chelator.
  • the detergent is a non-ionic detergent, such as Triton X100.
  • the detergent is an anionic detergent, such as SDS.
  • the buffer is a Tris buffer.
  • the metal chelator is EDTA.
  • the lysis buffer composition includes 25 mM Triton X100, 0.02 mM EDTA, and 2.0 mM TRIS.
  • Figure 1 is an illustration depicting probe counts leading to copy number variant (CNV) calls after normalization for multiple patient sample runs on a single flow cell of a direct targeted sequencing (DTS) system, in accordance with certain example embodiments.
  • CNV copy number variant
  • Figure 2 is a graph showing aggregation of multiple probe counts of a single patient blood sample that was subjected to pretreatment with a lysis buffer, in accordance with certain example embodiments. As shown, tight banding of normalized probe counts occurs around 2 copies of the gene of interest when the blood sample is pretreated with lysis buffer.
  • Figure 3 is a graph showing aggregation of multiple probe counts of a single patient saliva sample processed without lysis buffer pretreatment, in accordance with certain example embodiments. As shown, the normalized probe counts are highly variable. This probe count variability is particularly evident when compared to the graph shown in Figure 2.
  • Figure 4 is a graph showing identification of a 1 -copy CNV of the BRCA1 gene, as determined by aggregating, normalizing, and then comparing multiple probe counts of a single patient blood that was pretreated with lysis buffer, in accordance with certain example embodiments. As shown, tight binding occurs due to the lack of probe count variability. Notably, the BRCA1 copy number variant is easily discernable.
  • nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
  • Ranges or values can be expressed herein as from “about” one particular value, and/or to "about” another particular value. When such a range is expressed, another aspect includes from the one particular value of the range and/or to the other particular value of the range. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. Similarly, when values are expressed as approximations, by use of the antecedent "about,” it will be understood that the particular value forms another aspect. In certain example embodiments, the term “about” is understood as within a range of normal tolerance in the art, for example within 2 standard deviations of the mean.
  • amplification or “amplify” refer to any process by which the copy number of a polynucleotide sequence is increased.
  • Methods for primer-directed amplification of polynucleotides are known in the art, and include without limitation, methods based on the polymerase chain reaction (PCR).
  • Conditions favorable to the amplification of polynucleotide sequences by PCR are known in the art, can be optimized at a variety of steps in the process, and depend on characteristics of elements in the reaction, such as polynucleotide type, concentration, sequence length to be amplified, sequence of the polynucleotide and/or one or more primers, primer length, primer concentration, polymerase used, reaction volume, ratio of one or more elements to one or more other elements, and others, some or all of which can be altered.
  • PCR involves the steps of denaturation of the polynucleotide to be amplified (if double stranded), hybridization of one or more primers to the polynucleotide, and extension of the primers by a DNA polymerase, with the steps repeated (or "cycled") in order to amplify the polynucleotide sequence.
  • Steps in this process can be optimized for various outcomes, such as to enhance yield, decrease the formation of spurious products, and/or increase or decrease specificity of primer annealing.
  • an amplification reaction comprises at least 5, 10, 15, 20, 25, 30, 35, 50, or more cycles. In some example embodiments, an amplification reaction comprises no more than 5, 10, 15, 20, 25, 35, 50, or more cycles. Cycles can contain any number of steps, such as 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10 or more steps. Steps can include any temperature or gradient of temperatures, suitable for achieving the purpose of the given step, including but not limited to, strand denaturation, primer annealing, and primer extension.
  • Steps can be of any duration, including but not limited to about, less than about, or more than about 1 , 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 120, 180, 240, 300, 360, 420, 480, 540, 600, or more seconds, including indefinitely until manually interrupted. Cycles of any number comprising different steps can be combined in any order.
  • biological sample refers to a sample obtained from a subject, including a sample of biological tissue orfluid origin obtained in vivo or in vitro. Such samples can be from, without limitation, body fluids, organs, tissues, fractions, and cells isolated from a biological subject. Biological samples can also include extracts from a biological sample, such as for example an extract from a biological fluid (e.g., blood or urine). Samples can be obtained from a subject, such as a cell sample, tissue sample, fluid sample, or organ sample derived therefrom (or cell cultures derived from any of these), including, for example, cultured cell lines, biopsy, blood sample, cheek swab, or fluid sample containing a cell (e.g. saliva).
  • a biological fluid e.g., blood or urine
  • the sample includes genomic DNA.
  • samples include mitochondrial DNA, chloroplast DNA, plasmid DNA, bacterial artificial chromosomes, yeast artificial chromosomes, oligonucleotide tags, or combinations thereof.
  • a biological fluid or “biological fluid sample” refers to any bodily fluids (e.g., blood, blood plasma, sputum, lavage fluid, cerebrospinal fluid, urine, semen, sweat, tears, saliva, and the like, etc.), as well as solid tissues that have, at least in part, been converted to a fluid form through one or more known protocols or for which a fluid has been extracted.
  • a liquid tissue extract such as from a biopsy, can be a biological fluid sample.
  • a biological fluid sample is a saliva sample collected from a subject.
  • the biological fluid sample is a blood sample collected from a subject.
  • blood As used herein, the terms "blood,” “plasma” and “serum” include fractions or processed portions thereof. Similarly, where a sample is taken from a biopsy, swab, smear, etc., the “sample” encompasses a processed fraction or portion derived from the biopsy, swab, smear, etc.
  • a "chaotropic agent” refers generally to substances that, without being bound by any particular theory, are thought to disrupt the three dimensional hydrate shell structure of water. Chaotropic agents are understood to interfere with stabilizing intra-molecular interactions mediated by non-covalent forces, such as hydrogen bonds, Van der Waals forces, and/or hydrophobic effects. With regard to DNA, chaotropic agents are believed to disrupt the stabilizing hydrate shell that forms around DNA in an aqueous solution. Some inorganic, organic, and/or mixed salts can have chaotropic properties. Without wishing to be bound by any particular theory, such salts are thought to act, for example, by shielding charges and preventing the stabilization of salt bridges.
  • Example chaotropic agents include guanidinium salts generally, guanidinium isothiocyanate, guanidinium chloride, urea, alkali salts, and sodium dodecyl sulfate.
  • a "copy number variant” refers to a variation in the number of copies of a particular gene in a genome.
  • the genome generally has two copies of most genes - one inherited paternally and the other inherited maternally.
  • alterations in parental chromosomes can lead to the gain or a loss of a copy of the gene.
  • a deletion can occur, for example, when a fragment of DNA is lost, such as during copying, or when the genes shuffle during meiosis.
  • a duplication can occur when a copy of an additional gene is gained.
  • deletions and duplications of greater than about 1 ,000 nucleotides are considered copy number variants, although the present disclosure is not intended to be constrained by this value.
  • a difference in the copy number of a gene can increase or decrease the level of that gene's activity. For example, when a copy of a gene is deleted, the cell may produce half as much protein from the gene as compared to a normal cell. Many disease states are associated with changes in gene copy number.
  • a "DNA-binding particle” refers to any conventional solid-phase material that interacts with, or that has been modified to interact with, a DNA fragment.
  • the solid-phase phase material for example, is any type of an insoluble, usually rigid material, matrix or stationary phase material that interacts with a DNA, either directly or indirectly.
  • the DNA-binding particle is a bead.
  • a "bead” refers to a solid-phase particle of any convenient size, and can have an irregular or regular shape.
  • the surface of the bead is modified to bind nucleic acids, either directly and/or indirectly.
  • the bead can include silanol groups, carboxylic groups, or other groups that facilitate the direct and/or interaction of the bead with DNA.
  • silica beads (and gels) can be functionalized by adding primary amines, thiols, sulfhydryls, propyl, octyl, as well as other derivatives to the hydroxyl group (silanol) attached to silica.
  • the bead can fabricated from any number of known materials, including cellulose, cellulose derivatives, acrylic resins, glass, silica gels, polystyrene, gelatin, polyvinyl pyrrolidone, co-polymers of vinyl and acrylamide, polystyrene cross-linked with divinylbenzene, or the like, polyacrylamides, latex gels, polystyrene, dextran, rubber, silicon, plastics, nitrocellulose, natural sponges, silica gels, controlled pore glass (CPG), metals, cross-linked dextrans (e.g., Sephadex®), agarose gel (Sepharose®), and other solid phase bead supports known to those of skill in the art.
  • the beads can be packed together so as to form a column that can be used with conventional column chromatography.
  • the beads are magnetic beads, and more particularly paramagnetic beads, meaning that the beads are only magnetic in the presence of magnetic field.
  • magnetic particles can include an iron-oxide core coated with silane.
  • Magnetic particles useful for magnetic DNA purification can be made from synthetic polymers, porous glass, or metallic materials like iron-oxide.
  • the particles can be coated with functional groups or, in certain examples, can be left uncoated. While coated particles bound with carboxylic acid are more efficient at binding DNA, other molecules such as streptavidin or those containing free thiol groups can also be attached to the silane coat.
  • any high yield magnetic particles that do not require a coating may be desired, as the lack of a coating and functional groups can allow for a higher surface area for binding nucleic acid. Additionally, particles without a coating are more responsive to an applied electric field.
  • magnetic beads include, for example, silica-based magnetic beads or carboxylated magnetic beads.
  • silica-based magnetic beads include, for example, DynabeadsTM MyOneTM Silane beads, available from ThermoFisher ScientificTM.
  • the term “elution” or “eluting” refers generally to the process of extracting one material from another by washing with a solvent to remove adsorbed material from an adsorbent. In certain example embodiments, elution is used to remove DNA that is bound directly and/or indirectly to a DNA-binding particle. The eluate is the product that results from the elution process.
  • the term "genetic marker” refers generally to any gene or short genetic sequence that is known or understood to be associated with a disease condition of a subject.
  • the genetic marker can be a variation (which may arise due to mutation or alteration in the genomic loci) that can be observed.
  • a genetic marker can be a short DNA sequence, such as a sequence surrounding a single base-pair change (single nucleotide polymorphism, SNP), or a long one, like minisatellites.
  • SNP single nucleotide polymorphism
  • genetic marker for example, can be the presence or absence of a gene or short genetic sequence, which may provide an indication of a disease state.
  • polynucleotide As used herein, the terms “polynucleotide,” “nucleotide,” “nucleotide sequence,” “nucleic acid,” and “oligonucleotide” are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three dimensional structure, and may perform any function, known or unknown.
  • polynucleotides coding or non-coding regions of a gene or gene fragment, intergenic DNA, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), small nucleolar RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, adapters, and primers.
  • loci locus
  • mRNA messenger RNA
  • transfer RNA transfer RNA
  • ribosomal RNA short interfering RNA
  • shRNA short-hairpin RNA
  • miRNA micro-RNA
  • small nucleolar RNA ribozymes
  • cDNA recombinant polynucleo
  • a polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component, tag, reactive moiety, or binding partner. Polynucleotide sequences, when provided, are listed in the 5' to 3' direction, unless stated otherwise.
  • the terms “isolate” and “purify” are used interchangeably and mean to reduce by about 1 %, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%, or more, the amount of heterogenous elements, for example biological macromolecules DNA, that may be present in a biological sample.
  • the presence of DNA and other nucleic acids can be assayed by any appropriate method including gel electrophoresis and staining and/or assays employing polymerase chain reaction.
  • target polynucleotide refers to a nucleic acid molecule or polynucleotide in a population of nucleic acid molecules having a target sequence to which one or more oligonucleotides, such as the capture probes described herein, are designed to hybridize.
  • a target sequence uniquely identifies a sequence derived from a sample, such as a particular genomic, mitochondrial, bacterial, viral, or RNA (e.g. mRNA, miRNA, primary miRNA, or pre-miRNA) sequence.
  • a target sequence is a common sequence shared by multiple different target polynucleotides, such as a common adapter sequence joined to different target polynucleotides.
  • “Target polynucleotide” may also be used to refer to a double-stranded nucleic acid molecule comprising a target sequence on one or both strands, or a single-stranded nucleic acid molecule comprising a target sequence, and may be derived from any source of or process for isolating or generating nucleic acid molecules.
  • a target polynucleotide may comprise one or more (e.g. 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) target sequences, which may be the same or different.
  • different target polynucleotides comprise different sequences, such as one or more different nucleotides or one or more different target sequences.
  • hybridization and “annealing” refer to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues.
  • the hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner.
  • the complex may comprise two strands forming a duplex structure, three or more strands forming a multi stranded complex, a single self- hybridizing strand, or any combination of these.
  • a hybridization reaction may constitute a step in a more extensive process, such as the initiation of a PCR, or the enzymatic cleavage of a polynucleotide by a ribozyme.
  • a first sequence that can be stabilized via hydrogen bonding with the bases of the nucleotide residues of a second sequence is said to be "hybridizable" to the second sequence.
  • the second sequence can also be said to be hybridizable to the first sequence.
  • a "complement" of a given sequence is a sequence that is fully complementary to and hybridizable to the given sequence.
  • a first sequence that is hybridizable to a second sequence or set of second sequences is specifically or selectively hybridizable to the second sequence or set of second sequences, such that hybridization to the second sequence or set of second sequences is preferred (e.g. thermodynamically more stable under a given set of conditions, such as stringent conditions commonly used in the art) to hybridization with non-target sequences during a hybridization reaction.
  • hybridizable sequences share a degree of sequence complementarity over all or a portion of their respective lengths, such as between 25%-100% complementarity, including at least about 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, and 100% sequence complementarity.
  • hybridized refers to a polynucleotide in a complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues.
  • the hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner.
  • the complex may comprise two strands forming a duplex structure, three or more strands forming a multi-stranded complex, a single self-hybridizing strand, or any combination of these.
  • the hybridization reaction may constitute a step in a more extensive process, such as the initiation of a PCR reaction, ligation reaction, sequencing reaction, or cleavage reaction.
  • the term "homologous” denotes a characteristic of a nucleic acid sequence, wherein a nucleic acid sequence has at least about 60 percent sequence identity as compared to a reference sequence, typically at least about 75 percent sequence identity, and preferably at least about 95 percent sequence identity as compared to a reference sequence. The percentage of sequence identity is calculated excluding small deletions or additions which total less than 25 percent of the reference sequence.
  • the reference sequence may be a subset of a larger sequence, such as a portion of a gene or flanking sequence, or a repetitive portion of a chromosome.
  • the reference sequence is at least 12-18 nucleotides long, typically at least about 30 nucleotides long, and preferably at least about 50 to 100 nucleotides long.
  • recombination efficiency increases with the length of the targeting polynucleotide portion that is substantially complementary to a reference sequence present in the target DNA.
  • capture probe refers to an oligonucleotide sequence that is hybridizable to a target polynucleotide.
  • a capture probe can be used to hybridize with a target polynucleotide fragment and then localize that fragment to the substrate of the DTS system, either directly or indirectly.
  • capture probes are "homologous capture probes" when they share sequence identity with each other and hence target the same target polynucleotide sequence.
  • homologous capture probes may be about 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to each other.
  • heterologous capture probes are probes that share less sequence identity and hence target different target polynucleotide sequences.
  • a set of capture probes may include a subset of capture probes that are homologous to each other but that are heterologous to other subsets of homologous capture probes in the set of capture probes.
  • a "probe count” refers generally to the collective number of target polynucleotide fragments that a plurality of homologous capture probes hybridizes to and captures when contacted with fragments of a genomic library of a subject, as determined, for example, by a sequencing reaction. For example, if a plurality of homologous probes is mixed with a fragmented genomic library of a subject, direct targeting sequencing may determine that the plurality of homologous capture probes binds 500 target polynucleotide fragments. In this example, the probe count for the plurality of homologous capture probes is 500.
  • a different plurality of homologous probes may bind 300 target polynucleotide fragments from the same subject.
  • the probe count is 300.
  • the each of the probes of the plurality of homologous probes are identical to each other, i.e., they have the same sequence and are hence directed to the same target polynucleotide.
  • fragmented polynucleotides are captured, bound to a substrate of the flow cell, and imaged via an lllumina sequencing system.
  • the probe count can be used to identify a copy number variant.
  • a "subject” refers to an animal, including a vertebrate animal.
  • the vertebrate can be a mammal, for example, a human.
  • the subject can be a human patient.
  • a subject can be a "patient,” for example, such as a humane or veterinary patient suffering from or suspected of suffering from a disease or condition and can be in need of treatment or diagnosis or can be in need of monitoring for the progression of the disease or condition.
  • the patient can also be in on a treatment therapy that needs to be monitored for efficacy.
  • a mammal refers to any animal classified as a mammal, including, for example, humans, chimpanzees, domestic and farm animals, as well as zoo, sports, or pet animals, such as dogs, cats, cattle, rabbits, horses, sheep, pigs, and so on.
  • a "detergent” refers generally to a surfactant or a mixture of surfactants.
  • the surfactant molecule for example, is an amphipathic molecule that contains both hydrophobic and hydrophilic groups.
  • the surfactant molecules generally contain a polar, hydrophilic group (head) at the end of a long hydrophobic carbon chain (tail).
  • non-ionic detergent refers generally to a detergent (surfactant) molecule that contains an uncharged, hydrophilic head group(s).
  • “Ionic detergents” include a hydrophobic chain and a charged headgroup that can be either anion or cation.
  • An “anionic detergent” refers generally to a detergent (surfactant) that carries a negative charge, while a "cationic detergent” carries a positive charge.
  • Detergents also include zwitterionic detergents.
  • a biological sample such as a saliva sample
  • the sample is pretreated with a lysis buffer that includes a detergent, thus forming an extraction solution.
  • Nucleic acids from the extraction solution are then used, for example, in a direct sequencing protocol, such as a direct targeting sequencing (DTS) reaction.
  • DTS direct targeting sequencing
  • the nucleic acids are isolated from the extraction solution and fragmented into polynucleotide fragments, which are then mixed with homologous capture probes.
  • the capture probes bind to targeted sequences of the polynucleotide fragments, thereby capturing the targeted polynucleotide fragments and, for example, facilitating their binding to the substrate of the DTS system. Based on binding of polynucleotides fragments to the homologous capture probes, a probe count is determined for the homologous group of capture probes. By mixing the lysis buffer with the biological sample to form the extraction solution, variability of the determined probe count is substantially reduced.
  • a biological sample is collected from a subject.
  • the biological sample can be any type of biological sample as described herein.
  • the samples are from the same subject, from different subjects, or combinations thereof.
  • a sample includes nucleic acids from a single subject.
  • a sample includes multiple nucleic acids from two or more subjects.
  • a biological sample can be collected by a variety of conventional collection methods.
  • collection can include the collection of passive drool, use of an oral swab to collect saliva, or simply having a subject, such as a human patient, expel saliva into a collection vessel.
  • Various commercial kits are also available for saliva collection.
  • a saliva sample can be collected using a commercially available saliva collection kit (e.g., OrageneTM).
  • a conventional preservative can be added to the collected sample to extend the shelf life of the sample.
  • Example preservatives include formalin, formaldehyde, alcohol, and imidazolidinyl urea. Additional examples of potential preservatives include, for example, include octadecyldimethylbenzyl ammonium chloride, hexamethonium chloride, benzalkonium chloride (a mixture of alkylbenzyldimethylammonium chlorides in which the alkyl groups are long-chain compounds), and benzethonium chloride.
  • preservatives include aromatic alcohols such as phenol, butyl and benzyl alcohol, alkyl parabens such as methyl or propyl paraben, catechol, resorcinol, cyclohexanol, 3-pentanol, and m- cresol.
  • a metal chelator can also be added to the sample before storing the sample.
  • certain biological samples contain substantial amounts of salts and hence dissociated ions, such as calcium and magnesium.
  • a chelating agent to the biological sample can reduce the salt load of the collected sample, thereby stabilizing the sample for storage.
  • Any suitable chelating agent or combination of chelating agents can be used in accordance methods described herein.
  • Specific metal chelators include, for example, ethylenediaminetetraacetic acid (EDTA), ethylene glycol-bis(P-aminoethyl ether)-N,N,N',N'-tetraacetic acid (EGTA), as well as other conventional chelators.
  • the collected biological sample is pretreated with a lysis buffer by mixing the sample with a lysis buffer, thereby forming an extraction solution.
  • the lysis buffer includes, for example, a suitable buffer and detergent.
  • the lysis buffer can also include a metal chelator.
  • the lysis buffer is prepared, for example, by combining the detergent, the buffer, and the chelator. The prepared buffer is then mixed with at least a portion of the collected biological sample to form the extraction solution.
  • the detergent of the lysis buffer can be any detergent that, when used in the pretreatment step described herein, reduces probe count variability of the sample, such as during a direct targeted sequencing analysis.
  • Non-limiting examples of the disclosed detergents include, for example, sodium dodecyl sulfate (SDS), Deoxycholate and cholate, sarcosyl or sodium lauroyi sarcosinate, the Triton family of detergents (e.g., Triton X100, Triton X1 14, Triton X102, Triton X165, Nonidet P40 [NP- 40], IgepalTM CA-630, and derivatives thereof), n-dodecyl-p-D-maltoside and other maltosides, digitonin, the Tween family of detergents (e.g., Tween 20 and Tween 80), as well as zwitterionic detergents (e.g., 3-[(3-cholamidopropyl) dimethylammonio
  • the detergent of the lysis buffer is Triton X100.
  • the concentration of the Triton X100 in the lysis buffer can be at least about 5 mM, for example, at least about 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35 mM or greater.
  • the concentration of the Triton X100 in the lysis buffer does not exceed 50, 55, 60, 65, 70, 75, 80, or 85 mM.
  • the concentration of the Triton X100 in the lysis buffer can be approximately 25 mM, for example, the concentration of the Triton X100 in the lysis buffer can be about 24.1 , 24.2, 24.3, 24.4, 24.5, 24.6, 24.7, 24.8, 24.9, 25.0, 25.1 , 25.2, 25.3, 25.4, 25.5, 25.6, 25.7, 25.8, 25.9, 26.0, 26.1 , 26.2, 26.3, 26.4, 26.5, 26.6, 26.7, 26.8, 26.9, or 27.0 mM.
  • the detergent of the lysis buffer can be sodium dodecyl sulfate (SDS).
  • SDS sodium dodecyl sulfate
  • the concentration of the SDS in the lysis buffer can be about 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 mM. In certain example embodiments, the SDS concentration of the lysis buffer may not exceed 6.5 mM. In other example embodiments, the concentration of SDS in the lysis buffer can be higher, such as about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, or 35 mM. As those skilled in the art will appreciate, however, SDS can have a negative impact on PCR, and hence high concentrations of SDS may be disadvantageous for the methods and compositions described herein.
  • the lysis buffer described herein can include a mixture of different detergents.
  • the lysis buffer can include a Triton detergent, such as Triton X100, as well as SDS.
  • the lysis buffer can include, as the detergent, Triton X100 at a concentration of about 15 to 35 mM, such as about 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35 mM or greater.
  • the concentration of the Triton X100 can be about 25mM.
  • the lysis buffer can also include SDS in a concentration of about 1 -10 mM, such as about 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 mM.
  • the SDS concentration of the lysis buffer may not exceed about 6.5 mM.
  • the concentration of SDS in the lysis can be higher, such as about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, or 35 mM.
  • the buffer can be any buffer that serves to provide the desired pH of the lysis buffer.
  • the buffer can be a histidine-buffer, citrate-buffer, succinate- buffer, acetate-buffer, gluconate buffer, or phosphate-buffer (e.g., phosphate buffered saline) or mixtures thereof.
  • the lysis buffer can include a HEPES buffer (4-(2-hydroxyethyl)-1 -piperazineethanesulfonic acid), a TRICINE buffer (N- (Tri(hydroxymethyl) methyl)glycine), a TRIS buffer (tris(hydroxy- methyl)aminomethane), a BICI NE buffer (2-(Bis(2-hydroxyethyl)amino)acetic acid), a TAPS buffer (Tris(hydroxymethyl)methylAminoPropaneSulfonic), or combinations thereof.
  • HEPES buffer (4-(2-hydroxyethyl)-1 -piperazineethanesulfonic acid)
  • TRICINE buffer N- (Tri(hydroxymethyl) methyl)glycine
  • TRIS buffer tris(hydroxy- methyl)aminomethane
  • BICI NE buffer (2-(Bis(2-hydroxyethyl)amino)acetic acid
  • TAPS buffer Tris(hydroxymethyl)methylAminoPropaneSulfonic
  • the lysis buffer is buffered to a pH of from 6 to 10.
  • the pH of the lysis buffer can be about 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10.
  • the pH can be neutral to a slightly basic pH, for example, a pH of about 7.0, 7.1 , 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1 , 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, or 9.0.
  • the lysis buffer is a Tris buffer, such as a Tris-HCI buffer, having a pH of around 8.0.
  • the disclosed Tris-HCI lysis buffer can have a pH of, for example, about 7.0, 7.1 , 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1 , 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, or 9.0.
  • the concentration of the Tris-HCI in the lysis buffer is about 2.50 mM Tris-HCI.
  • the concentration of the Tris-HCI in the lysis buffer can be about 1 .5, 1 .6, 1 .7, 1 .8, 1 .9, 2.0, 2.1 , 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1 , 3.2, 3.3, 3.4, or 3.5 mM Tris-HCI.
  • a metal chelator can also be included in the lysis buffer. Any suitable chelating agent or combination of chelating agents can be used in accordance with the methods described herein. Specific metal chelators include, for example, ethylenediaminetetraacetic acid (EDTA), ethylene glycol-bis(P-aminoethyl ether)-N,N,N',N'-tetraacetic acid (EGTA), as well as other conventional chelators. When EDTA is used as a chelating agent the concentration of the EDTA in the lysis buffer can be about 25 ⁇ . For example, the lysis buffer can have an EDTA concentration of about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 ⁇ .
  • EDTA ethylenediaminetetraacetic acid
  • EGTA ethylene glycol-bis(P-aminoethyl ether)-N,N,N',N'-tetraacetic acid
  • the lysis buffer and the biological sample can be mixed together at a ratio of about 1 : 10 lysis buffer to biological sample.
  • the ratio of lysis buffer to biological sample in the extraction solution is about 1 :5, 1 :6, 1 :7, 1 :8, 1 :9, 1 : 10, 1 : 1 1 , 1 :12, 1 : 13. 1 : 14, 1 : 15, 1 : 16, 1 : 17, 1 : 18, 1 : 1 : 19, or 1 :20 v/v.
  • the extraction solution can be incubated for about 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes.
  • the extraction solution can be incubated, for example, at or about room temperature, for example, at about 20 °C, 21 °C, 22 °C, 23 °C, 24 °C, or 25 °C.
  • the extraction solution can be warmed above room temperature, for example, at about 26 °C, 27 °C, 28 °C, 29 °C, 30 °C, 31 °C, 32 °C, 33 °C, 34 °C, 35 °C, 36 °C, 37 °C, 38 °C, 39 °C, 40 °C, 41 °C, 42 °C, 43 °C, 44 °C, 45 °C or greater for the incubation.
  • DNA from the extraction solution is immediately subjected to nucleic acid isolation without any incubation period.
  • the composition can be used to contact the biological sample before nucleic acids from the sample are isolated as described herein.
  • the composition includes a detergent and a suitable buffer.
  • the detergent of the composition can be any detergent that, when used in the pretreatment step described herein, reduces probe count variability of the sample, such as during a direct targeted sequencing analysis.
  • Non-limiting examples of the detergents of the composition include sodium dodecyl sulfate (SDS), Deoxycholate and cholate, sarcosyl or sodium lauroyl sarcosinate, the Triton family of detergents (e.g.
  • a chaotropic agent can be substituted for the detergent and/or used in combination with the detergent.
  • the detergent of the composition is Triton X100.
  • the concentration of the Triton X100 in the composition can be at least about 15 mM, for example, at least about 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35 mM or greater. In certain example embodiments, the concentration of the Triton X100 in the composition does not exceed about 50, 55, 60, 65, 70, 75, 80, or 85 mM.
  • the concentration of the Triton X100 in the composition can be approximately 25 mM, for example, the concentration of the Triton X100 in the composition can be about 24.1 , 24.2, 24.3, 24.4, 24.5, 24.6, 24.7, 24.8, 24.9, 25.0, 25.1 , 25.2, 25.3, 25.4, 25.5, 25.6, 25.7, 25.8, 25.9, 26.0, 26.1 , 26.2, 26.3, 26.4, 26.5, 26.6, 26.7, 26.8, 26.9, or 27.0 mM.
  • the detergent of the composition is sodium dodecyl sulfate (SDS).
  • SDS sodium dodecyl sulfate
  • the concentration of the SDS in the composition can be about 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 mM. In certain example embodiments, the SDS concentration of the composition may not exceed 6.5 mM. In other example embodiments, the concentration of SDS in the composition can be higher, such as about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, or 35 mM. As those skilled in the art will appreciate, however, SDS can have a negative impact on PCR, and hence high concentrations of SDS may be disadvantageous when used with methods and compositions described herein.
  • the composition can include a mixture of different detergents.
  • the composition can include a Triton detergent, such as Triton X100, as well as SDS.
  • the composition can include, as the detergent, Triton X100 at a concentration of about 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35 mM or greater, such as about 25mM.
  • the composition can also include SDS in about 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 mM.
  • the SDS concentration of the composition may not exceed 6.5 mM.
  • the concentration of SDS in the composition can be higher, such as about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, or 35 mM.
  • the buffer component of the composition can be any buffer that serves to provide the desired pH of the composition.
  • the buffer can be a histidine-buffer, citrate- buffer, succinate-buffer, acetate-buffer, gluconate buffer, or phosphate-buffer (e.g., phosphate buffered saline) or mixtures thereof.
  • the buffer of the composition can include a HEPES buffer (4-(2-hydroxyethyl)-1 -piperazineethanesulfonic acid), a TRICINE buffer (N-(Tri(hydroxymethyl) methyl)glycine), a TRIS buffer (tris(hydroxymethyl) aminomethane), a BICINE buffer (2-(Bis(2-hydroxyethyl)amino)acetic acid), a TAPS buffer (Tris(hydroxymethyl)methylAminoPropaneSulfonic), or combinations thereof.
  • the composition is buffered to a pH of from about 6 to 10.
  • the pH of the composition can be about 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10.
  • the pH can be neutral to a slightly basic pH, for example, a pH of about 7.0, 7.1 , 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1 , 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, or 9.0.
  • the buffer is a Tris buffer, such as a Ths-HCI buffer, having a pH of around 8.0.
  • the disclosed Ths- HCI composition can have a pH of, for example, about 7.0, 7.1 , 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1 , 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, or 9.0.
  • the buffer of the composition is a Tris-HCI buffer
  • the concentration of the Tris-HCI in the composition is about 2.50 mM Tris-HCI.
  • the concentration of the Tris-HCI in the composition can be about 1 .5, 1 .6, 1 .7, 1 .8, 1 .9, 2.0, 2.1 , 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1 , 3.2, 3.3, 3.4, or 3.5 mM Tris-HCI.
  • the composition can include a metal chelator.
  • a metal chelator Any suitable chelating agent or combination of chelating agents can be used in accordance with the methods described herein.
  • Specific metal chelators include, for example, ethylenediaminetetraacetic acid (EDTA), ethylene glycol-bis( - aminoethyl ether)-N,N,N',N'-tetraacetic acid (EGTA), as well as other conventional chelators.
  • EDTA ethylenediaminetetraacetic acid
  • EGTA ethylene glycol-bis( - aminoethyl ether)-N,N,N',N'-tetraacetic acid
  • concentration of the EDTA in the composition can be about 25 ⁇ .
  • the composition can have an EDTA concentration of about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 ⁇ .
  • the composition described herein may be included as part of a kit.
  • the kit may include, for example, instructions for using the composition to pretreat a biological sample according to the methods described herein.
  • the kit may include a composition having about 25.0 mM Triton X-100, 2.50 mM Ths-HCL, and 0.025 mM EDTA.
  • the kit may include individual components, such as Triton X-100, Ths-HCL, and EDTA, with instructions on how to prepare the composition as described herein.
  • nucleic acids are isolated from the extraction solution.
  • the nucleic acids can be isolated by a variety of conventional nucleic acids isolation methods and techniques.
  • Potential extraction and isolation methods include, for example, organic extraction, the use of DNA-binding particles, such as silica-based isolation technology, magnetic separation, anion exchange technology, and others (e.g., salting out, cesium chloride density gradients, and chelex 100 resin based methods).
  • the extraction solution can be subjected to organic extraction to recover the nucleic acids from the extraction solution.
  • proteins within the extraction solution can be denatured and digested using a conventional protease.
  • the proteins can thereafter be precipitated with organic solvents such as phenol, phenol/chloroform/isoamyl alcohol, or similar formulations, including TRIzol and TriReagent.
  • organic solvents such as phenol, phenol/chloroform/isoamyl alcohol, or similar formulations, including TRIzol and TriReagent.
  • the protein precipitate can then be removed by centrifugation.
  • Purified nucleic acids can then be recovered, for example, via precipitation using ethanol, isopropanol, or other alcohol.
  • extraction techniques include: (1 ) organic extraction followed by ethanol precipitation, e.g., using a phenol/chloroform organic reagent (Ausubel et al., 1993), with or without the use of an automated nucleic acid extractor, e.g., the Model 341 DNA Extractor available from Applied Biosystems (Foster City, Calif.); (2) stationary phase adsorption methods (U.S. Pat. No. 5,234,809; Walsh et al., 1991 ); and (3) salt-induced nucleic acid precipitation methods (Miller et al., (1988), such precipitation methods being typically referred to as "salting-out" methods.
  • an automated nucleic acid extractor e.g., the Model 341 DNA Extractor available from Applied Biosystems (Foster City, Calif.
  • stationary phase adsorption methods U.S. Pat. No. 5,234,809; Walsh et al., 1991
  • salt-induced nucleic acid precipitation methods Milliller et
  • silica-based nucleic acid isolation methods can be used to isolate nucleic acids from the extraction solution.
  • DNA adsorbs specifically to silica membrane/beads/particles in the presence of certain salts and at a particular pH.
  • the DNA binds to the silica membrane/beads/particles and cellular contaminants can be removed by one or more wash steps.
  • DNA can then be eluted using a low salt buffer or elution buffer.
  • chaotropic salts can be included to aid in protein denaturation and extraction of DNA.
  • Silica-based isolation kits include PureLinkTM Genomic DNA extraction kit (InvitrogenTM) and DNeasy Blood and Tissue Kit (QiagenTM).
  • nucleic acid isolation and/or purification technique involves the use of magnetic particles to which nucleic acids can specifically or non-specifically bind, followed by isolation of the beads using a magnet, and washing and eluting the nucleic acids from the beads (see e.g. U.S. Pat. No. 5,705,628).
  • magnetic particle purification methods rely on reversible binding of DNA to a magnetic solid surface/bead/particles, which has been coated with a DNA binding antibody or functional group that interacts specifically with DNA.
  • the beads are silanol magnetic beads.
  • the beads with the DNA bound thereto can be separated from cellular contaminants by applying a magnetic field to the beads and removing the beads from the solution.
  • the bound DNA can then be eluted from the beads, such as with an alcoholic solution (e.g., ethanol).
  • the DNA can be eluted using a Tris/EDTA buffer.
  • the above isolation methods may be preceded by an enzyme digestion step to help eliminate unwanted protein from the sample, e.g., digestion with proteinase K, or other like proteases. See, e.g., U.S. Pat. No. 7,001 ,724.
  • RNase inhibitors may be added to the lysis buffer.
  • Purification methods may be directed to isolate DNA, RNA, or both. When both DNA and RNA are isolated together during or subsequent to an extraction procedure, further steps may be employed to purify one or both separately from the other.
  • Sub-fractions of extracted nucleic acids can also be generated, for example, purification by size, sequence, or other physical or chemical characteristic.
  • purification of nucleic acids can be performed after any step in the methods of the invention, such as to remove excess or unwanted reagents, reactants, or products.
  • Methods for determining the amount and/or purity of nucleic acids in a sample include absorbance (e.g. absorbance of light at 260 nm, 280 nm, and a ratio of these) and detection of a label (e.g. fluorescent dyes and intercalating agents, such as SYBR green, SYBR blue, DAPI, propidium iodine, Hoechst stain, SYBR gold, ethidium bromide).
  • kits for nucleic acid isolation are also available.
  • Potential kits for isolating DNA from the extraction solution include, for example, AccuPrepTM Genomic DNA Extraction Kit (Bioneer), ArcturusTM PicoPure® DNA Extraction Kit (Invitrogen), GFX Genomic Blood DNA Purification Kit (GE Healthcare), QIAampTM DNA mini kit (QiagenTM), AllPrep DNA/RNA Mini Kit (QiagenTM), GentraTM Puregene Blood Kit (QiagenTM), AgencourtTM DNAdvance Kit (Beckman CoulterTM), and InnuPrepTM DNA minikit (AJ Innuscreen).
  • Examples of magnetic bead extraction systems include Agencourt DNAdvance Kit (Beckman CoulterTM) and Magnetic Beads Genomic DNA Extraction Kit (GeneaidTM).
  • the isolated nucleic acids are used, for example, to prepare a genomic DNA library associated with the subject from which the biological sample was obtained.
  • the isolated nucleic acids are fragmented into multiple polynucleotide fragments. Fragmentation may be accomplished by a variety of methods known in the art, including chemical, enzymatic, and mechanical fragmentation.
  • the nucleic acids can be fragmented via acoustic shearing, Adaptive Focused AcousticsTM (AFA), nebulization, sonication, needle or high-pressure shearing, point-sink shearing, chemical fragmentation, or via the use of enzyme-based treatments (i.e., digitation with restriction enzymes), or combinations thereof.
  • AFA Adaptive Focused AcousticsTM
  • nebulization nebulization
  • sonication sonication
  • needle or high-pressure shearing point-sink shearing
  • chemical fragmentation or via the use of enzyme-based treatments (i.e., digitation with restriction enzymes), or combinations thereof.
  • enzyme-based treatments i.e., digitation with restriction enzymes
  • the isolated nucleic acids are fragmented into a population of fragmented polynucleotide fragments of one or more specific size range(s).
  • the amount of sample polynucleotides subjected to fragmentation is about, less than about, or more than about 50 ng, 100 ng, 200 ng, 300 ng, 400 ng, 500 ng, 600 ng, 700 ng, 800 ng, 900 ng, 1000 ng, 1500 ng, 2000 ng, 2500 ng, 5000 ng, ⁇ g, or more.
  • fragments are generated from about, less than about, or more than about 1 , 10, 100, 1000, 10000, 100000, 300000, 500000, or more genome-equivalents of starting DNA.
  • the fragments have an average or median length from about 10 to about 10,000 nucleotides.
  • the fragments have an average or median length from about 50 to about 2,000 nucleotides.
  • the fragments have an average or median length of about, less than about, more than about, or between about 100-2500, 200-1000, 10-800, 10-500, 50-500, 50-250, or 50- 150 nucleotides.
  • the fragments have an average or median length of about, less than about, or more than about 200, 300, 500, 600, 800, 1000, 1500 or more nucleotides.
  • the fragmentation is accomplished mechanically comprising subjecting sample polynucleotides to acoustic sonication.
  • the fragmentation comprises treating the sample polynucleotides with one or more enzymes under conditions suitable for the one or more enzymes to generate double-stranded nucleic acid breaks. Examples of enzymes useful in the generation of polynucleotide fragments include sequence specific and non-sequence specific nucleases.
  • Non-limiting examples of nucleases include DNase I, Fragmentase, restriction endonucleases, variants thereof, and combinations thereof.
  • digestion with DNase I can induce random double- stranded breaks in DNA in the absence of Mg++ and in the presence of Mn++.
  • fragmentation comprises treating the sample polynucleotides with one or more restriction endonucleases. Fragmentation can produce fragments having 5' overhangs, 3' overhangs, blunt ends, or a combination thereof.
  • fragmentation comprises the use of one or more restriction endonucleases, cleavage of sample polynucleotides leaves overhangs having a predictable sequence.
  • the method includes the step of size selecting the fragments via standard methods such as column purification or isolation from an agarose gel. In some embodiments, the method comprises determining the average and/or median fragment length after fragmentation. In some embodiments, samples having an average and/or median fragment length above a desired threshold are again subjected to fragmentation. In certain example embodiments, samples having an average and/or median fragment length below a desired threshold are discarded.
  • the polynucleotide fragments can be modified as described in U.S. Pat. Pub. 2014/0162278, titled "Methods and compositions for enrichment of target polynucleotides," which is hereby expressly incorporated herein by reference in its entirety.
  • the isolated polynucleotide fragments may be modified to include one or more adapter oligonucleotides, the adapter oligonucleotides including one or more of a variety of different sequence elements that can be joined to the polynucleotide fragments (see U.S. Pat. Pub. 2014/0162278).
  • the adapter oligonucleotides joined to fragmented polynucleotides from one sample include one or more sequences common to all adapter oligonucleotides and a "barcode" sequence that is unique to the adapters joined to polynucleotides of that particular sample (see U.S. Pat. Pub. 2014/0162278).
  • the barcode sequence for example, can be used to distinguish polynucleotides originating from one sample or adapter joining reaction from polynucleotides originating from another sample or another adapter joining reaction.
  • the fragmented polynucleotide sequences of a given sample are modified to include a unique nucleic acid sequence (i.e., the barcode) so that the fragments including the modification can later be traced back to the biological sample from which the polynucleotides originated (see U.S. Pat. Pub. 2014/0162278).
  • a unique nucleic acid sequence i.e., the barcode
  • the adapted polynucleotide fragments are subjected to an amplification reaction that amplifies the fragmented polynucleotides.
  • the amplification relies on, for example, primers that include a barcode associated with the sample (see U.S. Pat. Pub. 2014/0162278).
  • the amplified product includes the barcode sequence unique to the sample, such that the sample can be subsequently identified.
  • the amplification primers may be of any suitable length, such as about, less than about, or more than about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, or more nucleotides, any portion or all of which may be complementary to the corresponding target sequence to which the primer hybridizes (e.g. about, less than about, or more than about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or more nucleotides) (see U.S. Pat. Pub. 2014/0162278).
  • the amplified polynucleotide fragments are mixed with a set of capture probes for sequencing.
  • the capture probes include sequences that are hybridizable to target polynucleotides within the fragmented polynucleotide fragments.
  • the capture probes can include any other sequences that are needed for the sequencing reaction. For example, when the sequencing is via the llluminaTM sequencing system, the capture probes can include all of the sequence elements and features needed for the llluminaTM sequencing system.
  • the capture probes are homologous to each other, and hence are hybridizable to the same sequence.
  • the homologous capture probes thus, in certain example embodiments, target the same target polynucleotide fragments within a population of fragmented polynucleotides.
  • the capture probes include subsets of probes that are homologues to each other but that are different from other subsets of capture probes within the capture probe mixture.
  • one subset of capture probes targets a first group of the same polynucleotide fragments whereas the other subset of capture probes targets a different, second group of the sample polynucleotide fragments.
  • the collective set of capture probes can be used to target a variety of different target polynucleotide fragments.
  • the probes can be any length suitable for capturing a target polynucleotide fragment in a sequencing reaction such as a direct targeting sequencing reaction.
  • the probes are about 85, 86, 87, 88, 89, 90, 91 , 92, 93, 94, 95, 96, 97, 98, 99, 100, 101 , 102, 103, 104, 105, 106, 107, 108, 109, 1 10, 1 1 1 1 , 1 12, 1 13, 1 14, or 1 15 base pairs in length.
  • the probe length is close to 101 base pairs, such as 99, 100, 101 , 102, 103 base pairs in length.
  • the average length of the capture probes in the set of capture proves is about 100-102 base pairs.
  • the region of a capture probe that is hybridizable to a given polynucleotide fragment can be any size that results in capture of the polynucleotide fragment.
  • the region of the capture probe that is hybridizable to a given polynucleotide fragment is about 40 base pairs in length, such as about 35, 36, 37. 38, 39, 40, 41 , 42, 43, 44, or 45 base pairs in length.
  • the average length of the region of the capture probe that is hybridizable to a given polynucleotide fragment is about 39 base pairs.
  • a variety of methods can be used to hybridize a set of capture probes to the polynucleotide fragments. For example, the concentration of polynucleotide fragments in the sample can be determined, and the amount of probes incubated with the polynucleotide fragments can be based on the determined concentration of the polynucleotide fragments. For example, the capture probes can be added in a molar excess of the polynucleotide fragments so as to saturate the polynucleotide fragments.
  • the capture probes can then be incubated with the polynucleotide fragments for a suitable amount of time and at a suitable temperature so that the capture probes hybridize to their target polynucleotide fragments.
  • the incubation is at ambient room temperature, such as about 20°C, 21 °C, 22°C, 23°C, 24°C, or 25°C.
  • the reaction solution can be warmed above room temperature, such as to 26°C, 27°C, 28°C, 29°C, 30°C, 31 °C, 32°C, 33°C, 34°C, 35°C, 36°C, 37°C, 38°C, 39°C, 40°C, 41 °C, 42°C, 43°C, 44°C, 45°C, 46°C, 47°C, 48°C, 49°C, 50°C, 51 °C, 52°C, 53°C, 54°C, 55°C, 56°C, 57°C, 58°C, 59°C, 60°C, 61 °C, 62°C, 63°C, 64°C, 65°C, 66°C, 67°C, 68°C, 69°C, 70°C, 71 °C, 72°C, 73°C, 74°C, 75°C, 76°C, 77°C, 78°C, 79
  • the incubation is closer to 65°C, such as about 62°C, 63°C, 64°C, 65°C, 66°C, 67°C.
  • the incubation time can also be varied.
  • the incubation time can in minutes, such as about 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20, 25, 30, 45, 50, 55, or 60 min.
  • the incubation can be hours, such as about 1 .0, 1 .5, 2.0, 2.5, or 3.0 hours.
  • the incubation is at about 62°C, 63°C, 64°C, 65°C, 66°C, 67°C for about 1 .5-2.5 hours.
  • sequencing may be performed according to any method of sequencing known in the art, including those described at length in U.S. Pat. Pub. 2014/0162278.
  • One example sequencing system is the lllumina Genome Analyzer System, which is based on technology described in at least WO 98/44151 , hereby incorporated by reference in its entirety.
  • DNA molecules are bound to a sequencing platform (flow cell) via an anchor probe binding site (otherwise referred to as a flow cell binding site) and amplified in situ, such as on a glass slide.
  • a solid surface on which DNA molecules are amplified typically comprise a plurality of first and second bound oligonucleotides, the first complementary to a sequence near or at one end of a target polynucleotide and the second complementary to a sequence near or at the other end of a target polynucleotide.
  • This arrangement permits bridge amplification, as described in U.S. Pat. Pub. 2014/0162278.
  • the DNA molecules are then annealed to a sequencing primer and sequenced in parallel base-by-base using a reversible terminator approach.
  • Hybridization of a sequencing primer may be preceded by cleavage of one strand of a double-stranded bridge polynucleotide at a cleavage site in one of the bound oligonucleotides anchoring the bridge, thus leaving one single strand not bound to the solid substrate that may be removed by denaturing, and the other strand bound and available for hybridization to a sequencing primer.
  • the llluminaTM Genome Analyzer System utilizes flow-cells with 8 channels, generating sequencing reads of 18 to 36 bases in length, generating >1 .3 Gbp of high quality data per run (see www.illumina com).
  • multiple biological samples are processed as described herein and are combined into a single sample for sequencing.
  • biological samples from two or more subjects are separately contacted with the lysis buffer as described herein to form separate extraction solutions. While keeping the extraction solutions separate, nucleic acids are isolated from the extraction solutions, and thereafter the nucleic acids are fragmented as described herein, modified to include barcodes, and amplified. Capture probes are then added to the separate mixtures of fragmented polynucleotides.
  • the multiple capture probe/fragmented polynucleotide mixtures can be combined in to a single sample.
  • the single sample of mixed biological samples for example, can then be loaded onto a single flow cell of the llluminaTM sequencing system. Thereafter, the barcode sequences of the polynucleotide fragments can be used to identify the specific biological sample (and hence subject) from which the captured sequence arose.
  • polynucleotide fragments obtained as described herein from different biological samples can be combined into a single sample and then incubated with the capture probes for subsequent sequencing.
  • the barcode sequences can similarly be used to identify the biological sample (and hence subject) origin of a given sequence.
  • a single biological sample may be processed and analyzed on a single flow cell.
  • a probe count is determined for one or more of the capture probe subsets used in the sequencing reaction described herein.
  • each sequencing read associated with the llluminaTM sequencing system can be equated with the binding of a specific one of the capture probe molecules to a single target polynucleotide fragment.
  • the cumulative number of reads associated with a homologous capture probe subset provides an indication of the overall number of target polynucleotide fragments captured by the capture probe subset. For example, if a given subset of homologous capture probes has 218 reads for a biological sample, then the 218 reads correspond to a raw probe count of 218 for the homologous subset of capture probes.
  • the 523 reads correspond to a raw probe count of 523 for the homologous subset of capture probes used in the sequencing reaction.
  • the probe count thus provides an indication of the number of polynucleotide fragments that a given subset of homologous capture probes interacts with as part of the sequencing reaction for a biological sample.
  • a portion of a subset of homologous may capture 323 polynucleotide fragments from one biological sample while a different portion of the same subset of homologous probes my capture 672 polynucleotide fragments from a different biological sample.
  • the raw probe count for one biological sample is 323 whereas the probe count for the other biological sample is 672, with the raw probe counts being assigned to the specific biological samples (and hence the subjects from which the samples were obtained) based on the barcode sequences (see U.S. Pat. Pub. 2014/0162278) associated with the polynucleotide fragments.
  • the raw probe counts for a given probe can vary across samples in a manner that is unrelated to gene copy number variations among samples.
  • a variety of parameters can affect the samples, for example, including starting nucleic acid concentrations, amplification efficiencies, probe capture efficiencies, and sample purities.
  • the starting nucleic acid concentration in the samples can affect the levels of target polynucleotide fragments in a processed sample, thus affecting the raw probe count (again, in a manner that is unrelated to copy number variations in the sample).
  • the nucleic acid concentration in the original sample from one subject may be higher or lower as compared to the starting nucleic acid concentration in a sample from a different subject.
  • sample-to-sample variations contribute to higher or lower amounts of fragmented target polynucleotides available to bind to the capture probes among different samples.
  • the varying amounts of target polynucleotide fragments then results in variations among raw probe counts for a given capture probe - variations that are independent of gene copy numbers in the samples.
  • the raw probe counts determined from a sequencing run are normalized for a given sample and across different samples of the sequencing run.
  • the normalization can account for the sample- to-sample variation in the input nucleic acid concentration of the starting sample, as well as for other parameters that affect the raw probe counts as described herein.
  • raw probe counts of a sample can be normalized across all samples for each capture probe, adjusting for probe-to-probe differences in probe binding efficiency (capture efficiency).
  • the probe counts can be normalized for (and within) a particular sample.
  • a "normal" value of probe reads can be established for a particular sample and the homologous probe subset across the flow cell for that sample. From this value, the variation of the probe against that sample's average can be used to find significant deviations that would indicate a difference in the initial input concentration of DNA at that particular region (e.g. a copy number variant).
  • different and/or additional methods can be used to normalize the raw probe counts for a given sample and/or across several different samples.
  • a copy number variant is identified from the probe counts, such as by comparing the normalized probe counts arising from the sequencing reaction described herein.
  • a gene of interest arising from a biological sample will have multiple capture probes that captured target polynucleotide fragments from a biological sample.
  • the probe count frequencies provide an average number of probe counts over the region of interest in a given sequencing assay.
  • the overall average of the normalized probe counts will, for example, center on two copies of a gene (one copy per chromosome, i.e., one parental copy and one maternal copy). Variations in the copy number (i.e., copy number variants) can thus be identified by comparing the normalized probe counts.
  • a normalized probe count for a capture probe subset for one biological sample can be compared to a normalized probe count for the same capture probe for a different biological sample - the relative difference between the probe counts indicating a decrease or increase in the presence of the gene to which the capture probe is targeted.
  • the normalized probe counts for different subsets of homologous probes for the same sample can be compared to determine the increase or decrease in the copy numbers of the gene (relative to a two-gene-copy average).
  • combinations of comparing normalized probe counts among the same sample and across different samples can be used to identify a copy number variant.
  • a copy number variant caller can be used to identify copy number variants.
  • a caller for example, is described in U.S. Pat. App. No. 62/476,361 , filed March 24, 2017, and titled "COPY NUMBER VARIANT CALLER," the content of which is expressly incorporated herein by reference in its entirety.
  • sequencing reads generated for a test sequencing library are mapped to a segment or segments within a region, or regions of interest.
  • the number of sequencing reads mapped at the segment(s) within region(s) of interest can then be determined.
  • a copy number likelihood model can then be determined which is used to set the transition probability of a copy number state given the observed number of mapped sequencing reads.
  • a hidden Markov model is built which includes the hidden layer, the observation layer and transition probabilities.
  • the hidden Markov model is parameterized.
  • the hidden Markov model includes at least two unknown parameters: the copy number state and the transition probabilities between the copy number state and observed number of sequencing reads, which are determined by the copy number likelihood model. Expectation-Maximization can be used to determine these parameters based on the best fit of the data (that is, parameterize the model) and to determine the most probable copy number.
  • the process may consider other variables that affect the observation states, such as GC content bias, spuriosity of a capture probe associated with a segment, noisy test sequencing libraries which affect the transition probabilities.
  • the additional variables can be treated as latent and determined by EM given the available data.
  • the transition probabilities are then adjusted to account for these other variables.
  • the EM process can be cumulative (adjusting for all variables at once) or it can adjust for the variables in separate EM iterations before the HMM is solved to determine a most probable copy number state of the segment.
  • contacting the biological sample with the lysis buffer as described herein to form an extraction solution reduces the variability associated with determining a probe count. That is, contacting the biological sample with the lysis buffer reduces the likelihood that a given normalized probe count will (for example) aberrantly deviate from a two-copy average for probe count frequency and/or result in an incorrect raw probe count number.
  • the probe count variability of a biological sample pretreated with the lysis buffer as described herein is reduced by about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% or more as compared to an untreated biological sample.
  • the probe count variability of a biological sample pretreated with the lysis buffer as described herein is reduced by about 100%, 200%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, or 1000% or more as compared to an untreated biological sample.
  • nucleic acids are believed to be bound to proteins such as histone proteins.
  • the bound proteins are believed to interfere with isolation of the nucleic acid and/or preparation of the polynucleotide fragments used in a sequencing reaction as described herein. But by mixing the biological sample with the lysis buffer before isolating nucleic acids from the sample, it is believed the nucleic acids to be isolated can be released from the protein contaminants.
  • nucleic acids from the proteins results in cleaner isolation of the nucleic acids and/or generation of fragmented polynucleotides, thereby improving the probe count determinations as described herein.
  • pretreatment of the biological sample with the lysis buffer as described herein can substantially reduce the overall protein concentration in the biological as compared to a biological sample that has not been treated.
  • a biological sample that is pretreated with a lysis buffer that includes Triton X100 and/or SDS as described herein can reduce the overall protein concentration in a biological sample by about 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, or 85% as compared to an untreated biological sample.
  • using a lysis buffer that includes both SDS and Triton X100 as compared to lysis buffer that includes Triton X100 alone or SDS alone as the detergent component, can further enhance the level of protein concentration reduction in the biological sample.
  • using a lysis buffer that includes both Triton X100 and SDS may reduce overall protein concentration by 5%, 10%, 15%, 20%, or 25% more than the use of Triton X100 as the only detergent or SDS as the only detergent.
  • Overall protein concentration can be determined by a variety of methods know in the art, such as via a Bradford assay, Bio-RadTM protein assay, the Lowry method, a NanoOrange® Protein Quantitation Kit, and others.
  • Saliva samples were collected with an OrageneTM collection kit in accordance with the manufacturer's guidelines. Briefly, a subject expels saliva in an OrageneTM collection tube to a fill line, the cap of the tube is closed, and the collection tube is inverted to mix the saliva with OrageneTM collection buffer located within the tube. If needed, the collection tube is shipped to a testing facility. Upon receipt, all samples are stored at ambient temperature until the sample is pretreated and extracted, as described herein.
  • This example describes the preparation of saliva samples for DNA isolation. More particularly, a batch saliva lysis buffer was prepared by mixing 62.2 g Triton X- 100, 1 .0 L of TE buffer (10.0 mM Tris, 0.1 mM EDTA, pH buffered 8.0), and 2954 ml_ water. Once all of the components are in solution the volume should be approximately 4015 ml_. With the addition of the Triton X-100 to the TE buffer, the final concentrations of each of the components in the saliva lysis buffer were approximately 25.1 mM Triton X-100, 2.49 mM Tris-HCL, and 0.0249 mM EDTA. Thereafter, 50 ⁇ of the saliva lysis buffer was pipetted into each well of a 96-well NuncTM 2.2 ml_ round bottom deep well plate.
  • This example describes the isolation of DNA from the lysed and pretreated saliva sample. Following the pretreatment of the saliva sample (see Example 2), DNA is isolated using a Agencourt GenfindTM v2 DNA isolation system, according to the manufacturer's instructions and as modified described herein.
  • a magnetic field was then applied across the plate (via an AlpaquaTM EX) for 20 min, so that the magnetic beads and associated DNA are drawn to the bottom of the wells (i.e., a "pulldown" incubation was performed according to the manufacturer's instructions).
  • the well plate was moved to a deep well plate washer.
  • This plate washer is fitted with a strong magnetic base, which operates to further sequester the magnetic beads and associated DNA from the saliva samples while the supernatant (containing the non-DNA cellular components, e.g., proteins and lipids) was aspirated (BioTekTM ELx405 plate washer) from the wells.
  • the supernatant containing the non-DNA cellular components, e.g., proteins and lipids
  • the aspiration leaves approximately 50-80 ⁇ of supernatant, with the magnetic beads and associated DNA sequestered at the bottom of the well via the magnetic field.
  • the samples were then washed twice, following a similar pattern.
  • each well of the well plate was filled with wash regent, mixed on orbital shaker, subjected to magnetic bead pulldown, followed by aspiration of the wash supernatant.
  • 800 ⁇ of a high salt solution (Agencourt GenfindTM v2 - Wash 1 ) was added to each well of the well plate.
  • the plate was then mixed on the orbital shaker for 10 minutes at 1550 rpm.
  • the plate was subjected to the magnetic field for a 12-minute pulldown incubation.
  • the supernatant was then removed by aspiration, again leaving approximately 50-80 ⁇ .
  • the second wash 750 ⁇ of an ethanol-based wash (Agencourt GenfindTM v2 - Wash 2) was added to each well of the well plate.
  • the plate was the mixed on the orbital shaker for 5 minutes at 1550 rpm. After mixing, the plate was subjected to a magnetic pulldown for 8 minutes, and the supernatant was removed via aspiration (leaving 35-50 ⁇ in the well).
  • the second (and final) wash removes as much of the wash and material not bound to the magnetic beads as possible.
  • the plate is incubated at approximately 50°C for about 10 min in order to evaporate off more of the second wash buffer.
  • This example describes the preparation of the isolated DNA for sequencing and the direct targeted sequencing (DTS) of the isolated DNA (from Example 3).
  • DTS direct targeted sequencing
  • the DNA was sonicated to fragment the DNA and the ends are cleaned and prepared for ligation.
  • adapter oligos are ligated to the ends of the fragmented DNA.
  • These adapters contain molecular barcodes (for sample identification) as well as sequences required for llluminaTM sequencing (see U.S. Pat. Pub. 2014/0162278).
  • the fragmented DNA including the adapter sequences were then non-specifically PCR'ed using the common sequences in the adapters as primer targets.
  • the amplified samples with the adapters attached were introduced to the flow cell which has already been prepared with genomic-specific probes (i.e., capture probes).
  • the probes contain sequences required for llluminaTM sequencing and a region homologous to the portion of the human genome of interest for sequencing.
  • the conditions were controlled (65°C for 2 hours) to allow for the probes to hybridize to the regions which they share homology (e.g. the regions of interest).
  • the homologous region binds only to the paired sequences of the fragmented and amplified DNA. Any genetic material that is not bound by a capture probe was washed off the flow cell at this point.
  • the collective set of probes includes many subsets of probes that were, in this example, identical to each other (and hence target the same DNA sequence), the subsets being different from other identical subsets of probes within the collective set of probes (the different subsets targeting different DNA sequences).
  • Copy number variants can be identified, for example, using a copy number variant caller, such as described in U.S. Pat. App. No. 62/476,361 (see above discussion).
  • the counts for the probes of a given flow-cell were normalized across all the samples on the flow cell.
  • the counts for each specific identical probe subset were added to determine frequency of occurrence of that probe for that patient.
  • a gene of interest will have many hundreds of probes which captured small regions of DNA from the biological sample, and collectively these probe frequency counts show an average number of probe counts over the region of interest for the DTS assays. The overall average of these counts will overwhelmingly center on two copies of a gene (one copy per chromosome). However, local deviations in a region of a sample may exist.
  • CNV copy number variant
  • Figure 1 provides a schematic illustration of probe counts leading to CNV calls after normalization for multiple patient sample runs on a single flow cell.
  • Patient 1 has an even number of reads for all genes of interest, thus indicating that this patient has 2 copies of all genes sequenced.
  • this patient has -50% more reads (counts) for gene B, which is due to an extra copy of that gene (e.g. 3-copy copy number variant).
  • Patient 3 has about 50% less reads for gene C, which is due to having only a single copy of that gene (e.g. 1 - copy copy number variant).
  • Figure 2 is a graph showing aggregation of multiple probe counts of a single patient blood sample processed as described in Examples 1 -4 (but for blood), including pretreatment of the blood sample with lysis buffer (Example 2).
  • the graph (or "jitter plot") allows visualization of the variability among the multiple probe counts of a flow-cell run. More particularly, every black dot on the graph, separated along the x-axis - the x-axis being further subdivided into genes (delineated by the grey lines) examined in the assay - represents a specific, normalized probe count.
  • the center of the y-axis represents the average probe count of the sample, which corresponds to 2 copies of a gene (i.e., 2.0 on the y-axis).
  • a black dot i.e., a normalized probe count
  • a normalized probe count a normalized probe count
  • the use of the lysis buffer in the pretreatment step shows tight banding of the normalized probe counts across the x-axis, indicating a low variability of the probe counts used in generation of the jitter plot.
  • Figure 3 is a graph showing aggregation of multiple probe counts of a single patient saliva sample processed as described in Example 1 and Examples 3-4, i.e., processing of the saliva sample without first pretreating the saliva sample with the lysis buffer of Example 2.
  • the average probe count of Figure 3 was algorithmically centered on 2 gene copies (i.e., 2.0 on the y-axis).
  • the normalized probe counts for untreated saliva samples had very high variance among a single patient's probes. This is especially evident when comparing Figure 2 and Figure 3.
  • pretreating the saliva sample as described herein in Example 2 substantially reduces probe count variability (compare Figure 2 (blood) and Figure 3 (saliva)).
  • Figure 4 is a graph showing identification of a 1 -copy CNV of the BRCA1 gene, as determined by aggregating, normalizing, and then comparing multiple probe counts of a single patient blood sample processed as described in Examples 1 -4 (but for blood).
  • the blood sample was pretreated with the lysis buffer as described in Example 2, the low probe count variability resulted in tightly grouped deviation in the probe counts.
  • the BRCA1 gene was easily discerned by visualization (arrow).
  • the 1 -copy CNV was also easily determined by probe count computation.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

Provided herein are methods and compositions for reducing probe count variability of a biological sample. After obtaining a biological sample, such as a saliva sample, the sample is pretreated with a lysis buffer that includes a detergent to form an extraction solution. Nucleic acids are isolated from the extraction solution and fragmented into polynucleotide fragments, which are then mixed with homologous capture probes, which, for example, are bound to a flow cell of a direct targeted sequencing system. The capture probes bind to targeted sequences of the polynucleotide fragments, thereby capturing the targeted polynucleotide fragments. Based on binding of polynucleotides fragments to the homologous capture probes, a probe count is determined for the homologous probes. By mixing the lysis buffer with the biological sample to form the extraction solution, variability of the determined probe count is substantially reduced.

Description

EXTRACTION OF NUCLEIC ACIDS
FOR REDUCED PROBE COUNT VARIABILITY CROSS-REFERENCE TO RELATED APPLICATIONS
[001 ] This application claims priority benefit to U.S. Provisional Patent Application No. 62/530,779, filed July 10, 2017, which is titled "EXTRACTION OF NUCLEIC ACIDS FOR REDUCED PROBE COUNT VARIABILITY," and to U.S. Provisional Patent Application No. 62/466,789, filed March 3, 2017, which is also titled "EXTRACTION OF NUCLEIC ACIDS FOR REDUCED PROBE COUNT VARIABILITY." The entire disclosure of the above-identified priority applications are hereby fully incorporated herein by reference in their entirety.
TECHNICAL FIELD
[002] The present disclosure relates generally to methods and compositions for extracting nucleic acids from a biological sample, and more particularly to methods and compositions for extracting nucleic acids from a biological sample so that probe counts determined from the extracted nucleic acids have reduced probe count variability.
BACKGROUND
[003] Direct Targeted Sequencing (DTS) is a next generation sequencing technique in which a sequencing substrate (i.e., a flow cell) becomes a genomic sequence capture substrate. Without adding another instrument to the normal flow of a typical next generation sequencing protocol, the DTS protocol modifies the sequencing surface to capture genomic DNA from a specially prepared library. The captured library is then sequenced as a normal gDNA library would be, and the captured and sequenced DNA can be used for a variety of diagnostic applications.
[004] Recent improvements to the DTS system involve the sequencing of specific target sequences in a sample. For example, barcode-tagged polynucleotides are sequenced simultaneously and sample sources are identified on the basis of barcode sequences. The sequencing data can then be used, for example, to determine one or more genotypes at one or more loci comprising a causal genetic variant. For example, a copy number variant of a gene can arise when a subject has more or less than two copies of a gene. [005] To determine a copy number variant, the DTS system relies on determining interactions between capture probes used in the DTS system and the DNA fragments with which the probes interact. Determining such interactions, however, is complicated by the fact that the biological samples inherently contain a variety of cellular contaminants. These contaminants are believed to interfere with the sequencing process. Hence, determining interactions between the capture probes - and the DNA fragments that the probes are designed to capture - is challenging and often results in highly variable data. The variability of the data then makes identification of copy number variants difficult, and in some cases even impossible. As a result, multiple flow cells are often used in an effort to obtain data that is of sufficient quality to identify a copy number variant - an endeavor that is both expensive, time consuming, and not guaranteed to work. Improvements to the DTS process, and more particularly to the methods for determining interactions between the capture probes and DNA fragments for copy number variant calling, are therefore desirable.
SUMMARY
[006] In certain example aspects, provided herein is a method for reducing probe count variability of a biological sample. The method includes, for example, contacting a biological sample, such as a saliva sample, with a lysis buffer to form an extraction solution. The lysis buffer includes a detergent, and in certain aspects a metal chelator. In certain aspects, the detergent is a non-ionic detergent, such as Triton X100. In other aspects, the detergent is an anionic detergent, such as sodium dodecyl sulfate (SDS). In certain example aspects, the buffer of the lysis buffer is a Tris buffer. In certain example aspects, the lysis buffer includes 25.0 mM Triton X-100, 2.5 mM Tris- HCL, and 0.025 mM EDTA. In certain example aspects, the extraction solution formed by contacting the biological fluid sample with the lysis buffer has a ratio of about 1 : 12 (v/v) of lysis buffer to biological fluid sample.
[007] In addition to contacting the biological with a lysis buffer to form an extraction solution, in certain example aspects the method includes isolating polynucleotide fragments from the extraction solution. The polynucleotide fragments are then contacted with multiple homologous capture probes. The homologous capture probes, for example, bind to at least a portion of the polynucleotide fragments. A probe count for the multiple homologous capture probes is then determined, the probe count providing an indication of the number of polynucleotide fragments that bind the multiple homologous capture probes. By contacting the biological sample with a lysis buffer to form an extraction solution, the method reduces variability of the probe count.
[008] In certain other example aspects, provided is a composition for reducing probe count variability of a biological sample. The composition includes, for example, a detergent, a buffer, and a metal chelator. In certain example aspects, the detergent is a non-ionic detergent, such as Triton X100. In other example aspects, the detergent is an anionic detergent, such as SDS. In certain example aspects, the buffer is a Tris buffer. In certain example aspects, the metal chelator is EDTA. In certain example aspects, the lysis buffer composition includes 25 mM Triton X100, 0.02 mM EDTA, and 2.0 mM TRIS.
[009] These and other aspects, objects, features and advantages of the example embodiments will become apparent to those having ordinary skill in the art upon consideration of the following detailed description of illustrated example embodiments.
BREIF DESCRIPTION OF THE DRAWINGS
[0010] Figure 1 is an illustration depicting probe counts leading to copy number variant (CNV) calls after normalization for multiple patient sample runs on a single flow cell of a direct targeted sequencing (DTS) system, in accordance with certain example embodiments.
[001 1 ] Figure 2 is a graph showing aggregation of multiple probe counts of a single patient blood sample that was subjected to pretreatment with a lysis buffer, in accordance with certain example embodiments. As shown, tight banding of normalized probe counts occurs around 2 copies of the gene of interest when the blood sample is pretreated with lysis buffer.
[0012] Figure 3 is a graph showing aggregation of multiple probe counts of a single patient saliva sample processed without lysis buffer pretreatment, in accordance with certain example embodiments. As shown, the normalized probe counts are highly variable. This probe count variability is particularly evident when compared to the graph shown in Figure 2.
[0013] Figure 4 is a graph showing identification of a 1 -copy CNV of the BRCA1 gene, as determined by aggregating, normalizing, and then comparing multiple probe counts of a single patient blood that was pretreated with lysis buffer, in accordance with certain example embodiments. As shown, tight binding occurs due to the lack of probe count variability. Notably, the BRCA1 copy number variant is easily discernable.
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0014] The embodiments described herein can be understood more readily by reference to the following detailed description, examples, and claims, and their previous and following description. Before the present system, devices, compositions and/or methods are disclosed and described, it is to be understood that the embodiments described herein are not limited to the specific systems, devices, and/or compositions methods disclosed unless otherwise specified, as such can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting.
[0015] Further, the following description is provided as an enabling teaching of the various embodiments in their best, currently known aspect. Those skilled in the relevant art will recognize that many changes can be made to the aspects described, while still obtaining the beneficial results of this disclosure. It will also be apparent that some of the desired benefits of the present invention can be obtained by selecting some of the features of the various embodiments without utilizing other features. Accordingly, those who work in the art will recognize that many modifications and adaptations to the various embodiments described herein are possible and can even be desirable in certain circumstances and are a part of the present disclosure. Thus, the following description is provided as illustrative of the principles of the embodiments described herein and not in limitation thereof.
Summary of Terms & Nomenclature
[0016] The invention will now be described in detail by way of reference only using the following definitions and examples. All patents and publications, including all sequences disclosed within such patents and publications, referred to herein are expressly incorporated by reference in their entirety.
[0017] Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Singleton, et al., DICTIONARY OF MICROBIOLOGY AND MOLECULAR BIOLOGY, 2D ED., John Wiley and Sons, New York (1994), and Hale & Marham, THE HARPER COLLI NS DICTIONARY OF BIOLOGY, Harper Perennial, NY (1991 ) provide one of skill with a general dictionary of many of the terms used in this invention. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described. Practitioners are particularly directed to Sambrook et al., 1989, and Ausubel FM et al., 1993, for definitions and terms of the art. It is to be understood that this invention is not limited to the particular methodology, protocols, and reagents described, as these may vary.
[0018] Unless otherwise indicated, nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
[0019] The headings provided herein are not limitations of the various aspects or embodiments of the invention which can be had by reference to the specification as a whole. Accordingly, the terms defined immediately below are more fully defined by reference to the specification as a whole.
[0020] As used herein, the singular forms "a," "an" and "the" include plural referents unless the context clearly dictates otherwise.
[0021 ] Ranges or values can be expressed herein as from "about" one particular value, and/or to "about" another particular value. When such a range is expressed, another aspect includes from the one particular value of the range and/or to the other particular value of the range. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. Similarly, when values are expressed as approximations, by use of the antecedent "about," it will be understood that the particular value forms another aspect. In certain example embodiments, the term "about" is understood as within a range of normal tolerance in the art, for example within 2 standard deviations of the mean. About can be understood as within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1 %, 0.5%, 0.1 %, 0.05%, or 0.01 % of the stated value. Unless otherwise clear from context, all numerical values provided herein can be modified by the term about. Further, terms used herein such as "example," "exemplary," or "exemplified," are not meant to show preference, but rather to explain that the aspect discussed thereafter is merely one example of the aspect presented.
[0022] As used herein, the terms "amplification" or "amplify" refer to any process by which the copy number of a polynucleotide sequence is increased. Methods for primer-directed amplification of polynucleotides are known in the art, and include without limitation, methods based on the polymerase chain reaction (PCR). Conditions favorable to the amplification of polynucleotide sequences by PCR are known in the art, can be optimized at a variety of steps in the process, and depend on characteristics of elements in the reaction, such as polynucleotide type, concentration, sequence length to be amplified, sequence of the polynucleotide and/or one or more primers, primer length, primer concentration, polymerase used, reaction volume, ratio of one or more elements to one or more other elements, and others, some or all of which can be altered.
[0023] In general, PCR involves the steps of denaturation of the polynucleotide to be amplified (if double stranded), hybridization of one or more primers to the polynucleotide, and extension of the primers by a DNA polymerase, with the steps repeated (or "cycled") in order to amplify the polynucleotide sequence. Steps in this process can be optimized for various outcomes, such as to enhance yield, decrease the formation of spurious products, and/or increase or decrease specificity of primer annealing. Methods of optimization are well known in the art and include adjustments to the type or amount of elements in the amplification reaction and/or to the conditions of a given step in the process, such as temperature at a particular step, duration of a particular step, and/or number of cycles. In some example embodiments, an amplification reaction comprises at least 5, 10, 15, 20, 25, 30, 35, 50, or more cycles. In some example embodiments, an amplification reaction comprises no more than 5, 10, 15, 20, 25, 35, 50, or more cycles. Cycles can contain any number of steps, such as 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10 or more steps. Steps can include any temperature or gradient of temperatures, suitable for achieving the purpose of the given step, including but not limited to, strand denaturation, primer annealing, and primer extension. Steps can be of any duration, including but not limited to about, less than about, or more than about 1 , 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 120, 180, 240, 300, 360, 420, 480, 540, 600, or more seconds, including indefinitely until manually interrupted. Cycles of any number comprising different steps can be combined in any order.
[0024] As used herein, "biological sample" refers to a sample obtained from a subject, including a sample of biological tissue orfluid origin obtained in vivo or in vitro. Such samples can be from, without limitation, body fluids, organs, tissues, fractions, and cells isolated from a biological subject. Biological samples can also include extracts from a biological sample, such as for example an extract from a biological fluid (e.g., blood or urine). Samples can be obtained from a subject, such as a cell sample, tissue sample, fluid sample, or organ sample derived therefrom (or cell cultures derived from any of these), including, for example, cultured cell lines, biopsy, blood sample, cheek swab, or fluid sample containing a cell (e.g. saliva). In certain example embodiments, the sample includes genomic DNA. In some embodiments, samples include mitochondrial DNA, chloroplast DNA, plasmid DNA, bacterial artificial chromosomes, yeast artificial chromosomes, oligonucleotide tags, or combinations thereof.
[0025] As used herein, a "biological fluid" or "biological fluid sample" refers to any bodily fluids (e.g., blood, blood plasma, sputum, lavage fluid, cerebrospinal fluid, urine, semen, sweat, tears, saliva, and the like, etc.), as well as solid tissues that have, at least in part, been converted to a fluid form through one or more known protocols or for which a fluid has been extracted. For example, a liquid tissue extract, such as from a biopsy, can be a biological fluid sample. In certain examples, a biological fluid sample is a saliva sample collected from a subject. In certain examples, the biological fluid sample is a blood sample collected from a subject. As used herein, the terms "blood," "plasma" and "serum" include fractions or processed portions thereof. Similarly, where a sample is taken from a biopsy, swab, smear, etc., the "sample" encompasses a processed fraction or portion derived from the biopsy, swab, smear, etc.
[0026] As used herein, a "chaotropic agent" refers generally to substances that, without being bound by any particular theory, are thought to disrupt the three dimensional hydrate shell structure of water. Chaotropic agents are understood to interfere with stabilizing intra-molecular interactions mediated by non-covalent forces, such as hydrogen bonds, Van der Waals forces, and/or hydrophobic effects. With regard to DNA, chaotropic agents are believed to disrupt the stabilizing hydrate shell that forms around DNA in an aqueous solution. Some inorganic, organic, and/or mixed salts can have chaotropic properties. Without wishing to be bound by any particular theory, such salts are thought to act, for example, by shielding charges and preventing the stabilization of salt bridges. Example chaotropic agents include guanidinium salts generally, guanidinium isothiocyanate, guanidinium chloride, urea, alkali salts, and sodium dodecyl sulfate.
[0027] As used herein, a "copy number variant" refers to a variation in the number of copies of a particular gene in a genome. As those skilled in the art will appreciate, in human subjects, for example, the genome generally has two copies of most genes - one inherited paternally and the other inherited maternally. Occasionally, alterations in parental chromosomes can lead to the gain or a loss of a copy of the gene. A deletion can occur, for example, when a fragment of DNA is lost, such as during copying, or when the genes shuffle during meiosis. Similarly, a duplication can occur when a copy of an additional gene is gained. Typically, deletions and duplications of greater than about 1 ,000 nucleotides are considered copy number variants, although the present disclosure is not intended to be constrained by this value. As those skilled in the art will appreciate, a difference in the copy number of a gene can increase or decrease the level of that gene's activity. For example, when a copy of a gene is deleted, the cell may produce half as much protein from the gene as compared to a normal cell. Many disease states are associated with changes in gene copy number.
[0028] As used herein, a "DNA-binding particle" refers to any conventional solid-phase material that interacts with, or that has been modified to interact with, a DNA fragment. The solid-phase phase material, for example, is any type of an insoluble, usually rigid material, matrix or stationary phase material that interacts with a DNA, either directly or indirectly. In certain example embodiments, the DNA-binding particle is a bead.
[0029] As used herein, a "bead" refers to a solid-phase particle of any convenient size, and can have an irregular or regular shape. In certain example embodiments, the surface of the bead is modified to bind nucleic acids, either directly and/or indirectly. For example, the bead can include silanol groups, carboxylic groups, or other groups that facilitate the direct and/or interaction of the bead with DNA. In certain example embodiments, silica beads (and gels) can be functionalized by adding primary amines, thiols, sulfhydryls, propyl, octyl, as well as other derivatives to the hydroxyl group (silanol) attached to silica. The bead can fabricated from any number of known materials, including cellulose, cellulose derivatives, acrylic resins, glass, silica gels, polystyrene, gelatin, polyvinyl pyrrolidone, co-polymers of vinyl and acrylamide, polystyrene cross-linked with divinylbenzene, or the like, polyacrylamides, latex gels, polystyrene, dextran, rubber, silicon, plastics, nitrocellulose, natural sponges, silica gels, controlled pore glass (CPG), metals, cross-linked dextrans (e.g., Sephadex®), agarose gel (Sepharose®), and other solid phase bead supports known to those of skill in the art. In certain example embodiments, the beads can be packed together so as to form a column that can be used with conventional column chromatography.
[0030] In certain examples, the beads are magnetic beads, and more particularly paramagnetic beads, meaning that the beads are only magnetic in the presence of magnetic field. As those skilled in the art will appreciate, magnetic particles can include an iron-oxide core coated with silane. Magnetic particles useful for magnetic DNA purification can be made from synthetic polymers, porous glass, or metallic materials like iron-oxide. The particles can be coated with functional groups or, in certain examples, can be left uncoated. While coated particles bound with carboxylic acid are more efficient at binding DNA, other molecules such as streptavidin or those containing free thiol groups can also be attached to the silane coat. In certain examples, any high yield magnetic particles that do not require a coating may be desired, as the lack of a coating and functional groups can allow for a higher surface area for binding nucleic acid. Additionally, particles without a coating are more responsive to an applied electric field. Examples of magnetic beads include, for example, silica-based magnetic beads or carboxylated magnetic beads. Example silica-based magnetic beads include, for example, Dynabeads™ MyOne™ Silane beads, available from ThermoFisher Scientific™.
[0031 ] As used herein, the term "elution" or "eluting" refers generally to the process of extracting one material from another by washing with a solvent to remove adsorbed material from an adsorbent. In certain example embodiments, elution is used to remove DNA that is bound directly and/or indirectly to a DNA-binding particle. The eluate is the product that results from the elution process.
[0032] As used herein, the term "genetic marker" refers generally to any gene or short genetic sequence that is known or understood to be associated with a disease condition of a subject. The genetic marker can be a variation (which may arise due to mutation or alteration in the genomic loci) that can be observed. A genetic marker can be a short DNA sequence, such as a sequence surrounding a single base-pair change (single nucleotide polymorphism, SNP), or a long one, like minisatellites. In certain example embodiments, genetic marker, for example, can be the presence or absence of a gene or short genetic sequence, which may provide an indication of a disease state.
[0033] As used herein, the terms "polynucleotide," "nucleotide," "nucleotide sequence," "nucleic acid," and "oligonucleotide" are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three dimensional structure, and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, intergenic DNA, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), small nucleolar RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, adapters, and primers. A polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component, tag, reactive moiety, or binding partner. Polynucleotide sequences, when provided, are listed in the 5' to 3' direction, unless stated otherwise.
[0034] As used herein, the terms "isolate" and "purify" are used interchangeably and mean to reduce by about 1 %, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%, or more, the amount of heterogenous elements, for example biological macromolecules DNA, that may be present in a biological sample. The presence of DNA and other nucleic acids can be assayed by any appropriate method including gel electrophoresis and staining and/or assays employing polymerase chain reaction. [0035] As used herein, the term "target polynucleotide" refers to a nucleic acid molecule or polynucleotide in a population of nucleic acid molecules having a target sequence to which one or more oligonucleotides, such as the capture probes described herein, are designed to hybridize. In some example embodiments, a target sequence uniquely identifies a sequence derived from a sample, such as a particular genomic, mitochondrial, bacterial, viral, or RNA (e.g. mRNA, miRNA, primary miRNA, or pre-miRNA) sequence. In some embodiments, a target sequence is a common sequence shared by multiple different target polynucleotides, such as a common adapter sequence joined to different target polynucleotides. "Target polynucleotide" may also be used to refer to a double-stranded nucleic acid molecule comprising a target sequence on one or both strands, or a single-stranded nucleic acid molecule comprising a target sequence, and may be derived from any source of or process for isolating or generating nucleic acid molecules. A target polynucleotide may comprise one or more (e.g. 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) target sequences, which may be the same or different. In general, different target polynucleotides comprise different sequences, such as one or more different nucleotides or one or more different target sequences.
[0036] As used herein, the terms "hybridization" and "annealing" refer to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner. The complex may comprise two strands forming a duplex structure, three or more strands forming a multi stranded complex, a single self- hybridizing strand, or any combination of these. A hybridization reaction may constitute a step in a more extensive process, such as the initiation of a PCR, or the enzymatic cleavage of a polynucleotide by a ribozyme. A first sequence that can be stabilized via hydrogen bonding with the bases of the nucleotide residues of a second sequence is said to be "hybridizable" to the second sequence. In such a case, the second sequence can also be said to be hybridizable to the first sequence.
[0037] As used herein, a "complement" of a given sequence is a sequence that is fully complementary to and hybridizable to the given sequence. In general, a first sequence that is hybridizable to a second sequence or set of second sequences is specifically or selectively hybridizable to the second sequence or set of second sequences, such that hybridization to the second sequence or set of second sequences is preferred (e.g. thermodynamically more stable under a given set of conditions, such as stringent conditions commonly used in the art) to hybridization with non-target sequences during a hybridization reaction. Typically, hybridizable sequences share a degree of sequence complementarity over all or a portion of their respective lengths, such as between 25%-100% complementarity, including at least about 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, and 100% sequence complementarity.
[0038] As used herein, the term "hybridized" as applied to a polynucleotide refers to a polynucleotide in a complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner. The complex may comprise two strands forming a duplex structure, three or more strands forming a multi-stranded complex, a single self-hybridizing strand, or any combination of these. The hybridization reaction may constitute a step in a more extensive process, such as the initiation of a PCR reaction, ligation reaction, sequencing reaction, or cleavage reaction.
[0039] As used herein, the term "homologous" denotes a characteristic of a nucleic acid sequence, wherein a nucleic acid sequence has at least about 60 percent sequence identity as compared to a reference sequence, typically at least about 75 percent sequence identity, and preferably at least about 95 percent sequence identity as compared to a reference sequence. The percentage of sequence identity is calculated excluding small deletions or additions which total less than 25 percent of the reference sequence. The reference sequence may be a subset of a larger sequence, such as a portion of a gene or flanking sequence, or a repetitive portion of a chromosome. However, the reference sequence is at least 12-18 nucleotides long, typically at least about 30 nucleotides long, and preferably at least about 50 to 100 nucleotides long. In general, recombination efficiency increases with the length of the targeting polynucleotide portion that is substantially complementary to a reference sequence present in the target DNA.
[0040] As use herein, the term "capture probe" refers to an oligonucleotide sequence that is hybridizable to a target polynucleotide. For example, in a direct targeting sequencing system (DTS), a capture probe can be used to hybridize with a target polynucleotide fragment and then localize that fragment to the substrate of the DTS system, either directly or indirectly. In certain example embodiments, capture probes are "homologous capture probes" when they share sequence identity with each other and hence target the same target polynucleotide sequence. For example, homologous capture probes may be about 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to each other. Conversely, heterologous capture probes are probes that share less sequence identity and hence target different target polynucleotide sequences. In certain example embodiments, a set of capture probes may include a subset of capture probes that are homologous to each other but that are heterologous to other subsets of homologous capture probes in the set of capture probes.
[0041 ] As used herein, a "probe count" refers generally to the collective number of target polynucleotide fragments that a plurality of homologous capture probes hybridizes to and captures when contacted with fragments of a genomic library of a subject, as determined, for example, by a sequencing reaction. For example, if a plurality of homologous probes is mixed with a fragmented genomic library of a subject, direct targeting sequencing may determine that the plurality of homologous capture probes binds 500 target polynucleotide fragments. In this example, the probe count for the plurality of homologous capture probes is 500. In another example, a different plurality of homologous probes, such as those directed to a different target polynucleotide, may bind 300 target polynucleotide fragments from the same subject. For that plurality of homologous probes, the probe count is 300. In certain example embodiments, the each of the probes of the plurality of homologous probes are identical to each other, i.e., they have the same sequence and are hence directed to the same target polynucleotide. In certain example embodiments, for the sequencing reaction to determine a probe count, fragmented polynucleotides are captured, bound to a substrate of the flow cell, and imaged via an lllumina sequencing system. In certain example embodiments, the probe count can be used to identify a copy number variant.
[0042] As used herein, a "subject" refers to an animal, including a vertebrate animal. The vertebrate can be a mammal, for example, a human. In certain examples, the subject can be a human patient. A subject can be a "patient," for example, such as a humane or veterinary patient suffering from or suspected of suffering from a disease or condition and can be in need of treatment or diagnosis or can be in need of monitoring for the progression of the disease or condition. The patient can also be in on a treatment therapy that needs to be monitored for efficacy. A mammal refers to any animal classified as a mammal, including, for example, humans, chimpanzees, domestic and farm animals, as well as zoo, sports, or pet animals, such as dogs, cats, cattle, rabbits, horses, sheep, pigs, and so on.
[0043] As used herein, a "detergent" refers generally to a surfactant or a mixture of surfactants. The surfactant molecule, for example, is an amphipathic molecule that contains both hydrophobic and hydrophilic groups. For example, the surfactant molecules generally contain a polar, hydrophilic group (head) at the end of a long hydrophobic carbon chain (tail). The term "non-ionic detergent" refers generally to a detergent (surfactant) molecule that contains an uncharged, hydrophilic head group(s). "Ionic detergents" include a hydrophobic chain and a charged headgroup that can be either anion or cation. An "anionic detergent" refers generally to a detergent (surfactant) that carries a negative charge, while a "cationic detergent" carries a positive charge. Detergents also include zwitterionic detergents.
Example Embodiments
[0044] Provided herein are methods and compositions for reducing probe count variability of a biological sample. Generally, after obtaining a biological sample, such as a saliva sample, the sample is pretreated with a lysis buffer that includes a detergent, thus forming an extraction solution. Nucleic acids from the extraction solution are then used, for example, in a direct sequencing protocol, such as a direct targeting sequencing (DTS) reaction. For example, the nucleic acids are isolated from the extraction solution and fragmented into polynucleotide fragments, which are then mixed with homologous capture probes. The capture probes bind to targeted sequences of the polynucleotide fragments, thereby capturing the targeted polynucleotide fragments and, for example, facilitating their binding to the substrate of the DTS system. Based on binding of polynucleotides fragments to the homologous capture probes, a probe count is determined for the homologous group of capture probes. By mixing the lysis buffer with the biological sample to form the extraction solution, variability of the determined probe count is substantially reduced.
Sample Collection & Preservation
[0045] In accordance with the methods and compositions described herein, a biological sample is collected from a subject. The biological sample can be any type of biological sample as described herein. In certain example embodiments, the samples are from the same subject, from different subjects, or combinations thereof. In certain example embodiments, a sample includes nucleic acids from a single subject. In some example embodiments, a sample includes multiple nucleic acids from two or more subjects.
[0046] A biological sample can be collected by a variety of conventional collection methods. When the biological sample is saliva, for example, collection can include the collection of passive drool, use of an oral swab to collect saliva, or simply having a subject, such as a human patient, expel saliva into a collection vessel. Various commercial kits are also available for saliva collection. For example, a saliva sample can be collected using a commercially available saliva collection kit (e.g., Oragene™). Once the biological sample is collected, the sample can be stored for later use, for example, or can be immediately processed for a probe count determination.
[0047] In certain example embodiments, a conventional preservative can be added to the collected sample to extend the shelf life of the sample. Example preservatives include formalin, formaldehyde, alcohol, and imidazolidinyl urea. Additional examples of potential preservatives include, for example, include octadecyldimethylbenzyl ammonium chloride, hexamethonium chloride, benzalkonium chloride (a mixture of alkylbenzyldimethylammonium chlorides in which the alkyl groups are long-chain compounds), and benzethonium chloride. Other types of preservatives include aromatic alcohols such as phenol, butyl and benzyl alcohol, alkyl parabens such as methyl or propyl paraben, catechol, resorcinol, cyclohexanol, 3-pentanol, and m- cresol.
[0048] Additionally, in certain example embodiments a metal chelator can also be added to the sample before storing the sample. For example, certain biological samples contain substantial amounts of salts and hence dissociated ions, such as calcium and magnesium. Thus, the addition of a chelating agent to the biological sample can reduce the salt load of the collected sample, thereby stabilizing the sample for storage. Any suitable chelating agent or combination of chelating agents can be used in accordance methods described herein. Specific metal chelators include, for example, ethylenediaminetetraacetic acid (EDTA), ethylene glycol-bis(P-aminoethyl ether)-N,N,N',N'-tetraacetic acid (EGTA), as well as other conventional chelators.
Pretreatment of Biological Sample with Lysis Buffer
[0049] In order to reduce probe count variability as described herein, at least a portion of the collected biological sample is pretreated with a lysis buffer by mixing the sample with a lysis buffer, thereby forming an extraction solution. The lysis buffer includes, for example, a suitable buffer and detergent. In certain example embodiments, the lysis buffer can also include a metal chelator. The lysis buffer is prepared, for example, by combining the detergent, the buffer, and the chelator. The prepared buffer is then mixed with at least a portion of the collected biological sample to form the extraction solution.
[0050] The detergent of the lysis buffer can be any detergent that, when used in the pretreatment step described herein, reduces probe count variability of the sample, such as during a direct targeted sequencing analysis. Non-limiting examples of the disclosed detergents include, for example, sodium dodecyl sulfate (SDS), Deoxycholate and cholate, sarcosyl or sodium lauroyi sarcosinate, the Triton family of detergents (e.g., Triton X100, Triton X1 14, Triton X102, Triton X165, Nonidet P40 [NP- 40], Igepal™ CA-630, and derivatives thereof), n-dodecyl-p-D-maltoside and other maltosides, digitonin, the Tween family of detergents (e.g., Tween 20 and Tween 80), as well as zwitterionic detergents (e.g., 3-[(3-cholamidopropyl) dimethylammonio]-1 - propanesulfonate, better known as CHAPS). In certain embodiments, a chaotropic agent can be substituted for the detergent and/or used in combination with the detergent.
[0051 ] In certain example embodiments, the detergent of the lysis buffer is Triton X100. The concentration of the Triton X100 in the lysis buffer, for example, can be at least about 5 mM, for example, at least about 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35 mM or greater. In certain example embodiments, the concentration of the Triton X100 in the lysis buffer does not exceed 50, 55, 60, 65, 70, 75, 80, or 85 mM. In certain embodiments, the concentration of the Triton X100 in the lysis buffer can be approximately 25 mM, for example, the concentration of the Triton X100 in the lysis buffer can be about 24.1 , 24.2, 24.3, 24.4, 24.5, 24.6, 24.7, 24.8, 24.9, 25.0, 25.1 , 25.2, 25.3, 25.4, 25.5, 25.6, 25.7, 25.8, 25.9, 26.0, 26.1 , 26.2, 26.3, 26.4, 26.5, 26.6, 26.7, 26.8, 26.9, or 27.0 mM.
[0052] In certain example embodiments, the detergent of the lysis buffer can be sodium dodecyl sulfate (SDS). The concentration of the SDS in the lysis buffer, for example, can be about 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 mM. In certain example embodiments, the SDS concentration of the lysis buffer may not exceed 6.5 mM. In other example embodiments, the concentration of SDS in the lysis buffer can be higher, such as about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, or 35 mM. As those skilled in the art will appreciate, however, SDS can have a negative impact on PCR, and hence high concentrations of SDS may be disadvantageous for the methods and compositions described herein.
[0053] In certain example embodiments, the lysis buffer described herein can include a mixture of different detergents. For example, the lysis buffer can include a Triton detergent, such as Triton X100, as well as SDS. In certain example embodiments, the lysis buffer can include, as the detergent, Triton X100 at a concentration of about 15 to 35 mM, such as about 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35 mM or greater. In certain example embodiments the concentration of the Triton X100 can be about 25mM. The lysis buffer can also include SDS in a concentration of about 1 -10 mM, such as about 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 mM. In certain example embodiments, the SDS concentration of the lysis buffer may not exceed about 6.5 mM. In other example embodiments, the concentration of SDS in the lysis can be higher, such as about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, or 35 mM.
[0054] With regard to the buffer of the lysis buffer, the buffer can be any buffer that serves to provide the desired pH of the lysis buffer. In certain example embodiments, the buffer can be a histidine-buffer, citrate-buffer, succinate- buffer, acetate-buffer, gluconate buffer, or phosphate-buffer (e.g., phosphate buffered saline) or mixtures thereof. In certain example embodiments, the lysis buffer can include a HEPES buffer (4-(2-hydroxyethyl)-1 -piperazineethanesulfonic acid), a TRICINE buffer (N- (Tri(hydroxymethyl) methyl)glycine), a TRIS buffer (tris(hydroxy- methyl)aminomethane), a BICI NE buffer (2-(Bis(2-hydroxyethyl)amino)acetic acid), a TAPS buffer (Tris(hydroxymethyl)methylAminoPropaneSulfonic), or combinations thereof.
[0055] In certain example embodiments, the lysis buffer is buffered to a pH of from 6 to 10. For example, the pH of the lysis buffer can be about 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10. In other embodiments, the pH can be neutral to a slightly basic pH, for example, a pH of about 7.0, 7.1 , 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1 , 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, or 9.0. In certain embodiments, the lysis buffer is a Tris buffer, such as a Tris-HCI buffer, having a pH of around 8.0. The disclosed Tris-HCI lysis buffer can have a pH of, for example, about 7.0, 7.1 , 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1 , 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, or 9.0. In certain example embodiments, when the buffer is a Tris-HCI buffer, the concentration of the Tris-HCI in the lysis buffer is about 2.50 mM Tris-HCI. In a further example embodiment, the concentration of the Tris-HCI in the lysis buffer can be about 1 .5, 1 .6, 1 .7, 1 .8, 1 .9, 2.0, 2.1 , 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1 , 3.2, 3.3, 3.4, or 3.5 mM Tris-HCI.
[0056] As noted above, in certain embodiments a metal chelator can also be included in the lysis buffer. Any suitable chelating agent or combination of chelating agents can be used in accordance with the methods described herein. Specific metal chelators include, for example, ethylenediaminetetraacetic acid (EDTA), ethylene glycol-bis(P-aminoethyl ether)-N,N,N',N'-tetraacetic acid (EGTA), as well as other conventional chelators. When EDTA is used as a chelating agent the concentration of the EDTA in the lysis buffer can be about 25 μΜ. For example, the lysis buffer can have an EDTA concentration of about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 μΜ.
[0057] To pretreat the biological sample with the lysis buffer, at least a portion of the collected biological sample is mixed with the lysis buffer to form an extraction solution. For example, the lysis buffer and the biological sample can be mixed together at a ratio of about 1 : 10 lysis buffer to biological sample. In certain embodiments, the ratio of lysis buffer to biological sample in the extraction solution is about 1 :5, 1 :6, 1 :7, 1 :8, 1 :9, 1 : 10, 1 : 1 1 , 1 :12, 1 : 13. 1 : 14, 1 : 15, 1 : 16, 1 : 17, 1 : 18, 1 : 1 : 19, or 1 :20 v/v. As those skilled in the art will appreciate based on this disclosure, for example, lower volumes of lysis buffer are needed in the extraction solution when the lysis buffer includes a higher concentration of detergent. [0058] In certain example embodiments, following the formation of the extraction solution (i.e., following the mixing of the lysis buffer with the biological solution), the extraction solution can be incubated for about 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 minutes. In certain example embodiments, the extraction solution can be incubated, for example, at or about room temperature, for example, at about 20 °C, 21 °C, 22 °C, 23 °C, 24 °C, or 25 °C. In other example embodiments, the extraction solution can be warmed above room temperature, for example, at about 26 °C, 27 °C, 28 °C, 29 °C, 30 °C, 31 °C, 32 °C, 33 °C, 34 °C, 35 °C, 36 °C, 37 °C, 38 °C, 39 °C, 40 °C, 41 °C, 42 °C, 43 °C, 44 °C, 45 °C or greater for the incubation. In certain example embodiments, DNA from the extraction solution is immediately subjected to nucleic acid isolation without any incubation period.
[0059] In certain example embodiments, also provided is a composition for reducing probe count variability. For example, the composition can be used to contact the biological sample before nucleic acids from the sample are isolated as described herein. In certain example embodiments, the composition includes a detergent and a suitable buffer. The detergent of the composition can be any detergent that, when used in the pretreatment step described herein, reduces probe count variability of the sample, such as during a direct targeted sequencing analysis. Non-limiting examples of the detergents of the composition include sodium dodecyl sulfate (SDS), Deoxycholate and cholate, sarcosyl or sodium lauroyl sarcosinate, the Triton family of detergents (e.g. , Triton X100, Triton X1 14, Triton X102, Triton X165, Nonidet P40 [NP- 40], Igepal™ CA-630, and derivatives thereof), n-dodecyl-p-D-maltoside and other maltosides, digitonin, the Tween family of detergents (e.g., Tween 20 and Tween 80), as well as zwitterionic detergents (e.g., 3-[(3-cholamidopropyl) dimethylammonio]-1 - propanesulfonate, better known as CHAPS). In certain embodiments, a chaotropic agent can be substituted for the detergent and/or used in combination with the detergent.
[0060] In certain example embodiments, the detergent of the composition is Triton X100. The concentration of the Triton X100 in the composition, for example, can be at least about 15 mM, for example, at least about 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35 mM or greater. In certain example embodiments, the concentration of the Triton X100 in the composition does not exceed about 50, 55, 60, 65, 70, 75, 80, or 85 mM. In certain embodiments, the concentration of the Triton X100 in the composition can be approximately 25 mM, for example, the concentration of the Triton X100 in the composition can be about 24.1 , 24.2, 24.3, 24.4, 24.5, 24.6, 24.7, 24.8, 24.9, 25.0, 25.1 , 25.2, 25.3, 25.4, 25.5, 25.6, 25.7, 25.8, 25.9, 26.0, 26.1 , 26.2, 26.3, 26.4, 26.5, 26.6, 26.7, 26.8, 26.9, or 27.0 mM.
[0061 ] In certain example embodiments, the detergent of the composition is sodium dodecyl sulfate (SDS). The concentration of the SDS in the composition, for example, can be about 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 mM. In certain example embodiments, the SDS concentration of the composition may not exceed 6.5 mM. In other example embodiments, the concentration of SDS in the composition can be higher, such as about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, or 35 mM. As those skilled in the art will appreciate, however, SDS can have a negative impact on PCR, and hence high concentrations of SDS may be disadvantageous when used with methods and compositions described herein.
[0062] In certain example embodiments, the composition can include a mixture of different detergents. For example, the composition can include a Triton detergent, such as Triton X100, as well as SDS. In certain example embodiments, the composition can include, as the detergent, Triton X100 at a concentration of about 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35 mM or greater, such as about 25mM. The composition can also include SDS in about 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10 mM. In certain example embodiments, the SDS concentration of the composition may not exceed 6.5 mM. In other example embodiments, the concentration of SDS in the composition can be higher, such as about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, or 35 mM.
[0063] The buffer component of the composition can be any buffer that serves to provide the desired pH of the composition. In certain example embodiments, the buffer can be a histidine-buffer, citrate- buffer, succinate-buffer, acetate-buffer, gluconate buffer, or phosphate-buffer (e.g., phosphate buffered saline) or mixtures thereof. In certain example embodiments, the buffer of the composition can include a HEPES buffer (4-(2-hydroxyethyl)-1 -piperazineethanesulfonic acid), a TRICINE buffer (N-(Tri(hydroxymethyl) methyl)glycine), a TRIS buffer (tris(hydroxymethyl) aminomethane), a BICINE buffer (2-(Bis(2-hydroxyethyl)amino)acetic acid), a TAPS buffer (Tris(hydroxymethyl)methylAminoPropaneSulfonic), or combinations thereof. [0064] In certain example embodiments, the composition is buffered to a pH of from about 6 to 10. For example, the pH of the composition can be about 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10. In other example embodiments, the pH can be neutral to a slightly basic pH, for example, a pH of about 7.0, 7.1 , 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1 , 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, or 9.0. In certain embodiments, the buffer is a Tris buffer, such as a Ths-HCI buffer, having a pH of around 8.0. The disclosed Ths- HCI composition can have a pH of, for example, about 7.0, 7.1 , 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1 , 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, or 9.0. In certain example embodiments, when the buffer of the composition is a Tris-HCI buffer, the concentration of the Tris-HCI in the composition is about 2.50 mM Tris-HCI. In a further example embodiment, the concentration of the Tris-HCI in the composition can be about 1 .5, 1 .6, 1 .7, 1 .8, 1 .9, 2.0, 2.1 , 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1 , 3.2, 3.3, 3.4, or 3.5 mM Tris-HCI.
[0065] In certain example embodiments, the composition can include a metal chelator. Any suitable chelating agent or combination of chelating agents can be used in accordance with the methods described herein. Specific metal chelators include, for example, ethylenediaminetetraacetic acid (EDTA), ethylene glycol-bis( - aminoethyl ether)-N,N,N',N'-tetraacetic acid (EGTA), as well as other conventional chelators. When EDTA is used as a chelating agent the concentration of the EDTA in the composition can be about 25 μΜ. For example, the composition can have an EDTA concentration of about 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 μΜ.
[0066] In certain example embodiments, the composition described herein may be included as part of a kit. The kit may include, for example, instructions for using the composition to pretreat a biological sample according to the methods described herein. For example, the kit may include a composition having about 25.0 mM Triton X-100, 2.50 mM Ths-HCL, and 0.025 mM EDTA. In other example embodiments, the kit may include individual components, such as Triton X-100, Ths-HCL, and EDTA, with instructions on how to prepare the composition as described herein.
Nucleic Acid Isolation from the Extraction solution
[0067] Following pretreatment of the biological sample as described herein, nucleic acids are isolated from the extraction solution. The nucleic acids can be isolated by a variety of conventional nucleic acids isolation methods and techniques. Potential extraction and isolation methods include, for example, organic extraction, the use of DNA-binding particles, such as silica-based isolation technology, magnetic separation, anion exchange technology, and others (e.g., salting out, cesium chloride density gradients, and chelex 100 resin based methods).
[0068] In certain example embodiments, the extraction solution can be subjected to organic extraction to recover the nucleic acids from the extraction solution. For example, and as those skilled in the art will appreciate, proteins within the extraction solution can be denatured and digested using a conventional protease. The proteins can thereafter be precipitated with organic solvents such as phenol, phenol/chloroform/isoamyl alcohol, or similar formulations, including TRIzol and TriReagent. The protein precipitate can then be removed by centrifugation. Purified nucleic acids can then be recovered, for example, via precipitation using ethanol, isopropanol, or other alcohol. Other non-limiting examples of extraction techniques include: (1 ) organic extraction followed by ethanol precipitation, e.g., using a phenol/chloroform organic reagent (Ausubel et al., 1993), with or without the use of an automated nucleic acid extractor, e.g., the Model 341 DNA Extractor available from Applied Biosystems (Foster City, Calif.); (2) stationary phase adsorption methods (U.S. Pat. No. 5,234,809; Walsh et al., 1991 ); and (3) salt-induced nucleic acid precipitation methods (Miller et al., (1988), such precipitation methods being typically referred to as "salting-out" methods.
[0069] In other example embodiments, silica-based nucleic acid isolation methods can be used to isolate nucleic acids from the extraction solution. As those skilled in the art will appreciate, DNA adsorbs specifically to silica membrane/beads/particles in the presence of certain salts and at a particular pH. Following introduction of the extraction solution to a silica-based isolation system, the DNA binds to the silica membrane/beads/particles and cellular contaminants can be removed by one or more wash steps. DNA can then be eluted using a low salt buffer or elution buffer. In certain example embodiments, chaotropic salts can be included to aid in protein denaturation and extraction of DNA. Silica-based isolation kits include PureLink™ Genomic DNA extraction kit (Invitrogen™) and DNeasy Blood and Tissue Kit (Qiagen™).
[0070] Another example of a suitable nucleic acid isolation and/or purification technique involves the use of magnetic particles to which nucleic acids can specifically or non-specifically bind, followed by isolation of the beads using a magnet, and washing and eluting the nucleic acids from the beads (see e.g. U.S. Pat. No. 5,705,628). As those skilled in the art will appreciate, magnetic particle purification methods rely on reversible binding of DNA to a magnetic solid surface/bead/particles, which has been coated with a DNA binding antibody or functional group that interacts specifically with DNA. In certain example embodiments, the beads are silanol magnetic beads. After the DNA binds the magnetic beads, the beads with the DNA bound thereto can be separated from cellular contaminants by applying a magnetic field to the beads and removing the beads from the solution. The bound DNA can then be eluted from the beads, such as with an alcoholic solution (e.g., ethanol). In certain example embodiments, the DNA can be eluted using a Tris/EDTA buffer.
[0071 ] In certain example embodiments, the above isolation methods may be preceded by an enzyme digestion step to help eliminate unwanted protein from the sample, e.g., digestion with proteinase K, or other like proteases. See, e.g., U.S. Pat. No. 7,001 ,724. If desired, RNase inhibitors may be added to the lysis buffer. For certain cell or sample types, it may be desirable to add a protein denaturation/digestion step to the protocol. Purification methods may be directed to isolate DNA, RNA, or both. When both DNA and RNA are isolated together during or subsequent to an extraction procedure, further steps may be employed to purify one or both separately from the other. Sub-fractions of extracted nucleic acids can also be generated, for example, purification by size, sequence, or other physical or chemical characteristic. In addition to an initial nucleic acid isolation step, purification of nucleic acids can be performed after any step in the methods of the invention, such as to remove excess or unwanted reagents, reactants, or products. Methods for determining the amount and/or purity of nucleic acids in a sample are known in the art, and include absorbance (e.g. absorbance of light at 260 nm, 280 nm, and a ratio of these) and detection of a label (e.g. fluorescent dyes and intercalating agents, such as SYBR green, SYBR blue, DAPI, propidium iodine, Hoechst stain, SYBR gold, ethidium bromide).
[0072] Commercial kits for nucleic acid isolation are also available. Potential kits for isolating DNA from the extraction solution include, for example, AccuPrep™ Genomic DNA Extraction Kit (Bioneer), Arcturus™ PicoPure® DNA Extraction Kit (Invitrogen), GFX Genomic Blood DNA Purification Kit (GE Healthcare), QIAamp™ DNA mini kit (Qiagen™), AllPrep DNA/RNA Mini Kit (Qiagen™), Gentra™ Puregene Blood Kit (Qiagen™), Agencourt™ DNAdvance Kit (Beckman Coulter™), and InnuPrep™ DNA minikit (AJ Innuscreen). Examples of magnetic bead extraction systems include Agencourt DNAdvance Kit (Beckman Coulter™) and Magnetic Beads Genomic DNA Extraction Kit (Geneaid™).
Preparation of Polynucleotide Fragments & Sequencing
[0073] Once the nucleic acids from the extraction solution are isolated, the isolated nucleic acids are used, for example, to prepare a genomic DNA library associated with the subject from which the biological sample was obtained. To prepare the library, the isolated nucleic acids are fragmented into multiple polynucleotide fragments. Fragmentation may be accomplished by a variety of methods known in the art, including chemical, enzymatic, and mechanical fragmentation. For example, the nucleic acids can be fragmented via acoustic shearing, Adaptive Focused Acoustics™ (AFA), nebulization, sonication, needle or high-pressure shearing, point-sink shearing, chemical fragmentation, or via the use of enzyme-based treatments (i.e., digitation with restriction enzymes), or combinations thereof.
[0074] In certain example embodiments, the isolated nucleic acids are fragmented into a population of fragmented polynucleotide fragments of one or more specific size range(s). In some embodiments, the amount of sample polynucleotides subjected to fragmentation is about, less than about, or more than about 50 ng, 100 ng, 200 ng, 300 ng, 400 ng, 500 ng, 600 ng, 700 ng, 800 ng, 900 ng, 1000 ng, 1500 ng, 2000 ng, 2500 ng, 5000 ng, ^g, or more. In some embodiments, fragments are generated from about, less than about, or more than about 1 , 10, 100, 1000, 10000, 100000, 300000, 500000, or more genome-equivalents of starting DNA. In certain example embodiments, the fragments have an average or median length from about 10 to about 10,000 nucleotides. In some embodiments, the fragments have an average or median length from about 50 to about 2,000 nucleotides. In certain example embodiments, the fragments have an average or median length of about, less than about, more than about, or between about 100-2500, 200-1000, 10-800, 10-500, 50-500, 50-250, or 50- 150 nucleotides. In certain example embodiments, the fragments have an average or median length of about, less than about, or more than about 200, 300, 500, 600, 800, 1000, 1500 or more nucleotides. In certain example embodiments, the fragmentation is accomplished mechanically comprising subjecting sample polynucleotides to acoustic sonication. [0075] In certain example embodiments, the fragmentation comprises treating the sample polynucleotides with one or more enzymes under conditions suitable for the one or more enzymes to generate double-stranded nucleic acid breaks. Examples of enzymes useful in the generation of polynucleotide fragments include sequence specific and non-sequence specific nucleases. Non-limiting examples of nucleases include DNase I, Fragmentase, restriction endonucleases, variants thereof, and combinations thereof. For example, digestion with DNase I can induce random double- stranded breaks in DNA in the absence of Mg++ and in the presence of Mn++. In some embodiments, fragmentation comprises treating the sample polynucleotides with one or more restriction endonucleases. Fragmentation can produce fragments having 5' overhangs, 3' overhangs, blunt ends, or a combination thereof. In some embodiments, such as when fragmentation comprises the use of one or more restriction endonucleases, cleavage of sample polynucleotides leaves overhangs having a predictable sequence. In some embodiments, the method includes the step of size selecting the fragments via standard methods such as column purification or isolation from an agarose gel. In some embodiments, the method comprises determining the average and/or median fragment length after fragmentation. In some embodiments, samples having an average and/or median fragment length above a desired threshold are again subjected to fragmentation. In certain example embodiments, samples having an average and/or median fragment length below a desired threshold are discarded.
[0076] In certain example embodiments, the polynucleotide fragments can be modified as described in U.S. Pat. Pub. 2014/0162278, titled "Methods and compositions for enrichment of target polynucleotides," which is hereby expressly incorporated herein by reference in its entirety. For example, the isolated polynucleotide fragments may be modified to include one or more adapter oligonucleotides, the adapter oligonucleotides including one or more of a variety of different sequence elements that can be joined to the polynucleotide fragments (see U.S. Pat. Pub. 2014/0162278).
[0077] In certain example embodiments, the adapter oligonucleotides joined to fragmented polynucleotides from one sample include one or more sequences common to all adapter oligonucleotides and a "barcode" sequence that is unique to the adapters joined to polynucleotides of that particular sample (see U.S. Pat. Pub. 2014/0162278). The barcode sequence, for example, can be used to distinguish polynucleotides originating from one sample or adapter joining reaction from polynucleotides originating from another sample or another adapter joining reaction. As such, in certain example embodiments, the fragmented polynucleotide sequences of a given sample are modified to include a unique nucleic acid sequence (i.e., the barcode) so that the fragments including the modification can later be traced back to the biological sample from which the polynucleotides originated (see U.S. Pat. Pub. 2014/0162278).
[0078] In certain example embodiments, the adapted polynucleotide fragments are subjected to an amplification reaction that amplifies the fragmented polynucleotides. The amplification relies on, for example, primers that include a barcode associated with the sample (see U.S. Pat. Pub. 2014/0162278). Hence, in certain example embodiments, the amplified product includes the barcode sequence unique to the sample, such that the sample can be subsequently identified. The amplification primers may be of any suitable length, such as about, less than about, or more than about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, or more nucleotides, any portion or all of which may be complementary to the corresponding target sequence to which the primer hybridizes (e.g. about, less than about, or more than about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or more nucleotides) (see U.S. Pat. Pub. 2014/0162278).
[0079] In certain example embodiments, following the fragmentation of nucleic acids into polynucleotide fragments, and/or modifications of the polynucleotide fragments as described herein, the amplified polynucleotide fragments are mixed with a set of capture probes for sequencing. As described herein, the capture probes include sequences that are hybridizable to target polynucleotides within the fragmented polynucleotide fragments. Additionally, the capture probes can include any other sequences that are needed for the sequencing reaction. For example, when the sequencing is via the lllumina™ sequencing system, the capture probes can include all of the sequence elements and features needed for the lllumina™ sequencing system.
[0080] In certain example embodiments, the capture probes are homologous to each other, and hence are hybridizable to the same sequence. The homologous capture probes thus, in certain example embodiments, target the same target polynucleotide fragments within a population of fragmented polynucleotides. In certain example embodiments, the capture probes include subsets of probes that are homologues to each other but that are different from other subsets of capture probes within the capture probe mixture. Hence, in certain example embodiments, one subset of capture probes targets a first group of the same polynucleotide fragments whereas the other subset of capture probes targets a different, second group of the sample polynucleotide fragments. By using a variety of capture probe subsets, the collective set of capture probes can be used to target a variety of different target polynucleotide fragments.
[0081 ] The probes can be any length suitable for capturing a target polynucleotide fragment in a sequencing reaction such as a direct targeting sequencing reaction. In certain example embodiments, the probes are about 85, 86, 87, 88, 89, 90, 91 , 92, 93, 94, 95, 96, 97, 98, 99, 100, 101 , 102, 103, 104, 105, 106, 107, 108, 109, 1 10, 1 1 1 , 1 12, 1 13, 1 14, or 1 15 base pairs in length. In certain example embodiments, the probe length is close to 101 base pairs, such as 99, 100, 101 , 102, 103 base pairs in length. In certain example embodiments, the average length of the capture probes in the set of capture proves is about 100-102 base pairs. Further, the region of a capture probe that is hybridizable to a given polynucleotide fragment can be any size that results in capture of the polynucleotide fragment. In certain example embodiments, the region of the capture probe that is hybridizable to a given polynucleotide fragment is about 40 base pairs in length, such as about 35, 36, 37. 38, 39, 40, 41 , 42, 43, 44, or 45 base pairs in length. In certain example embodiments, the average length of the region of the capture probe that is hybridizable to a given polynucleotide fragment is about 39 base pairs.
[0082] A variety of methods can be used to hybridize a set of capture probes to the polynucleotide fragments. For example, the concentration of polynucleotide fragments in the sample can be determined, and the amount of probes incubated with the polynucleotide fragments can be based on the determined concentration of the polynucleotide fragments. For example, the capture probes can be added in a molar excess of the polynucleotide fragments so as to saturate the polynucleotide fragments. The capture probes can then be incubated with the polynucleotide fragments for a suitable amount of time and at a suitable temperature so that the capture probes hybridize to their target polynucleotide fragments. [0083] In certain example embodiments, the incubation is at ambient room temperature, such as about 20°C, 21 °C, 22°C, 23°C, 24°C, or 25°C. In other example embodiments, the reaction solution can be warmed above room temperature, such as to 26°C, 27°C, 28°C, 29°C, 30°C, 31 °C, 32°C, 33°C, 34°C, 35°C, 36°C, 37°C, 38°C, 39°C, 40°C, 41 °C, 42°C, 43°C, 44°C, 45°C, 46°C, 47°C, 48°C, 49°C, 50°C, 51 °C, 52°C, 53°C, 54°C, 55°C, 56°C, 57°C, 58°C, 59°C, 60°C, 61 °C, 62°C, 63°C, 64°C, 65°C, 66°C, 67°C, 68°C, 69°C, 70°C, 71 °C, 72°C, 73°C, 74°C, 75°C, 76°C, 77°C, 78°C, 79°C, or 80°C. In certain example embodiments, the incubation is closer to 65°C, such as about 62°C, 63°C, 64°C, 65°C, 66°C, 67°C. The incubation time can also be varied. For example, the incubation time can in minutes, such as about 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20, 25, 30, 45, 50, 55, or 60 min. In other example embodiments, the incubation can be hours, such as about 1 .0, 1 .5, 2.0, 2.5, or 3.0 hours. In certain example embodiments, the incubation is at about 62°C, 63°C, 64°C, 65°C, 66°C, 67°C for about 1 .5-2.5 hours.
[0084] To sequence the captured polynucleotide fragments, sequencing may be performed according to any method of sequencing known in the art, including those described at length in U.S. Pat. Pub. 2014/0162278. One example sequencing system is the lllumina Genome Analyzer System, which is based on technology described in at least WO 98/44151 , hereby incorporated by reference in its entirety. Generally, DNA molecules are bound to a sequencing platform (flow cell) via an anchor probe binding site (otherwise referred to as a flow cell binding site) and amplified in situ, such as on a glass slide. A solid surface on which DNA molecules are amplified typically comprise a plurality of first and second bound oligonucleotides, the first complementary to a sequence near or at one end of a target polynucleotide and the second complementary to a sequence near or at the other end of a target polynucleotide. This arrangement permits bridge amplification, as described in U.S. Pat. Pub. 2014/0162278. The DNA molecules are then annealed to a sequencing primer and sequenced in parallel base-by-base using a reversible terminator approach. Hybridization of a sequencing primer may be preceded by cleavage of one strand of a double-stranded bridge polynucleotide at a cleavage site in one of the bound oligonucleotides anchoring the bridge, thus leaving one single strand not bound to the solid substrate that may be removed by denaturing, and the other strand bound and available for hybridization to a sequencing primer. Typically, the lllumina™ Genome Analyzer System utilizes flow-cells with 8 channels, generating sequencing reads of 18 to 36 bases in length, generating >1 .3 Gbp of high quality data per run (see www.illumina com).
[0085] In certain example embodiments, before subjecting the captured polynucleotide fragments to the sequencing reaction, multiple biological samples are processed as described herein and are combined into a single sample for sequencing. For example, biological samples from two or more subjects are separately contacted with the lysis buffer as described herein to form separate extraction solutions. While keeping the extraction solutions separate, nucleic acids are isolated from the extraction solutions, and thereafter the nucleic acids are fragmented as described herein, modified to include barcodes, and amplified. Capture probes are then added to the separate mixtures of fragmented polynucleotides.
[0086] Following incubation of the capture probes with the separate mixtures of fragmented polynucleotides, the multiple capture probe/fragmented polynucleotide mixtures can be combined in to a single sample. The single sample of mixed biological samples, for example, can then be loaded onto a single flow cell of the lllumina™ sequencing system. Thereafter, the barcode sequences of the polynucleotide fragments can be used to identify the specific biological sample (and hence subject) from which the captured sequence arose. In certain example embodiments, polynucleotide fragments obtained as described herein from different biological samples can be combined into a single sample and then incubated with the capture probes for subsequent sequencing. In such embodiments, the barcode sequences can similarly be used to identify the biological sample (and hence subject) origin of a given sequence. In other example embodiments, a single biological sample may be processed and analyzed on a single flow cell.
Probe Count Determination & Identification of Copy Number Variances
[0087] In certain example embodiments, a probe count is determined for one or more of the capture probe subsets used in the sequencing reaction described herein. As an example, to determine the probe count, each sequencing read associated with the lllumina™ sequencing system can be equated with the binding of a specific one of the capture probe molecules to a single target polynucleotide fragment. Hence, the cumulative number of reads associated with a homologous capture probe subset provides an indication of the overall number of target polynucleotide fragments captured by the capture probe subset. For example, if a given subset of homologous capture probes has 218 reads for a biological sample, then the 218 reads correspond to a raw probe count of 218 for the homologous subset of capture probes. If a different subset of homologous capture probes has 523 reads for the same biological sample, then the 523 reads correspond to a raw probe count of 523 for the homologous subset of capture probes used in the sequencing reaction. For a given biological sample, the probe count thus provides an indication of the number of polynucleotide fragments that a given subset of homologous capture probes interacts with as part of the sequencing reaction for a biological sample. By this, and as described herein, one can estimate the amount of the portion of polynucleotide fragments that a specific probe subset targets.
[0088] While the above examples relate to determining a probe count for a single biological sample, the use of barcodes (see U.S. Pat. Pub. 2014/0162278) allows multiple biological samples to be processed and analyzed on a single flow cell of the sequencing system. Hence, a given subset of homologous capture probes can capture target polynucleotides from a variety of different biological samples, with the sequencing reads being traced back to a specific biological sample (via the barcodes) from which they originated. For example, on a single flow cell, a portion of a subset of homologous may capture 323 polynucleotide fragments from one biological sample while a different portion of the same subset of homologous probes my capture 672 polynucleotide fragments from a different biological sample. Hence, the raw probe count for one biological sample is 323 whereas the probe count for the other biological sample is 672, with the raw probe counts being assigned to the specific biological samples (and hence the subjects from which the samples were obtained) based on the barcode sequences (see U.S. Pat. Pub. 2014/0162278) associated with the polynucleotide fragments.
[0089] Because of the nature of the sample preparation and/or sequencing procedures described herein, the raw probe counts for a given probe can vary across samples in a manner that is unrelated to gene copy number variations among samples. A variety of parameters can affect the samples, for example, including starting nucleic acid concentrations, amplification efficiencies, probe capture efficiencies, and sample purities. With regard to the starting nucleic acid concentration in the samples, the nucleic acid concentration can affect the levels of target polynucleotide fragments in a processed sample, thus affecting the raw probe count (again, in a manner that is unrelated to copy number variations in the sample). For example, the nucleic acid concentration in the original sample from one subject may be higher or lower as compared to the starting nucleic acid concentration in a sample from a different subject. Without wishing to be bound by any particular theory, such sample-to-sample variations contribute to higher or lower amounts of fragmented target polynucleotides available to bind to the capture probes among different samples. The varying amounts of target polynucleotide fragments then results in variations among raw probe counts for a given capture probe - variations that are independent of gene copy numbers in the samples.
[0090] As a result of the variation that can arise for raw probe counts among different samples, in certain example embodiments the raw probe counts determined from a sequencing run are normalized for a given sample and across different samples of the sequencing run. The normalization, for example, can account for the sample- to-sample variation in the input nucleic acid concentration of the starting sample, as well as for other parameters that affect the raw probe counts as described herein. For example, raw probe counts of a sample can be normalized across all samples for each capture probe, adjusting for probe-to-probe differences in probe binding efficiency (capture efficiency). Further, in certain example embodiments the probe counts can be normalized for (and within) a particular sample. For example, a "normal" value of probe reads can be established for a particular sample and the homologous probe subset across the flow cell for that sample. From this value, the variation of the probe against that sample's average can be used to find significant deviations that would indicate a difference in the initial input concentration of DNA at that particular region (e.g. a copy number variant). As those skilled in the art can appreciate based on this disclosure, different and/or additional methods can be used to normalize the raw probe counts for a given sample and/or across several different samples.
[0091 ] In certain example embodiments, a copy number variant is identified from the probe counts, such as by comparing the normalized probe counts arising from the sequencing reaction described herein. For example, a gene of interest arising from a biological sample will have multiple capture probes that captured target polynucleotide fragments from a biological sample. Collectively, the probe count frequencies provide an average number of probe counts over the region of interest in a given sequencing assay. The overall average of the normalized probe counts will, for example, center on two copies of a gene (one copy per chromosome, i.e., one parental copy and one maternal copy). Variations in the copy number (i.e., copy number variants) can thus be identified by comparing the normalized probe counts.
[0092] In certain example embodiments, a normalized probe count for a capture probe subset for one biological sample can be compared to a normalized probe count for the same capture probe for a different biological sample - the relative difference between the probe counts indicating a decrease or increase in the presence of the gene to which the capture probe is targeted. In certain other example embodiments, the normalized probe counts for different subsets of homologous probes for the same sample can be compared to determine the increase or decrease in the copy numbers of the gene (relative to a two-gene-copy average). In certain example embodiments, combinations of comparing normalized probe counts among the same sample and across different samples can be used to identify a copy number variant.
[0093] In certain example embodiments, a copy number variant caller can be used to identify copy number variants. Such a caller, for example, is described in U.S. Pat. App. No. 62/476,361 , filed March 24, 2017, and titled "COPY NUMBER VARIANT CALLER," the content of which is expressly incorporated herein by reference in its entirety. Briefly, sequencing reads generated for a test sequencing library are mapped to a segment or segments within a region, or regions of interest. The number of sequencing reads mapped at the segment(s) within region(s) of interest can then be determined. A copy number likelihood model can then be determined which is used to set the transition probability of a copy number state given the observed number of mapped sequencing reads. Further, a hidden Markov model is built which includes the hidden layer, the observation layer and transition probabilities. The hidden Markov model is parameterized. In its simplest form, the hidden Markov model includes at least two unknown parameters: the copy number state and the transition probabilities between the copy number state and observed number of sequencing reads, which are determined by the copy number likelihood model. Expectation-Maximization can be used to determine these parameters based on the best fit of the data (that is, parameterize the model) and to determine the most probable copy number. In the model, it is desirable to maximize the probability of a copy number state given the observed number of sequencing reads, to determine the most probable copy number of the segment. A most probable copy number state of the segment can then be determined. In certain example embodiments, the process may consider other variables that affect the observation states, such as GC content bias, spuriosity of a capture probe associated with a segment, noisy test sequencing libraries which affect the transition probabilities. The additional variables can be treated as latent and determined by EM given the available data. The transition probabilities are then adjusted to account for these other variables. The EM process can be cumulative (adjusting for all variables at once) or it can adjust for the variables in separate EM iterations before the HMM is solved to determine a most probable copy number state of the segment.
[0094] Regardless of how a copy number variant is determined, contacting the biological sample with the lysis buffer as described herein to form an extraction solution reduces the variability associated with determining a probe count. That is, contacting the biological sample with the lysis buffer reduces the likelihood that a given normalized probe count will (for example) aberrantly deviate from a two-copy average for probe count frequency and/or result in an incorrect raw probe count number. In certain example embodiments, the probe count variability of a biological sample pretreated with the lysis buffer as described herein is reduced by about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% or more as compared to an untreated biological sample. In certain example embodiments, the probe count variability of a biological sample pretreated with the lysis buffer as described herein is reduced by about 100%, 200%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, or 1000% or more as compared to an untreated biological sample.
[0095] With a reduction in probe count variability, identifying copy number variants is much more reliable. As such, identification of various genetic markers in a subject is made easier. And, with increased CNV calling reliability, the overall process of identifying copy number variants is both more efficient and less expensive. For example, when a sequencing run has a high probe count variability, a second run - which may require recollection of a sample and/or extraction of the sample - may be necessary to the identify copy number variants associated with a particular subject. The second run, for example, requires both time investment to complete and additional materials, including an additional flow cell. Hence, the methods and compositions described herein address these and other challenges by greatly reducing probe count variability. Further, by reducing probe count variability, more samples can be run on a single flow cell, thus reducing costs and improving overall output volume and efficiency.
[0096] Without wishing to be bound by any particular theory, it is believed that mixing the biological sample with the detergent-containing lysis buffer before isolating nucleic acids from the biological sample frees the nucleic acids from bound proteins within the sample. For example, in the biological sample, such as in a saliva sample, the nucleic acids are believed to be bound to proteins such as histone proteins. The bound proteins, for example, are believed to interfere with isolation of the nucleic acid and/or preparation of the polynucleotide fragments used in a sequencing reaction as described herein. But by mixing the biological sample with the lysis buffer before isolating nucleic acids from the sample, it is believed the nucleic acids to be isolated can be released from the protein contaminants. Hence, and again without wishing to be bound by any particular theory, it is believed that releasing the nucleic acids from the proteins results in cleaner isolation of the nucleic acids and/or generation of fragmented polynucleotides, thereby improving the probe count determinations as described herein.
[0097] In certain example embodiments, pretreatment of the biological sample with the lysis buffer as described herein can substantially reduce the overall protein concentration in the biological as compared to a biological sample that has not been treated. For example, a biological sample that is pretreated with a lysis buffer that includes Triton X100 and/or SDS as described herein, for example, can reduce the overall protein concentration in a biological sample by about 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, or 85% as compared to an untreated biological sample. In certain example embodiments, using a lysis buffer that includes both SDS and Triton X100, as compared to lysis buffer that includes Triton X100 alone or SDS alone as the detergent component, can further enhance the level of protein concentration reduction in the biological sample. For example, using a lysis buffer that includes both Triton X100 and SDS may reduce overall protein concentration by 5%, 10%, 15%, 20%, or 25% more than the use of Triton X100 as the only detergent or SDS as the only detergent. Overall protein concentration can be determined by a variety of methods know in the art, such as via a Bradford assay, Bio-Rad™ protein assay, the Lowry method, a NanoOrange® Protein Quantitation Kit, and others.
EXAMPLES
[0098] The present invention is described in further detail in the following examples, which are not in any way intended to limit the scope of the invention as claimed. The attached Figures are meant to be considered as integral parts of the specification and description of the invention. All references cited are herein specifically incorporated by reference for all that is described therein. The following examples are offered to illustrate, but not to limit, the claimed invention.
Example 1
Sample Collection
[0099] Saliva samples, for example, were collected with an Oragene™ collection kit in accordance with the manufacturer's guidelines. Briefly, a subject expels saliva in an Oragene™ collection tube to a fill line, the cap of the tube is closed, and the collection tube is inverted to mix the saliva with Oragene™ collection buffer located within the tube. If needed, the collection tube is shipped to a testing facility. Upon receipt, all samples are stored at ambient temperature until the sample is pretreated and extracted, as described herein.
Example 2
Pretreatment of Samples for DNA Isolation
[00100] This example describes the preparation of saliva samples for DNA isolation. More particularly, a batch saliva lysis buffer was prepared by mixing 62.2 g Triton X- 100, 1 .0 L of TE buffer (10.0 mM Tris, 0.1 mM EDTA, pH buffered 8.0), and 2954 ml_ water. Once all of the components are in solution the volume should be approximately 4015 ml_. With the addition of the Triton X-100 to the TE buffer, the final concentrations of each of the components in the saliva lysis buffer were approximately 25.1 mM Triton X-100, 2.49 mM Tris-HCL, and 0.0249 mM EDTA. Thereafter, 50 μΙ of the saliva lysis buffer was pipetted into each well of a 96-well Nunc™ 2.2 ml_ round bottom deep well plate.
[00101 ] To extract DNA from the saliva samples, multiple Oragene™ collection tubes were removed from storage and incubated at 50°C for approximately 2 hours to allow for completion of the lysis of any cells within the Oragene™ collection buffer. Following the incubation, the collection tubes were transferred to an automated DNA extractor. From each collection tube, 600 μΙ of the sample were removed and deposited into each separate well of the 96-well plate (each well of the well plate containing 50 μΙ of the saliva lysis buffer, as described above) to form an extraction solution. The 600 μΙ saliva sample and 50 μΙ saliva lysis buffer were then tip-mixed approximately six times within the well. After the plate was filled with the saliva samples and mixed, the plate was immediately moved to the bulk dispenser to begin the isolation process with via Agencourt Genfind™ v2 DNA isolation system.
Example 3
Isolation of Saliva DNA from Pretreated Saliva Samples
[00102] This example describes the isolation of DNA from the lysed and pretreated saliva sample. Following the pretreatment of the saliva sample (see Example 2), DNA is isolated using a Agencourt Genfind™ v2 DNA isolation system, according to the manufacturer's instructions and as modified described herein.
[00103] More particularly, approximately 185 μΙ magnetic beads (Agencourt Genfind™ v2) were added via bulk dispenser (BioTek™ MultiFlo) to each well of the 96-well Nunc™ 2.2 ml_ round bottom deep well plate containing the lysis solution/saliva sample mixture (i.e., the extraction solution). The plate was then placed on an orbital shaker (Q-instrument™ ELM-3000) and mixed for 10 minutes at 1450 rpm to facilitate binding of the DNA within the pretreated sample to the beads. A magnetic field was then applied across the plate (via an Alpaqua™ EX) for 20 min, so that the magnetic beads and associated DNA are drawn to the bottom of the wells (i.e., a "pulldown" incubation was performed according to the manufacturer's instructions).
[00104] Following the pulldown incubation, the well plate was moved to a deep well plate washer. This plate washer is fitted with a strong magnetic base, which operates to further sequester the magnetic beads and associated DNA from the saliva samples while the supernatant (containing the non-DNA cellular components, e.g., proteins and lipids) was aspirated (BioTek™ ELx405 plate washer) from the wells. Generally, the aspiration leaves approximately 50-80 μΙ of supernatant, with the magnetic beads and associated DNA sequestered at the bottom of the well via the magnetic field. [00105] The samples were then washed twice, following a similar pattern. That is, each well of the well plate was filled with wash regent, mixed on orbital shaker, subjected to magnetic bead pulldown, followed by aspiration of the wash supernatant. For the first wash, 800 μΙ of a high salt solution (Agencourt Genfind™ v2 - Wash 1 ) was added to each well of the well plate. The plate was then mixed on the orbital shaker for 10 minutes at 1550 rpm. After mixing, the plate was subjected to the magnetic field for a 12-minute pulldown incubation. The supernatant was then removed by aspiration, again leaving approximately 50-80 μΙ. For the second wash, 750 μΙ of an ethanol-based wash (Agencourt Genfind™ v2 - Wash 2) was added to each well of the well plate. The plate was the mixed on the orbital shaker for 5 minutes at 1550 rpm. After mixing, the plate was subjected to a magnetic pulldown for 8 minutes, and the supernatant was removed via aspiration (leaving 35-50 μΙ in the well). The second (and final) wash removes as much of the wash and material not bound to the magnetic beads as possible. Following aspiration of the second wash buffer, the plate is incubated at approximately 50°C for about 10 min in order to evaporate off more of the second wash buffer.
[00106] To elute the bound DNA from the magnetic beads, 100 μΙ of undiluted TE buffer (10.0 mM Tris, 0.1 mM EDTA, pH buffered 8.0) (Teknova™ DSB) is added to each well of the well plate. The plate was then moved to a heated orbital shaker (Q- instrumentsT-ELM-3000) where it was agitated at approximately 50°C for 15 minutes at 1200 rpm, to aid in the unbinding of the magnetic beads and the purified DNA. The plate was then subjected to a magnetic field so that the magnetic beads, which no longer bind the DNA, are sequestered at the bottom of the well. The supernatant, which contains the purified DNA, was removed and transferred to separate deep well plate for sequencing.
Example 4
Direct Targeted Sequencing
[00107] This example describes the preparation of the isolated DNA for sequencing and the direct targeted sequencing (DTS) of the isolated DNA (from Example 3).
[00108] To determine the concentration of the extracted DNA (from Example 3), a PicoGreen™ (Life Technologies™) dye based quantitation method was used. The DNA concentration then informs an automated dilution method, in which all samples are individually diluted to 20 ng / μΙ with undiluted TE for assays and storage. Thereafter, 55 μΙ of the normalized DNA was used in our library preparation for Direct Targeted Sequencing (DTS) assays on an lllumina™ HiSeq 2500 (see also U.S. Pat. Pub. 2014/0162278, which is expressly incorporated herein in its entirety).
[00109] Briefly, in the first step of the DTS protocol, the DNA was sonicated to fragment the DNA and the ends are cleaned and prepared for ligation. At this point, adapter oligos are ligated to the ends of the fragmented DNA. These adapters contain molecular barcodes (for sample identification) as well as sequences required for lllumina™ sequencing (see U.S. Pat. Pub. 2014/0162278). The fragmented DNA including the adapter sequences were then non-specifically PCR'ed using the common sequences in the adapters as primer targets.
[001 10] At this point, the amplified samples with the adapters attached were introduced to the flow cell which has already been prepared with genomic-specific probes (i.e., capture probes). The probes contain sequences required for lllumina™ sequencing and a region homologous to the portion of the human genome of interest for sequencing. The conditions were controlled (65°C for 2 hours) to allow for the probes to hybridize to the regions which they share homology (e.g. the regions of interest). The homologous region binds only to the paired sequences of the fragmented and amplified DNA. Any genetic material that is not bound by a capture probe was washed off the flow cell at this point. With regard to the capture probes, the collective set of probes includes many subsets of probes that were, in this example, identical to each other (and hence target the same DNA sequence), the subsets being different from other identical subsets of probes within the collective set of probes (the different subsets targeting different DNA sequences).
[001 1 1 ] Following introduction of genomic-specific probes to the samples, all of the samples to be analyzed on a single flow cell were recombined into a single, mixed pool. This pool was then loaded onto a flow cell prepared with short oligos that will bind to genomic-specific capture probes, the genomic-specific probes being bound to the regions of interest in all the patient samples. In this manner, only the specified regions of the isolated DNA fragments are sequenced while the vast majority of the DNA remains in solution and is washed away. Sequencing was performed via the lllumina™ HiSeq 2500 system. Example 5
Determination of Probe Counts & Copy Number Variants
[001 12] This example describes the process for determining a probe count. By comparing probe counts within and between biological samples, copy number variants can be identified. Copy number variants can be identified, for example, using a copy number variant caller, such as described in U.S. Pat. App. No. 62/476,361 (see above discussion).
[001 13] Briefly, after a sample has been sequenced (see Example 4), a count of each probe used to capture a specific portion of DNA for DTS sequencing was determined and recorded. For the purposes of this analysis each of these sequenced pieces of DNA can be considered a single count, rather than a sequenced read. That is, for each subset of homologous probes within the probes added to the samples, the collective number of DNA fragments that that subset of probes binds is determined as the probe count for that subset of probes.
[001 14] Following determination of the probe counts, the counts for the probes of a given flow-cell were normalized across all the samples on the flow cell. The counts for each specific identical probe subset were added to determine frequency of occurrence of that probe for that patient. A gene of interest will have many hundreds of probes which captured small regions of DNA from the biological sample, and collectively these probe frequency counts show an average number of probe counts over the region of interest for the DTS assays. The overall average of these counts will overwhelmingly center on two copies of a gene (one copy per chromosome). However, local deviations in a region of a sample may exist. If a group of probes that are genomically close show a deviation of >50% from this average, it is believed to be due to a copy number variant (CNV) in that deviant region. A CNV is a mutation in which the person carries more or less than 2 copies of that gene, e.g., both a person with one copy of a gene and a person with three or four copies of a gene have a CNV.
Results & Discussion
[001 15] Figure 1 provides a schematic illustration of probe counts leading to CNV calls after normalization for multiple patient sample runs on a single flow cell. As shown, Patient 1 has an even number of reads for all genes of interest, thus indicating that this patient has 2 copies of all genes sequenced. With regard to Patient 2, this patient has -50% more reads (counts) for gene B, which is due to an extra copy of that gene (e.g. 3-copy copy number variant). By comparison, Patient 3 has about 50% less reads for gene C, which is due to having only a single copy of that gene (e.g. 1 - copy copy number variant).
[001 16] Figure 2 is a graph showing aggregation of multiple probe counts of a single patient blood sample processed as described in Examples 1 -4 (but for blood), including pretreatment of the blood sample with lysis buffer (Example 2). The graph (or "jitter plot") allows visualization of the variability among the multiple probe counts of a flow-cell run. More particularly, every black dot on the graph, separated along the x-axis - the x-axis being further subdivided into genes (delineated by the grey lines) examined in the assay - represents a specific, normalized probe count. The center of the y-axis represents the average probe count of the sample, which corresponds to 2 copies of a gene (i.e., 2.0 on the y-axis). If a black dot (i.e., a normalized probe count) were to move lower on the y-axis, toward the x-axis, for example, this would indicate that this specific probe giving rise to the probe count appeared less frequently in the overall probe counts in relation to the average of this patient. As shown, the use of the lysis buffer in the pretreatment step (Example 2) shows tight banding of the normalized probe counts across the x-axis, indicating a low variability of the probe counts used in generation of the jitter plot.
[001 17] Figure 3 is a graph showing aggregation of multiple probe counts of a single patient saliva sample processed as described in Example 1 and Examples 3-4, i.e., processing of the saliva sample without first pretreating the saliva sample with the lysis buffer of Example 2. As with Figure 2, the average probe count of Figure 3 was algorithmically centered on 2 gene copies (i.e., 2.0 on the y-axis). But as is evident from the wide and sporadic banding shown in Figure 3, the normalized probe counts for untreated saliva samples had very high variance among a single patient's probes. This is especially evident when comparing Figure 2 and Figure 3. Hence, pretreating the saliva sample as described herein in Example 2 substantially reduces probe count variability (compare Figure 2 (blood) and Figure 3 (saliva)).
[001 18] With the large probe count variability that arises without the lysis buffer pretreatment step, the ability to call CNVs is greatly diminished. For example, the CNV calling algorithm examines the probe in context of other local probes, and this analysis is compared to the average. With samples having high-variance, such as those without pretreatment with the lysis buffer (Figure 3), CNV determination analysis is impossible and the sample must be re-extracted or re-collected to correct, thus reducing efficiency of the overall processes and increasing costs. Use of the methods described herein, however, improve the quality of the DNA from a biological sample and greatly reduce the number of saliva samples, for example, with high-variance of probe counts, improving CNV calling and reducing sample failures.
[001 19] Figure 4 is a graph showing identification of a 1 -copy CNV of the BRCA1 gene, as determined by aggregating, normalizing, and then comparing multiple probe counts of a single patient blood sample processed as described in Examples 1 -4 (but for blood). As shown in Figure 4, when the blood sample was pretreated with the lysis buffer as described in Example 2, the low probe count variability resulted in tightly grouped deviation in the probe counts. As such, the BRCA1 gene was easily discerned by visualization (arrow). The 1 -copy CNV was also easily determined by probe count computation.

Claims

We claim:
1 . A method for reducing probe count variability of a biological sample, comprising:
contacting a biological sample with a lysis buffer to form an extraction solution, wherein the lysis buffer comprises a detergent;
isolating polynucleotides from the extraction solution;
contacting fragments of the polynucleotides with a plurality of homologous capture probes, wherein the homologous capture probes bind to at least a portion of the polynucleotide fragments; and,
determining a probe count for the plurality of homologous capture probes, wherein the determined probe count provides an indication of the number of polynucleotide fragments bound to the plurality of homologous capture probes and wherein contacting the biological sample with the lysis buffer reduces variability of the probe count.
2. The method of claim 1 , wherein the biological fluid sample is a saliva sample or a blood sample.
3. The method of claim 1 or 2, wherein the detergent is a non-ionic detergent.
4. The method of claim 2, wherein the non-ionic detergent is Triton X100.
5. The method of claim 3, wherein the Triton X100 of the lysis buffer is at least 5 mM Triton X100.
6. The method of claim 1 or 2, wherein the detergent is an anionic detergent.
7. The method of claim 6, wherein the detergent is sodium dodecyl sulfate
(SDS).
8. The method of claim 7, wherein the SDS of the lysis buffer is at least 1 .0 mM SDS.
9. The method of any of claims 1 -8, wherein the lysis buffer comprises a TRIS buffer.
10. The method of claim 9, wherein the TRIS buffer is at least 2.0 mM TRIS.
1 1 . The method of any of claims 1 -9, where in the lysis buffer further comprises a metal chelator.
12. The method of claim 1 1 , wherein the metal chelator is ethylenediaminetetraacetic acid (EDTA).
13. The method of claim 12, wherein the EDTA of the lysis is at least 0.02 mM EDTA.
14. The method of any of claims 1 -13, wherein the extraction solution formed by contacting the biological fluid sample with the lysis buffer comprises a ratio of about 1 : 12 v/v of lysis buffer to biological fluid sample.
15. The method any of claims 1 -14, further comprising identifying a copy number variant from the determined probe count.
16. A composition for reducing probe count variability of a biological sample, the composition comprising a detergent, a buffer, and a metal chelator.
17. The composition of claim 16, wherein the biological sample is a saliva sample.
18. The composition of claim 16 or 17, wherein the detergent is a non-ionic detergent.
19. The composition of claim 18, wherein the non-ionic detergent is Triton
X100.
20. The composition of claim 19, wherein the Triton X100 of the lysis buffer is about 25 mM Triton X100.
21 . The composition of claim 16 or 17, wherein the detergent is an anionic detergent.
22. The composition of claim 21 , wherein the detergent is sodium dodecyl sulfate (SDS).
23. The composition of claim 22, wherein the SDS of the lysis buffer is at least 1 .0 mM SDS.
24. The composition of any of claims 16-23, wherein the buffer is a TRIS buffer.
25. The composition of claim 24, wherein the TRIS buffer is at least 2.0 mM
TRIS.
26. The composition of any of claims 16-25, wherein the metal chelator is ethylenediaminetetraacetic acid (EDTA).
27. The composition of claim 26, wherein the EDTA of the lysis buffer is at least 0.02 mM EDTA.
PCT/US2018/020553 2017-03-03 2018-03-02 Extraction of nucleic acids for reduced probe count variability WO2018160907A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762466789P 2017-03-03 2017-03-03
US62/466,789 2017-03-03
US201762530779P 2017-07-10 2017-07-10
US62/530,779 2017-07-10

Publications (1)

Publication Number Publication Date
WO2018160907A1 true WO2018160907A1 (en) 2018-09-07

Family

ID=63370260

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/020553 WO2018160907A1 (en) 2017-03-03 2018-03-02 Extraction of nucleic acids for reduced probe count variability

Country Status (1)

Country Link
WO (1) WO2018160907A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110318745A1 (en) * 2009-02-26 2011-12-29 Steen Hauge Matthiesen Compositions and methods for performing hybridizations with separate denaturation of the sample and probe
WO2012166425A2 (en) * 2011-05-27 2012-12-06 President And Fellows Of Harvard College Methods of amplifying whole genome of a single cell
US20130171615A1 (en) * 2009-12-08 2013-07-04 Biocartis Sa Selective lysis of cells
US20140274740A1 (en) * 2013-03-15 2014-09-18 Verinata Health, Inc. Generating cell-free dna libraries directly from blood
US8877436B2 (en) * 2008-10-27 2014-11-04 Qiagen Gaithersburg, Inc. Fast results hybrid capture assay on an automated platform
US20150322524A1 (en) * 2014-05-12 2015-11-12 Good Start Genetics, Inc Methods for detecting aneuploidy
WO2016029020A1 (en) * 2014-08-20 2016-02-25 Abogen, Inc. Devices, solutions and methods for sample collection related applications

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8877436B2 (en) * 2008-10-27 2014-11-04 Qiagen Gaithersburg, Inc. Fast results hybrid capture assay on an automated platform
US20110318745A1 (en) * 2009-02-26 2011-12-29 Steen Hauge Matthiesen Compositions and methods for performing hybridizations with separate denaturation of the sample and probe
US20130171615A1 (en) * 2009-12-08 2013-07-04 Biocartis Sa Selective lysis of cells
WO2012166425A2 (en) * 2011-05-27 2012-12-06 President And Fellows Of Harvard College Methods of amplifying whole genome of a single cell
US20140274740A1 (en) * 2013-03-15 2014-09-18 Verinata Health, Inc. Generating cell-free dna libraries directly from blood
US20150322524A1 (en) * 2014-05-12 2015-11-12 Good Start Genetics, Inc Methods for detecting aneuploidy
WO2016029020A1 (en) * 2014-08-20 2016-02-25 Abogen, Inc. Devices, solutions and methods for sample collection related applications

Similar Documents

Publication Publication Date Title
EP2898090B1 (en) Method and kit for preparing a target rna depleted sample
EP3529374B1 (en) Sequencing and analysis of exosome associated nucleic acids
EP2761001B1 (en) Rapid method for isolating extracellular nucleic acids
EP3837379B1 (en) Method of nucleic acid enrichment using site-specific nucleases followed by capture
AU2016277476B2 (en) Method for isolating extracellular nucleic acids using anion exchange particles
JP2006197941A (en) Composition, method and kit for isolating nucleic acids using surfactant and protease
US20210380966A1 (en) Method for isolating poly(a) nucleic acids
WO2018160907A1 (en) Extraction of nucleic acids for reduced probe count variability
AU5840899A (en) Product and method for separation of a sample containing multiple sources of genetic material using a solid medium
US20220162592A1 (en) Duplex-specific nuclease depletion for purification of nucleic acid samples
WO2018222709A2 (en) Dna stabilization of rna
US20200291465A1 (en) Methods for rna sequencing
US12031125B2 (en) Method for isolating extracellular nucleic acids using anion exchange particles

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18761625

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18761625

Country of ref document: EP

Kind code of ref document: A1