CN111295443A

CN111295443A - Transposase-based genomic analysis

Info

Publication number: CN111295443A
Application number: CN201880071427.7A
Authority: CN
Inventors: R·雷伯弗斯基; J·陈
Original assignee: Bio Rad Laboratories Inc
Current assignee: Bio Rad Laboratories Inc
Priority date: 2017-11-02
Filing date: 2018-11-01
Publication date: 2020-06-16
Anticipated expiration: 2038-11-01
Also published as: CN111295443B; US11834710B2; US20240158847A1; EP3704247A4; US20210164036A1; US20190127792A1; EP3704247B1; EP4180534A1; US10907205B2; EP3704247A1; WO2019089959A1

Abstract

Methods and reagents are provided for barcoding and analyzing DNA samples using a partitioned (e.g., droplet) technique while avoiding amplification in the droplets.

Description

Transposase-based genomic analysis

Cross Reference to Related Applications

This application claims the benefit of U.S. provisional application No. 62/580,946 filed on 11/2/2017, which is incorporated herein by reference in its entirety for all purposes.

Sequence listing

This application contains a sequence listing electronically submitted in ASCII format and is incorporated herein by reference in its entirety. The ASCII copy created in 31/10/2018 was named 094868-.

Background

Preparation of modern sequencing libraries typically involves introducing a highly active variant of Tn5 transposase that mediates fragmentation of double-stranded DNA and ligation of the synthesized oligonucleotides at both ends in a 5 minute reaction (Adey A et al, Genome Biol 11: R119 (2010)). The wild type Tn5 transposon IS a multiplex transposon in which 2 nearly identical insertion sequences (IS50L and IS50R) flank 3 antibiotic resistance genes (Reznikoff WS. Annu Rev Genet 42: 269-286 (2008)). Each IS50 contains 2 inverted 19-bp End Sequences (ES), an Outboard End (OE) and an Inboard End (IE). However, wild-type ES is relatively less active and is replaced in vitro by an hyperactive Mosaic End (ME) sequence. Therefore, a transposase complex with a 19-bp ME is necessary for transposition to occur, provided that the intervening DNA is long enough to bring two of these sequences together to form an active Tn5 transposase homodimer (Reznikoff WS., MolMicrobiol 47: 1199-1206 (2003)). Transposons are very rare events in vivo, and hyperactive mutants have historically been derived from the Tn5 protein by introducing three missense mutations in residues 476 (E54K, M56A, L372P), which are encoded by IS50R (Goryshin IY, Reznikoff WS 1998.J Biol Chem 273: 7367-7374 (1998)). Transposition works by a "cut-and-stick" mechanism, in which Tn5 excises it from the donor DNA and inserts it into the target sequence, creating a 9-bp repeat of the target (Schaller H. Cold Spring Harb Symp Quant Biol 43: 401-408 (1979); ReznikeffWS., Annu Rev Genet 42: 269-286 (2008)). In the current commercial solution (Nextera DNA kit, llminda (Illumina)), free synthetic ME adaptors are end-linked to the 5' -end of the target DNA by transposase.

Summary of The Invention

In some embodiments, methods of barcoding DNA are provided. In some embodiments, the method comprises

Randomly introducing oligonucleotide adaptors to the DNA by contacting the DNA with a transposase loaded with the oligonucleotide adaptors, wherein the oligonucleotide adaptors comprise a 3 ' single stranded portion and a double stranded portion, a first oligonucleotide having a 3 ' end and a 5 ' end and being a strand of the double stranded portion, a second oligonucleotide comprising a complementary strand of the single stranded portion and the double stranded portion, and

wherein the transposase introduces double-strand breaks into the DNA, wherein each double-strand break forms two DNA ends, and the transposase ligates a first oligonucleotide to one strand of each DNA end to form a DNA fragment comprising oligonucleotide adaptors at both ends;

forming a droplet, wherein the droplet comprises DNA fragments and a first oligonucleotide primer having a bead-specific barcode sequence, wherein the first oligonucleotide primer is attached to a bead and comprises a free 3 'end that is complementary to a 3' single-stranded portion of an oligonucleotide adaptor;

hybridizing the 3 'end of the first oligonucleotide primer (optionally released from the bead) to the 3' single stranded portion of the oligonucleotide adaptor;

combining the contents of the droplets to form a reaction mixture;

the reaction mixture is contacted with a ligase to ligate a first oligonucleotide primer to the 5' end of the first oligonucleotide ligated to the DNA terminus, thereby forming a barcoded DNA fragment.

In some embodiments, the method further comprises amplifying the barcoded fragments. In some embodiments, the amplification comprises polymerase chain reaction.

In some embodiments, the method comprises stripping the transposase from the DNA prior to hybridization. In some embodiments, the peeling occurs in a droplet. In some embodiments, the DNA is in the nucleus and the stripping occurs prior to droplet formation.

In some embodiments, the method comprises cleaving the oligonucleotide primer from the bead prior to hybridization.

In some embodiments, the transposase carries two different adaptor oligonucleotides having the same double-stranded portion and different single-stranded portions. In some embodiments, the droplet further comprises a second oligonucleotide primer, wherein the second oligonucleotide primer comprises a 3 'end sequence that is complementary to at least 50% (e.g., at least 60%, 70%, 80%, 90%, or 100%) of one of the single stranded portions, and the first oligonucleotide primer comprises a free 3' end that is complementary to at least 50% (e.g., 60%, 70%, 80%, 90%, or 100%) of a different 3 'single stranded portion, and hybridizing comprises hybridizing the second oligonucleotide primer to the complementary 3' single stranded portion. In some embodiments, one single-stranded portion comprises GACGCTGCCGACGA (A14; SEQ ID NO: 1) and the other single-stranded portion comprises CCGAGCCCACGAGAC (B15; SEQ ID NO: 2).

In some embodiments, the transposase carries two identical adaptor oligonucleotides.

In some embodiments, the first oligonucleotide primer comprises a 5' PCR handle sequence. In some embodiments, the 5' PCR handle sequence of the first oligonucleotide primer comprises AATGATACGGCGACCACCGAGATCTACAC (P5; SEQ ID NO: 3). In some embodiments, the droplet further comprises a second oligonucleotide primer, and wherein the second oligonucleotide primer comprises a 5' PCR handle. In some embodiments, the 5' PCR handle of the second oligonucleotide primer comprises CAAGCAGAAGACGGCATACGAGAT (P7; SEQ ID NO: 4). In some embodiments, the second oligonucleotide primer further comprises an index tag (e.g., a barcode).

In some embodiments, the single stranded portion of the second oligonucleotide comprises:

i. a 3' end sequence that is less than 50% complementary to the first oligonucleotide primer; and

an intermediate sequence that is at least 50% (e.g., at least 60%, 70%, 80%, 90%, or 100%) complementary to the free 3' end of the first oligonucleotide primer.

In some embodiments, the DNA comprises a DNA-binding protein during the introduction. In some embodiments, the method further comprises removing the DNA-bound protein from the DNA after combining. In some embodiments, removing comprises contacting the DNA with a chaotropic agent or a protease. In some embodiments, the method further comprises removing DNA-bound protein from the DNA prior to pooling. In some embodiments, removing comprises contacting the DNA with a chaotropic agent or a protease.

In some embodiments, the formation maintains the contiguity (consistency) of the DNA fragments as compared to the DNA. In some embodiments, the DNA is purified after combining and before contacting.

In some embodiments, the method further comprises mixing the contents of the droplets with a competitive oligonucleotide comprising a single-stranded portion that hybridizes to the 3' end of the unbound copy of the first oligonucleotide primer during pooling, thereby preventing de novo binding of unbound DNA fragments after pooling.

In some embodiments, the method further comprises mixing the contents of the droplets with a competitive oligonucleotide comprising a single-stranded portion that hybridizes to the 3' end of the unbound copy of the oligonucleotide adaptor during pooling, thereby preventing de novo binding of unbound DNA fragments after pooling.

In some embodiments, the competitor oligonucleotide comprises a 3' terminus that is not extendable by a polymerase.

In some embodiments, the polymerase in the contacting is a strand displacing polymerase.

In some embodiments, the polymerase in the contacting has 5 '-3' exonuclease activity.

In some embodiments, the transposase is TN5 transposase.

In some embodiments, the transposase is attached to a bead.

In some embodiments, the method further comprises sequencing the barcoded DNA sequence, wherein sequencing comprises hybridizing sequencing primers and extending them to the barcoded DNA sequence. In some embodiments, the sequencing primer comprises one or more artificial nucleotides that form higher affinity base pairing than in natural nucleotides.

Definition of

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Generally, the nomenclature used herein and the laboratory procedures in cell culture, molecular genetics, organic chemistry, and nucleic acid chemistry and hybridization described below are those well known and commonly employed in the art. Nucleic acid and peptide synthesis was performed using standard techniques. These techniques and procedures are performed according to conventional methods as described in the art and in various general references (see generally, Sambrook et al, MOLECULAR CLONING: A LABORATORY Manual, 2 nd edition (1989) Cold Spring Harbor LABORATORY Press, Cold Spring Harbor, N.Y., incorporated herein by reference), which are incorporated herein in their entirety. The nomenclature used herein and the laboratory procedures in analytical chemistry and organic synthesis described below are those well known and commonly employed in the art.

The term "amplification reaction" refers to various in vitro methods for multiplying copies of a nucleic acid target sequence in a linear or exponential manner. Such methods include, but are not limited to, Polymerase Chain Reaction (PCR); DNA ligase chain reaction (see U.S. Pat. Nos. 4,683,195 and 4,683,202, PCR Protocols: A Guide to Methods and Applications (PCR Protocols: A Guide for Methods and Applications) (eds.: Innis et al, 1990)) (LCR); QBeta RNA replicase-based and RNA transcription-based amplification reactions (e.g., involving T7, T3, or SP 6-directed RNA polymerization), such as Transcription Amplification Systems (TAS), nucleic acid sequence-based amplification (NSABA), and autonomously maintained sequence replication (3 SR); isothermal amplification reactions (e.g., Single Primer Isothermal Amplification (SPIA)); and other methods known to those skilled in the art.

"amplification" refers to the step of subjecting the solution to conditions sufficient to amplify the polynucleotide (if all components of the reaction are intact). Components of the amplification reaction include, for example, primers, polynucleotide templates, polymerases, nucleotides, and the like. The term "amplification" generally refers to "exponential" growth of a target nucleic acid. However, "amplification" as used herein may also refer to a linear increase in the number of selected target sequences of a nucleic acid, as obtained by cycle sequencing or linear amplification. In an exemplary embodiment, amplification refers to PCR amplification using first and second amplification primers.

The term "amplification reaction mixture" refers to an aqueous solution comprising various reagents for amplifying a target nucleic acid. These reagents include enzymes, aqueous buffers, salts, amplification primers, target nucleic acids, and nucleoside triphosphates. The amplification reaction mixture may also contain stabilizers and other additives to optimize efficiency and specificity. Depending on the context, the mixture may also be a complete or incomplete amplification reaction mixture.

"polymerase chain reaction" or "PCR" refers to a method in which a specific segment or subsequence of a target double-stranded DNA is amplified geometrically. PCR is well known to those skilled in the art; see, for example, U.S. Pat. nos. 4,683,195 and 4,683,202; and PCR protocol: guidelines for methods and applications, eds. Innis et al, 1990. Exemplary PCR reaction conditions generally include two or three step cycles. The two-step cycle has a denaturation step followed by a hybridization/extension step. The three step cycle includes a denaturation step followed by a hybridization step followed by a separate extension step.

"primer" refers to a polynucleotide sequence that hybridizes to a sequence on a target nucleic acid and serves as a point of initiation for template-based nucleic acid synthesis (e.g., by primer extension or PCR). Primers can be of various lengths and are typically less than 50 nucleotides in length, for example 12-30 nucleotides in length. The length and sequence of primers for primer extension or PCR can be designed based on principles known to those skilled in the art, see, e.g., Innis et al (supra). The primer may be DNA, RNA or a chimera of a DNA portion and an RNA portion. In some cases, a primer may include one or more modified or non-natural nucleobases. In some cases, the primer is labeled.

The term "adapter" is simply a term that distinguishes different oligonucleotides in a mixture. As used herein, an "adaptor" is used for an oligonucleotide (which is chemically indistinguishable or indistinguishable from other oligonucleotides) that has been loaded onto a transposase or subsequently ligated to a DNA terminus by a transposase following transposase fragmentation of the DNA.

Certain conditions under which a nucleic acid or portion thereof "hybridizes" to another nucleic acid minimize non-specific hybridization at a defined temperature in a physiological buffer (e.g., pH 6-9, 25-150mM hydrochloride). In some cases, a nucleic acid, or portion thereof, hybridizes to a conserved sequence that is common among a set of target nucleic acids. In some cases, a primer or portion thereof can hybridize to a primer binding site if there are at least about 6,8, 10, 12,14, 16, or 18 consecutive complementary nucleotides, including "universal" nucleotides that are complementary to more than one nucleotide partner. Alternatively, a primer or portion thereof can hybridize to a primer binding site if there are less than 1 or 2 complementary mismatches out of at least about 12,14, 16, or 18 consecutive complementary nucleotides. In some embodiments, the defined temperature at which specific hybridization occurs is room temperature. In some embodiments, the defined temperature at which specific hybridization occurs is above room temperature. In some embodiments, the defined temperature at which specific hybridization occurs is at least about 37, 40, 42, 45, 50, 55, 60, 65, 70, 75, or 80 ℃. In some embodiments, the defined temperature at which specific hybridization occurs is 37, 40, 42, 45, 50, 55, 60, 65, 70, 75, or 80 ℃. In order for hybridization to occur, the primer binding site and the hybridized primer portion will be at least substantially complementary. By "substantially complementary" is meant that the primer binding site has a base sequence comprising at least 6,8, 10, 15, or 20 (e.g., 4-30, 6-30, 4-50) contiguous regions of bases that are at least 50%, 60%, 70%, 80%, 90%, or 95% complementary to contiguous regions of bases of equal length present in the primer sequence. By "complementary" is meant that a plurality of contiguous nucleotides of two nucleic acid strands are available for standard Watson-Crick base pairing. For a particular reference sequence, 100% complementary means that each nucleotide in one strand is complementary to a nucleotide on the contiguous sequence in the second strand (standard base pairing).

"template" refers to a polynucleotide sequence comprising a polynucleotide to be amplified flanked by or as a pair of primer hybridization sites. Thus, a "target template" comprises a target polynucleotide sequence adjacent to at least one hybridization site of a primer. In some cases, a "target template" comprises a target polynucleotide sequence flanked by hybridization sites for a "forward" primer and a "reverse" primer.

As used herein, "nucleic acid" refers to DNA, RNA, single-stranded, double-stranded, or more highly aggregated hybridization motifs and any chemical modifications thereof. Modifications include, but are not limited to, those that provide chemical groups that introduce other charges, polarizability, hydrogen bonding, electrostatic interactions, points of attachment and points of action with the nucleic acid ligand base or the nucleic acid ligand as a whole. Such modifications include, but are not limited to, Peptide Nucleic Acids (PNA), phosphodiester group modifications (e.g., phosphorothioate, methylphosphonate), sugar modifications at the 2' -position, pyrimidine modifications at the 5-position, purine modifications at the 8-position, modifications at exocyclic amines, substitution of 4-thiouridine, substitution of 5-bromo or 5-iodo-uracil, backbone modifications, methylation, unusual base pairing combinations such as isobase, isocytidine, and isoguanidine, and the like. The nucleic acid may also comprise non-natural bases, such as nitroindole. Modifications may also include 3 'and 5' modifications, including but not limited to capping with fluorophores (e.g., quantum dots) or other moieties.

"polymerase" refers to an enzyme capable of template-directed polynucleotide (e.g., DNA and/or RNA) synthesis. The term includes both full-length polypeptides and domains with polymerase activity. DNA polymerases are well known to those of skill in the art and include, but are not limited to, DNA polymerases isolated or derived from Pyrococcus furiosus, Thermococcus maritima (Thermococcuslitalalis) and Thermotoga maritima (Thermotoga maritime) or modified versions thereof. Other examples of commercially available polymerases include, but are not limited to: klenow fragment (New England)

Company), Taq DNA polymerase (QIAGEN), 9 ℃ N^TMDNA polymerase (New England)

Company), Deep Vent^TMDNA polymerase (New England)

Company), Manta DNA polymerase (enzymology Co., Ltd.),BstDNA polymerase (New England)

Company) and phi29 DNA polymerase (New England)

Company).

Polymerases include DNA-dependent polymerases and RNA-dependent polymerases, such as reverse transcriptases. At least 5 families of DNA-dependent DNA polymerases are known, although most fall into A, B and family C. Other types of DNA polymerases include phage polymerases. Similarly, RNA polymerases typically include eukaryotic RNA polymerases I, II and III, and bacterial RNA polymerases as well as bacteriophage and viral polymerases. RNA polymerases can be DNA-dependent and RNA-dependent.

The term "partition" or "partitioned" as used herein refers to the division of a sample into multiple portions or multiple "partitions". Partitions are typically in a physical sense, e.g., the sample in one partition does not mix or does not substantially mix with the sample in an adjacent partition. Partitions may be solid or fluid. In some embodiments, the partition is a solid partition, such as a microchannel. In some embodiments, a partition is a fluidic partition, such as a droplet. In some embodiments, the fluid partitions (e.g., droplets) are a mixture of immiscible fluids (e.g., water and oil). In some embodiments, the fluid partitions (e.g., droplets) are aqueous droplets surrounded by an immiscible carrier fluid (e.g., oil).

As used herein, a "barcode" is a short nucleotide sequence (e.g., at least about 4,6, 8, 10, 12, 15, 20, or 50 nucleotides or more in length) that identifies the molecule to which it is coupled. For example, barcodes can be used to identify molecules in a partition. Such a partition-specific barcode should be unique to that partition relative to the barcodes of other partitions. For example, partitions containing target RNA from a single cell can be subjected to reverse transcription conditions, using primers comprising barcode sequences of different partition specificities in each partition, thereby incorporating copies of the unique "cell barcode" into the reverse transcribed nucleic acids of each partition. Thus, the nucleic acid from each cell can be distinguished from the nucleic acids of other cells by a unique "cell barcode". In other examples, partitions containing CPT-DNA can be subjected to PCR conditions using primers comprising different partition-specific barcode sequences in each partition, thereby incorporating copies of the unique CPT-DNA barcode into the PCR amplicons of each partition. The substrate may be cellular RNA, cellular DNA and/or long continuous DNA molecules. In some cases, the substrate barcode is provided by a "particle barcode" (also referred to as a "bead-specific barcode") present on an oligonucleotide coupled to a particle, wherein the particle barcode is common to (e.g., identical or substantially identical between) all or substantially all of the oligonucleotides coupled to the particle. Thus, the substrate and particle barcodes may be present in the partition, attached to the particle, or bound to cellular nucleic acid, in multiple copies of the same barcode sequence. Substrates or particle barcodes having the same sequence can be identified as originating from the same substrate (e.g., long DNA molecules that have been cleaved but remain adjacent), cell, partition, or particle.

In other cases, the barcode specifically identifies the molecule to which it is coupled. For example, reverse transcription is performed by using primers that each contain a unique "molecular barcode". Also in other embodiments, primers comprising a "partition-specific barcode" unique to each partition, and a "molecular barcode" unique to each molecule may be utilized. After barcoding, the partitions can be merged and optionally augmented, while the virtual partitions are maintained. Thus, for example, the presence or absence of target nucleic acids (e.g., reverse transcribed nucleic acids) comprising each barcode can be calculated (e.g., by sequencing) without maintaining physical partitioning.

The length of the barcode sequence determines how many unique samples can be distinguished. For example, a1 nucleotide barcode may partition no more than 4 samples or molecules; 4 nucleotide barcodes can be paired with no more than 4⁴(i.e., 256) samples were partitioned; a 6 nucleotide barcode can partition no more than 4096 different samples; while 8 nucleotide barcodes can index no more than 65,536 different samples. In addition, barcodes can be attached, for example, by ligation or in a transposase reaction.

Barcodes are often synthesized and/or aggregated (e.g., amplified) using inherently imprecise processes. Thus, barcodes intended to be uniform (e.g., cell, substrate, particle, or partition-specific barcodes common to all barcoded nucleic acids of a single partition, cell, or bead) may comprise different N-1 deletions or other mutations relative to the template barcode sequence. Thus, barcodes that are referred to as "identical" or "substantially identical" copies refer to different barcodes that contain different N-1 deletions or other mutations relative to the template barcode sequence due to, for example, one or more of synthesis, polymerization, or purification errors. Furthermore, during synthesis using, for example, resolution and pooling methods and/or equivalent mixtures of nucleotide precursor molecules, random coupling of barcode nucleotides may result in a low probability event where the barcodes are not absolutely unique (e.g., different from other barcodes of a population, or different from barcodes of different partitions, cells, or beads). However, such slight deviations from the theoretically ideal barcode do not interfere with the high throughput sequencing assay methods, compositions, and kits described herein. Thus, as used herein, the term "unique" encompasses a variety of unintended N-1 deletions and mutations that deviate from the ideal barcode sequence in context of particle, substrate, cell, partition specificity, or molecular barcodes. In some cases, problems due to imprecise nature of barcode synthesis, aggregation, and/or amplification are overcome by oversampling (e.g., at least about 2,5, 10, or more times the number of possible barcode sequences) the possible barcode sequences compared to the number of barcode sequences to be distinguished. For example, 10,000 cells can be analyzed with a cell barcode having 9 barcode nucleotides (representing 262,144 possible barcode sequences). The use of barcode technology is well known in the art, see, e.g., Katsuyuki Shiroguchi et al Proc Natl Acad Sci U S a., 2012, 1/24/109 (4): 1347-52 and Smith, AM et al, Nucleic Acids Research Can 11, (2010). Other methods and compositions using barcode technology include those described in U.S. 2016/0060621.

"transposase" or "tagase" (used synonymously herein) refers to an enzyme that is capable of forming a functional complex with a transposon end containing composition and catalyzes the insertion or transfer of the transposon end containing composition into double stranded target DNA that is incubated with the composition in an in vitro transposition reaction. Exemplary transposases include, but are not limited to, modified TN5 transposases that are overactive as compared to wild-type TN5, e.g., may have one or more mutations selected from E54K, M56A, or L372P or as described in the background section.

"merging the contents of the droplets" refers to any manner of forming a continuous mixture of the contents of the plurality of droplets. For example, when droplets are present in the emulsion, breaking the emulsion (thereby mixing the contents of the droplets in the emulsion) is achieved by adding reagents or by applying physical forces. For example, a surfactant (e.g., perfluorooctanol) may be added and/or heat. The force selection includes gravity and/or centrifugation.

Drawings

FIG. 1 depicts exemplary oligonucleotides for use in the methods described herein. The Tn5 transposase adaptor contained a 3 'overhang and all strands of the adaptor were 5' phosphorylated. For gel bead primers, the bead code may vary and may comprise a combination of fixed and variable sequences. The gel bead oligonucleotide competitor is not extendable from the 3' end. This can be achieved by using an inverted dT as shown or by using any known non-extendible base. Fig. 1 discloses in order of appearance SEQ ID NOs: 5-6, 5,7, 3 and 8-10.

FIGS. 2a-b depict 3' overhang transposase adaptor hybridization with extensive extension ligation. The two types of Tn5 transposase complexes used in this embodiment are shown at the top of the figure. The first step involves reacting the Tn5 complex with DNA that contains no nucleosomes or no proteins that prevent Tn5 binding. The second step involves the tagging of DNA, including fragmentation of the double-stranded substrate, followed by ligation of one of the two mosaic end strands to the target DNA. The Tn5 bound DNA was then encapsulated in a droplet and optionally peeled off Tn 5. Oligonucleotide beads that have been encapsulated with transposed DNA in a previous step, for example, upon droplet formation, their oligonucleotides are released from the beads using reagents introduced into the droplets. The 3' end of the bead oligonucleotide is hybridized with the transposed DNA and the reverse PCR primer. Demulsifying and optionally adding competitive non-extendible oligonucleotides to the solution to prevent head-on binding of unbound oligonucleotides to unoccupied Tn5 adaptors. DNA is purified from cellular material, including nucleosomes, using guanidine thiocyanate and/or other protein denaturants. The 3 ' end of the DNA is then extended and the strands are ligated at nicks (nicks) where the 5 ' and 3 ' sequences are juxtaposed to each other.

FIG. 3 depicts 3' overhang transposase adaptor hybridization in which an extension reaction using a plurality of strand displacing polymerases is performed. The same process as shown in FIG. 2 occurs except that a strand displacement polymerase is used to generate complementary ends of DNA in large quantities.

FIG. 4 depicts 3 ' overhang transposase adaptor hybridization, and a mass extension reaction is performed using only a polymerase with 5 ' to 3 ' exonuclease activity. The same process as shown in FIG. 2 occurs except that a DNA polymerase having 5 'to 3' exonuclease activity is used to generate complementary ends of DNA in large quantities.

FIG. 5 depicts exemplary oligonucleotides for use in the methods described herein. The Tn5 adaptor contained A3' overhang that contained the entire P7 sequence, Tn5 a14 and a Mosaic End (ME) sequence. All strands of the adapter are 5' phosphorylated. For gel bead primers, the bead code may vary and may comprise a combination of fixed and variable sequences. The gel bead oligonucleotide competitor is not extendable from the 3' end. This can be achieved by using an inverted dT as shown or by using a non-extendible base. Examples of P7 and P5 grafting (grafting) sequences and LNA sequencing primers are shown. Fig. 5 discloses SEQ ID NOs: 5. 11,3, 8, 12, 4 and 13.

FIGS. 6a-c depict exemplary methods described herein, wherein the adaptor oligonucleotide includes a single stranded portion having a 3 'portion and a 5' portion. The 5 ' portion of the single stranded portion hybridizes to the 3 ' end of the first oligonucleotide primer, while the 3 ' portion of the single stranded portion is not complementary to, and therefore does not hybridize to, the first oligonucleotide primer, thereby generating a "Y" type hybridization. Figure 6c depicts a possible sequencing reaction for the resulting DNA fragment using a sequencing primer with one or more Linked Nucleic Acid (LNA) nucleotides.

Detailed Description

Introduction to the word

The methods and reagents described herein provide for the barcoding and analysis of DNA (e.g., purified DNA or nucleosomes) using a compartmentalization (e.g., droplet) technique, while avoiding amplification or other enzymatic reactions (e.g., ligation, DNA extension, exonuclease treatment) in the droplets. For example, amplification can occur in large amounts after the contents of the droplets have merged. The inventors have determined how to obtain the benefits of zoning techniques to allow zone-specific barcoding of DNA while performing extensive amplification (e.g., PCR) and other steps, thereby avoiding performance problems that can arise, for example, when performing PCR on DNA samples fragmented by transposases.

An advantage of avoiding PCR and optionally other enzymatic steps in the droplets is that some reagents can be used in the droplets, which are otherwise avoided due to the sensitivity of the enzymes to them. For example, a chaotropic agent (e.g., guanidine thiocyanate) or a protease may be used in the droplets. This allows for improved reactions and, in some embodiments, may increase sensitivity.

Transposase reaction

The method includes a step of randomly fragmenting DNA by transposase and a step of introducing oligonucleotide adaptors on the ends resulting from the fragmentation. The transposase carries two oligonucleotide adaptors. In some embodiments, the transposase-loaded oligonucleotide adaptor comprises a 3 'single-stranded portion (i.e., a 3' overhang) and a double-stranded portion, the first oligonucleotide having a 3 'end and a 5' end and being one strand of the double-stranded portion and the second oligonucleotide comprising a single-stranded portion and a complementary strand of the double-stranded portion. Exemplary single stranded portions may be, for example, 6-30, 10-20, or 12-18 nucleotides in length. In some embodiments, the transposase is loaded with heteroadapters (heteroadapters), wherein the single stranded portion of one of the oligonucleotide adapters is GACGCTGCCGACGA (A14; SEQ ID NO: 1) and the single stranded portion of one of the oligonucleotide adapters is CCGAGCCCACGAGAC (B15; SEQ ID NO: 2). Exemplary oligonucleotide adaptors are shown in the top row of FIG. 1. In some embodiments, the transposase carries two different adaptor oligonucleotides having the same double stranded portion and different 3' single stranded portions. In these embodiments, the shorter strand (the one forming the double-stranded region) is the same in both adaptors and it is this strand that is transferred to the end of the DNA. See, for example, fig. 1 and 2 a. However, the complementary strand of the transferred strand (the strand having the single-stranded and double-stranded portions) is different from the complementary strand of the second transferred strand. Two alternatives in the first row of fig. 1 are an exemplary set of oligonucleotide adaptors that form a heteroadapter (heteroadapter) to be loaded onto a transposase. In some embodiments, one or both strands of the adaptor oligonucleotide are phosphorylated at the 5' end, thereby allowing for use in later ligation. Thus, contacting a target polynucleotide (e.g., genomic DNA or double-stranded cDNA) with a homoadapter (homoadapter) -loaded transposase covalently attaches the individual transfer strands to the 5' ends of the fragments produced by the transposase. In some embodiments, the homoadapter-loaded transposase is used in a reaction mixture that does not contain a different transposase-loaded (e.g., does not contain a transposase loaded with a different homoadapter and does not contain a heteroadapter-loaded tagase). In this reaction mixture, the transferred chains are identical for each product of the labeling reaction.

Transposases loaded with adapters are further described, for example, in U.S. patent publication nos.: 2010/0120098, respectively; 2012/0301925, respectively; and 2015/0291942 and U.S. patent nos.: 5,965,443, respectively; us patent 6,437,109; 7083980, respectively; 9005935, respectively; and 9,238,671, each of which is incorporated by reference herein in its entirety for all purposes. Oligonucleotides can be loaded onto a transposase, for example, by first mixing the two strands of an oligonucleotide adaptor, thereby making them double-stranded, and then contacting the double-stranded adaptor oligonucleotide with the transposase. See, for example, U.S. patent No. 6,294,385.

In some embodiments, the single stranded portion of the second (adaptor) oligonucleotide comprises a 5 ' nucleotide sequence and a 3 ' nucleotide sequence (note that the 3 ' nucleotide sequence is not at the 3 ' end of the oligonucleotide, but is the 3 ' portion of the single stranded portion). In some embodiments, the 5 ' nucleotide sequence is complementary to the free 3 ' end of the first oligonucleotide primer, and the 3 ' nucleotide sequence of the single-stranded portion of the second (adaptor) oligonucleotide is less than 50% (e.g., less than 40%, 30%, 20%, 10%) complementary to the first oligonucleotide primer. This is shown for example in fig. 6 a-b. For example, in part 4 of FIG. 6b, the 5 ' portion of the single stranded portion of the adapter hybridizes to the 3 ' end of the first oligonucleotide primer, but the 3 ' portion of the single stranded region of the adapter does not hybridize to the first oligonucleotide primer (depicted in FIG. 6b as the portion diagonal to the "bead barcode"). This configuration improves the library conversion for sequencing, thus allowing more fragments to be available for sequencing.

In some embodiments, the transposase is attached to a bead. The linkage may be covalent or non-covalent (e.g., via biotin-streptavidin or other linkage). In these embodiments, the beads linked to transposase are different from the beads linked to the first oligonucleotide primer described below. The beads to which the transposase is attached may be magnetic or non-magnetic. Exemplary beads for this purpose include, but are not limited to, Dynabeads^TMM-280 streptavidin (ThermoFisher).

The transposase fragmented double-stranded DNA can be from any source as desired. For example, any genomic DNA may be used in these methods. In some embodiments, the DNA is from a single cell or from a single type of cell of an organism. In some embodiments, the genomic DNA is from a eukaryote, e.g., from a mammal, e.g., a human. In some embodiments, the DNA is from a plant or fungus. In some embodiments, the starting DNA is purified as desired and used directly in the process. Alternatively, the DNA may be processed to generate DNA of a desired average size, for example, using size selective columns or gel purification.

The biological sample may be obtained from any organism, such as an animal, a plant, a fungus, a pathogen (e.g., a bacterium or virus), or any other organism. In some embodiments, the biological sample is from an animal, such as a mammal (e.g., a human or non-human primate, cow, horse, pig, sheep, cat, dog, mouse, or rat), a bird (e.g., a chicken), or a fish. The biological sample can be any tissue or body fluid obtained from an organism, for example blood, blood components or blood products (e.g., serum, plasma, platelets, red blood cells, etc.), sputum or saliva, tissue (e.g., kidney, lung, liver, heart, brain, neural tissue, thyroid, eye, skeletal muscle, cartilage or bone tissue); cultured cells, such as primary cultures, explants, and transformed cells, stem cells, stool, urine, and the like. In some embodiments, the sample is a sample comprising cells. In some embodiments, the sample is a single cell sample. In some embodiments, DNA from cells (e.g., including cancer cells in some aspects) can be instilled into the blood as cell-free DNA. Thus, in some embodiments, the sample is DNA in such a cell-free (e.g., including, but not limited to, nucleosomes from cell-free DNA) sample.

In some embodiments, the transposase is applied to DNA with chromatin (e.g., nucleosome-forming histones and/or contains other DNA cofactors that form chromatin). In these embodiments, the transposase will not be able to access all DNA equally due to the presence of nucleosomes. These methods are sometimes referred to as "ATAC-seq" (see, e.g., U.S. patent publication No. 20160060691; Buenrostro et al, (2015) Curr Protoc Mol biol.109: 21.29.1-21.29.9) and can be used, for example, to determine chromatin changes under different conditions.

In other embodiments, the DNA is substantially free of protein. For example, DNA samples have been extracted with phenol to remove DNA binding proteins.

In some embodiments, the DNA is contained within its native cell. For example, native cells may be fixed and permeabilized so that transposases can enter the nucleus and cleave DNA as allowed by chromatin structure. This may be considered a measure of the transposase accessibility of chromatin. Thus, in some embodiments, the DNA is in a chromatin form. In some embodiments, the DNA is a tagged polynucleotide (e.g., DNA) sequence that retains contiguity. In contiguity preserving transposition or tagging, transposases (e.g., Tn5 transposase) are used to modify DNA with adaptor sequences while maintaining the contiguous relationship of DNA segments. Conditions for preparing tagged polynucleotide sequences with retained contiguity are known in the art. See, e.g., Amini et al, Nature Genetics, 2014, 46: 1343-1349; WO 2016/061517; and U.S. provisional patent application No. 62/436,288; each incorporated herein by reference.

Once the DNA sample has been treated with transposase, the DNA can be formed in a plurality of separate partitions (e.g., droplets). Any type of partitioning may be used in the methods described herein. While the method has been illustrated using droplets, it should be understood that other types of partitions may be used.

Methods and compositions for performing compartmentalization are described, for example, in published patent applications WO 2010/036,352, US 2010/0173,394, US 2011/0092,373 and US 2011/0092,376, the entire contents of which are incorporated herein by reference. The plurality of partitions may be a plurality of emulsion droplets, or a plurality of microwells, or the like.

In some embodiments, one or more reagents are added during droplet formation, or one or more reagents are added to the droplet after droplet formation. Methods and compositions for delivering reagents to one or more partitions include microfluidic methods known in the art; droplets or microcapsules are combined, coalesced, fused, broken or degraded (e.g., as described in U.S.2015/0027,892; US 2014/0227,684; WO 2012/149,042; and WO 2014/028,537); droplet injection methods (e.g., as described in WO2010/151,776); and combinations thereof.

Partitions may be picopores, nanopores, or microwells, as described herein. The partitions may be picometers, nano-or micro-reaction chambers, such as picometers, nano-or micro-capsules. The partitions may be picometer, nano or micro channels. The partitions may be droplets, such as emulsion droplets.

In some embodiments, the partitions are droplets. In some embodiments, the droplets comprise an emulsion composition, i.e., a mixture of immiscible fluids (e.g., water and oil). In some embodiments, the droplets are aqueous droplets, which are surrounded by an immiscible carrier fluid (e.g., oil). In some embodiments, the droplets are oily droplets surrounded by an immiscible carrier fluid (e.g., an aqueous solution). In some embodiments, the droplets described herein are relatively stable and have minimal coalescence between two or more droplets. In some embodiments, less than 0.0001%, 0.0005%, 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% of the droplets generated from the sample coalesce with other droplets. These emulsions may also have limited flocculation, a process in which the dispersed phase is produced as a suspension in flakes. In some cases, this stability or minimal coalescence may be maintained for up to 4,6, 8, 10, 12, 24, or 48 hours or more (e.g., at room temperature, or at about 0, 2, 4,6, 8, 10, or 12 ℃). In some embodiments, the oil phase is flowed through the aqueous phase or reagents, thereby forming droplets.

The oil phase may comprise a fluorinated base oil, which may be further stabilized by use in combination with a fluorinated surfactant, such as a perfluoropolyether. In some embodiments, the base oil comprises one or more of: HFE 7500, FC-40, FC-43, FC-70, or other common fluorinated oils. In some embodiments, the oil phase comprises an anionic fluorosurfactant. In some embodiments, the anionic fluorosurfactant is Ammonium Krytox (Krytox-AS), Ammonium salt of Krytox FSH, or morpholino derivative of Krytox FSH. The concentration of Krytox-AS may be about 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 2.0%, 3.0%, or 4.0% (w/w). In some embodiments, the concentration of Krytox-AS is about 1.8%. In some embodiments, the concentration of Krytox-AS is about 1.62%. The concentration of the morpholino derivative of krytox fsh may be about 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 2.0%, 3.0%, or 4.0% (w/w). In some embodiments, the concentration of the morpholino derivative of Krytox FSH is about 1.8%. In some embodiments, the concentration of the morpholino derivative of Krytox FSH is about 1.62%.

In some embodiments, the oil phase further comprises an additive for adjusting properties of the oil (such as vapor pressure, viscosity, or surface tension). Non-limiting examples include perfluorooctanol and 1H, 2H-perfluorodecanol. In some embodiments, 1H, 2H-perfluorodecanol is added to a concentration of about 0.05%, 0.06%, 0.07%, 0.08%, 0.09%, 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.25%, 1.50%, 1.75%, 2.0%, 2.25%, 2.5%, 2.75%, or 3.0% (w/w). In some embodiments, 1H, 2H-perfluorodecanol is added to a concentration of about 0.18% (w/w).

In some embodiments, the emulsion is formulated to produce highly monodisperse droplets having a liquid-like interfacial film, which can be converted by heating into microcapsules having a solid-like interfacial film; such microcapsules may act as bioreactors to retain their contents by incubation for a period of time. The conversion into microcapsules may take place upon heating. For example, such conversion can occur at a temperature greater than about 40 °,50 °, 60 °, 70 °,80 °, 90 °, or 95 ℃. A fluid or mineral oil blanket may be used to prevent evaporation during the heating process. Excess continuous phase oil may be removed prior to heating or left in place. These microcapsules are resistant to coalescence and/or flocculation under a wide range of thermal and mechanical treatments.

After converting the droplets into microcapsules, the microcapsules can be stored at about-70 ℃, -20 ℃,0 ℃,3 ℃,4 ℃,5 ℃,6 ℃,7 ℃,8 ℃,9 ℃,10 ℃, 15 ℃,20 ℃,25 ℃,30 ℃,35 ℃ or 40 ℃. In some embodiments, these microcapsules can be used to store or transport zoned mixtures. For example, a sample can be collected at one location, partitioned into droplets containing enzymes, buffers, and/or primers or other probes, optionally one or more polymerization reactions can be performed, then the partition can be heated for microencapsulation, and the microcapsules can be stored or transported for further analysis.

In some embodiments, the sample is divided into at least 500 partitions, 1000 partitions, 2000 partitions, 3000 partitions, 4000 partitions, 5000 partitions, 6000 partitions, 7000 partitions, 8000 partitions, 10,000 partitions, 15,000 partitions, 20,000 partitions, 30,000 partitions, 40,000 partitions, 50,000 partitions, 60,000 partitions, 70,000 partitions, 80,000 partitions, 90,000 partitions, 100,000 partitions, 200,000 partitions, 300,000 partitions, 400,000 partitions, 500,000 partitions, 600,000 partitions, 700,000 partitions, 800,000 partitions, 900,000 partitions, 1,000,000 partitions, 2,000,000 partitions, 3,000,000 partitions, 4,000,000 partitions, 5,000 partitions, 30,000 partitions, 20,000 partitions, 100,000 partitions, 200,000 partitions, 300,000 partitions, 400,000 partitions, 500,000 partitions, 600,000 partitions, 700,000 partitions, 800,000 partitions, 900,000 partitions, 1,000,000,000 partitions, 2,000,000 partitions, 3,000,000,000 partitions, 4,000,000,000 partitions, 5,000 partitions, 20,000 partitions, 100,000 partitions, 200,000 partitions, 300,000 partitions, or 10,000 partitions.

In some embodiments, the droplets produced are substantially uniform in shape and/or size. For example, in some embodiments, the droplets are substantially uniform in average diameter. In some embodiments, the droplets produced have an average diameter of about 0.001 microns, about 0.005 microns, about 0.01 microns, about 0.05 microns, about 0.1 microns, about 0.5 microns, about 1 micron, about 5 microns, about 10 microns, about 20 microns, about 30 microns, about 40 microns, about 50 microns, about 60 microns, about 70 microns, about 80 microns, about 90 microns, about 100 microns, about 150 microns, about 200 microns, about 300 microns, about 400 microns, about 500 microns, about 600 microns, about 700 microns, about 800 microns, about 900 microns, or about 1000 microns. In some embodiments, the droplets produced have an average diameter of less than about 1000 microns, less than about 900 microns, less than about 800 microns, less than about 700 microns, less than about 600 microns, less than about 500 microns, less than about 400 microns, less than about 300 microns, less than about 200 microns, less than about 100 microns, less than about 50 microns, or less than about 25 microns. In some embodiments, the droplets generated are non-uniform in shape and/or size.

In some embodiments, the droplets generated are substantially uniform in volume. For example, the standard deviation of the drop volume can be less than about 1 picoliter, 5 picoliters, 10 picoliters, 100 picoliters, 1nL, or less than about 10 nL. In some cases, the standard deviation of the drop volumes may be less than about 10-25% of the average drop volume. In some embodiments, the droplets produced have an average volume of about 0.001nL, about 0.005nL, about 0.01nL, about 0.02nL, about 0.03nL, about 0.04nL, about 0.05nL, about 0.06nL, about 0.07nL, about 0.08nL, about 0.09nL, about 0.1nL, about 0.2nL, about 0.3nL, about 0.4nL, about 0.5nL, about 0.6nL, about 0.7nL, about 0.8nL, about 0.9nL, about 1nL, about 1.5nL, about 2nL, about 2.5nL, about 3nL, about 3.5nL, about 4nL, about 4.5nL, about 5nL, about 5.5nL, about 6nL, about 6.5nL, about 10nL, about 10, about 9nL, about 10nL, about 9nL, about 10nL, about 9nL, about 10, about 9nL, about 10, about 9nL, about 9, about 10, about 9nL, about 9nL, about 9.

In some embodiments, the formation of droplets results in droplets comprising DNA previously treated with a transposase and a first oligonucleotide primer attached to a bead. The term "bead" refers to any solid support that may be present in a partition, for example, a small particle or other solid support. Exemplary beads may include hydrogel beads. In some cases, the hydrogel is in the form of a sol (sol). In some cases, the hydrogel is in the form of a gel (gel). An exemplary hydrogel is an agarose hydrogel. Other hydrogels include, but are not limited to, those described in the following documents: U.S. patent nos. 4,438,258, 6,534,083, 8,008,476, 8,329,763; U.S. patent application nos. 2002/0,009,591, 2013/0,022,569, 2013/0,034,592; and international patent application numbers WO/1997/030092 and WO/2001/049240.

Methods for attaching oligonucleotides to beads are described, for example, in WO 2015/200541. In some embodiments, the oligonucleotide configured to link the hydrogel and the barcode is covalently linked to the hydrogel. Many methods are known in the art for covalently linking oligonucleotides to one or more hydrogel matrices. As just one example, aldehyde-derivatized agarose may be covalently linked to the 5' -amine group of a synthetic oligonucleotide.

As described above, a partition will comprise one or several (e.g., 1,2, 3, 4) beads per partition, wherein each bead is ligated to a first oligonucleotide primer having a free 3 ' end the first oligonucleotide primer will have a bead-specific barcode and A3 ' end complementary to an adaptor in some embodiments the barcode will be, for example, 2-10 nucleotides in length, such as 2,3, 4,5, 6,7, 8, 9 or 10 nucleotides in length, i.e., interrupted by other nucleotides in sequence, in some embodiments the 3 ' end will be at least 50% complementary (e.g., at least 60%, 70%, 80%, 90% or 100%) complementary to the entire adaptor sequence (such that they hybridize) if the first oligonucleotide primer is ligated to a third oligonucleotide primer, such as a oligonucleotide primer 38934, or a third oligonucleotide primer, such as a primer is ligated to a third oligonucleotide primer, such as a primer, or a third oligonucleotide primer, such as a primer is ligated to a third oligonucleotide primer, such as a primer, a third oligonucleotide primer, such as a third oligonucleotide primer, a fourth oligonucleotide primer, a third oligonucleotide primer, a fourth oligonucleotide primer.

As described above, in some embodiments, the partition further comprises a second oligonucleotide primer that functions as a reverse primer in conjunction with the first oligonucleotide primer. See, for example, fig. 1 and 2 b. The 3 'end of the second oligonucleotide primer is at least 50% complementary (e.g., at least 60%, 70%, 80%, 90%, or 100%) to the 3' single-stranded portion of the oligonucleotide adaptor ligated to the DNA fragment. See, for example, fig. 2 b. In some embodiments, the 3' end of the second oligonucleotide primer will be complementary to the entire adaptor sequence. In some embodiments, the 3' most 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 of the second oligonucleotide primers are complementary to sequences in the adaptor. In some embodiments, the second oligonucleotide primer comprises a barcode sequence, which, for example, can have the same length as listed above for the barcode of the first oligonucleotide primer. In some embodiments, the barcode comprises an index barcode, such as a sample barcode, e.g., Illumina i7 or i5 sequence.

The amount and type of DNA in the partitions can vary as desired. For example, in some embodiments, all DNA from a particular partition of the sample is from a single cell or nucleus. In some embodiments, the DNA in the droplet is no more than 0.02% of the haploid genome. In some embodiments, the partitions comprise 60 megabases or less of DNA. When each partition contains less DNA, more partitions are required to achieve the same data resolution. In some embodiments, partitions have on average 1kb to 10 megabases of DNA.

In some embodiments, where information about a haploid genome is desired, DNA in the droplets is maintained such that contiguity between fragments produced by the transposase is maintained. This can be achieved, for example, by selecting conditions such that the transposase is not released from the DNA, thereby forming a bridge that links DNA fragments having the same relationship (haplotype) as in the case of genomic DNA. For example, it has been observed that transposases remain bound to DNA until detergents such as SDS are added to the reaction (Amini et al Nature Genetics 46 (12): 1343-1349).

Optionally, in some embodiments, the transposase is stripped from the DNA after droplet formation, i.e., within the partitions/droplets. In embodiments where no enzymatic reaction occurs in the droplets, more harsh reagents may be used in the droplets. This may improve reactions using reagents, and may remove such reagents prior to a number of subsequent enzymatic reactions taking place. For example, the DNA/transposase complex can be combined with an agent that removes the transposase from the DNA. In some embodiments, the agent is a detergent, e.g., an ionic or nonionic detergent. An exemplary detergent is Sodium Dodecyl Sulfate (SDS). In some embodiments, the concentration of 0.1 and 0.2% SDS is sufficient to remove the tagged enzyme, but low enough not to interfere with amplification. In some embodiments, the transposase is digested, for example, by protease digestion (e.g., proteinase K digestion). In some embodiments, the transposase is stripped from the DNA by contact with a chaotropic agent, such as guanidine thiocyanate. In some embodiments (e.g., where droplets are used), the agent is compatible with droplet formation.

In some embodiments, the partitions (e.g., droplets) formed above may further comprise a second oligonucleotide primer. See, for example, fig. 2 b. In some embodiments, the second oligonucleotide primer is not attached to a bead or other solid support. In some embodiments, the second oligonucleotide primer is used as a second portion of a pair of amplification primers with the first oligonucleotide primer. In some embodiments, for example, the second oligonucleotide primer acts as a reverse primer. For example, the second oligonucleotide primer may have a 3 ' end that is complementary to one of the single stranded portions of the oligonucleotide adaptors, wherein the 3 ' end of the first oligonucleotide primer and the 3 ' end of the second oligonucleotide primer are complementary to the single stranded portion of a different oligonucleotide adaptor. In some embodiments, the second oligonucleotide primer will comprise a 5' PCR handle sequence (e.g., a P5 or P7 sequence (optionally the first oligonucleotide primer has the other of the two sequences)). The length of the PCR handle may be, for example, 2-40 nucleotides, such as 10-30 nucleotides.

Thus, in some embodiments, a partition comprises fragmented and transposase treated DNA, and a first oligonucleotide primer and a second oligonucleotide primer that serve as forward and reverse primers for fragments having different adaptors at both ends. See, for example, fig. 2 b. Since the first oligonucleotide will comprise a bead-based (or partitioned) specific barcode, and the different partitions will typically have one or few first oligonucleotide primer beads, each having a unique bead-specific barcode, each partition (e.g., droplet) can then be used to barcode the DNA fragment with a partition (e.g., bead) -specific barcode. In embodiments where contiguity is retained, for example, a haploid genome will therefore comprise the same bead-specific barcode. Thus, in embodiments where the DNA is in chromatin and contained within the nucleus, the ATAC DNA (DNA in chromatin form accessible to the transposase) of the cell will also contain the same bead-specific barcode.

Thus, after forming the partition comprising at least one oligonucleotide primer, and in some embodiments, the first and second oligonucleotide primers, the primers hybridize to the adaptor sequence at the ends of the DNA fragments. Hybridization conditions can be selected as desired to allow specific hybridization of the primer to the adapter.

After hybridization has occurred, but before the enzymatic reaction, the contents of the various (e.g., hundreds, thousands or more) partitions can be combined. Any method of combining droplets may be used. Exemplary methods of merging droplets can be found, for example, in Priest et al, (2006) appl.phys.lett., 89: 134101: 1-134101: 3; ahn et al, (2006) appl.phys.lett., 88: 264105, respectively; fidalgo et al, (2007) Lab Chip, 7 (8): 984-; tan et al, (2004) Lab Chip, 4 (4): 292-298.

In some embodiments, histones, non-DNA nucleosome factors, and/or chromatin can be removed in the droplets. For example, once droplets are formed, the resulting mixture may be contacted with a reagent to remove these substances. In some embodiments, the droplets are formed by combining two aqueous streams (e.g., with immiscible liquids), wherein a reagent derived from one of the aqueous streams used to generate the droplets is combined with the other stream containing the DNA substrate. Reagents for removing these substances from DNA may include, for example, protease digestion, such as proteinase K digestion, or contact with chaotropic agents, such as guanidine thiocyanate. This may help to maximize the number of binding sites released from the bead with the first oligonucleotide primer.

Optionally, once the droplets are combined, the resulting bulk mixture may be contacted with a reagent to remove the histones. An exemplary reagent is, for example, guanidine thiocyanate.

Optionally, competitive oligonucleotides can be introduced into the mixture to hybridize to unbound copies of the reagents before, during, or after pooling, thereby reducing misassignment of barcodes to multiple droplets. For example, in some embodiments, the competitor oligonucleotide may be introduced at a sufficient concentration such that it, or a single-stranded portion thereof, hybridizes to the 3' end of the unbound copy of the first oligonucleotide primer, thereby preventing binding of unbound DNA fragments de novo after pooling. In some embodiments, the competitor oligonucleotide may be introduced at a sufficient concentration such that it, or a single-stranded portion thereof, hybridizes to the 3' end of the unbound copy of the oligonucleotide adaptor, thereby preventing binding of unbound DNA fragments de novo after pooling. The length of the competitor oligonucleotide may be, for example, at least 10 nucleotides, and in some embodiments, no longer than the primer binding portion of the first oligonucleotide primer. In some embodiments, the competitor oligonucleotide comprises the reverse complement of A14GACGCTGCCGACGA (SEQ ID NO: 1) or B15 CCGAGCCACGAGAC (SEQ ID NO: 2), or different competitor oligonucleotides are used, each having a separate one of these sequences. In some embodiments, the concentration of the competitor oligonucleotide is at least 2-fold higher than the final concentration of the first oligonucleotide primer. In some embodiments, the competitive oligonucleotide concentration is between 200nM and 10. mu.M.

Once the contents of the droplets are combined, the DNA is contacted with one or more enzymes to manipulate the DNA. To examine, in some embodiments, a DNA fragment hybridized to an oligonucleotide primer described herein can be contacted with a ligase, a polymerase, or both, thereby ligating the 3 'end of the first oligonucleotide primer to the 5' end of an oligonucleotide adaptor at the end of the DNA fragment.

In some embodiments, after hybridization, the 3 'end of the first oligonucleotide primer (and the 3' end of the second oligonucleotide primer, if present) can be ligated to an adaptor on the DNA fragment. Optionally, in conjunction with ligation (before, after, or simultaneously), the polymerase can fill in the 5 'overhang ("gap-fill") to form a double-stranded sequence by extending the 3' end. See, for example, fig. 2 b. After the gap is filled, other connections may be made. In some embodiments, the polymerase may have strand displacement activity (see fig. 3) or 5 'to 3' exo activity (see fig. 4).

If desired, the resulting DNA product can be amplified, for example, using forward and reverse primers that hybridize to the PCR handle sequences on the first and second oligonucleotide primers. Any type of amplification may be used, including but not limited to PCR.

In embodiments where the complementary sequences are present at either end of the amplicon, the complementary sequences may form a hairpin when the DNA is rendered single stranded. To avoid formation of hairpins, a sequencing primer comprising one or more artificial nucleotides that form a higher affinity base pairing (e.g., a higher Tm) than the natural nucleotides can be used, thereby facilitating hybridization of the sequencing primer compared to the hairpins. Exemplary artificial nucleotides can include, but are not limited to, Locked Nucleic Acids (LNAs)^TM)。

Any nucleotide sequencing method desired may be used so long as at least some of the DNA segment sequences and barcode sequences can be determined. Methods of high throughput sequencing and genotyping are known in the art. For example, such sequencing techniques include, but are not limited to: pyrosequencing, sequencing by ligation, single-molecule sequencing, Sequencing By Synthesis (SBS), mass synchronous cloning, mass synchronous single-molecule SBS, mass synchronous single-molecule real-time method, mass synchronous single-molecule nanopore technology, and the like. Morozova and Marra provide an overview of some of these technologies, see Genomics, 92: 255(2008), which is hereby incorporated by reference in its entirety.

Exemplary DNA sequencing techniques include fluorescence-based sequencing techniques (see, e.g., Birren et al, genome analysis: Analyzing DNA, Vol.1, Cold spring harbor, N.Y., incorporated herein by reference in its entirety). In some embodiments, automated sequencing techniques are used as are understood in the art. In some embodiments, the present technology provides for the simultaneous sequencing of partitioned amplicons (PCT application No. WO 2006/0841,32, which is herein incorporated by reference in its entirety). In some embodiments, DNA sequencing is achieved by synchronized oligonucleotide extension (see, e.g., U.S. Pat. nos. 5,750,341 and 6,306,597, both of which are incorporated herein by reference in their entirety). Additional examples of sequencing technologies include: church polyclonal technology (Mitra et al, 2003, Analytical Biochemistry 320, 55-65; Shendire et al, 2005 Science 309, 1728-.

Typically, high throughput sequencing has the common feature of large amounts of synchronization, and the goal of high throughput strategies is to make the cost of earlier sequencing methods low (see, e.g., Voelkerding et al, Clinical chem., 55: 641-. Such methods can be broadly divided into two broad categories, normal and non-template amplification. Methods requiring amplification include pyrosequencing (e.g., GS 20 and GS FLX) commercialized by Roche under the 454 technology platform, Solexa platform sold by Illumina, and Supported oligonucleotide Ligation and Detection (SOLID) platform sold by Applied Biosystems. Non-amplification methods, also known as single molecule sequencing, are exemplified by the HeliScope platform sold by helicon BioSciences (Helicos BioSciences), the VisiGen corporation, Oxford Nanopore Technologies (Oxford Nanopore Technologies), the Life Technologies (Life Technologies)/Ion flux (Ion Torrent), and the platform sold by Pacific BioSciences.

Pyrophosphoric acid sequencing (Voelkerding et al, Clinical chem., 55: 641-658, 2009; MacLean et al, Nature Rev. Microbial., 7: 287-296; U.S. Pat. Nos. 6,210,891 and 6,258,568, each of which is incorporated herein by reference in its entirety) in which template DNA is fragmented, end-repaired, adapters ligated, and single template molecules captured with beads carrying oligonucleotides complementary to the adapters for in situ clonal amplification. Beads loaded with single template types were divided into water-in-oil microbubbles and the templates clonally amplified, using a technique known as emulsion PCR. After amplification, the beads are broken and placed in wells of a picoliter plate (picotre plate) which serve as flow chambers in a sequencing reaction. In the presence of a sequencing enzyme and a luminescent reporter, such as luciferase, ordered iterative introduction of each of the four dNTP reagents occurs in the flow chamber. When the appropriate dNTPs are added to the 3' end of the sequencing primer, the ATP generated causes an in-well luminescence pulse, which is recorded with a CCD camera. Can realize a read length of 400 bases or more and can realize 10⁶Sequence reads yielding up to 5 hundred million base pairs (Mb) of sequence.

Sequencing data was generated as shorter reads in the Solexa/Illumina platform (Voelkerding et al, Clinical chem., 55.641-658, 2009; MacLean et al, Nature Rev. Microbiological., 7: 287-296; U.S. Pat. Nos. 6,833,246, 7,115,400 and 6,969,488; each of which is incorporated herein by reference in its entirety). In this method, single-stranded fragmented DNA end repair generates a 5 '-phosphorylated blunt end, followed by Klenow-mediated addition of a single a base to the 3' end of these fragments. Addition of A facilitates the addition of T-overhang adaptor oligonucleotides that will be used to capture template-adaptor molecules on the surface of the flow cell into which the oligonucleotide anchor is inserted. Anchors are used as PCR primers, but due to the length of the template and its proximity to other adjacent anchor oligonucleotides, PCR extension results in molecular "arching over" hybridization of adjacent anchor oligonucleotides to form a bridge structure on the flow cell surface. These DNA loops are denatured and cleaved. The plus strand is then sequenced by a reversible dye terminator. The sequence of the incorporated nucleotide was determined by detecting the fluorescence after incorporation, each fluorophore removed and blocked before the next round of dNTP addition. Sequence reads range in length from 36 nucleotides to over 50 nucleotides, with an overall output of over 10 million nucleotide pairs analyzed per run.

Sequencing of nucleic acid molecules using SOLID technology (Voelkerding et al, Clinical chem., 55: 641-658, 2009; MacLean et al, Nature Rev. Microbial., 7: 287-296; U.S. Pat. Nos. 5,912,148; and 6,130,073; each of which is incorporated herein by reference in its entirety) also includes fragmentation templates, ligation oligonucleotide adaptors, ligation beads, and emulsion PCR clonal amplification. Thereafter, the template-bearing beads are immobilized on a derivatized surface of a glass flow chamber, and primers complementary to the adapter oligonucleotides are annealed. But rather than serving as a 3 'extension, the primer is used to provide a 5' phosphate group for ligation to interrogation probes, which contain two probe-specific bases followed by 6 degenerate bases and one of four fluorescent labels. In the SOLID system, there are 16 possible combinations of two bases 3 'to each probe in the interrogating probe and one of four fluorescent labels at the 5' end. The fluorescent color, and thus each probe identified, corresponds to a specified color-space coding scheme. Multiple rounds (usually 7 rounds) of probe annealing, ligation and fluorescent detection are followed by denaturation, followed by a second round of sequencing with primers staggered by one base relative to the initial primers. In this way, the template sequence can be reconstructed by calculation and the template base interrogates twice, resulting in greater accuracy. Sequence reads are on average 35 nucleotides in length, with an overall output of over 40 hundred million bases per sequencing run.

In certain embodiments, nanopore sequencing is used (see, e.g., Astier et al, J.Am.chem.Soc.2006, 8/2; 128(5)1705-10, incorporated herein by reference). The principle of nanopore sequencing involves a phenomenon that occurs when a nanopore is immersed in a conducting fluid and a voltage (volts) is applied across the nanopore. Under these conditions, it was observed that a weak current passed through the nanopore due to ionic conduction, and the amount of current was extremely sensitive to the size of the nanopore. As each base of the nucleic acid passes through the nanopore, it causes a change in the magnitude of the current through the nanopore, which is different for each of the four bases, thereby allowing the sequence of the DNA molecule to be determined.

In certain embodiments, HeliScope (Voelkerding et al, Clinical chem., 55.641-658, 2009; MacLean et al, NatureRev. Microbinary, 7: 287-296; U.S. Pat. Nos. 7,169,560, 7,282,337, 7,482,120, 7,501,245, 6,818,395, 6,911,345 and 7,501,245, each of which is incorporated herein by reference in its entirety), from HeliScience Corporation (Helicos BioSciences Corporation), is used. The template DNA was fragmented and polyadenylated at the 3' end, and the last adenosine carried a fluorescein label. The denatured polyadenylated template fragment was ligated to a poly (dT) oligonucleotide on the surface of the flow chamber. The initial physical position of the captured template is recorded by the CCD camera and then excised and washed of the label. Sequencing was achieved by addition of polymerase and serial addition of fluorescently labeled dNTP reagents. The incorporation event produces a fluorescent signal corresponding to the dntps, while the CCD camera captures the signal before each round of dNTP addition. Sequence reads are 25-50 nucleotides in length, with an overall output of over 10 million nucleotide pairs per run of analysis.

The ion torrent technique is based on DNA sequencing of the detection of hydrogen ions released by DNA polymerization (see, e.g., Science327 (5970): 1190 (2010); U.S. patent application Nos. 2009/0026082; 2009/0127589; 2010/0301398; 2010/0197507; 2010/0188073 and 2010/0137143; all of which are incorporated herein by reference in their entirety for all purposes). The microwells contain the template DNA strands to be sequenced. And a hypersensitive ISFET ion sensor is arranged below the microporous layer. All layers are contained within a CMOS semiconductor chip similar to that used in the electronics industry. Hydrogen ions are released when dntps are incorporated into the growing complementary strand, triggering the hypersensitive ion sensor. If a homopolymeric repeat series is present in the template series, multiple dNTP molecules will be incorporated in a single cycle. This results in a corresponding amount of hydrogen release, and a proportionately higher electronic signal. This technique differs from other sequencing techniques in that the modified nucleotides and optical elements are not used. The single base accuracy of the ion current sequencer is about 99.6% per 50 base read, yielding about 100Mb per run. The read length is 100 base pairs. The accuracy of the 5-repeat homopolymeric repeat sequence was about 98%. The advantages of ion semiconductor sequencing are fast sequencing speed and low early stage and running cost.

Another exemplary nucleic acid sequencing method that may be suitable for use in the present invention is the sequencing method developed by Strato Genomics and used for Xpandomer molecules. The sequencing method generally includes providing a daughter strand produced by template-directed synthesis. The daughter strand typically comprises a plurality of subunits coupled in a contiguous nucleotide sequence corresponding to all or part of the target nucleic acid, each subunit containing a tether (tether), at least one probe or nucleobase residue, and at least one selectively cleavable bond. The selectively cleavable bond is cleaved to yield an Xpandomer having a length greater than the length of the plurality of subunits of the daughter strand. Xpandomers typically include tether and reporter elements that resolve genetic information in a sequence corresponding to a contiguous nucleotide sequence of all or part of a target nucleic acid. The reporter element of the Xpandomer is then measured. Additional details on Xpandomer-based methods are described in the literature, for example, U.S. patent publication No. 2009/0035777, which is incorporated herein by reference in its entirety.

Other single molecule sequencing methods include real-time sequencing by synthesis using the VisiGen platform (volekerding et al, Clinical chem., 55: 641-58, 2009; U.S. patent No. 7,329,492, U.S. patent application serial nos. 11/671,956 and 11/781,166; each of which is incorporated herein by reference in its entirety), in which an immobilized primed DNA template is chain extended with a fluorescein-modified polymerase and a fluorescein acceptor molecule, resulting in measurable Fluorescence Resonance Energy Transfer (FRET) upon nucleotide addition.

Another real-time single molecule sequencing system developed by Pacific Biosciences (Voelkerding et al, Clinical chem., 55.641-658, 2009; MacLean et al, NatureRev. Microbiol., 7: 287-296; U.S. Pat. Nos. 7,170,050, 7,302,146, 7,313,308, and 7,476,503, each of which is incorporated herein by reference in its entirety), utilizes a 50-100nm diameter containing about 20 zeptoliters (10) with a diameter of about 20 nanometers (10 nm)^-21L) reaction wells of the reaction volume. The sequencing reaction was performed using an immobilized template, modified Φ 29 DNA polymerase and high local concentration fluorescein-labeled dntps. The high local concentration and continuous reaction conditions allow the use of laser excitation, optical waveguides and CCD cameras to capture the incorporation events in real time by fluorescence signal detection.

In certain embodiments, the Single Molecule Real Time (SMRT) DNA sequencing method employs zero-mode waveguiding (ZMW) developed by Pacific Biosciences (Pacific Biosciences) or similar methods. With this technique, DNA sequencing is performed on SMRT chips, each of which contains thousands of zero-order waveguides (ZMWs). ZMWs are pores, a few tenths of a nanometer in diameter, fabricated in 100nm metal films that are placed on a silica substrate. Each ZMW became to provide a detection volume of only 20 zeptoliters (10)^-21L) nanophotonic visualization chambers. With this volume, the activity of a single molecule can be detected in a background of thousands of labeled nucleotides. ZMWs are sequenced synthetically, providing a window for the observation of DNA polymerase. Within each ZMW chamber, a single DNA polymerase molecule is bound to the bottom surface and thereby permanently retained within the detection volume. The phosphate-linked (phosphobound) nucleotides, each labeled with a different color fluorophore, are subsequently introduced into the reaction solution at high concentrations that enhance enzyme speed, accuracy and throughput (processing). Because ZMWs are small in volume, even at these high concentrations, the time taken for the detection volume to be occupied by numerous nucleotides is small. Furthermore, due to the short diffusion distance of the transport nucleotide, the transit to the detection volume is fast, lasting only a few microseconds. The result is a low background.

Methods and systems for such real-time sequencing that can be adapted for use with the methods described herein are described, for example, in U.S. patent nos. 7,405,281, 7,315,019, 7,313,308, 7,302,146, and 7,170,050; U.S. patent publication Nos. 2008/0212960, 2008/0206764, 2008/0199932, 2008/0199874, 2008/0176769, 2008/0176316, 2008/0176241, 2008/0165346, 2008/0160531, 2008/0157005, 2008/0153100, 2008/0153095, 2008/0152281, 2008/0152280, 2008/0145278, 2008/0128627, 2008/0108082, 2008/0095488, 2008/0080059, 2008/0050747, 2008/0032301, 2008/0030628, 2008/0009007, 2007/0238679, 2007/0231804, 2007/0206187, 2007/0196846, 2007/0188750, 2007/0161017, 2007/0141598, 2007/0134128, 2007/0128133, 2007/0077564, 2007/0072196, and 2007/0036511, and Korlach et al (2008) "Selective aluminum passivation" AS for targeting immobilization of single DNA polymerase molecules in zero-order waveguide nanostructures (Selective aluminum catalysis for targeting immobilization of single DNA polymerase in zero-e waveguided amplification of single DNA polymerase in modules) "AS 105(4): 1176-81, which are all herein incorporated by reference in their entirety.

After competitive sequencing, sequences can be sorted by identical barcodes, where sequences with identical barcodes are from the same partition and thus are contiguous. In some embodiments, sequences based on conventional barcode sequence ligation can be determined, and optionally, SNPs for individual fragments of each barcode can be detected. In some embodiments, over accidental fragment co-localization to a single barcode (skewed distribution) can be detected, thereby detecting a re-ordering.

It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, sequence aspect numbers, patents, and patent applications cited herein are incorporated by reference in their entirety for all purposes.

Sequence listing

<110> Bio radiation LABORATORIES, Inc. (BIO-RAD LABORATORIES, INC.)

<120> transposase-based genomic analysis

<130>094868-1103056 (115910PC)

<140>

<141>

<150>62/580,946

<151>2017-11-02

<160>13

<170>PatentIn version 3.5

<210>1

<211>14

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthetic oligonucleotides

<400>1

gacgctgccg acga 14

<210>2

<211>15

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthetic oligonucleotides

<400>2

ccgagcccac gagac 15

<210>3

<211>29

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthesis of primers

<400>3

aatgatacgg cgaccaccga gatctacac 29

<210>4

<211>24

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthesis of primers

<400>4

caagcagaag acggcatacg agat 24

<210>5

<211>19

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthetic oligonucleotides

<400>5

agatgtgtat aagagacag 19

<210>6

<211>33

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthetic oligonucleotides

<400>6

ctgtctctta tacacatctg acgctgccga cga 33

<210>7

<211>34

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthetic oligonucleotides

<400>7

ctgtctctta tacacatctc cgagcccacg agac 34

<210>8

<211>14

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthesis of primers

<400>8

tcgtcggcag cgtc 14

<210>9

<211>15

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthetic oligonucleotides

<400>9

gacgctgccg acgat 15

<210>10

<211>47

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthesis of primers

<220>

<221> modified base

<222>(25)..(32)

<223> a, c, t, g, unknown or otherwise

<400>10

caagcagaag acggcatacg agatnnnnnn nngtctcgtg ggctcgg 47

<210>11

<211>57

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthetic oligonucleotides

<400>11

ctgtctctta tacacatctg acgctgccga cgaatctcgt atgccgtctt ctgcttg 57

<210>12

<211>15

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthetic oligonucleotides

<400>12

gacgctgccg acgat 15

<210>13

<211>33

<212>DNA

<213> Artificial sequence

<220>

<223> Artificial sequence description: synthesis of primers

<400>13

tcgtcggcag cgtcagatgt gtataagaga cag 33

Claims

1. A method of barcoding DNA, the method comprising randomly introducing oligonucleotide adaptors into DNA by contacting the DNA with a transposase carrying the oligonucleotide adaptors,

wherein the oligonucleotide adaptor comprises a 3 ' single stranded portion and a double stranded portion, the first oligonucleotide having a 3 ' end and a 5 ' end and being the strand of the double stranded portion, the second oligonucleotide comprising the complementary strand of the single stranded portion and the double stranded portion, and

combining the contents of the droplets to form a reaction mixture;

2. The method of claim 1, further comprising amplifying the barcoded fragments.

3. The method of claim 2, wherein the amplification comprises polymerase chain reaction.

4. The method of claim 1, comprising stripping the transposase from the DNA prior to hybridization.

5. The method of claim 4, wherein the peeling occurs in a droplet.

6. The method of claim 4, wherein the DNA is in the nucleus and the exfoliation occurs prior to droplet formation.

7. The method of claim 1, comprising cleaving the oligonucleotide primer from the bead prior to hybridizing.

8. The method of claim 1, wherein the transposase carries two different adaptor oligonucleotides having the same double-stranded portion and different single-stranded portions.

9. The method of claim 8, wherein the droplets further comprise a second oligonucleotide primer, wherein the second oligonucleotide primer comprises a 3 'end sequence that is complementary to at least 50% (e.g., at least 60%, 70%, 80%, 90%, or 100%) of one of the single stranded portions, and the first oligonucleotide primer comprises a free 3' end that is complementary to at least 50% (e.g., 60%, 70%, 80%, 90%, or 100%) of a different 3 'single stranded portion, and hybridizing comprises hybridizing the second oligonucleotide primer to the complementary 3' single stranded portion.

10. The method of claim 9, wherein one single-stranded portion comprises GACGCTGCCGACGA (A14; SEQ ID NO: 1) and the other single-stranded portion comprises CCGAGCCCACGAGAC (B15: SEQ ID NO: 2).

11. The method of claim 1, wherein the transposase carries two identical adaptor oligonucleotides.

12. The method of any one of claims 1-11, wherein the first oligonucleotide primer comprises a 5' PCR handle sequence.

13. The method of claim 12, wherein the 5' PCR handle sequence of the first oligonucleotide primer comprises AATGATACGGCGACCACCGAGATCTACAC (P5; SEQ ID NO: 3).

14. The method of claim 12 or 13, wherein the droplet further comprises a second oligonucleotide primer, and wherein the second oligonucleotide primer comprises a 5' PCR handle.

15. The method of claim 14, wherein the 5' PCR handle sequence of the second oligonucleotide primer comprises CAAGCAGAAGACGGCATACGAGAT (P7; SEQ ID NO: 4).

16. The method of claim 15, wherein the second oligonucleotide primer further comprises an index tag.

17. The method of claim 1, wherein the single stranded portion of the second oligonucleotide comprises:

18. The method of any one of claims 1-15, wherein the DNA comprises a DNA-binding protein during introduction.

19. The method of claim 18, further comprising removing DNA-bound protein from the DNA after pooling.

20. The method of claim 19, wherein removing comprises contacting the DNA with a chaotropic agent or a protease.

21. The method of claim 18, further comprising removing DNA-bound protein from the DNA prior to pooling.

22. The method of claim 19, wherein removing comprises contacting the DNA with a chaotropic agent or a protease.

23. The method of claim 1, wherein the forming maintains the contiguity of the DNA fragments compared to the DNA.

24. The method of claim 1, wherein the DNA is purified after combining and prior to contacting.

25. The method of claim 1, further comprising mixing the contents of the droplets with a competitor oligonucleotide comprising a single-stranded portion that hybridizes to the 3' end of the unbound copy of the first oligonucleotide primer during pooling, thereby preventing de novo binding of unbound DNA fragments after pooling.

26. The method of claim 1, further comprising mixing the contents of the droplets with a competitive oligonucleotide comprising a single-stranded portion that hybridizes to the 3' end of the unbound copy of the oligonucleotide adaptor during pooling, thereby preventing de novo binding of unbound DNA fragments after pooling.

27. The method of claim 25 or 26, wherein the competitor oligonucleotide comprises a 3' terminus that is not extendable by a polymerase.

28. The method of claim 1, wherein the polymerase in the contacting is a strand displacing polymerase.

29. The method of claim 1, wherein the polymerase in the contacting has 5 '-3' exonuclease activity.

30. The method of claim 1, wherein the transposase is TN5 transposase.

31. The method of claim 1, wherein the transposase is attached to a bead.

32. The method of any one of claims 1-31, further comprising sequencing the barcoded DNA sequence, wherein the sequencing comprises hybridizing sequencing primers and extending them to the barcoded DNA sequence.

33. The method of claim 32, wherein the sequencing primer comprises one or more artificial nucleotides that form higher affinity base pairing than in natural nucleotides.