CN116829734A - Enrichment of nucleic acid sequences - Google Patents

Enrichment of nucleic acid sequences Download PDF

Info

Publication number
CN116829734A
CN116829734A CN202180059186.6A CN202180059186A CN116829734A CN 116829734 A CN116829734 A CN 116829734A CN 202180059186 A CN202180059186 A CN 202180059186A CN 116829734 A CN116829734 A CN 116829734A
Authority
CN
China
Prior art keywords
nucleic acid
sequence
interest
acid sequence
primer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180059186.6A
Other languages
Chinese (zh)
Inventor
W·J·麦克唐奈
K·法伊弗
R·拉梅纳尼
M·斯图宾顿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10X Genomics Inc
Original Assignee
10X Genomics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10X Genomics Inc filed Critical 10X Genomics Inc
Priority claimed from PCT/US2021/035311 external-priority patent/WO2021247618A1/en
Publication of CN116829734A publication Critical patent/CN116829734A/en
Pending legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided herein are methods for enriching nucleic acid sequences of interest from a library using primers directed to an identification sequence. The nucleic acid sequence of interest can then be cloned and the protein product can be analyzed.

Description

Enrichment of nucleic acid sequences
Background
The sample may be processed for various purposes, such as identifying certain types of moieties within the sample. The sample may be a biological sample. Biological samples may be processed, such as to detect a disease (e.g., cancer) or to identify a particular species. There are various methods of processing the sample, such as Polymerase Chain Reaction (PCR) and sequencing.
Biological samples may be processed within various reaction environments, such as partitions. The partitions may be holes or droplets. The droplets or wells may be used to process the biological sample in a manner that enables the biological sample to be separately partitioned and processed. For example, such droplets may be fluidly isolated from other droplets, enabling accurate control of the respective environments in the droplets.
Various processes, such as chemical or physical processes, may be performed on the biological samples in the partitions. The sample in the partition may be heated or cooled or chemically reacted to produce species that can be processed qualitatively or quantitatively. Specific species (e.g., nucleic acids) produced from biological samples can be processed in selection and enrichment reactions to more efficiently recover nucleic acid sequences of interest.
Disclosure of Invention
Provided herein are methods for enriching a nucleic acid sequence of interest from a plurality of nucleic acid molecules (such as a library of nucleic acid molecules). The methods provided herein can be used, for example, to enrich for a nucleic acid sequence of interest so that the nucleic acid sequence can be further cloned or analyzed. The methods herein may provide low noise, high specificity, or both. In some embodiments, the methods herein can be used for selection of nucleic acid sequences of interest (e.g., candidate antibodies) or antibody discovery applications.
Provided herein are methods for enriching a nucleic acid sequence of interest. In the method, a plurality of nucleic acid molecules having a plurality of identification sequences are provided. The nucleic acid molecules of the plurality of nucleic acid molecules comprise: (i) An identification sequence of the plurality of identification sequences that identifies the nucleic acid molecule and includes a barcode sequence, a Template Switching Oligonucleotide (TSO) sequence, and/or a Unique Molecular Identifier (UMI) sequence; and (ii) a nucleic acid sequence of interest, wherein the nucleic acid sequence of interest comprises a nucleic acid sequence encoding a T Cell Receptor (TCR), a B Cell Receptor (BCR), or a fragment thereof. A first nucleic acid primer complementary to at least a portion of the identification sequence is used in a first round of amplification to amplify the nucleic acid sequence of interest. A second nucleic acid primer complementary to at least a portion of the V (D) J sequence is used in a second round of amplification to further amplify the nucleic acid sequence.
Provided herein are methods for enriching a nucleic acid sequence of interest, the methods comprising: (a) Providing a plurality of nucleic acid molecules comprising a plurality of identification sequences, wherein a nucleic acid molecule of the plurality of nucleic acid molecules comprises (i) an identification sequence of the plurality of identification sequences that identifies the nucleic acid molecule and (ii) the nucleic acid sequence of interest; and (b) amplifying the nucleic acid sequence of interest using a nucleic acid primer complementary to the identification sequence, thereby enriching the nucleic acid sequence of interest.
In some embodiments, the providing comprises generating the plurality of nucleic acid molecules comprising the plurality of authentication sequences that authenticate the plurality of nucleic acid molecules
In some embodiments, the plurality of nucleic acid molecules corresponds to a plurality of cell surface proteins from the plurality of cells. In some embodiments, the plurality of cells comprises a plurality of T cells. In some embodiments, the plurality of cell surface proteins comprises a plurality of T cell receptors. In some embodiments, the plurality of cells comprises a plurality of B cells. In some embodiments, the plurality of cell surface proteins comprises a plurality of B cell receptors.
In some embodiments, the plurality of nucleic acid molecules comprises a library of nucleic acid molecules encoding a plurality of variants of an amino acid sequence. In some embodiments, the plurality of variants of the amino acid sequence are variants of a T cell receptor. In some embodiments, the plurality of variants of the amino acid sequence are variants of an antibody or antigen binding fragment thereof. In some embodiments, the plurality of nucleic acid molecules comprises complementary deoxyribonucleic acid (cDNA) molecules. In some embodiments, the nucleic acid sequence of interest comprises a nucleic acid sequence encoding a T cell receptor or fragment thereof. In some embodiments, the nucleic acid sequence of interest comprises a nucleic acid sequence encoding an antibody or antigen-binding fragment thereof. In some embodiments, the nucleic acid sequence of interest comprises a nucleic acid sequence encoding a V (D) J sequence.
In some embodiments, the identification sequence comprises a barcode sequence. In some embodiments, the nucleic acid sequence complementary to the identification sequence is complementary to at least a portion of the barcode sequence. In some embodiments, the nucleic acid sequence complementary to the identification sequence is complementary to the barcode sequence and a read sequence of the nucleic acid sequence of interest. In some embodiments, the identification sequence comprises a template switching oligonucleotide sequence. In some embodiments, the nucleic acid sequence complementary to the identification sequence is complementary to at least a portion of the template switch oligonucleotide sequence. In some embodiments, the identification sequence comprises a unique molecular identifier sequence. In some embodiments, the nucleic acid sequence complementary to the identification sequence is complementary to at least a portion of the unique molecular identifier sequence. In some embodiments, the nucleic acid primer further comprises a nucleic acid sequence complementary to a portion of the coding sequence. In some embodiments, the nucleic acid primer further comprises a nucleic acid sequence complementary to at least a portion of the V (D) J sequence. In some embodiments, the nucleic acid sequence complementary to at least a portion of the V (D) J sequence is complementary to a V sequence of the V (D) J sequence. In some embodiments, the nucleic acid primer further comprises a non-binding handle.
Some embodiments further comprise amplifying the nucleic acid sequence of interest using another nucleic acid primer, wherein the other nucleic acid primer is different from the nucleic acid primer. In some embodiments, the other nucleic acid primer comprises a non-binding handle. In some embodiments, the nucleic acid primer and the further nucleic acid primer are configured to anneal to a sequence flanking at least a portion of the nucleic acid sequence of interest.
Some embodiments further comprise a polymerase chain reaction, thereby further enriching the nucleic acid sequences of the plurality of nucleic acid molecules.
In some embodiments, no other nucleic acid molecules of the plurality of nucleic acid molecules are amplified. In some embodiments, the nucleic acid sequence of interest is enriched by at least 1000-fold, at least 10,000-fold, or at least 100,000-fold.
Some embodiments further comprise contacting the plurality of nucleic acid sequences with a second nucleic acid primer, wherein the nucleic acid primer comprises a nucleic acid sequence that is complementary to a binding sequence on a complementary sequence of the nucleic acid sequence of interest. In some embodiments, the second nucleic acid primer further comprises a non-binding handle. In some embodiments, the nucleic acid sequence of interest comprises a sequence encoding an antibody or antigen binding fragment thereof. In some embodiments, the second nucleic acid primer is complementary to at least a portion of the complement of a nucleic acid sequence encoding the V (D) J sequence of the antibody or antigen binding fragment thereof. In some embodiments, the second nucleic acid primer is complementary to at least a portion of the complement of a nucleic acid sequence encoding the constant region of the antibody or antigen binding fragment thereof. In some embodiments, the second nucleic acid primer is further complementary to at least a portion of the complement of a nucleic acid sequence encoding the J region of the antibody or antigen binding fragment thereof.
Some embodiments further comprise cloning the nucleic acid sequence of interest into a vector. In some embodiments, the vector is selected from the group consisting of a viral vector, a plasmid, a phage, a cosmid, and an artificial chromosome. In some embodiments, the cloning comprises combining two or more nucleic acid sequences. In some embodiments, the two or more nucleic acid sequences comprise a nucleic acid sequence of a heavy chain and a nucleic acid sequence of a light chain of an antibody or antigen binding fragment. In some embodiments, the two or more nucleic acid sequences comprise a nucleic acid sequence of an alpha chain of a T cell receptor and a nucleic acid sequence of a beta chain of a T cell receptor. In some embodiments, the vector comprises at least a portion of a constant region of a T cell receptor. In some embodiments, the vector comprises at least a portion of the constant region of an antibody or antigen binding fragment thereof. In some embodiments, the vector comprises a promoter.
Some embodiments further comprise determining the level of enrichment of the nucleic acid sequence of interest. Some embodiments further comprise performing a second round of amplification to further enrich the nucleic acid sequence of interest. Some such embodiments include contacting the nucleic acid sequence of interest with a third primer and a fourth primer. In some embodiments, the third primer is complementary to a portion of a barcode of the identification sequence. In some embodiments, the third primer is complementary to the 5' end of the barcode of the identification sequence. In some embodiments, the method of claim 0, wherein the third primer is further complementary to a read sequence. In some embodiments, the third primer is complementary to a portion of the identification sequence upstream of the barcode of the identification sequence. In some embodiments, the fourth primer is complementary to at least a portion of the complement of the constant region of the nucleic acid sequence of interest.
Some embodiments further comprise performing fragmentation of the nucleic acid sequence of interest. Some embodiments further comprise adding an a tail to the nucleic acid sequence of interest. Some embodiments further comprise performing SI-PCR on the nucleic acid sequence of interest. Some embodiments also include V (D) J enrichment of the nucleic acid sequence of interest. In some embodiments, the nucleic acid sequence of interest comprises a restriction site.
Another aspect of the present disclosure provides a non-transitory computer-readable medium containing machine-executable code that, when executed by one or more computer processors, implements any of the methods above or elsewhere herein.
Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory includes machine executable code that when executed by one or more computer processors implements any of the methods above or elsewhere herein.
Also provided herein are methods comprising: enriching a nucleic acid sequence of interest based at least on at least a portion of a constant region of the nucleic acid sequence of interest, thereby producing an enriched nucleic acid sequence of interest; and modifying the enriched nucleic acid sequence to produce a modified enriched nucleic acid sequence compatible with the vector.
In some embodiments, the enriching is performed using a first nucleic acid primer and a second nucleic acid primer. In some embodiments, one of the first nucleic acid primer and the second nucleic acid primer is a backbone leading (FWR 1) primer. In some embodiments, the first nucleic acid primer is at least complementary to a barcode or portion thereof on the nucleic acid sequence of interest. In some embodiments, the first nucleic acid primer is complementary to at least a unique molecular recognition sequence on the nucleic acid sequence of interest or a portion thereof. In some embodiments, the first nucleic acid primer is complementary to at least a 5 'untranslated region (5' utr) or a portion thereof on the nucleic acid sequence of interest. In some embodiments, the second nucleic acid primer is complementary to at least a constant region or portion thereof on the nucleic acid sequence of interest. In some embodiments, the second nucleic acid primer is at least complementary to a J sequence on the nucleic acid sequence of interest or a portion thereof. In some embodiments, the second nucleic acid primer is at least complementary to a nucleic acid sequence of a junction region or portion thereof on the nucleic acid sequence of interest. In some embodiments, the enrichment is performed using hybrid capture. In some embodiments, the hybridization capture is based on hybridization of a primer to a linking sequence on the nucleic acid sequence of interest. In some embodiments, the linking sequence is a V (D) J sequence or a portion thereof. In some embodiments, the nucleic acid primers are selected based on Rapid Amplification of CDNA Ends (RACE) sequencing. In some embodiments, the nucleic acid sequence of interest comprises a complementary deoxyribonucleic acid (cDNA) molecule. In some embodiments, the nucleic acid sequence of interest further comprises a barcode. In some embodiments, the nucleic acid sequence of interest encodes at least a portion of a cell surface protein of a cell.
In some embodiments, the cell surface protein is a T cell receptor or fragment thereof. In some embodiments, the cell surface protein is a B cell receptor or fragment thereof. In some embodiments, the cell is a T cell. In some embodiments, the cell is a B cell.
In some embodiments, the constant region of a nucleic acid sequence of interest comprises a nucleic acid sequence encoding a V (D) J sequence or a portion thereof. In some embodiments, the moiety is a V sequence. In some embodiments, the moiety is a J sequence.
In some embodiments, the modification comprises adding a Gibson terminus to the amplified nucleic acid sequence. In some embodiments, the adding of the Gibson ends is performed using Polymerase Chain Reaction (PCR). In some embodiments, the modification comprises combining a second nucleic acid of interest with the enriched nucleic acid of interest. In some embodiments, the combining comprises ligating the enriched nucleic acid sequence of interest to the second nucleic acid sequence of interest using overlapping extension primers. In some embodiments, the combining comprises ligating the second nucleic acid sequence of interest to the enriched nucleic acid sequence of interest using a nucleic acid linker. In some embodiments, the second nucleic acid sequence of interest is enriched. In some embodiments, the second nucleic acid sequence of interest encodes at least a portion of a cell surface protein of a cell. In some embodiments, the cell surface protein is a T cell receptor or fragment thereof. In some embodiments, the cell surface protein is a B cell receptor or fragment thereof. In some embodiments, the cell is a T cell.
Some embodiments further comprise cloning the modified enriched nucleic acid sequence into the vector. In some embodiments, the cloning includes vector restriction digestion. In some embodiments, the vector restriction digest comprises a digest at an fspI restriction site. In some embodiments, the vector comprises a native leader sequence.
Further provided herein is another method for enriching a nucleic acid sequence of interest. In the method, a plurality of nucleic acid molecules are provided. The plurality of nucleic acid molecules comprises a plurality of identification sequences. The nucleic acid molecules of the plurality of nucleic acid molecules comprise: (i) An identification sequence of the plurality of identification sequences that identifies the nucleic acid molecule and includes a barcode sequence and a Unique Molecular Identifier (UMI) sequence; and (ii) a nucleic acid sequence of interest, wherein the nucleic acid sequence of interest comprises a nucleic acid sequence encoding a B Cell Receptor (BCR) or a fragment thereof. A first amplification reaction is performed using the first set of primers. The first set of primers includes a first primer having a sequence complementary to at least a portion of the barcode sequence and/or the UMI sequence; and a second primer having a sequence complementary to at least a portion of a nucleic acid sequence of interest encoding a ligation (J) region and/or an isoform region of BCR or a fragment thereof. A second amplification reaction is performed using the second set of primers. The second set of primers comprises a third primer having a sequence complementary to a leader sequence of BCR or a fragment thereof and/or a nucleotide encoding at least a portion of framework region (FWR) 1 of BCR or a fragment thereof; and a fourth primer having a sequence complementary to at least a portion of a nucleic acid sequence of interest encoding a junction between BCR or a fragment thereof, complementarity region (CDR) 3, FWR4, J region, D region, and/or V region, or any one or more thereof.
In any aspect or embodiment of the methods provided herein, the plurality of nucleic acid molecules may be prepared from a cell sample of one or more donors. In some of these embodiments, the plurality of nucleic acid molecules that the one or more donors may have been exposed to a target antigen and that comprise a nucleic acid sequence of interest may have a nucleic acid sequence encoding a selected B Cell Receptor (BCR) or fragment thereof that binds to a target antigen. Furthermore, in aspects of these embodiments of the methods, the plurality of nucleic acid molecules comprising a nucleic acid sequence encoding a selected B Cell Receptor (BCR) or fragment thereof (that binds to a target antigen) may be prepared from a cell sample of the one or more donors according to steps that may include a first step of partitioning a reaction mixture into a plurality of partitions, wherein the reaction mixture comprises (i) a plurality of cells of the cell sample and (ii) a target antigen, which may be coupled to a reporter oligonucleotide. The reaction mixture comprises cells that bind to the target antigen. The first step of partitioning provides a partition comprising: (i) A divided cell that binds to a target antigen (ii) a plurality of nucleic acid barcode molecules comprising a partition specific barcode sequence. The second step of preparation from the cell sample of the one or more donors may generate barcoded nucleic acid molecules in the partitions. The barcoded nucleic acid molecules may include a first barcoded nucleic acid molecule and a second barcoded nucleic acid molecule. The first barcoded nucleic acid molecule may comprise the sequence of the reporter oligonucleotide or its reverse complement and the partition specific barcode sequence or its reverse complement. If the first barcoded nucleic acid molecule is detected, the second barcoded nucleic acid molecule may comprise a nucleic acid molecule of interest to be included in the plurality of nucleic acid molecules.
Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in the art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments and its several details are capable of modification in various obvious respects, all without departing from the present disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
Incorporated by reference
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. In the event that publications and patents or patent applications incorporated by reference contradict the disclosure contained in this specification, this specification is intended to supersede and/or take precedence over any such contradictory material.
Drawings
The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also referred to herein as "figures") of which:
Fig. 1 shows an example of a microfluidic channel structure for separating individual biological particles.
Fig. 2 shows an example of a microfluidic channel structure for delivering barcode-bearing beads to droplets.
Fig. 3 shows an example of a microfluidic channel structure for co-separating biological particles and reagents.
Fig. 4 shows an example of a microfluidic channel structure for controlled separation of beads into discrete droplets.
Fig. 5 shows an example of a microfluidic channel structure for achieving increased droplet generation throughput.
Fig. 6 shows another example of a microfluidic channel structure for achieving increased droplet generation throughput.
Fig. 7A shows a cross-sectional view of another example of a microfluidic channel structure with geometric features for controlled separation. Fig. 7B shows a perspective view of the channel structure of fig. 7A.
Fig. 8 shows an example of a bead carrying a bar code.
FIG. 9 shows a workflow for enriching a nucleic acid sequence of interest.
FIG. 10 shows a nested PCR protocol for amplifying a nucleic acid sequence of interest.
FIG. 11 shows an exemplary labeling agent comprising a reporter oligonucleotide attached thereto.
Fig. 12A shows a workflow for analyzing one or more analytes.
FIGS. 12B-C illustrate the processing of nucleic acid molecules derived from cells to supplement barcode sequences.
Fig. 13A-C illustrate a workflow for analyzing a plurality of analytes using a labeling agent.
FIG. 14 illustrates a computer system programmed or otherwise configured to implement the methods provided herein.
FIG. 15 shows an exemplary labeling agent comprising an attached reporter oligonucleotide.
FIG. 16 shows an example of a primer design configured to generate a clonable sequence from a nucleic acid sequence of interest using the enrichment methods provided herein.
FIG. 17 provides a graphical overview of a method for enriching nucleic acid sequences of interest.
FIG. 18 provides a graphical overview of nucleic acid sequences compatible with a vector, including incorporation of the nucleic acid sequences into the vector.
FIG. 19 provides a graphical overview of exemplary probes and protocols for capture-based enrichment of nucleic acid sequences of interest.
FIG. 20 shows the product of a nested PCR amplification reaction (FIG. 20A) compared to a one-step PCR amplification reaction (FIG. 20B) for enriching a target nucleic acid sequence of interest (e.g., a fragment encoding a BCR).
FIG. 21 shows BioA results indicating that nested PCR amplified the nucleic acid sequence of a target product of interest, e.g., a fragment encoding BCR, neat to three of the four cell clones from the pooled barcoded cDNA library. (FIGS. 21B-D; clones B, C and D). Nested PCR amplification targeting the fourth cell clone yielded multiple products (FIG. 21A; clone A).
FIG. 22 shows the sequencing results of enrichment of the product after nested amplification of a nucleic acid sequence of interest (e.g., a target nucleic acid sequence encoding a fragment of BCR produced by clone A (an expanded clonotype with multiple subclones)) from a pooled barcoded cDNA library when the forward external primer lacks sufficient specificity.
FIG. 23 shows the sequencing results of enrichment of the product after nested amplification of the nucleic acid sequence of interest (e.g., target nucleic acid sequence encoding a fragment of BCR produced by clone C (a single cell clone with many potent UMI)) from a pooled barcoded cDNA library when the forward external primer lacks sufficient specificity.
FIG. 24 shows the sequencing results of enrichment of the product after nested amplification of the nucleic acid sequence of interest (e.g., target nucleic acid sequence encoding a fragment of BCR produced by clone B (expanded clonotype with single unique subclones)) from a pooled barcoded cDNA library when the forward external primer binds to the cell barcode and UMI with sufficient specificity.
Detailed Description
While various embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.
Where values are described as ranges, it is understood that such disclosure includes disclosure of all possible sub-ranges within such ranges, as well as specific values falling within such ranges, whether or not the specific values or sub-ranges are explicitly stated.
The terms "a," "an," and "the" as used herein generally refer to both singular and plural referents unless the context clearly dictates otherwise. "A and/or B" is used herein to include all of the following alternatives: "A", "B", "A or B" and "A and B".
Headings (e.g., (a), (b), (i), etc.) are presented merely for convenience in reading the specification and claims. The use of headings in the specification or claims does not require that the steps or elements be performed in alphabetical or numerical order or the order in which they are presented.
Use of ordinal terms such as "first," "second," "third," etc., in the claims does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements. Similarly, the use of these terms in the description does not itself imply any desired priority, precedence, or order.
As used herein, the term "barcode" generally refers to a label or identifier that conveys or is capable of conveying information about an analyte. The barcode may be part of the analyte. The barcode may be independent of the analyte. The barcode may be a tag attached to an analyte (e.g., a nucleic acid molecule) or a combination of the tag plus an inherent property of the analyte (e.g., the size of the analyte or terminal sequence). Bar codes may be unique. Bar codes may have a variety of different formats. For example, the bar code may include: a polynucleotide bar code; random nucleic acid and/or amino acid sequences; and synthetic nucleic acid and/or amino acid sequences. The barcode may be attached to the analyte in a reversible or irreversible manner. The barcode may be added to a fragment of, for example, a deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sample before, during, and/or after sequencing of the sample. Bar codes may allow identification and/or quantification of individual sequencing reads.
As used herein, the term "real-time" may refer to a response time of less than about 1 second, one tenth second, one hundredth second, one millisecond or less. The response time may be greater than 1 second. In some cases, real-time may refer to simultaneous or substantially simultaneous processing, detection, or authentication.
As used herein, the term "subject" generally refers to an animal such as a mammal (e.g., a human) or an avian (e.g., a bird), or other organism such as a plant. For example, the subject can be a vertebrate, mammal, rodent (e.g., mouse), primate, ape, or human. Animals may include, but are not limited to, farm animals, sports animals, and pets. The subject may be a healthy or asymptomatic individual, an individual who has or is suspected of having a disease (e.g., cancer) or is susceptible to the disease, and/or an individual in need of treatment or suspected of being in need of treatment. The subject may be a patient. The subject may be a microorganism or microbe (e.g., bacteria, fungi, archaea, viruses).
As used herein, the term "genome" generally refers to genomic information from a subject, which may be, for example, at least a portion or all of the genetic information of the subject. The genome may be encoded in DNA or RNA. The genome may comprise coding regions (e.g., encoding a protein) as well as non-coding regions. The genome may comprise sequences of all chromosomes together in an organism. For example, the human genome typically has a total of 46 chromosomes. The sequence of all these chromosomes together may constitute the human genome.
The terms "adapter", "adapter" and "tag" may be used synonymously. The adaptors or tags may be coupled to the polynucleotide sequences to be "tagged" by any method, including ligation, hybridization or other methods.
As used herein, the term "sequencing" generally refers to methods and techniques for determining the sequence of nucleotide bases in one or more polynucleotides. These polynucleotides may be, for example, nucleic acid molecules such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single stranded DNA). Sequencing may be performed by various systems currently available, such as, but not limited toPacific Biosciences/>Oxford/>Or Life Technologies (Ion->) A sequencing system produced. Alternatively or in addition, sequencing may be performed using nucleic acid amplification, polymerase Chain Reaction (PCR) (e.g., digital PCR, quantitative PCR, or real-time PCR), or isothermal amplification. Such systems can provide a plurality of raw genetic data corresponding to genetic information of a subject (e.g., a human) as generated by the systems from a sample provided by the subject. In some examples, such systems provide sequencing reads (also referred to herein as "reads"). Reads may include a sequence of nucleobases corresponding to the sequence of a nucleic acid molecule that has been sequenced. In some cases, the systems and methods provided herein may be used with proteome information.
As used herein, the term "bead" generally refers to a particle. The beads may be solid or semi-solid particles. The beads may be gel beads. The gel beads may include a polymer matrix (e.g., a matrix formed by polymerization or cross-linking). The polymer matrix may include one or more polymers (e.g., polymers having different functional groups or repeating units). The polymers in the polymer matrix may be randomly arranged, for example in a random copolymer, and/or have an ordered structure, for example in a block copolymer. Crosslinking may be via covalent, ionic or induced interactions or physical entanglement. The beads may be macromolecules. Beads may be formed from nucleic acid molecules that are bound together. Beads may be formed via covalent or non-covalent assembly of molecules (e.g., macromolecules) such as monomers or polymers. Such polymers or monomers may be natural or synthetic. Such polymers or monomers may be or include, for example, nucleic acid molecules (e.g., DNA or RNA). The beads may be formed of a polymeric material. The beads may be magnetic or non-magnetic. The beads may be rigid. The beads may be flexible and/or compressible. The beads may be destructible or dissolvable. The beads may be solid particles (e.g., metal-based particles including, but not limited to, iron oxide, gold, or silver) covered with a coating comprising one or more polymers. Such coatings may be destructible or dissolvable.
As used herein, the term "sample" generally refers to a biological sample of a subject. The biological sample may comprise any number of macromolecules, such as cellular macromolecules. The sample may be a cell sample. The sample may be a cell line or a cell culture sample. The sample may comprise one or more cells. The sample may comprise one or more microorganisms. The biological sample may be a nucleic acid sample or a protein sample. The biological sample may also be a carbohydrate sample or a lipid sample. The biological sample may be derived from another sample. The sample may be a tissue sample, such as a biopsy sample, core needle biopsy sample, needle aspirate, or fine needle aspirate. The sample may be a fluid sample, such as a blood sample, a urine sample, or a saliva sample. The sample may be a skin sample. The sample may be a cheek swab. The sample may be a plasma or serum sample. The sample may be a cell-free sample or a cell-free sample. The cell-free sample may comprise extracellular polynucleotides. The extracellular polynucleotides may be isolated from a body sample, which may be selected from the group consisting of blood, plasma, serum, urine, saliva, mucosal secretions, sputum, stool, and tears.
As used herein, the term "biological particle" generally refers to a discrete biological system derived from a biological sample. The biological particles may be macromolecules. The biological particles may be small molecules. The biological particle may be a virus. The biological particles may be cells or derivatives of cells. The biological particles may be organelles. The biological particles may be rare cells from a population of cells. The biological particles can be any type of cell including, but not limited to, prokaryotic cells, eukaryotic cells, bacteria, fungi, plants, mammalian or other animal cell types, mycoplasma, normal tissue cells, tumor cells, or any other cell type whether derived from a single-cell organism or a multicellular organism. The biological particles may be a component of a cell. The biological particles may be or may include DNA, RNA, organelles, proteins, or any combination thereof. The biological particles may be or include a matrix (e.g., a gel or polymer matrix) comprising cells or one or more components from cells (e.g., cell beads), such as DNA, RNA, organelles, proteins, or any combination thereof from cells. The biological particles may be obtained from a tissue of a subject. The biological particles may be hardened cells. Such hardened cells may or may not include cell walls or cell membranes. The biological particles may include one or more components of the cell, but may not include other components of the cell. Examples of such components are nuclei or organelles. The cells may be living cells. Living cells may be capable of being cultured, for example, when enclosed in a gel or polymer matrix, or when comprising a gel or polymer matrix.
As used herein, the term "macromolecular composition" generally refers to macromolecules contained within or derived from a biological particle. The macromolecular composition may comprise a nucleic acid. In some cases, the biological particles may be macromolecules. The macromolecular composition may comprise DNA. The macromolecular composition may comprise RNA. The RNA may be encoded or non-encoded. The RNA may be, for example, messenger RNA (mRNA), ribosomal RNA (rRNA), or transfer RNA (tRNA). The RNA may be a transcript. The RNA may be a small RNA less than 200 nucleobases in length, or a large RNA greater than 200 nucleobases in length. The micrornas can include 5.8S ribosomal RNAs (rrnas), 5S rrnas, transfer RNAs (trnas), micrornas (mirnas), small interfering RNAs (sirnas), micronucleolar RNAs (snornas), RNAs that interact with Piwi proteins (pirnas), tRNA-derived micrornas (tsrnas), and rDNA-derived micrornas (srrrnas). The RNA may be double-stranded RNA or single-stranded RNA. The RNA may be circular RNA. The macromolecular composition may comprise a protein. The macromolecular composition may comprise a peptide. The macromolecular composition may comprise a polypeptide.
As used herein, the term "molecular tag" generally refers to a molecule capable of binding to a macromolecular component. Molecular tags can bind to macromolecular components with high affinity. Molecular tags can bind to macromolecular components with high specificity. The molecular tag may comprise a nucleotide sequence. The molecular tag may comprise a nucleic acid sequence. The nucleic acid sequence may be at least a portion or all of a molecular tag. The molecular tag may be a nucleic acid molecule or may be part of a nucleic acid molecule. The molecular tag may be an oligonucleotide or a polypeptide. The molecular tag may comprise a DNA aptamer. The molecular tag may be or comprise a primer. The molecular tag may be or comprise a protein. The molecular tag may comprise a polypeptide. The molecular tag may be a barcode.
As used herein, the term "partition" generally refers to a space or volume that may be suitable for containing one or more species or carrying out one or more reactions. The partitions may be physical compartments such as droplets or holes. A partition may isolate a space or volume from another space or volume. The droplets may be a first phase (e.g., an aqueous phase) in a second phase (e.g., oil) that is immiscible with the first phase. The droplets may be a first phase in a second phase that is not phase separated from the first phase, such as capsules or liposomes in an aqueous phase. A partition may include one or more other (internal) partitions. In some cases, a partition may be a virtual compartment that may be defined and identified across multiple and/or remote physical compartments by an index (e.g., an index library).
Enrichment of sequences of interest from a library of nucleic acid molecules
Provided herein are methods for enriching a nucleic acid sequence of interest from a plurality of nucleic acid molecules (such as a library of nucleic acid molecules). The methods provided herein can be used, for example, to enrich for a nucleic acid sequence of interest so that the nucleic acid sequence can be further cloned or analyzed. The methods herein may provide low noise, high specificity, or both. In some embodiments, the methods herein can be used for selection of nucleic acid sequences of interest (e.g., candidate antibodies) or antibody discovery applications.
A general workflow of the methods provided herein is provided in fig. 9. Briefly, (1) a library (e.g., a barcoded library of sequences of a single cell immune repertoire of subjects) can be generated; (2) Sequences of interest (e.g., V (D) J sequences, such as paired TCR (e.g., TRA/TRB), BCR, or antibody (e.g., heavy/light chain) sequences) can be identified, for example, by sequencing; (3) Sequences of interest can be enriched from the library (e.g., by using 1 or 2 rounds of PCR); (4) Cloning the enriched sequence of interest into an appropriate expression vector (and optionally expressing the sequence); and (5) optionally analyzing the protein (e.g., antibody) encoded by the sequence of interest.
Method
Libraries of nucleic acid molecules having nucleic acid sequences (e.g., nucleic acid sequences encoding proteins such as paired T Cell Receptor (TCR), B Cell Receptor (BCR), and antibodies or antigen binding fragments thereof) can be used to provide an enriched nucleic acid sequence of interest in the nucleic acid sequences, e.g., encoding an amino acid sequence of interest (e.g., a specific T cell receptor, B cell receptor, or antibody or antigen binding fragment thereof). The library may be generated, for example, by isolating and/or amplifying RNA encoding the amino acid sequence of interest or using DNA (e.g., genomic DNA). The RNA library can be reverse transcribed to produce a cDNA library, and an identification sequence (e.g., a barcode sequence or unique molecular identification sequence) can be added to and used to identify the members of the library.
In some cases, a barcoded nucleic acid library comprising immune molecules (e.g., from single cells) is generated as described herein. For example, in some embodiments, the RNA molecules are processed as generally described in fig. 12B-C. Referring to fig. 12B, in some cases, nucleic acid molecules (such as RNA molecules) derived from cells are processed to supplement the cell (e.g., partition) specific barcode sequences 1222 to these molecules or derivatives thereof (e.g., cDNA molecules). For example, referring to fig. 12B, in some embodiments, primer 1250 comprises a sequence that is complementary to a sequence of an RNA molecule 1260 (such as an RNA encoding an immune molecule (such as a light chain or heavy chain antibody sequence)) from a cell. In some cases, primer 1250 comprises one or more adaptor sequences 1251 that are not complementary to RNA molecule 1260. In some cases, primer 1250 comprises a poly-T sequence. In some cases, primer 1250 comprises a sequence that is complementary to a target sequence in an RNA molecule. In some cases, primer 1250 comprises a sequence that is complementary to a region of an immune molecule (such as a constant region of an RNA encoding a TCR, BCR, or antibody molecule). Primer 1250 hybridizes to RNA molecule 1260 and generates cDNA molecule 1270 in a reverse transcription reaction. In some cases, the reverse transcriptase is selected such that several non-template bases 1280 (e.g., poly-C sequences) are added to the cDNA. Nucleic acid barcode molecule 1290 comprises a sequence 1224 complementary to a non-template base, and reverse transcriptase performs a template switching reaction on nucleic acid barcode molecule 1290 to generate a barcoded nucleic acid molecule comprising cellular (e.g., partition specific) barcode sequence 1222 (or reverse complement thereof) and cDNA sequence 1270 (or a portion thereof). In another example, referring to fig. 12C, in some embodiments, nucleic acid barcode molecule 1290 comprises a sequence 1223 that is complementary to the sequence of RNA molecule 1260 from a cell. In some cases, sequence 1223 comprises a sequence specific for an RNA molecule. In some cases, sequence 1223 comprises a poly-T sequence. In some cases, sequence 1223 comprises a sequence specific for an RNA molecule. In some cases, sequence 1223 comprises a sequence complementary to a region of an immune molecule (such as a constant region of an RNA encoding a TCR, BCR, or antibody molecule). Sequence 1223 hybridizes to RNA molecule 1260 and generates cDNA molecule 1270 in a reverse transcription reaction, thereby generating a barcoded nucleic acid molecule comprising cellular (e.g., partition specific) barcode sequence 1222 (or reverse complement thereof) and cDNA sequence 1270 (or a portion thereof). The barcoded nucleic acid molecules can then optionally be processed as described elsewhere herein, for example, to amplify the molecules and/or to supplement the sequencing platform specific sequences to the fragments. See, for example, U.S. patent publication 20180105808, which is hereby incorporated by reference in its entirety. The barcoded nucleic acid molecules or derivatives generated therefrom can then be sequenced on a suitable sequencing platform. In some cases, one or more labeling agents capable of binding to or otherwise coupling to one or more cellular features may be used to characterize cells and/or cellular features as described herein (e.g., to characterize immune receptors or antigen specificity of an immune molecule).
The molecules of the library may have a structure from 5 'to 3' from the identification sequence to the coding sequence. For example, the molecules of the library may have the following structure from 5 'to 3': (1) a barcode sequence; (2) a unique molecular identifier sequence; (3) a template switching oligonucleotide sequence; (4) An immune molecule variable sequence (e.g., a V (D) J sequence as provided herein); and (5) an immune receptor constant sequence. In some embodiments, one or more adapter sequences (such as sequencing platform specific sequences, such as sequencing primers or primer binding sequences, e.g., illumina R1 or R2) may be located 5 'or 3' or both of the sequence of the molecules of the library.
In some cases, the barcoded gene expression library is generated from a plurality of cells (e.g., from a single cell as described herein) comprising an immune molecule such as a TCR, BCR, or antibody. The barcoded library can then be sequenced and analyzed to identify paired immune molecule sequences (e.g., comprising a common barcode sequence) from single cells, such as paired TCRs (e.g., TRA/TRBs), paired BCRs (light chain/heavy chain sequences), and paired antibody sequences (light chain/heavy chain sequences). The immune molecules of interest (e.g., paired light chain/heavy chain antibodies) can then be enriched (e.g., amplified) directly from the barcoded library for subsequent processing and analysis in, for example, an expression vector. In some cases, primers are designed to amplify paired immune molecules (e.g., light and heavy chain antibody sequences) from the library for cloning into one or more suitable expression vectors.
Enrichment of a nucleic acid sequence of interest from, for example, a barcoded gene expression library may allow for accelerated isolation of the nucleic acid and expression and/or analysis of the amino acid sequence encoded by the nucleic acid. For example, sequences of interest (e.g., V (D) J sequences, such as paired TCR (e.g., TRA/TRB), BCR, or antibody (e.g., heavy/light chain) sequences) can be enriched (e.g., using one or more PCR reactions) and these enriched sequences (such as the light and heavy chain sequences of the antibody) can be cloned directly into an appropriate expression vector to avoid expensive and time-consuming methods (such as gene synthesis) for generating expression vectors configured to express an immune molecule of interest (e.g., antibody). The nucleic acid sequence of interest may be enriched by: the nucleic acid sequence of interest is amplified based on an identification sequence (e.g., a barcode or UMI) associated with the nucleic acid sequence of interest, for example, by using a protocol such as that shown in fig. 9. In some embodiments, the nucleic acid sequence of interest may be further enriched using a nested amplification method.
As part of the enrichment protocol, a nucleic acid primer can be designed that anneals to one or more identification sequences in a molecule having a nucleic acid sequence of interest, such as a barcode sequence or unique molecular identifier. The other primer may be designed to anneal to a sequence downstream of the identification sequence and may be configured such that the nucleic acid sequence may be amplified using the primer, for example, by polymerase chain reaction. A second round of amplification may be performed using a different set of primers to further enrich for the nucleic acid sequence of interest.
After enrichment for the nucleic acid sequence of interest, the nucleic acid sequence of interest may be cloned into a vector and subsequently expressed in an expression system. Such cloning and expression can produce the protein to be analyzed. For example, candidate T cell receptors, B cell receptors or antibodies or antigen binding fragments thereof may be expressed in an expression system in which such a nucleic acid sequence of interest is cloned. These proteins may be therapeutic candidates, genes of interest, protein variants of interest or another protein to be analyzed. In some cases, the primers are designed to amplify paired immune molecule sequences (e.g., comprising a common barcode sequence) from a single cell, such as paired TCRs (e.g., TRA/TRBs), paired BCRs (light chain/heavy chain sequences), and paired antibody sequences (light chain/heavy chain sequences). These amplified, paired immune molecule sequences (e.g., paired light and heavy chain antibody sequences) can then optionally be processed for subsequent cloning into expression vectors (e.g., plasmids configured to co-express paired immune molecule subunits such as antibody heavy and light chains) for expression of functional immune molecules.
Nucleic acid molecules
The methods provided herein can include providing a plurality of nucleic acid molecules. The nucleic acid molecules described herein can comprise ribonucleic acids (e.g., RNAs, such as the RNA molecules provided herein) or deoxyribonucleic acids (e.g., DNA or cDNA). The nucleic acid molecule may comprise G, A, T, U, C or bases capable of reliably base pairing with complementary nucleotides. 7-deaza-adenine, 7-deaza-guanine, adenine, guanine, cytosine, thymine, uracil, 2-deaza-2-thio-guanosine, 2-thio-7-deaza-guanosine, 2-thio-adenine, 2-thio-7-deaza-adenine, isoguanine, 7-deaza-guanine, 5, 6-dihydrouridine, 5, 6-dihydrothymine, huang Piao, 7-deaza-Huang Piao, hypoxanthine, 7-deaza-Huang Piao, 2,6 diamino-7-deaza-purine, 5-methyl-cytosine, 5-propynyl-uridine, 5-propynyl-cytidine, 2-thio-thymine, or 2-thio-uridine are examples of such bases, but many other bases are known. The nucleic acid molecule may comprise, for example, LNA, PNA, UNA or morpholino oligomer. Nucleic acid molecules as used herein may comprise natural or unnatural nucleotides or linkages.
The nucleic acid molecule may comprise an identification sequence. The identification sequence may identify, for example, the nucleic acid molecule, the source of the nucleic acid sample, or another characteristic of the nucleic acid sample. For example, the nucleic acid molecule may comprise one or more of the following: an adaptor sequence, primer or primer binding sequence, a sequencing primer or sequencing primer binding sequence (such as R1 or a partial R1 sequence), a Unique Molecular Identifier (UMI), a polynucleotide sequence (such as a poly a or poly C sequence), or a sequence configured to bind to a flow cell of a sequencer (such as P5 or P7 or a partial sequence thereof). It will be appreciated that the nucleic acid molecule may also comprise a cellular barcode sequence, such as a partition specific barcode.
In some embodiments, the nucleic acid molecules of the plurality of nucleic acid molecules may comprise two or more of a barcode, a unique molecular recognition sequence, and a template switching oligonucleotide sequence. For example, the nucleic acid molecule may comprise a barcode and unique molecular recognition sequence, a barcode and template switch oligonucleotide sequence, or a unique molecular recognition sequence and template switch oligonucleotide sequence.
An example of such a nucleic acid molecule is included in the top panel of FIG. 10.
In some embodiments, the nucleic acid sequence of interest may be engineered to include a restriction site (e.g., using PCR primers that include a restriction site). In some embodiments, restriction sites may be used for cloning after enrichment of the nucleic acid sequence of interest.
The nucleic acid molecules of the plurality of nucleic acid molecules may comprise a nucleic acid sequence which may encode an amino acid sequence. In some embodiments, the amino acid sequence may be of a T cell receptor or a B cell receptor. In some embodiments, the amino acid sequence may be the amino acid sequence of an antibody or antigen binding fragment thereof.
In some embodiments, a nucleic acid molecule of a plurality of nucleic acid molecules can comprise a nucleic acid sequence of interest, such as the nucleic acid sequences described herein.
As used herein, the term "antibody" may refer to an immunoglobulin (Ig), polypeptide, or protein having a binding domain that is an antigen binding domain or is homologous to an antigen binding domain. The term may also include "antigen binding fragments" and other interchangeable terms like binding fragments as described herein. Natural antibody and natural immunoglobulinWhite (Ig) can be a heterotetrameric glycoprotein of about 150,000 daltons made up of two identical light chains and two identical heavy chains. Antibodies may also be referred to as camelid antibodies. In some cases, camelid antibodies are not tetramers. Each light chain may be linked to the heavy chain by one covalent disulfide bond, and the number of disulfide bonds may vary between heavy chains of different immunoglobulin isotypes. Each heavy and light chain may have regularly spaced intrachain disulfide bridges. Each heavy chain may have a variable domain at one end ("V H ") followed by multiple constant domains (" C ") H "). Each light chain may have a variable domain at one end ("V L ") and has a constant domain (" C ") at its other end L ""; the constant domain of the light chain may be aligned with the first constant domain of the heavy chain, and the light chain variable domain may be aligned with the variable domain of the heavy chain. Specific amino acid residues may form an interface between the light chain variable domain and the heavy chain variable domain.
In some cases, an antibody or antigen-binding fragment thereof comprises an isolated antibody or antigen-binding fragment thereof, a purified antibody or antigen-binding fragment thereof, a recombinant antibody or antigen-binding fragment thereof, a modified antibody or antigen-binding fragment thereof, or a synthetic antibody or antigen-binding fragment thereof.
Antibodies and antigen binding fragments herein may be partially or fully synthetically produced. An antibody or antigen binding fragment may be a polypeptide or protein having a binding domain that may act as an antigen binding domain or may be homologous to an antigen binding domain. In some cases, antibodies or antigen-binding fragments thereof may be produced in a suitable in vivo animal model, which is then isolated and/or purified.
Immunoglobulins (Ig) can be assigned to different classes based on the amino acid sequence of the constant domain of their heavy chain. The main classes of immunoglobulins may include: igA, igD, igE, igG and IgM, and several of these classes can be further divided into subclasses (isotypes), such as IgG1, igG2, igG3, igG4, igA1 and IgA2. The Ig or portion thereof may in some cases be a human Ig. In some cases, C H The 3 domain may be from an immunoglobulin. In some cases, the chain or portion of the antibody or antigen-binding fragment thereof, the modified antibody or antigen-binding fragment thereof, or the binding agent may be from Ig. In such cases, ig may be IgG, igA, igD, igE or IgM. In the case where Ig is IgG, it may be a subtype of IgG, where the subtype of IgG may include IgG1, igG2a, igG2b, igG3, and IgG4. In some cases, C H The 3 domain may be from an immunoglobulin selected from the group consisting of IgG, igA, igD, igE and IgM.
The "light chain" of antibodies (immunoglobulins) from any vertebrate species can be assigned to one of two distinctly different types, called kappa ("kappa" or "K") or lambda ("lambda"), based on the amino acid sequence of their constant domains.
The "variable region" of an antibody may refer to the variable region of an antibody light chain or the variable region of an antibody heavy chain, alone or in combination. The variable regions of the heavy and light chains may consist of four Framework Regions (FR) connected by three Complementarity Determining Regions (CDRs) (also known as hypervariable regions). The CDRs in each chain can be held together in close proximity by the FR and together with CDRs from the other chains contribute to the formation of the antigen binding site of the antibody. The CDR may be determined by a method such as: (1) Methods based on cross-species sequence variability (i.e., kabat et al, sequences of Proteins of Immunological Interest (5 th edition, 1991,National Institutes of Health,Bethesda Md.)); and (2) methods based on crystallographic studies of antigen-antibody complexes (Al-Izikani et Al (1997) J.molecular. Biol. 273:927-948)). As used herein, a CDR may refer to a CDR defined by either method or by a combination of both methods.
In the case of antibodies, the term "variable domain" may refer to the variable domain of an antibody used in the binding and specificity of each particular antibody for its particular antigen. In some cases, the variability is unevenly distributed throughout the variable domains of the antibody. In some cases, it is concentrated in three segments called hypervariable regions (also called CDRs) in the light chain variable domain and the heavy chain variable domain. The more highly conserved portions of the variable domains may be referred to as "framework regions" or "FRs". The variable domains of unmodified heavy and light chains may contain four FR (FR 1, FR2, FR3 and FR 4) that mainly take on a β -sheet configuration interspersed with three CDRs, which may form part of a linked loop and in some cases β -sheet structure. The CDRs in each chain can be held together in close proximity by the FR and together with CDRs from the other chains contribute to the formation of the antigen binding site of the antibody (see Kabat et al Sequences of Proteins of Immunological Interest, 5 th edition, public Health Service, national Institutes ofHealth, bethesda, md. (1991), pages 647-669).
"antibodies" useful in the present disclosure may encompass monoclonal antibodies, polyclonal antibodies, chimeric antibodies, bispecific antibodies, multispecific antibodies, heteroconjugate antibodies, humanized antibodies, human antibodies, deimmunized antibodies, mutants thereof, fusions thereof, immunoconjugates thereof, antigen-binding fragments thereof, and/or any other modified configuration of immunoglobulin molecules comprising an antigen recognition site of a desired specificity, including glycosylated variants of antibodies, amino acid sequence variants of antibodies, and covalently modified antibodies. In some embodiments, the antibody may be a murine antibody.
The antibody may be a human antibody. As used herein, a "human antibody" may be an antibody having an amino acid sequence that corresponds to the amino acid sequence of an antibody produced by a human and/or that has been prepared using any suitable technique for preparing human antibodies. The human antibody may comprise an antibody comprising at least one human heavy chain polypeptide or at least one human light chain polypeptide. One such example is an antibody comprising a murine light chain polypeptide and a human heavy chain polypeptide. In one embodiment, the human antibody is selected from a phage library, wherein the phage library expresses human antibodies (Vaughan et al, 1996,Nature Biotechnology,14:309-314; sheets et al, 1998,PNAS USA,95:6157-6162; hoogenboom and Winter,1991, J.mol. Biol.,227:381; marks et al, 1991, J.mol. Biol., 222:581). Human antibodies can also be prepared by: the human immunoglobulin loci are introduced into transgenic animals, such as mice in which endogenous immunoglobulin genes have been partially or fully inactivated. This method is described in U.S. patent No. 5,545,807;5,545,806;5,569,825;5,625,126;5,633,425; and 5,661,016. Alternatively, human antibodies may be prepared by immortalizing human B lymphocytes that produce antibodies to a target antigen (such B lymphocytes may be recovered from an individual or may have been immunized in vitro). See, e.g., cole et al, monoclonal Antibodies and Cancer Therapy, alan r.list, page 77 (1985); boerner et al, 1991, J.Immunol.,147 (1): 86-95; U.S. patent No. 5,750,373.
Any of the antibodies herein may be bispecific. Bispecific antibodies can be antibodies that have binding specificities for at least two different antigens and can be prepared using the antibodies disclosed herein. Exemplary methods for preparing bispecific antibodies are described (see, e.g., sursh et al, 1986,Methods in Enzymology 121:210). Recombinant production of bispecific antibodies can be based on co-expression of two immunoglobulin heavy chain-light chain pairs, where the two heavy chains have different specificities (Millstein and Cuello,1983, nature,305, 537-539). Bispecific antibodies can be composed of hybrid immunoglobulin heavy chains in one arm with a first binding specificity and hybrid immunoglobulin heavy chain-light chain pairs in the other arm (providing a second binding specificity). This asymmetric structure with immunoglobulin light chains in only half of the bispecific molecule may facilitate the separation of the desired bispecific compound from undesired immunoglobulin chain combinations. This method is described, for example, in PCT publication No. WO 94/04690.
Functional fragments of any of the antibodies herein are also contemplated. The terms "antigen binding portion of an antibody", "antigen binding fragment", "antigen binding domain", "antibody fragment" or "functional fragment of an antibody" may refer to one or more fragments of an antibody that retain the ability to specifically bind to an antigen. Representative antigen binding fragments include Fab, fab ', F (ab') 2 Fv, scFv, dsFv variable heavy chain domain, variable light chain domain, variable NAR domain, bispecific scFv, bispecific Fab 2 Trispecific Fab 3Minibodies, diabodies, macroantibodies (maxibodies), camelid antibodies, VHHs, minibodies, intracellular antibodies, fusion proteins comprising an antibody moiety (e.g., a domain antibody), and single-chain binding polypeptides.
“F(ab') 2 The "and" Fab' "portion can be generated by treating Ig with proteases such as pepsin and papain, and includes antibody fragments generated by digestion of immunoglobulins in the vicinity of disulfide bonds present between the respective hinge regions of the two heavy chains. For example, papain can cleave IgG upstream of disulfide bonds present between the respective hinge regions of two heavy chains to generate two homologous antibody fragments, consisting of V L And C L (light chain constant region) light chain and V H And C Hγ1 The heavy chain fragments (γ1 region in the constant region of the heavy chain) are linked at their C-terminal regions by disulfide bonds. Each of these two homologous antibody fragments may be referred to as Fab'. Pepsin may also cleave IgG downstream of the disulfide bond present between the respective hinge regions of the two heavy chains to generate an antibody fragment slightly larger than the fragment, with two of the above Fab's joined at the hinge region. The antibody fragment may be referred to as F (ab') 2
The Fab fragment may also contain the constant domain of the light chain and the first constant domain of the heavy chain (C H 1). Fab' fragments can differ from Fab fragments in that they are found in the heavy chain C H 1, including one or more cysteines from the antibody hinge region. Fab '-SH can be Fab' in which the cysteine residue of the constant domain bears a free thiol group. F (ab') 2 Antibody fragments may be produced, for example, in the form of pairs of Fab' fragments with hinge cysteines therebetween. Other chemical couplings of antibody fragments may also be employed.
As used herein, "Fv" may refer to an antibody fragment that contains complete antigen recognition and antigen binding sites. This region may consist of a dimer of one heavy and one light chain variable domain in close, non-covalent or covalent association (disulfide-linked Fv's have been described,see, for example, reiter et al (1996) Nature Biotechnology 14:1239-1245). In this configuration, the three CDRs of each variable domain can interact to define V H -V L Antigen binding sites on the surface of the dimer. From each V H And V L A combination of one or more of the CDRs of a chain may collectively confer antigen binding specificity to the antibody. For example, when transferred to a recipient antibody or antigen-binding fragment thereof H And V L When chain, CDRH3 and CDRL3 may be sufficient to confer antigen binding specificity to an antibody, and the binding, specificity, affinity, etc. of the CDR combination may be tested using, for example, the techniques described herein. In some cases, even a single variable domain (or half Fv comprising only three CDRs specific for an antigen) may have the ability to recognize and bind antigen, but the specificity or affinity may be lower than when combined with a second variable domain. Furthermore, although the two domains of the Fv fragment (V L And V H ) May be encoded by separate genes, but they may be joined using recombinant methods, for example by synthetic linkers that enable them to be made into a single protein chain, where V L And V H Regions pair to form monovalent molecules (known as single chain Fv (scFv); bird et al (1988) Science242:423-426; huston et al (1988) Proc. Natl. Acad. Sci. USA 85:5879-5883; and Osbourn et al (1998) Nat. Biotechnol. 16:778). Such scFv may be encompassed within the term "antigen binding portion" of an antibody. Any V of a specific scFv H And V L The sequences may be linked to an Fc region cDNA or genomic sequence to generate an expression vector encoding a complete Ig (e.g., igG) molecule or other isotype. V (V) H And V L It can also be used to generate other fragments of Fab, fv or Ig using protein chemistry or recombinant DNA techniques.
A "single chain Fv" or "sFv" antibody fragment may comprise V of an antibody H And V L Domains, wherein these domains may be present in a single polypeptide chain. In some embodiments, fv polypeptides may also be described as V H And V L The domains contain polypeptide linkers between them that enable sFv to form the necessary antigen bindingIs a structure of (a). For reviews of sFvs, see, e.g., pluckthun in The Pharmacology of Monoclonal Antibodies, volume 113, rosenburg and Moore editions, springer-Verlag, new York, pages 269-315 (1994).
Also contemplated herein are usesAs an antigen or antigen binding portion. The term->May refer to a class of therapeutic proteins of human origin that may be unrelated to antibodies and antibody fragments, and may be composed of a plurality of modular and reusable binding domains, referred to as class a domains (also referred to as class a modules, complementary repeats, or LDL-receptor class a domains). They can be developed from human extracellular receptor domains by in vitro exon shuffling and phage display (Silverman et al, 2005, nat. Biotechnol.23:1493-1494; silverman et al, 2006, nat. Biotechnol.24:220). The resulting protein may contain multiple independent binding domains, and may thus exhibit increased affinity and/or specificity as compared to a single epitope binding protein. Each of the known 217 human a domains may comprise about 35 amino acids (about 4 kDa); and these domains may be separated by linkers that may be five amino acids in length on average. The native a domain rapidly and efficiently folds into a uniform, stable structure mediated primarily by calcium binding and disulfide formation. This common structure may require a conserved scaffold motif of only 12 amino acids. The end result may be a single protein chain containing multiple domains, each domain representing a separate function. Each domain of a protein may bind independently, and the energy contribution of each domain may be additive.
Antigen binding polypeptides may also include heavy chain dimers, e.g., antibodies from camels and sharks. Camelid and shark antibodies may comprise pairs of homodimers of two chains of V-like and C-like domains (neither having light chains). V due to heavy chain dimer IgG in camel H The region may not have to interact with the light chain in a hydrophobic manner and thus the region of the heavy chain that normally contacts the light chain may be changed to a hydrophilic amino acid residue in a camel. V of heavy chain dimer IgG H The domain may be referred to as V HH A domain. The shark Ig-NAR may comprise a homodimer of one variable domain (called the V-NAR domain) and five C-like constant domains (C-NAR domains). In camels, diversity of antibody repertoires can be achieved by V H Or V HH CDRs 1, 2 and 3 in the region. Camel V HH CDR3 in the region can be characterized by its relatively long length of on average 16 amino acids (Muyldermans et al 1994,Protein Engineering 7 (9): 1129). This can be contrasted with CDR3 regions of antibodies of many other species. For example, mouse V H CDR3 of (c) may have an average of 9 amino acids. Libraries of camelid derived antibody variable regions, which can preserve in vivo diversity of the variable regions of the camels, can be prepared by methods disclosed, for example, in U.S. patent application serial No. 20050037421.
As used herein, "large antibody" may refer to a bivalent scFv covalently linked to the Fc region of an immunoglobulin, see, e.g., fredericks et al Protein Engineering, design & Selection,17:95-106 (2004) and Powers et al Journal of Immunological Methods,251:123-135 (2001).
As used herein, a "dsFv" may be an Fv fragment obtained, for example, by: cys residues are introduced into the appropriate sites in each of the heavy and light chain variable regions, which are then stabilized by disulfide bonds. The site in each chain into which a Cys residue may be introduced may be determined based on the conformation predicted by molecular modeling. In the present disclosure, for example, the conformation can be predicted from the amino acid sequences of the heavy chain variable region and the light chain variable region of the above-described antibodies, and then DNA encoding each of the heavy chain variable region and the light chain variable region can be constructed (into which mutations have been introduced based on such predictions). The DNA construct may then be incorporated into a suitable vector and prepared from a transformant obtained by transformation with the vector described above.
Described herein are single chain variable fragment ("scFv") of antibodies. Single chain variable region fragments can be prepared by ligating light chain variable regions and/or heavy chain variable regions using short connecting peptides. Bird et al (1988) Science242:423-426. Single-chain variants may be produced recombinantly or synthetically. To synthetically generate scFv, an automated synthesizer may be used. To recombinantly produce scFv, a suitable plasmid comprising a polynucleotide encoding the scFv may be introduced into a suitable host cell, either eukaryotic (such as yeast, plant, insect or mammalian cells) or prokaryotic (such as e.coli). Polynucleotides encoding the scFv of interest can be prepared by conventional manipulations, such as ligation of polynucleotides. The resulting scFv may be isolated using any suitable protein purification technique.
The bifunctional antibody may be a single chain antibody. Bifunctional antibodies may be bivalent, bispecific antibodies in which VH and VL domains may be expressed on a single polypeptide chain, but the linker used is too short to allow pairing between two domains on the same chain, forcing these domains to pair with the complementary domain of the other chain and forming two antigen binding sites (see e.g. Holliger, p. Et al, proc.Natl. Acad. Sci. USA,90:6444-6448 (1993), and Poljak, R.J. Et al, structure,2:1121-1123 (1994)).
As used herein, a "minibody" may refer to an scFv fused to CH3 via a peptide linker (no hinge) or via an IgG hinge, which has been described in Olafsen et al, protein Eng Des sel.2004, month 4; 17 (4):315-23.
As used herein, an "intracellular antibody" refers to a single chain antibody that can display intracellular expression and can manipulate the function of intracellular proteins (Biocca et al, EMBO J.9:101-108,1990; colby et al, proc Natl Acad. Sci. USA.101:17616-21, 2004). Intracellular antibodies may be produced that may contain a cell signal sequence that may retain the antibody construct in the intracellular region, as described, for example, by Mhashilkar et al (EMBO j.,14:1542-51,1995) and Wheeler et al (FASEB j.17:1733-5.2003). Transmembrane antibodies are cell permeable antibodies in which a Protein Transduction Domain (PTD) may be fused to a single chain variable fragment (scFv) antibody, as described, for example, in Heng et al (Med hypotheses.64:1105-8, 2005).
The antibody or antigen binding fragment may bind to an epitope. An epitope may be part of an antigen or other macromolecule capable of forming a binding interaction with a variable region binding pocket of an antigen binding molecule, such as an antibody or antigen binding fragment thereof. Such binding interactions may be manifested as intermolecular contacts with one or more amino acid residues of one or more CDRs. Antigen binding may involve, for example, CDR3 or CDR3 pairs or in some cases V H And V L Interactions of up to all six CDRs of the chain. An epitope may be a linear peptide sequence (i.e., "contiguous") or may be composed of non-contiguous amino acid sequences (i.e., "conformational" or "discontinuous"). An antigen binding molecule (such as an antibody or antigen binding fragment thereof) may recognize one or more amino acid sequences; thus, an epitope may define more than one distinct amino acid sequence. Epitopes recognized by antigen binding molecules (such as antibodies or antigen binding fragments thereof) can be determined by peptide mapping or sequence analysis techniques. In some embodiments, the binding interaction may be manifested as intermolecular contact between an epitope on the antigen and one or more amino acid residues of the CDRs. Epitopes recognized by antigen binding molecules such as antibodies or antigen binding fragments thereof may be determined, for example, by peptide mapping or sequence analysis techniques. Binding interactions may be manifested as intermolecular contacts between an epitope on an antigen and one or more amino acid residues of a Complementarity Determining Region (CDR).
Epitopes can be determined, for example, using one or more epitope mapping techniques. Epitope mapping may include experimentally identifying epitopes on an antigen. Epitope mapping may be performed by any acceptable method, such as X-ray co-crystallography, low temperature electron microscopy, array-based oligopeptide scanning, site-directed mutagenesis mapping, high throughput shotgun mutagenesis epitope mapping, hydrogen deuterium exchange, cross-linked coupled mass spectrometry, yeast display, phage display, proteolysis, or combinations thereof.
The antibody or antigen binding fragment thereof may comprise a V (D) J sequence. The variable region of each immunoglobulin heavy or light chain may be encoded by multiple subgenomic genes. These subgenomic may contain variable (V), diversity (D) and junction (J) fragments, and may be combined to produce V (D) J sequences. The heavy chain may comprise V, D and/or J fragments and the light chain may comprise V and/or J fragments. There are multiple copies of V, D and J gene fragments, and these copies may be placed in tandem in the genome of a mammal. In bone marrow, each developing B cell can assemble immunoglobulin variable regions, for example, by randomly selecting and combining one V, one D, and one J gene segment (or one V and one J segment in the light chain). Since there may be multiple copies of each type of gene fragment, and different combinations of gene fragments may be used to generate each immunoglobulin variable region, the process may generate a large number of antibodies with different paratopes and, in some embodiments, different antigen specificities. Rearrangement of several subgenoids of lambda light chain immunoglobulins (e.g., in the V2 family) can be combined with activation of microrna miR-650, which can further affect B cell biology.
The plurality of nucleic acid molecules may comprise a library of nucleic acid molecules. As one non-limiting example, the library of nucleic acid molecules may comprise complementary deoxyribonucleic acids (cDNA molecules). In some cases, the library of nucleic acid molecules may comprise a library of cDNA molecules. In some embodiments, the library of nucleic acid molecules may comprise a library of variants of the nucleic acid molecules. Variants of a nucleic acid molecule may comprise variants of a nucleic acid molecule encoding an amino acid sequence, such as an amino acid sequence of an antibody or antigen binding fragment thereof. In some embodiments, variants of the nucleic acid molecule may comprise a nucleic acid sequence encoding an amino acid sequence of a T cell receptor or a B cell receptor.
For example, a variant of an antibody or antigen binding fragment thereof may comprise a variant in the variable region. In some embodiments, the variant in the variable region may comprise a variant in the V sequence, a variant in the D sequence, a variant in the J sequence, or a combination thereof. Variants of an antibody or antigen-binding fragment thereof may have different specificities (e.g., have a specificity for different antigens) or different affinities (e.g., have different affinities for the same antigen).
The nucleic acid molecules of the plurality of nucleic acid molecules may comprise a plurality of nucleic acid molecules, which may comprise a nucleic acid sequence encoding a V amino acid sequence, a D amino acid sequence, a J amino acid sequence, or a combination thereof. In some embodiments, a nucleic acid molecule of the plurality of nucleic acid molecules may comprise a nucleic acid sequence encoding a V (D) J amino acid sequence. In some embodiments, the nucleic acid sequence of interest may comprise a nucleic acid sequence encoding a V (D) J amino acid sequence. In some embodiments, the different nucleic acid sequences of the plurality of nucleic acid molecules may comprise nucleic acid sequences encoding different V (D) J amino acid sequences. The different V (D) J amino acid sequences may comprise different V sequences, different D sequences, different J sequences, or combinations thereof.
The plurality of nucleic acid molecules may correspond to a plurality of cell surface proteins from a plurality of cells. In some cases, the plurality of cell surface proteins from a plurality of cells may be different. In some cases, the different cell surface protein may be a variation of the cell surface protein. Examples of cell surface proteins may include T cell receptors (e.g., of T cells), B cell receptors (e.g., of B cells), or antibodies or antigen binding fragments thereof. Cell surface proteins may be naturally occurring or synthetic. In some cases, the cell surface protein may be a modified native protein.
In some embodiments, providing the plurality of nucleic acid molecules may include generating the plurality of nucleic acid molecules. The plurality of nucleic acid molecules generated may comprise a plurality of identification sequences that identify the plurality of nucleic acid molecules.
The generation of a library of nucleic acid molecules can be accomplished, for example, by collecting the nucleic acid molecules and adding an identification sequence (e.g., including a barcode, unique molecular identifier, and/or template switching oligonucleotide as described herein) to the nucleic acid molecules. For example, a nucleic acid molecule encoding a T cell receptor, B cell receptor, or antigen or antibody fragment thereof can be isolated from a sample (e.g., a cell) and labeled with an identifying sequence using the methods provided herein. Libraries may be generated, for example, as described in U.S. patent No. 10,550,429, which is incorporated herein in its entirety. Once a library is formed, members of the library may be formed, and one of the libraries may be enriched using primers that are complementary to at least a portion of the identification sequence. In some embodiments, enriched members of the library (i.e., the nucleic acid sequence of interest) can be cloned and expressed, and in some cases, further analysis can be performed on the amino acid products of the nucleic acid sequence of interest (e.g., T cell receptor, B cell receptor, or antibody or antigen binding fragment thereof).
Enrichment
The nucleic acid sequence of interest may be enriched from a library of nucleic acid molecules. The library of nucleic acid molecules can be a cDNA library generated from single cells (e.g., B cells) from an immune repertoire of a subject. Such a library may be generated, for example, by: isolating and/or amplifying the RNA and reverse transcribing the RNA library to produce a cDNA library. The library may be a barcoded gene expression library generated from cells (e.g., B cells) separated by barcoded beads. Cells can be reverse transcribed for their RNA after lysis or permeabilization, and during reverse transcription, an identification sequence (e.g., a barcode sequence or unique molecular identification sequence) can be added thereto (e.g., to generate a full transcriptome barcoded gene expression library). See, for example, fig. 12B or fig. 13C. The library may be a sequencing library, depending on the sequences included in the identification sequence. Methods for enriching for a nucleic acid sequence of interest may include amplification reactions. Examples of amplification reactions may include linear amplification, polymerase Chain Reaction (PCR), and nested PCR. In some embodiments, different amplification reactions may be employed.
PCR may include denaturation, annealing, and extension steps. Denaturation may include exposing the nucleic acid to a temperature capable of melting the nucleic acid. In some cases, denaturation can occur between 94 ℃ and 98 ℃. In some cases, denaturation can occur at 94 ℃, 95 ℃, 96 ℃, 97 ℃, or 98 ℃. Denaturation can last for at least 15 seconds, at least 30 seconds, at least 45 seconds, at least 60 seconds, at least 75 seconds, at least 90 seconds, at least 105 seconds, at least 120 seconds, at least 135 seconds, at least 150 seconds, at least 165 seconds, or at least 180 seconds. Annealing may include exposing the melted nucleic acid to a temperature that may allow the primer to bind to the nucleic acid. In some cases, annealing may occur between 50 ℃ and 75 ℃. In some cases, annealing may occur between 55 ℃ and 70 ℃. In some cases, annealing may occur at 55 ℃, 56 ℃, 57 ℃, 58 ℃, 59 ℃, 60 ℃, 61 ℃, 62 ℃, 63 ℃, 64 ℃, 65 ℃, 66 ℃, 67 ℃, 68 ℃, 69 ℃, or 70 ℃. The anneal may last for at least 15 seconds, at least 30 seconds, at least 45 seconds, at least 60 seconds, at least 75 seconds, at least 90 seconds, at least 105 seconds, at least 120 seconds, at least 135 seconds, at least 150 seconds, at least 165 seconds, or at least 180 seconds. Extension may include exposing the nucleic acid to a temperature at which extension may occur, such that the nucleic acid is amplified, for example, by a polymerase present in the region containing the nucleic acid. The extension may occur between 65 ℃ and 75 ℃. In some cases, the extension may occur at 65 ℃, 66 ℃, 67 ℃, 68 ℃, 69 ℃, 70 ℃, 71 ℃, 72 ℃, 73 ℃, 74 ℃, or 75 ℃. The steps of denaturing, annealing and extending may be repeated for a number of cycles. In some cases, a PCR cycle may be performed for at least 1 cycle. In some cases, a PCR cycle may be performed for at least 5, 10, 15, 20, 25, 30, 35, or 40 cycles. In some cases, PCR cycles may be performed between 1 cycle and 40 cycles, between 1 cycle and 35 cycles, between 1 cycle and 30 cycles, between 1 cycle and 25 cycles, between 1 cycle and 20 cycles, between 1 cycle and 15 cycles, between 1 cycle and 10 cycles, between 1 cycle and 5 cycles, between 5 cycles and 40 cycles, between 5 cycles and 35 cycles, between 5 cycles and 30 cycles, between 5 cycles and 25 cycles, between 5 cycles and 20 cycles, between 5 cycles and 15 cycles, between 5 cycles and 10 cycles, between 10 cycles and 40 cycles, between 10 cycles and 35 cycles, between 10 cycles and 30 cycles, between 10 cycles and 25 cycles, between 10 cycles and 20 cycles, between 10 cycles and 15 cycles, between 15 cycles and 40 cycles, between 15 cycles and 35 cycles, between 15 cycles and 30 cycles, between 15 cycles and 15 cycles, between 15 cycles and 25 cycles, between 20 cycles and 35 cycles, between 20 cycles and 30 cycles, between 25 cycles and 35 cycles, between 20 cycles and 35 cycles, between 35 cycles.
Methods for enriching nucleic acid sequences of interest, such as amplification reactions, may include contacting the nucleic acid sequences of interest with a PCR reaction (e.g., reagents for a PCR reaction). In some cases, reagents for a PCR reaction may include a polymerase, one or more sets of primers, and dNTP mix. In some cases, for example in an amplification reaction, the polymerase may be a DNA polymerase, an RNA polymerase, or a reverse transcriptase. In some cases, a set of primers may include at least 2 primers that may be complementary to a region of a nucleic acid of interest, such that the region of the nucleic acid of interest may be amplified via PCR using the primer pair. In some cases, a partition may include at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 sets of dual primers. In some cases, a partition may include no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 sets of dual primers. In some cases, a partition may include one or more probes. The probe may be a DNA binding dye, hydrolysis probe, molecular beacon, two-hybrid probe, eclipse probe or ampliflouor probe. In some cases, the probe may be a SYBR Green probe, a Taqman probe, a Scorpions PCR primer, a LUX PCR primer, or a Qzyme PCR primer. In some cases, the probe may include a label that may be colored, opaque, radio-opaque, fluorescent, radioactive, or otherwise detectable. In some cases, the partition may include additional reagents, which may include magnesium, salts, glycerol, buffers, dyes, or other reagents. A first set of partitions and a second set of partitions may be obtained. In some cases, each of these partitions can include nucleic acid molecules (e.g., target nucleic acid molecules) that can be amplified and detected. A set of partitions may include multiple partitions. In some cases, a set of partitions may include at least 1, at least 10, at least 100, at least 1,000, at least 10,000, at least 100,000, at least 1,000,000, or at least 10,000,000 partitions. In some cases, a set of zones may include a set of droplets, such as aqueous droplets in an emulsion. For example, the first set of partitions may include a first set of droplets and the second set of partitions may include a second set of droplets.
Methods for enriching for a nucleic acid sequence of interest, such as an amplification reaction, may include contacting the nucleic acid sequence of interest with a nucleic acid primer. In some cases, the nucleic acid primer may be an oligonucleotide (e.g., a PCR primer) suitable for a PCR reaction.
The nucleic acid primer may comprise an oligonucleotide. An oligonucleotide may be a molecule that may be a strand of nucleotides. The oligonucleotides described herein may comprise ribonucleic acids. The oligonucleotides described herein may comprise deoxyribonucleic acid. In some cases, the oligonucleotide may have any sequence, including a user-specified sequence.
In some embodiments, the oligonucleotide may comprise G, A, T, U, C or bases capable of reliably base pairing with complementary nucleotides. 7-deaza-adenine, 7-deaza-guanine, adenine, guanine, cytosine, thymine, uracil, 2-deaza-2-thio-guanosine, 2-thio-7-deaza-guanosine, 2-thio-adenine, 2-thio-7-deaza-adenine, isoguanine, 7-deaza-guanine, 5, 6-dihydrouridine, 5, 6-dihydrothymine, huang Piao, 7-deaza-Huang Piao, hypoxanthine, 7-deaza-Huang Piao, 2,6 diamino-7-deaza-purine, 5-methyl-cytosine, 5-propynyl-uridine, 5-propynyl-cytidine, 2-thio-thymine, or 2-thio-uridine are examples of such bases, but many other bases are known. The oligonucleotide may comprise, for example, LNA, PNA, UNA or morpholino oligomers. An oligonucleotide as used herein may comprise natural or unnatural nucleotides or linkages.
The length of the oligonucleotide may be at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at least 95, or at least 100 nucleotides. In some cases, the length of the oligonucleotide may be between 10-30 nucleotides, between 10-50 nucleotides, between 10-70 nucleotides, between 10-100 nucleotides, between 20-50 nucleotides, between 20-70 nucleotides, between 20-100 nucleotides, between 30-50 nucleotides, between 30-70 nucleotides, between 30-100 nucleotides, between 40-70 nucleotides, between 40-100 nucleotides, between 50-70 nucleotides, between 50-100 nucleotides, between 60-70 nucleotides, between 60-80 nucleotides, between 60-90 nucleotides, or between 60-100 nucleotides. In some cases, the length of the oligonucleotide may be no more than 5, no more than 10, no more than 15, no more than 20, no more than 25, no more than 30, no more than 35, no more than 40, no more than 45, no more than 50, no more than 55, no more than 60, no more than 65, no more than 70, no more than 75, no more than 80, no more than 85, no more than 90, no more than 95, or no more than 100 nucleotides.
In some cases, the oligonucleotide may be fully single stranded. In some cases, the oligonucleotide may be partially double stranded. The partially double stranded region may be located at the 3 'end of the oligonucleotide, the 5' end of the oligonucleotide, or between the 5 'and 3' ends of the oligonucleotide. In some cases, more than one double stranded region may be present.
The method may include using a nucleic acid primer that is complementary to a portion of the nucleic acid sequence of interest (e.g., an identification sequence or a portion thereof). In some embodiments, the nucleic acid primer that is complementary to at least a portion of the identification sequence of the nucleic acid sequence of interest may be complementary to at least a portion of the barcode sequence, at least a portion of the template switching oligonucleotide sequence, at least a portion of the unique molecular identifier sequence, or a combination thereof. In some embodiments, the nucleic acid primer may be complementary to a barcode sequence and a read sequence of a nucleic acid sequence of interest, or a portion thereof. In some embodiments, the nucleic acid primer may be complementary to the barcode sequence and the unique molecular identifier sequence or a portion thereof. This can be performed, for example, to amplify a nucleic acid sequence of interest using an amplification reaction (e.g., PCR or nested PCR) as provided herein.
In some embodiments, the nucleic acid primer may further comprise a nucleic acid sequence that may be complementary to at least a portion of the coding sequence of the nucleic acid sequence of interest. In some embodiments, the nucleic acid primer may comprise a nucleic acid sequence that may be complementary to a variable region of a nucleic acid sequence of interest (such as a variable region of a T cell receptor, a variable region of a B cell receptor, or a variable region of an antigen or antibody binding fragment thereof). As an example, the nucleic acid primer may comprise a nucleic acid sequence complementary to a V (D) J sequence or a portion thereof, a V sequence of a V (D) J sequence or a portion thereof, a D sequence of a V (D) J sequence or a portion thereof, or a J sequence of a V (D) J sequence or a portion thereof. In some embodiments, the nucleic acid primer may be complementary to a portion of the variable region of the nucleic acid sequence of interest that is different from the different nucleic acid molecules.
In some embodiments, the nucleic acid primer may further comprise a non-binding handle. The non-binding handle may be a nucleic acid sequence on a nucleic acid primer that is not complementary to a fragment of a nucleic acid sequence of interest. In some embodiments, the non-binding handle may not bind any nucleic acid sequences in the plurality of nucleic acid molecules. In some embodiments, non-binding handles can be used to clone into a recipient vector or to achieve pairing of specific heavy/light chain (or TRA/TRB) sequences using overlap extension or similar methods after enrichment of the nucleic acid molecule.
In some embodiments, the methods provided herein may further comprise amplifying the nucleic acid sequence of interest using another nucleic acid primer, wherein the other nucleic acid primer is different from the nucleic acid primer. For example, the method can include using a first nucleic acid primer and a second nucleic acid primer (e.g., a forward primer and a reverse primer for a PCR reaction).
In some embodiments, another nucleic acid primer may comprise a non-binding handle. Such a non-binding handle may be a nucleic acid sequence on a nucleic acid primer that is not complementary to a fragment of a nucleic acid sequence of interest. In some embodiments, the non-binding handle may not bind any nucleic acid sequences in the plurality of nucleic acid molecules. In some embodiments, non-binding handles can be used to clone into a recipient vector or to achieve pairing of specific heavy/light chain (or TRA/TRB) sequences using overlap extension or similar methods after enrichment of the nucleic acid molecule.
In some methods, a nucleic acid primer and another (e.g., a second) nucleic acid primer (e.g., a forward primer and a reverse primer) can be configured to anneal to a sequence flanking at least a portion of the nucleic acid sequence of interest. For example, a nucleic acid primer can be configured to anneal to a sequence upstream of a nucleic acid sequence of interest, and a second nucleic acid primer can be configured to anneal to a complementary sequence of a sequence downstream of the nucleic acid sequence of interest. In some embodiments, two such nucleic acid primers may be configured to produce copies of a nucleic acid sequence of interest after an amplification reaction (such as PCR).
In some embodiments, the second nucleic acid primer may comprise a nucleic acid sequence that is complementary to a binding sequence on a nucleic acid sequence of interest or a complement thereof. Such binding sequences may be located on the coding segment of the nucleic acid sequence of interest, or upstream or downstream of the coding segment of the nucleic acid sequence of interest. In some embodiments, the second nucleic acid primer may be complementary to at least a portion of a nucleic acid sequence encoding a constant region of an amino acid sequence encoded by a nucleic acid sequence of interest (such as a T cell receptor, B cell receptor, or antibody or antigen binding fragment thereof) or a complement of such a nucleic acid sequence.
The second nucleic acid primer can be at least partially complementary to a variable region of a nucleic acid sequence of interest (e.g., a V (D) J sequence, a V sequence, a D sequence, or a J sequence). For example, the second nucleic acid primer may also be complementary to at least a portion of a nucleic acid sequence encoding a J region of an antibody or antigen binding fragment thereof.
The method may further comprise a second enrichment step, such as a second amplification reaction. The second amplification reaction may comprise linear amplification, PCR or another amplification protocol. In some embodiments, a second PCR reaction may be performed to enrich or further enrich the nucleic acid sequences of the plurality of nucleic acid molecules. As an example, a nested PCR protocol can be utilized to provide enrichment of nucleic acid sequences of interest.
The second round of amplification may include contacting the nucleic acid sequence with a third primer and a fourth primer. The third primer or the fourth primer may comprise an oligonucleotide as provided herein. The third primer and the fourth primer may be configured to specifically enrich the nucleic acid sequence. In some embodiments, the third primer may be different from the first primer, or the fourth primer may be different from the second primer.
The third primer may be complementary to at least a portion of the identification sequence. In some embodiments, the third primer may be complementary to a portion of a barcode of the identification sequence. In some cases, the third primer may be complementary to the 5' end of the barcode of the identification sequence.
In some embodiments, the third primer may be complementary to a portion of the identification sequence upstream of the barcode of the identification sequence. For example, the third primer may be complementary to at least a portion of the read sequence of the nucleic acid molecule. In some embodiments, the third primer may be complementary to at least a portion of a variable sequence, such as a nucleic acid sequence encoding a V (D) J sequence.
The fourth primer may be complementary to a complementary sequence of another fragment of the nucleic acid molecule such that the nucleic acid sequence of interest may be flanked by the third primer and the fourth primer. In some embodiments, the fourth primer may be complementary to a nucleic acid sequence downstream of the coding sequence of the nucleic acid sequence of interest. In some embodiments, the fourth primer may be complementary to at least a portion of the complement of the constant region of the nucleic acid sequence of interest.
In some methods, other nucleic acid molecules of the plurality of nucleic acid molecules may not be amplified. For example, a nucleic acid molecule that does not comprise a nucleic acid sequence of interest may not be amplified. In some methods, other nucleic acid molecules of the plurality of nucleic acid molecules can be amplified less than a threshold amount. For example, a nucleic acid sequence of interest may be amplified more than 100-fold, more than 1000-fold, more than 10,000-fold, more than 100,000-fold, more than 1,000,000-fold, or more than 10,000,000-fold as compared to other nucleic acid molecules in the plurality of nucleic acid molecules.
The methods provided herein may further comprise determining the level of enrichment of the nucleic acid sequence of interest. Enrichment can be determined, for example, by fluorescence, gel electrophoresis, sequencing, or another acceptable method for determining enrichment.
The nucleic acid sequence of interest may be enriched by at least 1000-fold, at least 10,000-fold, at least 100,000-fold, at least 1,000,000-fold, or at least 10,000,000-fold. In some embodiments, the nucleic acid sequence of interest may be enriched by a multiple sufficient to clone the nucleic acid sequence of interest. In some embodiments, the nucleic acid sequence of interest may be further enriched by the second amplification step by at least 1000-fold, at least 10,000-fold, at least 100,000-fold, at least 1,000,000-fold, or at least 10,000,000-fold.
Also provided herein are methods comprising enriching for a nucleic acid sequence of interest based on at least a portion of the constant region of the nucleic acid sequence of interest. The enrichment may result in an enriched nucleic acid sequence of interest. In some embodiments, the method may further comprise modification of the enriched nucleic acid sequence, thereby producing a modified enriched nucleic acid sequence. A graphical overview of this approach is provided in fig. 17. The modified enriched nucleic acid sequence may be compatible with the vector. For example, the modified enriched nucleic acid sequence can have a structure (e.g., a nucleic acid sequence) that can be directly incorporated into a vector. The vector may be a vector suitable for cloning or expression of the modified enriched nucleic acid sequence or other uses of the modified enriched nucleic acid sequence. The carrier may be any carrier described herein. A graphical overview of nucleic acid sequences compatible with the vector (including incorporation of the nucleic acid sequences into the vector) is provided in FIG. 18.
The nucleic acid sequence of interest may be a nucleic acid sequence as described herein. For example, a nucleic acid sequence of interest may encode at least a portion of a cell surface protein of a cell, such as a T cell receptor (or fragment thereof) or a B cell receptor (or fragment thereof). The nucleic acid sequence of interest may comprise a constant region. In some embodiments, a nucleic acid sequence of interest may comprise a sequence encoding a V (D) J sequence or a portion thereof, such as a V sequence (or portion thereof), a D sequence (or portion thereof), or a J sequence (or portion thereof) as described herein. In some embodiments, the constant region of a nucleic acid sequence of interest may comprise a sequence encoding a V (D) J sequence or a portion thereof, such as a V sequence (or portion thereof), a D sequence (or portion thereof), or a J sequence (or portion thereof) as described herein. In some embodiments, the nucleic acid sequence of interest may comprise a barcode (e.g., as provided herein), UMI (e.g., as provided herein), or a 5 'untranslated region (5' utr) of a gene of interest (e.g., TCR gene or BCR gene). In some embodiments, the nucleic acid sequence of interest may comprise complementary deoxyribonucleic acid (cDNA) of an RNA transcript of interest (e.g., a TCR or BCR transcript).
Enrichment may be performed using the first nucleic acid primer. The first nucleic acid primer may be complementary to a region of the nucleic acid sequence of interest. For example, the first nucleic acid primer may be at least complementary to a barcode or portion thereof on the nucleic acid sequence of interest. In some embodiments, the first nucleic acid primer may be complementary to a UMI sequence or a portion thereof on the nucleic acid sequence of interest. In some embodiments, the first nucleic acid primer may be complementary to at least a 5 'untranslated region (5' utr) or a portion thereof on the nucleic acid sequence of interest. In some embodiments, the first nucleic acid primer may be a backbone lead (FWR 1) primer.
Enrichment may be performed using a second nucleic acid primer. In some embodiments, the second nucleic acid primer can be used with the first nucleic acid primer to enrich for a nucleic acid sequence of interest. In some embodiments, the second nucleic acid primer may be at least complementary to a constant region on the nucleic acid sequence of interest or a portion thereof. In some embodiments, the second nucleic acid primer may be at least complementary to a V (D) J sequence or portion thereof on the nucleic acid sequence of interest. In some embodiments, the second nucleic acid primer may be at least complementary to a J sequence on the nucleic acid sequence of interest or a portion thereof. In some embodiments, the second nucleic acid primer may be complementary to at least the nucleic acid sequence of the junction region or portion thereof on the nucleic acid sequence of interest.
Enrichment may be performed using hybridization capture. In some embodiments, hybridization capture may be based on hybridization of a nucleic acid probe to a sequence (such as a constant sequence or a linker sequence) on the nucleic acid sequence of interest. For example, the probe may hybridize to a portion of a linker sequence, such as a V (D) J sequence or a portion thereof, such as a V sequence or a portion thereof, a D sequence or a portion thereof, or a J sequence or a portion thereof. In some embodiments, the probe may hybridize to a V sequence and a D sequence (or a portion thereof) or a D sequence and a J sequence (or a portion thereof). The probes can contain functional groups (such as biotin molecules) to effect purification of hybridized target nucleic acid molecules (e.g., using streptavidin conjugated beads such as magnetic beads). See, for example, fig. 19.
The nucleic acid primers used for enrichment may be selected based on Rapid Amplification of CDNA Ends (RACE) sequencing. RACE sequencing may be a technique for obtaining the sequence (e.g., 5' RACE) of a nucleic acid (e.g., RNA transcript) such as a nucleic acid (e.g., RNA transcript) present in a cell. RACE sequencing can allow for the generation of cDNA copies of a sequence of interest generated by reverse transcription followed by PCR amplification of the cDNA copies (see RT-PCR). Amplified cDNA copies can be sequenced and can be mapped to unique genomic regions. In some embodiments, the RACE product may be sequenced by next generation sequencing techniques.
In some embodiments, the method can further comprise cloning the modified enriched nucleic acid into a vector (such as a vector with which the modified enriched nucleic acid sequence is compatible). Cloning can be performed using any acceptable method, including the methods provided herein (e.g., in the cloning section).
Nucleic acid primers should not be construed as specific for a particular nucleic acid strand. For example, in some embodiments, the first nucleic acid molecule can be complementary to a complement of an identification sequence as described herein. In some embodiments, the second nucleic acid molecule may be complementary to a binding sequence as designed herein.
In further embodiments of enriching for a nucleic acid sequence of interest (wherein the nucleic acid sequence of interest is a BCR or a fragment thereof), the enriching can be performed via a first amplification reaction and a second amplification reaction. The first reaction may be performed using a first primer and a second primer, wherein: (i) The first primer has a sequence complementary to at least a portion of the barcode sequence and/or the UMI sequence; and (ii) the second primer has a sequence complementary to at least a portion of the nucleic acid sequence of interest encoding the ligation (J) region and/or the isoform region of BCR or a fragment thereof. The second reaction may be performed using a third primer and a fourth primer, wherein: (i) The third primer comprises a sequence complementary to a leader sequence of BCR or a fragment thereof and/or a nucleotide encoding at least a portion of a framework region (FWR) 1 of BCR or a fragment thereof, and (ii) the fourth primer comprises a sequence complementary to at least a portion of a nucleic acid sequence of interest encoding a junction between BCR or a fragment thereof, complementarity region (CDR) 3, FWR4, J region, D region, and/or V region, or any one or more thereof.
In certain embodiments of the method, in the first amplification reaction, the first primer may comprise a sequence complementary to at least a portion of the barcode sequence and the UMI sequence. In certain other embodiments, the first primer may include a sequence complementary to the barcode sequence and the UMI sequence. In some embodiments, the second primer may comprise a sequence complementary to a nucleic acid sequence of interest encoding at least a portion of a J region of a BCR or fragment thereof. In other embodiments, the second primer may comprise a sequence complementary to a nucleic acid sequence of interest encoding at least a portion of an isoform region of BCR or a fragment thereof. In still other embodiments, the second primer may comprise a sequence complementary to a nucleic acid sequence of interest encoding at least a portion of the J region and the isoform region of BCR or a fragment thereof. In certain embodiments of the method, in the first amplification reaction, the first primer may comprise a sequence complementary to at least a portion of the barcode sequence and the UMI sequence, and the second primer may comprise a sequence complementary to a nucleic acid sequence of interest encoding at least a portion of the J region and the isoform region of BCR or a fragment thereof.
In certain embodiments of the method, in the second amplification reaction, the third primer may comprise a sequence complementary to a leader sequence of BCR or a fragment thereof or a nucleotide encoding at least a portion of FWR1 of BCR or a fragment thereof. In other certain embodiments of the method, the third primer may comprise a sequence complementary to a nucleotide encoding at least a portion of FWR1 of BCR or a fragment thereof. In other embodiments of the method, the fourth primer may comprise a sequence complementary to a sequence encoding CDR3 and a complement of at least a portion of a nucleic acid sequence of interest extending into the junction in the J region of the BCR or fragment thereof. In still other embodiments of the method, the fourth primer may comprise a sequence complementary to at least a portion of a nucleic acid sequence of interest encoding the D region and the J region of the BCR or fragment thereof or the junction between the D region and the J region. In yet other embodiments of the method, the fourth primer may comprise a sequence complementary to at least a portion of a nucleic acid sequence of interest encoding the V region and the J region of the BCR or fragment thereof or the junction between the V region and the J region. In still other embodiments of the method, the fourth primer may comprise a sequence complementary to at least a portion of a nucleic acid sequence of interest encoding the V region and the D region of the BCR or fragment thereof or the junction between the V region and the D region. In yet further embodiments of the method, the fourth primer may comprise a sequence complementary to at least a portion of a nucleic acid sequence of interest encoding a V region, a D region, and a J region of a BCR or fragment thereof. In certain embodiments of the method, in the second amplification reaction, the third primer may comprise a sequence complementary to a leader sequence of the BCR or fragment thereof or a nucleotide encoding at least a portion of FWR1 of the BCR or fragment thereof, and the fourth primer may comprise a sequence complementary to a sequence encoding CDR3 and at least a portion of a nucleic acid sequence of interest extending into a junction in the J region of the BCR or fragment thereof.
In one embodiment of the method, the first amplification reaction may employ a first primer comprising a sequence complementary to at least a portion of the barcode sequence and the UMI sequence and a second primer comprising a sequence complementary to a nucleic acid sequence of interest encoding at least a portion of the J region and the isoform region of BCR or a fragment thereof. The first amplification may be followed by a second amplification reaction that may employ a third primer comprising a sequence complementary to a leader sequence of the BCR or fragment thereof or a nucleotide encoding at least a portion of FWR1 of the BCR or fragment thereof, and a fourth primer comprising a sequence complementary to a sequence encoding CDR3 and at least a portion of a nucleic acid sequence of interest extending into a junction in the J region of the BCR or fragment thereof.
Modification of enriched nucleic acid sequences of interest
Further modification of the nucleic acid sequence of interest may be performed, for example, after the nucleic acid sequence of interest has been enriched. In some embodiments, modification of the nucleic acid sequence of interest may be performed in order to prepare for analysis of the nucleic acid sequence of interest, analyze the nucleic acid sequence of interest, or prepare the nucleic acid sequence of interest for cloning.
The method may further comprise performing fragmentation of the nucleic acid sequence of interest. Nucleic acid fragmentation (e.g., footprinting), such as by OH radicals, can be a tool to probe nucleic acid structures and nucleic acid-protein interactions. Such methods can provide structural information at single base pair resolution. The footprint may refer to an assay in which binding of the ligand to a specific base sequence or conformation of the nucleic acid inhibits cleavage of the phosphodiester backbone of the nucleic acid polymer. Close interactions between proteins and nucleic acids can be widely examined by footprint. A prerequisite for such an assay may be the ability to generate and detect high quality nucleic acid fragmentation around the protected region of the protein. Nucleic acid fragmentation can be achieved by using a variety of enzymatic and chemical reagents. This can be highly relevant to the development of chemical hydroxyl radical footprinting using Fenton chemistry and peroxynitrous acid. Hydroxyl radicals can cause cleavage of the phosphodiester backbone in a non-specific sequence manner, and thus can be used in footprinting assays. The use of hydroxyl radical methods rather than enzymatic footprinting may be advantageous because it may provide greater sensitivity to nucleic acid structures, such as sequence dependent bending and RNA folding.
The method may further comprise adding an A tail to the nucleic acid sequence of interest. Tailing may include enzymatic methods for adding non-template nucleotides to the 3' end of a blunt-ended, double-stranded DNA molecule. Tailing may be performed to prepare a T-vector for TA cloning or to subject PCR products produced by high fidelity polymerase (e.g., other than Taq) for TA cloning. TA cloning can be a rapid method of cloning PCR products that can utilize the complementary T (thymidine) of the T-vector to stabilize the single base extension (adenosine) produced by Taq polymerase prior to ligation and transformation. This technique may not utilize restriction enzymes and may directly use PCR products without modification. Furthermore, in some embodiments, the PCR primers do not have to be designed with restriction sites, making the process less complex. In some embodiments, the addition of a tail may be non-directional, meaning that the insertion sequence may enter the vector in both orientations.
The method may further comprise performing a sample-indexed polymerase chain reaction (SI-PCR) on the nucleic acid sequence of interest. SI-PCR can utilize different pairs of index primers on a nucleic acid molecule. In some cases, the index primer may be beaded on the individual sample after the second thermal cycling step (e.g., after initial amplification of the target). This may allow for mixing together a number of samples (e.g., up to 96) and sequencing the samples simultaneously. For example, after sequencing on Illumina MiSeq, the software can identify these indices on each sequence read, in some cases allowing the reads of each different nucleic acid molecule to be separated.
The method may further comprise V (D) J enrichment of the nucleic acid sequence of interest. This may be accomplished, for example, using PCR or another amplification method to amplify V (D) J sequences or fragments thereof from the enriched nucleic acid sequence of interest.
Modification of the nucleic acid sequence of interest or the enriched nucleic acid sequence of interest may include adding a Gibson terminus to the amplified nucleic acid sequence. The addition of a Gibson terminus (e.g., gibson assembly) may allow cloning or ligation of two nucleic acid sequences without restriction sites. In some cases, the addition of a Gibson terminus may allow ligation of any two fragments without regard to sequence. Gibson assembly can be performed in a manner that does not leave scars (scars) between the linked nucleic acid sequences. Multiple (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) fragments may be combined using Gibson assembly. The Gibson assembly may be performed, for example, as described in the following documents: gibson DG, young L, chuang RY, venter JC, hutchison CA3rd, smith HO. Enzymatic assembly of DNAmolecules up to several hundred Kilobases. Nat methods.2009;6 (5): 343-345.Doi:10.1038/nmeth.1318, which is incorporated herein by reference in its entirety.
Gibson assembly can combine multiple DNA fragments simultaneously, e.g., based on sequence identity. The DNA fragments contain about 20-40 base pairs overlapping adjacent DNA fragments. These DNA fragments may be mixed with one or more enzymes (e.g., a mixture of 3 enzymes) and other buffer components. In some embodiments, these enzymes may include exonucleases, DNA polymerases, and DNA ligases.
During Gibson assembly, modification of the nucleic acid sequence of interest or the enriched nucleic acid sequence of interest may include combining a second nucleic acid of interest with the nucleic acid of interest or the enriched nucleic acid of interest. In some embodiments, the second nucleic acid sequence of interest may be enriched. Such combinations may include, for example, ligating a nucleic acid sequence of interest or an enriched nucleic acid sequence of interest to a second nucleic acid sequence of interest using one or more overlapping extension primers. In some cases, such cases can include ligating the second nucleic acid sequence of interest to the nucleic acid sequence of interest or the enriched nucleic acid sequence of interest using a nucleic acid linker.
During Gibson assembly, the second nucleic acid sequence of interest may be a nucleic acid sequence as described herein. For example, the second nucleic acid sequence of interest may encode at least a portion of a cell surface protein of a cell, such as a T cell receptor (or fragment thereof) or a B cell receptor (or fragment thereof). In some cases, the enriched nucleic acid sequence of interest may comprise one strand of a T cell receptor (or fragment thereof) or a first strand of a B cell receptor (or fragment thereof), while the second nucleic acid sequence of interest may comprise a second strand of a T cell receptor (or fragment thereof) or a B cell receptor (or fragment thereof). The second nucleic acid sequence of interest may comprise a constant region. In some embodiments, the second nucleic acid sequence of interest may comprise a sequence encoding a V (D) J sequence or a portion thereof, such as a V sequence (or portion thereof), a D sequence (or portion thereof), or a J sequence (or portion thereof) as described herein. In some embodiments, the constant region of the second nucleic acid sequence of interest may comprise a sequence encoding a V (D) J sequence or a portion thereof, such as a V sequence (or portion thereof), a D sequence (or portion thereof), or a J sequence (or portion thereof) as described herein. In some embodiments, the second nucleic acid sequence of interest may comprise a barcode (e.g., as provided herein), UMI (e.g., as provided herein), or a 5 'untranslated region (5' utr). In some embodiments, the second nucleic acid sequence of interest may comprise complementary deoxyribonucleic acid (cDNA).
During Gibson assembly, exonucleases can back-page the DNA from the 5' end and in some cases do not inhibit polymerase activity, allowing the reaction to proceed in a single process. The resulting single stranded region on adjacent DNA fragments may be annealed. The DNA polymerase may incorporate nucleotides to fill any gaps. The DNA ligase may covalently join adjacent fragments of DNA, thereby removing any gaps in the DNA. Linear or closed cyclic molecules can be assembled. In some embodiments, the Gibson assembly may be performed using PCR.
Cloning
Existing antibody cloning methods can be time consuming or difficult and can require considerable automation and expensive reagents based on plates to succeed. Manpower may be used instead, but this may become impractical when cloning thousands of antibodies, which may be a common procedure, for example, during pandemic antibody discovery work and during antibody discovery programs for pharmaceutical research. Methods for enriching nucleic acid sequences, including those encoding antibodies or fragments thereof, as well as methods for cloning those provided herein, can utilize the cDNA and amplified sequences to efficiently recover a target of interest from one or more single cell libraries. For example, using primers and probes as provided herein, specific BCR/antibody nucleic acid sequences of interest can be specifically enriched from a complex, pooled cDNA library. Primers and probes encoding a BCR/antibody target of interest at, for example, a junction to or including a J region sequence have been demonstrated to selectively enrich for specific BCR/antibodies (including libraries from pooled donor samples containing B cells expressing BCR/antibodies of many different sequences, e.g., from sequencing libraries prepared from pooled donor samples containing thousands (e.g., 5,000-10,000) BCR sequences). Thus, using the methods described herein, a user can sequence thousands to tens of thousands of antibodies and target the antibodies for recovery and cloning. This may be particularly powerful when combined with other components provided herein (e.g., barcoding) to screen, for example, antigen specificity or other multi-sets of chemical data.
The methods provided herein may further comprise cloning the nucleic acid sequence of interest into a vector. A vector may be a nucleic acid (e.g., DNA) molecule that serves as a vehicle to artificially carry foreign genetic material into a cell in which the foreign genetic material may be replicated and/or expressed. Examples of vectors may include viral vectors, plasmids, phages, cosmids or artificial chromosomes.
In some embodiments, the vector may be modified by the addition of genetic material encoding a protein. For example, a vector may comprise a nucleic acid sequence that may be combined with a nucleic acid sequence of interest. For example, a vector may comprise a nucleic acid sequence that may be combined with a nucleic acid sequence of interest to produce a protein of interest (such as an antibody or antigen binding fragment thereof, a T cell receptor, or a B cell receptor). For example, the vector may comprise at least a portion of a constant region of a T cell receptor, B cell receptor, or antibody or antigen binding fragment thereof.
In some embodiments, the vector may comprise a promoter. The promoter may be a DNA sequence to which one or more proteins may bind, thereby initiating transcription of a single RNA from DNA downstream thereof. The RNA may encode a protein, or may itself have a function, such as tRNA, mRNA, or rRNA. The promoter is located near the transcription initiation site of the gene upstream of the DNA (toward the 5' region of the sense strand). Promoters may be about 100-1000 base pairs in length. Examples of the promoter may include a bacterial promoter or a eukaryotic promoter.
In some embodiments, cloning may include vector restriction digestion (e.g., cleavage of the nucleic acid sequence of the vector at a restriction site or site recognized by a restriction enzyme). Restriction digestion of the vector may include digestion of the vector at restriction sites. The restriction site may be a DNA sequence on the vector, which may contain a specific nucleotide sequence (e.g., 4-8 bases long) that can be recognized by the restriction enzyme. In some embodiments, the restriction site may be a palindromic sequence. In some embodiments, a restriction enzyme (e.g., a restriction enzyme that recognizes a restriction site) can cleave a sequence between two nucleic acids within or near the restriction site. An example of a restriction site may be an fspI restriction site that can be recognized by an fspI restriction enzyme, for example. Non-limiting examples of restriction sites that can be employed are provided in Table 1.
Table 1: restriction enzymes and recognition sequences therefor
/>
Cloning vectors may have features that may allow for convenient insertion and removal of genes into the vector. Examples may include Multiple Cloning Sites (MCS) or polylinkers, which may contain unique restriction sites. The restriction sites in the MCS may first be cut by a restriction enzyme, and then a DNA ligase is used to ligate the PCR amplified target gene (e.g., a nucleic acid sequence of interest) that is also digested with the same enzyme into the vector. It may be inserted into the carrier in a specific orientation, if so desired. Restriction sites can also be used for subcloning into another vector, if necessary.
Other cloning vectors may employ topoisomerase instead of ligase and cloning may be performed more rapidly without the need for restriction digestion of the vector or insert. In this TOPO cloning method, a linearized vector may be activated by attaching topoisomerase I to its ends, and then this "TOPO-activated" vector may receive PCR products by: in this process, both 5' ends of the PCR product are ligated, releasing the topoisomerase and forming a circular vector. Another method of cloning without the use of DNA digestion and ligase may be by DNA recombination, for example as used in Gateway cloning systems. Once cloned into a cloning vector, the gene can be conveniently introduced into a variety of expression vectors by recombination.
The vector may comprise a reporter gene. Reporter genes can be used in some cloning vectors to facilitate screening for successful clones by using the characteristics of these genes, which allow easy identification of successful clones. Such features may include an alpha-complementary lacZ alpha fragment for use in blue-white selection, and/or a marker gene or reporter gene in the same reading frame as and flanking the MCS to facilitate the production of the fusion protein. Examples of fusion partners that may be used in the screening may include Green Fluorescent Protein (GFP) and luciferase.
In some embodiments, cloning may include combining two or more nucleic acid sequences. For example, two or more nucleic acid sequences may be linked to produce a coding sequence for an amino acid sequence of interest (e.g., a T cell receptor, a B cell receptor, or an antibody or antigen binding fragment thereof). The two or more nucleic acid sequences may comprise the nucleic acid sequence of the heavy chain and the nucleic acid sequence of the light chain of an antibody or antigen binding fragment. The two or more nucleic acid sequences may comprise a nucleic acid sequence of an alpha chain of a T cell receptor and a nucleic acid sequence of a beta chain of a T cell receptor. In this method, the whole antibody or antigen-binding fragment thereof, B cell receptor, T cell receptor, or other amino acid may be cloned into a single vector and expressed as a single nucleic acid sequence or amino acid sequence.
After cloning, the nucleic acid sequence of interest or the amino acid product of the nucleic acid sequence of interest can be expressed. Expression may be performed in any acceptable expression system, including bacterial, yeast, insect, viral, or mammalian cell expression systems. In some embodiments, expression may be performed in a living animal.
Protein products of the nucleic acid sequences of interest can be analyzed. For example, protein products can be analyzed for affinity, specificity, enzymatic activity, solubility, stability, or other properties. Examples of assays may include ELISA, western blot, enzymatic assay, dot blot, bradford protein assay, neutralization assay, immunoassay, or another assay.
System and method for compartmentalization of samples
In one aspect, the systems and methods described herein provide for compartmentalization, deposition, or separation of one or more particles (e.g., biological particles, macromolecular components of biological particles, beads, reagents, etc.) into discrete compartments or partitions (interchangeably referred to herein as partitions), wherein each partition maintains its own content separate from the content of the other partitions. The partitions may be droplets in an emulsion. A partition may include one or more other partitions.
A partition may include one or more particles. A partition may include one or more types of particles. For example, the partitions of the present disclosure may include one or more biological particles and/or macromolecular components thereof. The partition may comprise one or more gel beads. A partition may include one or more cell beads. The partition may comprise a single gel bead, a single cell bead, or both a single cell bead and a single gel bead. The partition may include one or more reagents. Alternatively, the partition may be unoccupied. For example, a partition may not include beads. The cell beads may be biological particles and/or one or more of their macromolecular components encapsulated inside a gel or polymer matrix, such as via polymerization of droplets comprising biological particles and precursors capable of being polymerized or gelled. Unique identifiers such as bar codes may be injected into the droplets, such as via microcapsules (e.g., beads), before, after, or simultaneously with droplet generation, as described elsewhere herein. Microfluidic channel networks (e.g., on a chip) may be used to generate partitions, as described herein. Alternative mechanisms may also be employed in the separation of individual biological particles, including porous membranes through which an aqueous mixture of cells is extruded into a non-aqueous fluid.
The partitions may be flowable in the fluid flow. The partition may include, for example, microcapsules having an outer barrier surrounding an inner fluid center or core. In some cases, a partition may include a porous matrix capable of entraining and/or retaining material within its matrix. The partitions may be droplets of a first phase within a second phase, wherein the first phase and the second phase are immiscible. For example, the partitions may be droplets of an aqueous fluid within a non-aqueous continuous phase (e.g., an oil phase). In another example, the partitions may be droplets of a non-aqueous fluid within the aqueous phase. In some examples, the partitions may be provided in the form of a water-in-oil emulsion or an oil-in-water emulsion. A number of different containers are described, for example, in U.S. patent application publication No. 2014/0155295, which is incorporated herein by reference in its entirety for all purposes. Emulsion systems for producing stable droplets in a non-aqueous or oil continuous phase are described, for example, in U.S. patent application publication No. 2010/0105112, which is incorporated herein by reference in its entirety for all purposes.
In the case of droplets in an emulsion, in one non-limiting example, the distribution of individual particles to discrete partitions may be achieved by introducing a flowing stream of particles in an aqueous fluid into a flowing stream of a non-aqueous fluid such that droplets are generated at the junction of the two streams. Fluid properties (e.g., fluid flow rate, fluid viscosity, etc.), particle properties (e.g., volume fraction, particle size, particle concentration, etc.), microfluidic architecture (e.g., channel geometry, etc.), and other parameters may be adjusted to control the occupancy of the resulting zones (e.g., number of biological particles per zone, number of beads per zone, etc.). For example, zone occupancy may be controlled by providing a flow of water at a concentration and/or flow rate of particles. To generate single multiparticulate zones, the relative flow rates of the immiscible fluids may be selected such that each of the zones may contain on average less than one multiparticulate to ensure that those occupied are predominantly single occupied. In some cases, a partition of the plurality of partitions may contain at most one biological particle (e.g., a bead, DNA, cell, or cellular material). In some embodiments, various parameters (e.g., fluidic characteristics, particle characteristics, microfluidic architecture, etc.) may be selected or adjusted such that a majority of the zones are occupied, e.g., only a small percentage of the unoccupied zones are allowed. The flow and channel architecture may be controlled to ensure that a given number of single occupied partitions is less than a certain level of unoccupied partitions and/or less than a certain level of multiple occupied partitions.
Fig. 1 shows an example of a microfluidic channel structure 100 for separating individual biological particles. The channel structure 100 may include channel segments 102, 104, 106, and 108 that communicate at a channel connection 110. In operation, a first aqueous fluid 112 comprising suspended biological particles (or cells) 114 may be transported along the channel segment 102 into the junction 110, while a second fluid 116, which is immiscible with the aqueous fluid 112, is delivered from each of the channel segments 104 and 106 to the junction 110 to create discrete droplets 118, 120 of the first aqueous fluid 112 that flow into the channel segment 108 and away from the junction 110. The channel segment 108 may be fluidly coupled to an outlet reservoir in which discrete droplets may be stored and/or harvested. The discrete droplets generated may include individual biological particles 114 (such as droplets 118). The discrete droplets generated may include more than one individual biological particle 114 (not shown in fig. 1). Discrete droplets may be free of biological particles 114 (such as droplets 120). Each discrete partition may keep its own contents (e.g., individual biological particles 114) separate from the contents of the other partitions.
The second fluid 116 may comprise an oil, such as a fluorinated oil, that includes a fluorosurfactant for stabilizing the resulting droplets (e.g., inhibiting subsequent coalescence of the resulting droplets 118, 120). Examples of particularly useful spacer fluids and fluorosurfactants are described, for example, in U.S. patent application publication No. 2010/0105112, which is incorporated herein by reference in its entirety for all purposes.
It should be appreciated that the channel segments described herein may be coupled to any of a variety of different fluid sources or receiving components, including reservoirs, pipes, manifolds, or other system fluid components. It should be understood that the microfluidic channel structure 100 may have other geometries. For example, a microfluidic channel structure may have more than one channel connection. For example, a microfluidic channel structure may have 2, 3, 4, or 5 channel segments each carrying particles (e.g., biological particles, cell beads, and/or gel beads), which meet at a channel junction. Fluid may be directed to flow along one or more channels or reservoirs via one or more fluid flow units. The fluid flow unit may include a compressor (e.g., providing positive pressure), a pump (e.g., providing negative pressure), an actuator, etc., to control the flow of fluid. The fluid may also or alternatively be controlled via an applied pressure differential, centrifugal force, electric pumping, vacuum, capillary or gravity flow, or the like.
The generated droplets may include two subgroups of droplets: (1) An occupied droplet 118 comprising one or more biological particles 114, and (2) an unoccupied droplet 120 that does not comprise any biological particles 114. Occupied droplets 118 can include single occupied droplets (with one biological particle) and multiple occupied droplets (with more than one biological particle). As described elsewhere herein, in some cases, each occupied partition of the majority of occupied partitions may include no more than one bio-particle, and some generated partitions may be unoccupied (of any bio-particle). However, in some cases, some occupied zones may include more than one biological particle. In some cases, the partitioning process may be controlled such that less than about 25% of the occupied partition contains more than one biological particle, and in many cases less than about 20% of the occupied partition has more than one biological particle, and in some cases less than about 10% or even less than about 5% of each partition of the occupied partition includes more than one biological particle.
In some cases, it may be desirable to minimize the generation of an excessive number of empty partitions in order to reduce costs and/or increase efficiency. While such miniaturization may be achieved by providing a sufficient number of bio-particles (e.g., bio-particles 114) at the partition connection 110 to ensure that at least one bio-particle is enclosed in a partition, a poisson distribution may be expected to increase the number of partitions comprising multiple bio-particles. Thus, in the case where a single occupied partition is to be obtained, up to about 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5% or less of the generated partition may be unoccupied.
In some cases, the flow of one or more biological particles (e.g., in channel segment 102) or other fluids directed into the partition junction (e.g., in channel segments 104, 106) may be controlled such that, in many cases, no more than about 50% of the generated partitions, no more than about 25% of the generated partitions, or no more than about 10% of the generated partitions are unoccupied. These flows may be controlled so as to present a non-poisson distribution of single occupied partitions while providing a lower level of unoccupied partitions. The above ranges of unoccupied partitions can be achieved while still providing any of the single occupancy described above. For example, in many cases, use of the systems and methods described herein can result in a resulting partition having multiple occupancy of less than about 25%, less than about 20%, less than about 15%, less than about 10%, and in many cases less than about 5%, while having unoccupied partitions of less than about 50%, less than about 40%, less than about 30%, less than about 20%, less than about 10%, less than about 5%, or less.
It should be understood that the above occupancy rates also apply to partitions comprising both biological particles and additional reagents, including but not limited to microcapsules or beads (e.g., gel beads) carrying barcoded nucleic acid molecules (e.g., oligonucleotides) (described with respect to fig. 2). Occupied partitions (e.g., at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 99% occupied partitions) can include both microcapsules (e.g., beads) and biological particles containing barcoded nucleic acid molecules.
In another aspect, in addition to or instead of droplet-based separation, the biological particles may be encapsulated within microcapsules comprising a shell, layer or porous matrix having one or more individual biological particles or small groups of biological particles entrained therein. In some embodiments, biological particles (e.g., cells) are included within (e.g., encapsulated within) a particulate matter to form "cell beads. The microcapsules (e.g., cell beads) may include other reagents. Encapsulation of biological particles can be performed by a variety of processes. Such processes may combine an aqueous fluid containing biological particles with a polymer precursor material that may be capable of forming into a gel or other solid or semi-solid matrix upon application of a specific stimulus to the polymer precursor. Such stimuli can include, for example, thermal stimuli (e.g., heating or cooling), photo stimuli (e.g., by photo-curing), chemical stimuli (e.g., by crosslinking, polymerization initiation of precursors (e.g., by added initiators)), mechanical stimuli, or combinations thereof.
The preparation of microcapsules containing biological particles can be performed by a variety of methods. For example, an air knife droplet or aerosol generator may be used to disperse droplets of a precursor fluid into the gelling solution to form microcapsules comprising individual biological particles or small groups of biological particles. Likewise, a film-based encapsulation system may be used to create microcapsules comprising encapsulated biological particles as described herein. Microfluidic systems of the present disclosure such as that shown in fig. 1 can be readily used to encapsulate cells as described herein. In particular, and with reference to fig. 1, an aqueous fluid 112 comprising (i) biological particles 114 and (ii) a polymeric precursor material (not shown) flows into a channel connection 110 where it is separated into droplets 118, 120 by the flow of a non-aqueous fluid 116. In the case of encapsulation methods, the non-aqueous fluid 116 may also include an initiator (not shown) to cause polymerization and/or crosslinking of the polymer precursor to form microcapsules comprising entrained biological particles. Examples of polymer precursor/initiator pairs include those described in U.S. patent application publication No. 2014/0378345, which is incorporated herein by reference in its entirety for all purposes.
For example, where the polymer precursor material comprises a linear polymer material such as linear polyacrylamide, PEG, or other linear polymer material, the activator may comprise a cross-linking agent, or a chemical that activates the cross-linking agent within the formed droplets. Also, for a polymer precursor including a polymerizable monomer, the activator may include a polymerization initiator. For example, in some cases, when the polymer precursor includes a mixture of acrylamide monomer and N, N' -bis- (acryloyl) cystamine (BAC) comonomer, an agent, such as tetraethyl methyl diamine (TEMED), may be provided within the second fluid stream 116 in the channel sections 104 and 106, which may initiate copolymerization of acrylamide and BAC into a crosslinked polymer network or hydrogel.
As the second fluid stream 116 contacts the first fluid stream 112 at the junction 110 during droplet formation, TEMED may diffuse from the second fluid stream 116 into the aqueous fluid 112 comprising linear polyacrylamide, which will activate cross-linking of the polyacrylamide within the droplets 118, 120, resulting in the formation of gel (e.g., hydrogel) microcapsules in the form of solid or semi-solid beads or particles that entrain the cells 114. Although described in terms of polyacrylamide encapsulation, other "activatable" encapsulation compositions may also be employed in the context of the methods and compositions described herein. For example, alginate droplets are formed and subsequently exposed to divalent metal ions (e.g., ca 2+ Ions) may be used as an encapsulation process using the process. Also, agarose droplets can be converted into capsules by temperature-based gelation (e.g., after cooling, etc.).
In some cases, the encapsulated biological particles may be selectively released from the microcapsules, such as through the passage of time or upon application of a particular stimulus, which degrades the microcapsules sufficiently to allow release of the biological particles (e.g., cells) or other contents thereof from the microcapsules, such as into a partition (e.g., a droplet). For example, in the case of the polyacrylamide polymers described above, degradation of the microcapsules can be achieved by the introduction of a suitable reducing agent such as DTT or the like to cleave the disulfide bonds that crosslink the polymer matrix. See, for example, U.S. patent application publication No.2014/0378345, which is incorporated herein by reference in its entirety for all purposes.
The biological particles may be subjected to other conditions sufficient to polymerize or gel the precursor. Conditions sufficient to polymerize or gel the precursor may include exposure to heat, cooling, electromagnetic radiation, and/or light. The conditions sufficient to polymerize or gel the precursor may include any conditions sufficient to polymerize or gel the precursor. After polymerization or gelation, a polymer or gel may be formed around the biological particles. The polymer or gel may be diffusion permeable to chemical or biochemical agents. The polymer or gel may be diffusion impermeable to the macromolecular components of the biological particle. In this way, the polymer or gel may act to subject the biological particles to chemical or biochemical manipulations while spatially confining the macromolecular composition to the region of the droplet defined by the polymer or gel. The polymer or gel may comprise one or more of disulfide-crosslinked polyacrylamide, agarose, alginate, polyvinyl alcohol, polyethylene glycol (PEG) -diacrylate, PEG-acrylate, PEG-thiol, PEG-azide, PEG-alkyne, other acrylate, chitosan, hyaluronic acid, collagen, fibrin, gelatin, or elastin. The polymer or gel may comprise any other polymer or gel.
The polymer or gel may be functionalized to bind to a targeted analyte, such as a nucleic acid, protein, carbohydrate, lipid, or other analyte. The polymer or gel may polymerize or gel via a passive mechanism. The polymer or gel may be stable under alkaline conditions or at elevated temperatures. The polymer or gel may have mechanical properties similar to those of the beads. For example, the polymer or gel may have a similar size as the beads. The polymer or gel may have a mechanical strength (e.g., tensile strength) similar to that of the beads. The polymer or gel may have a density lower than the oil. The density of the polymer or gel may be substantially similar to the density of the buffer. The polymer or gel may have an adjustable pore size. The pore size may be selected, for example, to retain denatured nucleic acids. The pore size may be selected to maintain diffusion permeability to exogenous chemicals, such as sodium hydroxide (NaOH), and/or endogenous chemicals, such as inhibitors. The polymer or gel may be biocompatible. The polymer or gel may maintain or enhance cell viability. The polymer or gel may be biochemically compatible. The polymer or gel may be polymerized and/or depolymerized thermally, chemically, enzymatically, and/or optically.
The polymer may comprise poly (acrylamide-co-acrylic acid) crosslinked by disulfide bonds. The preparation of the polymer may involve a two-step reaction. In the first activation step, the poly (acrylamide-co-acrylic acid) may be exposed to an acylating agent to convert the carboxylic acid to an ester. For example, poly (acrylamide-co-acrylic acid) can be exposed to 4- (4, 6-dimethoxy-1, 3, 5-triazin-2-yl) -4-methylmorpholine hydrochloride (DMTMM). The polyacrylamide-co-acrylic acid may be exposed to other salts of 4- (4, 6-dimethoxy-1, 3, 5-triazin-2-yl) -4-methylmorpholinium. In the second crosslinking step, the ester formed in the first step may be exposed to a disulfide crosslinking agent. For example, the ester may be exposed to cystamine (2, 2' -dithiobis (ethylamine)). After these two steps, the biological particles may be surrounded by polyacrylamide chains linked together by disulfide bridges. In this way, the biological particles may be encapsulated within or comprise a gel or matrix (e.g., a polymer matrix) to form "cell beads". The cell beads can comprise biological particles (e.g., cells) or macromolecular components of biological particles (e.g., RNA, DNA, proteins, etc.). The cell beads may comprise a single cell or a plurality of cells, or a derivative of a single cell or a plurality of cells. For example, after lysing and washing the cells, the inhibitory components of the cell lysate may be washed away and the macromolecular components may be bound as cell beads. The systems and methods disclosed herein may be applicable to cell beads (and/or droplets or other partitions) comprising biological particles and cell beads (and/or droplets or other partitions) comprising macromolecular components of biological particles.
The encapsulated biological particles may provide certain potential advantages over droplet-based, compartmentalized biological particles that are easier to store and more portable. Furthermore, in some cases, it may be desirable to incubate the biological particles for a selected period of time prior to analysis, such as to characterize the change in such biological particles over time in the presence or absence of different stimuli. In such cases, encapsulation may allow for longer incubation times than the separation in the emulsion droplets, but in some cases, the droplets may also be incubated for different periods of time, such as at least 10 seconds, at least 30 seconds, at least 1 minute, at least 5 minutes, at least 10 minutes, at least 30 minutes, at least 1 hour, at least 2 hours, at least 5 hours, or at least 10 hours or more. Encapsulation of biological particles may constitute the separation of biological particles into which other agents are co-separated. Alternatively or in addition, the encapsulated biological particles can be readily deposited into other partitions (e.g., droplets) as described above.
Bead particle
The partition may include one or more unique identifiers, such as a bar code. The bar code may be pre-delivered, subsequently delivered, or simultaneously delivered to the partition containing the compartmentalized or separated biological particles. For example, the bar code may be injected into the droplet before, after, or simultaneously with the droplet generation. The delivery of the bar code to a particular partition allows the later attribution of the characteristics of the individual biological particles to the particular partition. The barcode may be delivered to the partition via any suitable mechanism, such as on a nucleic acid molecule (e.g., an oligonucleotide). The barcoded nucleic acid molecules (e.g., oligonucleotides comprising a barcode) can be delivered to the partition via a microcapsule. In some cases, the microcapsules may include beads. The beads are described in further detail below.
In some cases, the barcoded nucleic acid molecules (e.g., oligonucleotides comprising a barcode) may be first associated with the microcapsule and then released from the microcapsule. The release of the barcoded nucleic acid molecules (e.g., oligonucleotides comprising a barcode) may be passive (e.g., by diffusion out of the microcapsule). In addition or alternatively, release from the microcapsules may be after application of a stimulus that allows the barcoded nucleic acid molecules (e.g., oligonucleotides comprising a barcode) to dissociate or release from the microcapsules. Such stimulation may disrupt the microcapsules, i.e., couple the barcoded nucleic acid molecules (e.g., the oligonucleotides comprising the barcodes) to the microcapsules or couple interactions within the microcapsules or both. Such stimuli may include, for example, thermal stimuli, optical stimuli, chemical stimuli (e.g., pH change or use of a reducing agent), mechanical stimuli, radiation stimuli; biostimulation (e.g., enzymes) or any combination thereof.
Fig. 2 shows an example of a microfluidic channel structure 200 for delivering barcode-bearing beads to droplets. The channel structure 200 may include channel segments 201, 202, 204, 206, and 208 that communicate at a channel connection 210. In operation, the channel segment 201 can transport an aqueous fluid 212 comprising a plurality of beads 214 (e.g., with nucleic acid molecules, oligonucleotides, molecular tags) along the channel segment 201 into the junction 210. The plurality of beads 214 may be derived from a suspension of beads. For example, channel segment 201 may be connected to a reservoir of an aqueous suspension comprising beads 214. The channel segment 202 may transport an aqueous fluid 212 comprising a plurality of biological particles 216 along the channel segment 202 into the connection 210. The plurality of biological particles 216 may be derived from a suspension of biological particles. For example, the channel segment 202 may be connected to a reservoir of an aqueous suspension containing biological particles 216. In some cases, the aqueous fluid 212 in the first channel segment 201 or the second channel segment 202 or in both segments may include one or more reagents, as described further below. A second fluid 218 (e.g., oil) that is immiscible with the aqueous fluid 212 may be delivered from each of the channel segments 204 and 206 to the connection 210. As the aqueous fluid 212 from each of the channel segments 201 and 202 and the second fluid 218 from each of the channel segments 204 and 206 meet at the channel connection 210, the aqueous fluid 212 may separate into discrete droplets 220 in the second fluid 218 and flow along the channel segment 208 away from the connection 210. The channel segment 208 may deliver the discrete droplets to an outlet reservoir fluidly coupled to the channel segment 208, where the discrete droplets may be harvested.
Alternatively, the channel segments 201 and 202 may meet at another junction upstream of the junction 210. At such a junction, the beads and biological particles may form a mixture that is directed along another channel to junction 210 to create droplets 220. The mixture may provide beads and biological particles in an alternating manner such that, for example, the droplets comprise a single bead and a single biological particle.
Beads, biological particles, and droplets may flow along the channel in a substantially regular flow pattern (e.g., at a regular flow rate). Such regular flow patterns may allow the droplets to include a single bead and a single biological particle. Such regular flow patterns may allow droplets to have an occupancy rate of greater than 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 95% (e.g., droplets with beads and biological particles). Such regular flow patterns and devices that may be used to provide such regular flow patterns are provided, for example, in U.S. patent publication No. 2015/0292988, which is incorporated herein by reference in its entirety.
The second fluid 218 may comprise an oil, such as a fluorinated oil, that includes a fluorosurfactant for stabilizing the resulting droplets (e.g., inhibiting subsequent coalescence of the resulting droplets 220).
The discrete droplets generated may include individual biological particles 216. The discrete droplets generated may include beads 214 carrying a bar code or other reagent. The discrete droplets generated may include individual biological particles and bar code carrying beads, such as droplets 220. In some cases, the discrete droplets may include more than one individual biological particle or no biological particles. In some cases, the discrete droplets may include more than one bead or no beads. Discrete droplets may be unoccupied (e.g., without beads, without biological particles).
Advantageously, separating discrete droplets of the biological particles and the beads carrying the bar code may effectively allow the bar code to be ascribed to the macromolecular components of the biological particles within the partition. The contents of a partition may remain discrete from the contents of other partitions.
It should be appreciated that the channel segments described herein may be coupled to any of a variety of different fluid sources or receiving components, including reservoirs, pipes, manifolds, or other system fluid components. It should be understood that the microfluidic channel structure 200 may have other geometries. For example, a microfluidic channel structure may have more than one channel connection. For example, a microfluidic channel structure may have 2, 3, 4 or 5 channel segments each carrying beads, which meet at channel junctions. Fluid may be directed to flow along one or more channels or reservoirs via one or more fluid flow units. The fluid flow unit may include a compressor (e.g., providing positive pressure), a pump (e.g., providing negative pressure), an actuator, etc., to control the flow of fluid. The fluid may also or alternatively be controlled via an applied pressure differential, centrifugal force, electric pumping, vacuum, capillary or gravity flow, or the like.
The beads may be porous, non-porous, solid, semi-fluid, and/or any combination thereof. In some cases, the beads may be dissolvable, destructible, or degradable. In some cases, the beads may not be degradable. In some cases, the beads may be gel beads. The gel beads may be hydrogel beads. Gel beads may be formed from molecular precursors (such as polymers or monomeric species). The semi-solid beads may be liposome beads. The solid beads may comprise metals including iron oxide, gold, and silver. In some cases, the beads may be silica beads. In some cases, the beads may be rigid. In other cases, the beads may be flexible and/or compressible.
The beads may have any suitable shape. Examples of bead shapes include, but are not limited to, spherical, non-spherical, oval, oblong, amorphous, circular, cylindrical, and variations thereof.
The beads may be of uniform or non-uniform size. In some cases, the beads may have a diameter of at least about 10 nanometers (nm), 100nm, 500nm, 1 micrometer (μm), 5 μm, 10 μm, 20 μm, 30 μm, 40 μm, 50 μm, 60 μm, 70 μm, 80 μm, 90 μm, 100 μm, 250 μm, 500 μm, 1mm or more. In some cases, the beads may have a diameter of less than about 10nm, 100nm, 500nm, 1 μm, 5 μm, 10 μm, 20 μm, 30 μm, 40 μm, 50 μm, 60 μm, 70 μm, 80 μm, 90 μm, 100 μm, 250 μm, 500 μm, 1mm or less. In some cases, the beads may have a diameter in the range of about 40-75 μm, 30-75 μm, 20-75 μm, 40-85 μm, 40-95 μm, 20-100 μm, 10-100 μm, 1-100 μm, 20-250 μm, or 20-500 μm.
In certain aspects, the beads may be provided in the form of a population of beads or a plurality of beads having a relatively monodisperse size distribution. Where it may be desirable to provide a relatively consistent amount of reagent within a partition, maintaining relatively consistent bead characteristics (such as size) may contribute to overall consistency. In particular, the beads described herein can have a size distribution with a coefficient of variation of the bead cross-sectional dimension of less than 50%, less than 40%, less than 30%, less than 20%, and in some cases less than 15%, less than 10%, less than 5% or less.
The beads may comprise natural and/or synthetic materials. For example, the beads may comprise natural polymers, synthetic polymers, or both natural and synthetic polymers. Examples of natural polymers include proteins and sugars such as deoxyribonucleic acid, rubber, cellulose, starch (e.g., amylose, amylopectin), proteins, enzymes, polysaccharides, silk, polyhydroxyalkanoates, chitosan, dextran, collagen, carrageenan, psyllium, gum arabic, agar, gelatin, shellac, karaya, xanthan, corn gum, guar gum, karaya, agarose, alginic acid, alginate, or natural polymers thereof. Examples of synthetic polymers include acrylic, nylon, silicone, spandex (spandex), viscose rayon, polycarboxylic acid, polyvinyl acetate, polyacrylamide, polyacrylate, polyethylene glycol, polyurethane, polylactic acid, silica, polystyrene, polyacrylonitrile, polybutadiene, polycarbonate, polyethylene terephthalate, poly (chlorotrifluoroethylene), poly (ethylene oxide), poly (ethylene terephthalate), polyethylene, polyisobutylene, poly (methyl methacrylate), poly (formaldehyde), polyoxymethylene, polypropylene, polystyrene, poly (tetrafluoroethylene), poly (vinyl acetate), poly (vinyl alcohol), poly (vinyl chloride), poly (vinylidene fluoride), poly (vinyl fluoride), and/or combinations (e.g., copolymers) thereof. The beads may also be formed from materials other than polymers including lipids, micelles, ceramics, glass-ceramics, material composites, metals, other inorganic materials, and the like.
In some cases, the beads may contain molecular precursors (e.g., monomers or polymers) that can form a polymer network via polymerization of the molecular precursors. In some cases, the precursor may be an already polymerized species capable of undergoing further polymerization (e.g., via chemical crosslinks). In some cases, the precursor may include one or more of an acrylamide or methacrylamide monomer, oligomer, or polymer. In some cases, the beads may comprise a prepolymer, which is an oligomer capable of further polymerization. For example, polyurethane beads can be prepared using a prepolymer. In some cases, the beads may contain separate polymers that may be further polymerized together. In some cases, the beads may be generated via polymerization of different precursors such that they comprise mixed polymers, copolymers, and/or block copolymers. In some cases, the beads may comprise covalent or ionic bonds between polymer precursors (e.g., monomers, oligomers, linear polymers), nucleic acid molecules (e.g., oligonucleotides), primers, and other entities. In some cases, the covalent bond may be a carbon-carbon bond, thioether bond, or carbon-heteroatom bond.
In some cases, multiple nucleic acid barcode molecules may be attached to a bead. The nucleic acid barcode molecule may be directly or indirectly attached to the bead. In some cases, the nucleic acid barcode molecule may be covalently linked to the bead. In some cases, the nucleic acid barcode molecule is covalently linked to the bead via a linker. In some cases, the linker is a degradable linker. In some cases, the linker comprises an labile bond configured to release the nucleic acid barcode molecule of the plurality of nucleic acid barcode molecules. In some cases, the labile bond comprises a disulfide bond.
Crosslinking may be permanent or reversible, depending on the particular crosslinking agent used. Reversible crosslinking may allow linearization or dissociation of the polymer under appropriate conditions. In some cases, reversible crosslinking may also allow for reversible attachment of materials that bind to the surface of the beads. In some cases, the crosslinker may form disulfide bonds. In some cases, the disulfide-forming chemical cross-linking agent may be cystamine or a modified cystamine.
In some cases, disulfide bonds can be formed between molecular precursor units (e.g., monomers, oligomers, or linear polymers) or precursors and nucleic acid molecules (e.g., oligonucleotides) incorporated into the beads. Cystamine (including modified cystamine) is for example an organic agent comprising disulfide bonds, which can be used as a cross-linking agent between individual monomers or polymer precursors of the beads. Polyacrylamide can be polymerized in the presence of cystamine or cystamine-containing species (e.g., modified cystamine) to produce polyacrylamide gel beads comprising disulfide linkages (e.g., chemically degradable beads comprising a chemically reducible cross-linking agent). Disulfide bonds may allow the beads to be degraded (or dissolved) when the beads are exposed to a reducing agent.
In some cases, chitosan (a linear polysaccharide polymer) can be crosslinked with glutaraldehyde via hydrophilic chains to form beads. Crosslinking of the chitosan polymer may be achieved by chemical reactions initiated by heat, pressure, pH changes and/or radiation.
In some cases, the beads may comprise acrydite moieties, which in some aspects may be used to attach one or more nucleic acid molecules (e.g., barcode sequences, barcoded nucleic acid molecules, barcoded oligonucleotides, primers, or other oligonucleotides) to the beads. In some cases, an acrydite moiety may refer to an acrydite analog generated from the reaction of acrydite with one or more species (such as the reaction of acrydite with other monomers and crosslinkers during a polymerization reaction). The acrydite moiety can be modified to form a chemical bond with a species to be ligated, such as a nucleic acid molecule (e.g., a barcode sequence, a barcoded nucleic acid molecule, a barcoded oligonucleotide, a primer, or other oligonucleotide). The Acrydite moiety may be modified with a thiol group capable of forming a disulfide bond, or may be modified with a group already containing a disulfide bond. The sulfhydryl group or disulfide bond (via disulfide interchange) may be used as an anchor point for the species to be linked, or another part of the acrydite moiety may be used for linking. In some cases, the linkage may be reversible such that when the disulfide bond breaks (e.g., in the presence of a reducing agent), the linked species is released from the bead. In other cases, the acrydite moiety may contain reactive hydroxyl groups that may be used for attachment.
Functionalization of the beads for attachment of nucleic acid molecules (e.g., oligonucleotides) can be accomplished by a number of different methods, including activation of chemical groups within the polymer, incorporation of active or activatable functional groups in the polymer structure, or attachment at the prepolymer or monomer stage of bead generation.
For example, a precursor (e.g., monomer, crosslinker) that polymerizes to form a bead may comprise acrydite moieties such that when the bead is produced, the bead also comprises acrydite moieties. The acrydite moiety can be linked to a nucleic acid molecule (e.g., an oligonucleotide) that can include a primer sequence (e.g., a primer for amplifying a target nucleic acid, a random primer, a primer sequence of a messenger RNA) and/or one or more barcode sequences. The one or more barcode sequences may include sequences that are the same for all nucleic acid molecules coupled to a given bead and/or sequences that are different in all nucleic acid molecules coupled to a given bead. The nucleic acid molecules can be incorporated into beads.
In some cases, the nucleic acid molecule may comprise a functional sequence (e.g., for ligation to a sequencing flow cell), e.g., forSequenced P5 sequence. In some cases, the nucleic acid molecule or derivative thereof (e.g., an oligonucleotide or polynucleotide generated from the nucleic acid molecule) may comprise another functional sequence, such as a P7 sequence for ligation to a sequencing flow cell for Illumina sequencing. In some cases, the nucleic acid molecules may comprise Contains a barcode sequence. In some cases, the primer may also comprise a Unique Molecular Identifier (UMI). In some cases, the primer may comprise an R1 primer sequence for Illumina sequencing. In some cases, the primer may comprise an R2 primer sequence for Illumina sequencing. Examples of such nucleic acid molecules (e.g., oligonucleotides, polynucleotides, etc.) and uses thereof that may be used with the compositions, devices, methods, and systems of the present disclosure are provided in U.S. patent publication nos. 2014/0378345 and 2015/0376609, each of which is incorporated herein by reference in its entirety.
Fig. 8 shows an example of a bead carrying a bar code. Nucleic acid molecules 802 (e.g., nucleic acid barcode molecules, such as oligonucleotides) can be coupled to beads 804 via releasable bonds 806 (e.g., disulfide linkers). The same bead 804 may be coupled (e.g., via a releasable bond) to one or more other nucleic acid molecules 818, 820. The nucleic acid molecule 802 may be or comprise a barcode. As described elsewhere herein, the structure of a bar code may comprise a plurality of sequential elements. The nucleic acid molecule 802 may comprise a functional sequence 808 that may be used in subsequent processing. For example, the functional sequence 808 may include a sequencer-specific flow cell junction sequence (e.g., for P5 sequence of a sequencing system) and sequencing primer sequences (e.g., for use inR1 primer of a sequencing system). The nucleic acid molecule 802 can comprise a barcode sequence 810 for barcoding a sample (e.g., DNA, RNA, protein, etc.). In some cases, the barcode sequence 810 may be bead-specific such that the barcode sequence 810 is common to all nucleic acid molecules (e.g., including the nucleic acid molecule 802) coupled to the same bead 804. Alternatively or in addition, the barcode sequence 810 may be partition-specific such that the barcode sequence 810 is common to all nucleic acid molecules coupled to one or more beads that are partitioned into the same partition. Nucleic acidThe molecule 802 may comprise a specific primer sequence 812, such as an mRNA specific primer sequence (e.g., a poly T sequence), a targeting primer sequence, and/or a random primer sequence. The nucleic acid molecule 802 may include an anchor sequence 814 to ensure hybridization of the specific primer sequence 812 at the sequence end (e.g., of an mRNA). For example, the anchor sequence 814 can comprise a random short sequence of nucleotides, such as a 1-mer, 2-mer, 3-mer, or longer sequence, which can ensure that the poly-T fragment is more likely to hybridize at the sequence end of the poly-a tail of the mRNA.
The nucleic acid molecule 802 may comprise a unique molecular recognition sequence 816 (e.g., a Unique Molecular Identifier (UMI)). In some cases, the unique molecular recognition sequence 816 can comprise about 5 to about 8 nucleotides. Alternatively, the unique molecular recognition sequence 816 may be compressed by less than about 5 or more than about 8 nucleotides. Unique molecular recognition sequence 816 can be a unique sequence that varies between individual nucleic acid molecules (e.g., 802, 818, 820, etc.) coupled to a single bead (e.g., bead 804). In some cases, the unique molecular recognition sequence 816 can be a random sequence (e.g., such as a random N-mer sequence). For example, UMI may provide a unique identifier of the captured starting mRNA molecule in order to allow quantification of the amount of RNA initially expressed. It should be appreciated that although fig. 8 shows three nucleic acid molecules 802, 818, 820 coupled to the surface of bead 804, individual beads may be coupled to any number of individual nucleic acid molecules, e.g., from one to tens to hundreds of thousands or even millions of individual nucleic acid molecules. The respective barcodes of individual nucleic acid molecules may comprise common sequence fragments or relatively common sequence fragments (e.g., 808, 810, 812, etc.) and variable or unique sequence fragments (e.g., 816) between different individual nucleic acid molecules coupled to the same bead.
In operation, biological particles (e.g., cells, DNA, RNA, etc.) can be co-partitioned along with the barcoded beads 804. The barcoded nucleic acid molecules 802, 818, 820 may be released from the beads 804 in the partition. For example, in the context of analyzing sample RNA, a poly-T fragment (e.g., 812) of one of the released nucleic acid molecules (e.g., 802) can hybridize to the poly-a tail of an mRNA molecule. Reverse transcription can produce a cDNA transcript of mRNA, but the transcript includes each of the sequence segments 808, 810, 816 of the nucleic acid molecule 802. Since nucleic acid molecule 802 comprises anchor sequence 814, it is more likely to hybridize to the sequence end of the poly-A tail of mRNA and initiate reverse transcription. Within any given partition, all cDNA transcripts of individual mRNA molecules may contain one common barcode sequence fragment 810. However, transcripts made from different mRNA molecules within a given partition may vary at the unique molecular recognition sequence 812 fragment (e.g., UMI fragment). Advantageously, even after any subsequent amplification of the contents of a given partition, the number of different UMIs may be indicative of the amount of mRNA originating from the given partition, and thus the amount of mRNA originating from a biological particle (e.g., a cell). As described above, transcripts can be amplified, purified and sequenced to identify the sequence of cDNA transcripts of mRNA, as well as to sequence barcode and UMI fragments. Although a poly-T primer sequence is described, other targeting or random primer sequences may be used to initiate a reverse transcription reaction. Also, while described as releasing barcoded oligonucleotides into a partition, in some cases, nucleic acid molecules that bind to beads (e.g., gel beads) can be used to hybridize and capture mRNA on a bead solid phase, e.g., to facilitate separation of RNA from other cell contents.
In some cases, precursors containing functional groups that are reactive or that can be activated to render them reactive can be polymerized with other precursors to produce gel beads containing activated or activatable functional groups. This functional group can then be used to attach additional species (e.g., disulfide linkers, primers, other oligonucleotides, etc.) to the gel beads. For example, some precursors comprising carboxylic acid (COOH) groups may be copolymerized with other precursors to form gel beads that also comprise COOH functional groups. In some cases, acrylic acid (species containing free COOH groups), acrylamide, and bis (acryloyl) cystamine may be copolymerized together to form gel beads containing free COOH groups. The COOH groups of the gel beads may be activated (e.g., via 1-ethyl-3- (3-dimethylaminopropyl) carbodiimide (EDC) and N-hydroxysuccinimide (NHS) or 4- (4, 6-dimethoxy-1, 3, 5-triazin-2-yl) -4-methylmorpholine hydrochloride (DMTMM)) so that they are reactive (e.g., reactive to amine functionality in the case EDC/NHS or DMTMM is used for activation). The activated COOH groups can then be reacted with an appropriate species comprising the moiety to be attached to the bead (e.g., a species comprising an amine functionality where the carboxylic acid group is activated to be reactive with the amine functionality).
Beads containing disulfide bonds in their polymer network can be functionalized with additional species by reducing some of the disulfide bonds to free sulfhydryl groups. Disulfide bonds can be reduced by, for example, the action of a reducing agent (e.g., DTT, TCEP, etc.) to form free sulfhydryl groups without dissolving the beads. The free thiol of the bead may then react with the free thiol of the species or with a species comprising another disulfide bond (e.g., by thiol-disulfide exchange) such that the species may be attached to the bead (e.g., by the disulfide bond generated). In some cases, the free thiol groups of the beads may react with any other suitable group. For example, the free thiol groups of the beads may react with species comprising acrydite moieties. The free thiol groups of the beads can be reacted with acrydite by michael addition chemistry such that species comprising acrydite are attached to the beads. In some cases, uncontrolled reactions can be prevented by adding thiol capping agents such as N-ethylmaleimide (N-ethylmaleamide) and iodoacetate.
The activation of disulfide bonds within the beads can be controlled such that only a small amount of disulfide bonds are activated. Control may be exercised, for example, by controlling the concentration of reducing agents used to generate free sulfhydryl groups and/or controlling the concentration of reagents used to form disulfide bonds in bead polymerization. In some cases, low concentrations (e.g., a molecular ratio of reducing agent: gel beads of less than or equal to about 1:100,000,000,000, less than or equal to about 1:10,000,000, less than or equal to about 1:1,000,000,000, less than or equal to about 1:100,000,000, less than or equal to about 1:10,000,000, less than or equal to about 1:1,000,000, less than or equal to about 1:100,000, less than or equal to about 1:10,000) of reducing agent may be used for the reduction. Controlling the number of disulfide bonds reduced to free sulfhydryl groups may be useful to ensure bead structural integrity during functionalization. In some cases, a photoactive agent such as a fluorescent dye may be coupled to the beads via free thiol groups of the beads and used to quantify the number of free thiol groups present in the beads and/or track the beads.
In some cases, it may be advantageous to add a portion to the gel beads after they are formed. For example, the addition of oligonucleotides (e.g., barcoded oligonucleotides) after gel bead formation can avoid loss of species during chain transfer termination that may occur during polymerization. In addition, smaller precursors (e.g., monomers or crosslinkers that do not contain side chain groups and attached moieties) can be used for polymerization and can be minimally hindered from growing chain ends by viscous effects. In some cases, functionalization after gel bead synthesis can minimize exposure of the species to be loaded (e.g., oligonucleotides) to potential damaging agents (e.g., free radicals) and/or chemical environments. In some cases, the resulting gel may have an upper critical dissolution temperature (UCST) that may allow the temperature driven swelling and collapsing of the beads. Such functionality may aid in the permeation of oligonucleotides (e.g., primers) into the beads during subsequent functionalization of the beads with the oligonucleotides. Post-production functionalization can also be used to control the loading ratio of species in the beads so that, for example, variability in loading ratio is minimized. The loading of the species may also be performed in a batch process, such that multiple beads may be functionalized with the species in a single batch.
Beads injected or otherwise introduced into the partition may contain a releasably, cleavable, or reversibly linked barcode. Beads injected or otherwise introduced into the partition may contain activatable barcodes. The beads injected or otherwise introduced into the partition may be degradable, destructible, or dissolvable beads.
The barcode may be releasably, cleavable, or reversibly attached to the bead such that the barcode may be released or releasable by cleavage of the bond between the barcode molecule and the bead, or by degradation of the base bead itself, allowing the barcode to be accessed or accessible by other reagents, or both. In non-limiting examples, cleavage may be achieved by reducing disulfide bonds, using restriction enzymes, photoactivated cleavage, or cleavage and/or reaction via other types of stimuli (e.g., chemical, thermal, pH, enzymatic, etc.), as described elsewhere herein. Releasable barcodes may sometimes be referred to as activatable because they are available for reaction once released. Thus, for example, an activatable barcode may be activated by releasing the barcode from the bead (or other suitable type of partition as described herein). Other activatable configurations are also contemplated in the context of the described methods and systems.
In addition to or instead of cleavable linkages between the bead and associated molecules, such as barcode-containing nucleic acid molecules (e.g., barcoded oligonucleotides), the bead may be degradable, destructible, or dissolvable, either spontaneously or upon exposure to one or more stimuli (e.g., temperature change, pH change, exposure to specific chemical species or chemical phases, exposure to light, reducing agents, etc.). In some cases, the beads may be dissolvable such that the material component of the beads dissolves when exposed to a particular chemical species or environmental change (such as a temperature change or pH change). In some cases, the gel beads may degrade or dissolve under elevated temperature and/or alkaline conditions. In some cases, the beads may be thermally degradable such that when the beads are exposed to an appropriate temperature change (e.g., heat), the beads degrade. Degradation or dissolution of the beads bound to the species (e.g., nucleic acid molecules, such as barcoded oligonucleotides) can result in release of the species from the beads.
It will be appreciated from the above disclosure that degradation of the beads may refer to dissociation of bound or entrained species from the beads, with and without concomitant structural degradation of the physical beads themselves. For example, the degradation of the beads may involve cleavage of cleavable bonds via one or more of the species and/or methods described elsewhere herein. In another example, the entrained species may be released from the beads by, for example, osmotic pressure differences due to chemical environmental changes. For example, changes in bead pore size due to osmotic pressure differences may typically occur without structural degradation of the beads themselves. In some cases, an increase in pore size due to osmotic swelling of the beads may allow release of the species entrained within the beads. In other cases, the osmotic shrinkage of the beads may allow the beads to better retain entrained species due to the reduced pore size.
Degradable beads can be introduced into a partition (such as a droplet or well of an emulsion) such that when appropriate stimulus is applied, the beads degrade within the partition and any associated species (e.g., oligonucleotides) are released into the droplet. The free species (e.g., oligonucleotides, nucleic acid molecules) may interact with other reagents contained in the partition. For example, polyacrylamide beads containing cystamine and linked to a barcode sequence via disulfide bonds can be combined with a reducing agent within droplets of a water-in-oil emulsion. Within the droplet, the reducing agent can break down individual disulfide bonds, resulting in bead degradation and release of the barcode sequence into the aqueous internal environment of the droplet. In another example, heating a droplet containing bead-bound barcode sequences in an alkaline solution can also result in bead degradation and release of the attached barcode sequences into the aqueous internal environment of the droplet.
Any suitable number of molecular tag molecules (e.g., primers, barcoded oligonucleotides) may be associated with the beads such that upon release from the beads, the molecular tag molecules (e.g., primers, e.g., barcoded oligonucleotides) are present in the partitions at a predefined concentration. Such predefined concentrations may be selected to facilitate certain reactions, such as amplification, for generating sequencing libraries within a partition. In some cases, the predefined concentration of the primer may be limited by the process of producing beads with nucleic acid molecules (e.g., oligonucleotides).
In some cases, the beads may be non-covalently loaded with one or more reagents. The beads may be non-covalently supported by, for example: the beads are subjected to conditions sufficient to swell the beads, allow sufficient time for the reagent to diffuse into the interior of the beads, and to conditions sufficient to deswelle the beads. Swelling of the beads may be accomplished, for example, by: the beads are placed in a thermodynamically favored solvent, the beads are subjected to higher or lower temperatures, the beads are subjected to higher or lower ion concentrations, and/or the beads are subjected to an electric field. Swelling of the beads can be accomplished by various swelling methods. The deswelling of the beads can be accomplished, for example, by: transferring the beads to a thermodynamically unfavorable solvent, subjecting the beads to a lower or higher temperature, subjecting the beads to a lower or higher ion concentration, and/or removing the electric field. The deswelling of the beads can be accomplished by various deswelling methods. Transferring the beads may result in Kong Shousu in the beads. Shrinkage may then prevent the agent within the bead from diffusing out of the interior of the bead. This obstruction may be due to spatial interactions between the reagent and the interior of the beads. Transfer may be accomplished by microfluidics. For example, the transfer may be accomplished by moving the beads from one co-current solvent stream to a different co-current solvent stream. The swellability and/or pore size of the beads can be adjusted by changing the polymer composition of the beads.
In some cases, the acrydite moiety attached to the precursor, another species attached to the precursor, or the precursor itself may contain labile bonds, such as chemical, thermal, or photosensitive bonds, e.g., disulfide bonds, UV-sensitive bonds, and the like. Once the acrydite moiety or other moiety comprising an labile bond is incorporated into the bead, the bead may also comprise the labile bond. Labile bonds can be useful, for example, in reversibly linking (covalently linking) species (e.g., barcodes, primers, etc.) to beads. In some cases, the thermally labile bond can include a linkage based on nucleic acid hybridization (e.g., where the oligonucleotide hybridizes to a complementary sequence attached to the bead) such that the thermal melting of the hybrid releases the oligonucleotide, e.g., a sequence containing a barcode, from the bead or microcapsule.
Adding multiple types of labile bonds to the gel beads can enable the generation of beads that are capable of responding to different stimuli. Each type of labile bond may be sensitive to an associated stimulus (e.g., chemical stimulus, light, temperature, enzymatic, etc.), such that release of the species attached to the bead via each labile bond may be controlled by application of an appropriate stimulus. Such functionality may be useful for the controlled release of species from gel beads. In some cases, another species comprising an labile bond may be attached to the gel bead after the gel bead is formed via an activated functional group of the gel bead, e.g., as described above. It is understood that barcodes releasably, cleavable, or reversibly attached to the beads described herein include barcodes that are released or releasable by cleavage of the bond between the barcode molecule and the bead, or barcodes that are released by degradation of the base bead itself, allowing the barcodes to be accessed or accessible by other reagents, or both.
Releasable barcodes as described herein may sometimes be referred to as activatable in that they are available for reaction once released. Thus, for example, an activatable barcode may be activated by releasing the barcode from the bead (or other suitable type of partition as described herein). Other activatable configurations are also contemplated in the context of the described methods and systems.
In addition to thermally cleavable bonds, disulfide bonds, and UV-sensitive bonds, other non-limiting examples of labile bonds that can be coupled to a precursor or bead include ester bonds (e.g., cleavable with an acid, base, or hydroxylamine), vicinal glycol bonds (e.g., cleavable via sodium periodate), diels-Alder (e.g., cleavable via thermal cleavage), sulfone bonds (e.g., cleavable via a base), silyl ether bonds (e.g., cleavable via an acid), glycosidic bonds (e.g., cleavable via an amylase), peptide bonds (e.g., cleavable via a protease), or phosphodiester bonds (e.g., cleavable via a nuclease (e.g., dnase)). The bond may be cleaved via other nucleic acid molecule targeting enzymes such as restriction enzymes (e.g., restriction endonucleases), as described further below.
The species may be encapsulated in the beads during bead formation (e.g., during precursor polymerization). Such species may or may not participate in the polymerization. Such species may be incorporated into the polymerization reaction mixture such that the beads produced upon bead formation comprise the species. In some casesSuch species may be added to the gel beads after they are formed. Such species may include, for example, nucleic acid molecules (e.g., oligonucleotides), reagents for nucleic acid amplification reactions (e.g., primers, polymerase, dntps, cofactors (e.g., ionic cofactors), buffers) (including those described herein), reagents for enzymatic reactions (e.g., enzymes, cofactors, substrates, buffers), reagents for nucleic acid modification reactions (such as polymerization, ligation, or digestion), and/or reagents for one or more sequencing platforms (e.g.,is->) Reagents for template preparation (e.g., tag fragmentation). Such species may include one or more enzymes described herein, including but not limited to polymerases, reverse transcriptases, restriction enzymes (e.g., endonucleases), transposases, ligases, proteases K, DNA enzymes, and the like. Such species may include one or more agents (e.g., lysing agents, inhibitors, inactivating agents, chelating agents, stimulating agents) described elsewhere herein. The capture of such species may be controlled by the density of the polymer network generated during the precursor polymerization, the control of the ionic charge within the gel beads (e.g., via ionic species attached to the polymeric species), or by the release of other species. The encapsulated species may be released from the beads upon degradation of the beads and/or by application of a stimulus that enables release of the species from the beads. Alternatively or in addition, the species may be partitioned in the partition (e.g., droplet) during or after partition formation. Such species may include, but are not limited to, the above-described species that may also be encapsulated in beads.
The degradable beads may contain one or more species with labile bonds such that when the beads/species are exposed to an appropriate stimulus, the bonds are broken and the beads degrade. The labile bond may be a chemical bond (e.g., covalent bond, ionic bond), or may be another type of physical interaction (e.g., van der Waals interactions, dipole-dipole interactions, etc.). In some cases, the cross-linking agent used to generate the beads may contain labile bonds. Upon exposure to appropriate conditions, the labile bonds may be broken and the beads degraded. For example, when polyacrylamide gel beads containing a cystamine crosslinker are exposed to a reducing agent, the disulfide bonds of cystamine can be broken and the beads degraded.
Degradable beads can be used to release linked species (e.g., nucleic acid molecules, barcode sequences, primers, etc.) from the beads more quickly than non-degradable beads when appropriate stimuli are applied to the beads. For example, for a species bound to the inner surface of a porous bead or in the case of an encapsulated species, the species may have higher mobility and accessibility to other species in solution as the bead degrades. In some cases, the species may also be attached to the degradable beads through degradable linkers (e.g., disulfide linkers). The degradable linker may be responsive to the same stimulus as the degradable bead, or the two degradable species may be responsive to different stimuli. For example, the barcode sequence may be attached to a polyacrylamide bead comprising cystamine via disulfide bonds. Upon exposure of the barcoded beads to the reducing agent, the beads degrade and the barcode sequence is released upon cleavage of disulfide bonds between the barcode sequence and the beads and disulfide bonds of cystamine in the beads.
It will be appreciated from the above disclosure that, although referred to as degradation of the beads, in many of the cases mentioned above, this degradation may refer to dissociation of bound or entrained species from the beads, with and without concomitant structural degradation of the physical beads themselves. For example, entrained species may be released from the beads by, for example, osmotic pressure differences due to chemical environmental changes. For example, changes in bead pore size due to osmotic pressure differences may typically occur without structural degradation of the beads themselves. In some cases, an increase in pore size due to osmotic swelling of the beads may allow release of the species entrained within the beads. In other cases, the osmotic shrinkage of the beads may allow the beads to better retain entrained species due to the reduced pore size.
Where degradable beads are provided, it may be advantageous to avoid exposing such beads to one or more stimuli that lead to such degradation prior to a given time, for example, to avoid premature degradation of the beads and problems caused by such degradation, including, for example, poor flow characteristics and aggregation. For example, where the beads contain reducible crosslinking groups such as disulfide groups, it would be desirable to avoid contacting such beads with a reducing agent (e.g., DTT or other disulfide cleavage reagent). In such cases, treatment of the beads described herein will in some cases be provided in the absence of a reducing agent (such as DTT). Since reducing agents are typically provided in commercial enzyme formulations, it may be desirable to provide an enzyme formulation that is free of reducing agents (or free of DDT) when handling the beads described herein. Examples of such enzymes include, for example, polymerase preparations, reverse transcriptase preparations, ligase preparations, and many others that may be used to treat the beads described herein. The term "reducing agent-free" or "DTT-free" formulation may refer to a formulation having a lower limit range of such materials used in degrading the beads of less than about 1/10, less than about 1/50, or even less than about 1/100. For example, for DTT, the formulation without reducing agent may have less than about 0.01 millimoles (mM), 0.005mM, 0.001mM DTT, 0.0005mM DTT, or even less than about 0.0001mM DTT. In many cases, the amount of DTT may not be detectable.
A number of chemical triggers can be used to trigger the degradation of the beads. Examples of such chemical changes may include, but are not limited to, pH-mediated changes in the integrity of the components within the beads, degradation of the bead components via cleavage of cross-links, and depolymerization of the bead components.
In some embodiments, the beads may be formed from a material comprising a degradable chemical cross-linking agent (such as BAC or cystamine). Degradation of such degradable crosslinkers can be achieved by a variety of mechanisms. In some examples, the beads may be contacted with a chemical degradation agent that induces oxidation, reduction, or other chemical change. For example, the chemical degradation agent may be a reducing agent, such as Dithiothreitol (DTT). Additional examples of reducing agents may include beta-mercaptoethanol, (2S) -2-amino-1, 4-dimercaptobutane (dithiobutylamine or DTBA), tris (2-carboxyethyl) phosphine (TCEP), or combinations thereof. The reducing agent may degrade the disulfide bonds formed between the gel precursors forming the beads, thus degrading the beads. In other cases, a change in the pH of the solution (such as an increase in pH) may trigger degradation of the beads. In other cases, exposure to an aqueous solution (such as water) may trigger hydrolytic degradation, thus degrading the beads. In some cases, any combination of stimuli may trigger degradation of the beads. For example, a change in pH may enable a chemical agent (e.g., DTT) to be an effective reducing agent.
The beads may also be induced to release their contents when a thermal stimulus is applied. The change in temperature can cause a variety of changes in the beads. For example, heat may cause the solid beads to liquefy. The change in heat may cause melting of the beads, degrading a portion of the beads. In other cases, the heat may increase the internal pressure of the bead component, causing the bead to rupture or explode. Heat may also be applied to the heat sensitive polymer used as a material for constructing the beads.
Any suitable agent can degrade the beads. In some embodiments, a change in temperature or pH can be used to degrade heat-sensitive or pH-sensitive bonds within the beads. In some embodiments, chemical degradation agents may be used to degrade chemical bonds within the beads by oxidation, reduction, or other chemical changes. For example, the chemical degradation agent may be a reducing agent, such as DTT, wherein the DTT may degrade disulfide bonds formed between the crosslinking agent and the gel precursor, thereby degrading the beads. In some embodiments, a reducing agent may be added to degrade the beads, which may or may not cause the beads to release their contents. Examples of reducing agents may include Dithiothreitol (DTT), beta-mercaptoethanol, (2S) -2-amino-1, 4-dimercaptobutane (dithiobutylamine or DTBA), tris (2-carboxyethyl) phosphine (TCEP), or combinations thereof. The reducing agent may be present at a concentration of about 0.1mM, 0.5mM, 1mM, 5mM, 10 mM. The reducing agent may be present at a concentration of at least about 0.1mM, 0.5mM, 1mM, 5mM, 10mM, or greater than 10 mM. The reducing agent may be present at a concentration up to about 10mM, 5mM, 1mM, 0.5mM, 0.1mM, or less.
Any suitable number of molecular tag molecules (e.g., primers, barcoded oligonucleotides) may be associated with the beads such that upon release from the beads, the molecular tag molecules (e.g., primers, e.g., barcoded oligonucleotides) are present in the partitions at a predefined concentration. Such predefined concentrations may be selected to facilitate certain reactions, such as amplification, for generating sequencing libraries within a partition. In some cases, the predefined concentration of the primer may be limited by the process of generating the oligonucleotide-bearing bead.
Although fig. 1 and 2 have been described above in terms of providing substantially single occupied partitions, in some instances it may be desirable to provide multiple occupied partitions, e.g., containing two, three, four or more cells and/or microcapsules (e.g., beads) comprising barcoded nucleic acid molecules (e.g., oligonucleotides) within a single partition. Thus, as described above, the flow characteristics of the fluid containing the biological particles and/or beads and the spacer fluid can be controlled to provide such multiple occupied zones. In particular, the flow parameters may be controlled to provide a given occupancy of the partition of greater than about 50%, greater than about 75%, and in some cases greater than about 80%, 90%, 95%, or higher.
In some cases, additional microcapsules (e.g., beads) may be used to deliver additional agents to the partition. In such cases, it may be advantageous to introduce different beads from different bead sources (e.g., containing different associated reagents) into a common channel or droplet generation junction (e.g., junction 210) through different channel inlets into such a common channel or droplet generation junction. In such cases, the flow and frequency of different beads into the channel or junction can be controlled to provide a specific ratio of microcapsules from each source, while ensuring that a given pairing and combination of such beads enter a partition along with a given number of biological particles (e.g., one biological particle and one bead per partition).
The partitions described herein may have a small volume, for example, less than about 10 microliters (μl), 5 μl, 1 μl, 900 picoliters (pL), 800pL, 700pL, 600pL, 500pL, 400pL, 300pL, 200pL, 100pL, 50pL, 20pL, 10pL, 1pL, 500 nanoliters (nL), 100nL, 50nL, or less.
For example, in the case of drop-based partitioning, the total volume of the drop may be less than about 1000pL, 900pL, 800pL, 700pL, 600pL, 500pL, 400pL, 300pL, 200pL, 100pL, 50pL, 20pL, 10pL, 1pL or less. In the case of co-partitioning with microcapsules (e.g., beads), it is to be understood that the sample fluid volume within the partition (e.g., including co-partitioned biological particles and/or beads) can be less than about 90% of the above-described volume, less than about 80%, less than about 70%, less than about 60%, less than about 50%, less than about 40%, less than about 30%, less than about 20%, or less than about 10%.
As described elsewhere herein, the partitioned species may generate a population or multiple partitions. In such cases, any suitable number of partitions may be generated or otherwise provided. For example, at least about 1,000 partitions, at least about 5,000 partitions, at least about 10,000 partitions, at least about 50,000 partitions, at least about 100,000 partitions, at least about 500,000 partitions, at least about 1,000,000 partitions, at least about 5,000,000 partitions, at least about 10,000,000 partitions, at least about 50,000,000 partitions, at least about 100,000,000 partitions, at least about 500,000,000 partitions, at least about 1,000,000 partitions, or more partitions may be generated or otherwise provided. Further, the plurality of partitions may include unoccupied partitions (e.g., empty partitions) and occupied partitions.
Reagent(s)
According to certain aspects, the biological particles may be partitioned along with the lysing agent to release the contents of the biological particles within the partition. In such cases, the lysing agent may be contacted with the biological particle suspension at the same time as or immediately prior to introduction of the biological particles into the separation junction/droplet generation zone (e.g., junction 210), such as through one or more additional channels upstream of the channel junction. According to other aspects, additionally or alternatively, the biological particles may be separated along with other reagents, as will be described further below.
Fig. 3 shows an example of a microfluidic channel structure 300 for co-separating biological particles and reagents. Channel structure 300 may include channel segments 301, 302, 304, 306, and 308. The channel segments 301 and 302 communicate at a first channel connection 309. The channel segments 302, 304, 306, and 308 communicate at a second channel connection 310.
In an example operation, the channel segment 301 may transport an aqueous fluid 312 including a plurality of biological particles 314 into the second connection 310 along the channel segment 301. Alternatively or additionally, the channel segment 301 may transport beads (e.g., gel beads). The beads may comprise barcode molecules.
For example, the channel segment 301 may be connected to a reservoir of an aqueous suspension comprising biological particles 314. Upstream of and immediately before reaching the second connection 310, the channel segment 301 may meet the channel segment 302 at a first connection 309. The channel segment 302 may transport a plurality of reagents 315 (e.g., lysing agents) suspended in an aqueous fluid 312 along the channel segment 302 into the first connection 309. For example, channel segment 302 may be connected to a reservoir containing reagent 315. After the first connection 309, the aqueous fluid 312 in the channel segment 301 may bring both the biological particles 314 and the reagent 315 to the second connection 310. In some cases, the aqueous fluid 312 in the channel segment 301 may include one or more reagents, which may be the same or different reagents than the reagent 315. A second fluid 316 (e.g., oil) that is immiscible with the aqueous fluid 312 may be delivered from each of the channel segments 304 and 306 to the second connection 310. As the aqueous fluid 312 from the channel segment 301 and the second fluid 316 from each of the channel segments 304 and 306 meet at the second channel connection 310, the aqueous fluid 312 may separate into discrete droplets 318 in the second fluid 316 and flow along the channel segment 308 away from the second connection 310. The channel segment 308 may deliver the discrete droplets 318 to an outlet reservoir fluidly coupled to the channel segment 308, where the discrete droplets may be harvested.
The second fluid 316 may comprise an oil, such as a fluorinated oil, that includes a fluorosurfactant for stabilizing the resulting droplets (e.g., inhibiting subsequent coalescence of the resulting droplets 318).
The discrete droplets generated may include individual biological particles 314 and/or one or more reagents 315. In some cases, the discrete droplets generated may include beads (not shown) carrying a barcode, such as via other microfluidic structures described elsewhere herein. In some cases, the discrete droplets may be unoccupied (e.g., free of reagents, free of biological particles).
Advantageously, when the lysing agent and the biological particles are co-partitioned, the lysing agent may facilitate release of the contents of the biological particles within the partition. The contents released in a partition may remain discrete from the contents of other partitions.
It should be appreciated that the channel segments described herein may be coupled to any of a variety of different fluid sources or receiving components, including reservoirs, pipes, manifolds, or other system fluid components. It should be understood that the microfluidic channel structure 300 may have other geometries. For example, a microfluidic channel structure may have more than two channel connections. For example, a microfluidic channel structure may have 2, 3, 4, 5 or more channel segments each carrying the same or different types of beads, reagents and/or biological particles, which meet at a channel junction. The fluid flow in each channel segment can be controlled to control the separation of different elements into droplets. Fluid may be directed to flow along one or more channels or reservoirs via one or more fluid flow units. The fluid flow unit may include a compressor (e.g., providing positive pressure), a pump (e.g., providing negative pressure), an actuator, etc., to control the flow of fluid. The fluid may also or alternatively be controlled via an applied pressure differential, centrifugal force, electric pumping, vacuum, capillary or gravity flow, or the like.
Examples of lysing agents include bioactive agents, such as, for example, lysing enzymes for lysing different cell types (e.g., gram positive or negative bacteria, plants, yeast, mammals, etc.), such as lysozyme, leucopeptidase, lysostaphin, labase, rhizoctonia solani lyase (kitalase), lywallase, and a variety of other lysing enzymes available from, for example, sigma-Aldrich, inc. (St Louis, MO), as well as other commercially available lysing enzymes. Other lysing agents may additionally or alternatively be co-partitioned with the biological particles to cause the contents of the biological particles to be released into the partition. For example, in some cases, cells may be lysed using surfactant-based lysis solutions, but these solutions may be less desirable for emulsion-based systems where surfactants may interfere with stable emulsions. In some cases, the lysis solution may contain nonionic surfactants, such as Triton X-100 and Tween 20. In some cases, the lysis solution may contain ionic surfactants such as sodium dodecyl sarcosinate and Sodium Dodecyl Sulfate (SDS). Electroporation, thermal, acoustic or mechanical cell disruption may also be used in certain situations, for example non-emulsion based partitioning, such as encapsulation of biological particles, which may be in addition to or instead of droplet partitioning, wherein any pore size of the encapsulate is sufficiently small to retain a nucleic acid fragment of a given size after cell disruption.
Alternatively or in addition to the lysis agent co-separated from the biological particles described above, other agents may also be co-separated from the biological particles, including, for example, dnase and rnase inactivating agents or inhibitors, e.g., proteinase K, chelating agents such as EDTA, and other agents for removing or otherwise reducing the negative activity or impact of different cell lysate components on subsequent nucleic acid treatment. In addition, in the case of encapsulated biological particles, the biological particles may be exposed to an appropriate stimulus to release the biological particles or their contents from the co-partitioned microcapsules. For example, in some cases, chemical stimuli may be co-segregated with encapsulated biological particles to allow for microcapsule degradation and release of cells or their contents into larger partitions. In some cases, the stimulus may be the same as the stimulus described elsewhere herein for releasing nucleic acid molecules (e.g., oligonucleotides) from their respective microcapsules (e.g., beads). In alternative aspects, this may be a different and non-overlapping stimulus, so as to allow the encapsulated biological particles to be released into the partition at a different time than the nucleic acid molecules are released into the same partition.
Additional reagents such as endonucleases can also be co-partitioned with the biological particles to fragment DNA of the biological particles, DNA polymerase and dntps used to amplify nucleic acid fragments of the biological particles, and to ligate barcode molecular tags to amplified fragments. Other enzymes may be co-partitioned, including, but not limited to, polymerases, transposases, ligases, proteases K, DNA enzymes, and the like. Additional reagents may also include reverse transcriptase (including enzymes having terminal transferase activity), primers and oligonucleotides, and switch oligonucleotides (also referred to herein as "switch oligonucleotides" or "template switch oligonucleotides") that may be used for template switching. In some cases, template switching may be used to increase the length of the cDNA. In some cases, template switching may be used to supplement a predefined nucleic acid sequence to the cDNA. In the example of template switching, the cDNA may be generated from reverse transcription of a template (e.g., cellular mRNA), where a reverse transcriptase having terminal transferase activity may add additional nucleotides, such as poly-C, to the cDNA in a template-independent manner. The transition oligonucleotide may comprise a sequence complementary to an additional nucleotide, such as poly-G. An additional nucleotide on the cDNA (e.g., polyC) may hybridize to an additional nucleotide on the switch oligonucleotide (e.g., polyG), whereby the reverse transcriptase may use the switch oligonucleotide as a template to further extend the cDNA. The template switching oligonucleotide may comprise a hybridization region and a template region. The hybridization region may comprise any sequence capable of hybridizing to a target. In some cases, as previously described, the hybridization region comprises a series of G bases to complement the overhanging C base at the 3' end of the cDNA molecule. The series of G bases can include 1G base, 2G bases, 3G bases, 4G bases, 5G bases, or more than 5G bases. The template sequence may comprise any sequence to be incorporated into the cDNA. In some cases, the template region comprises at least 1 (e.g., at least 2, 3, 4, 5, or more) tag sequences and/or functional sequences. The transition oligonucleotide may comprise deoxyribonucleic acid; ribonucleic acid; modified nucleic acids, including 2-aminopurine, 2, 6-diaminopurine (2-amino-dA), inverted dT, 5-methyl dC, 2' -deoxyinosine, super T (5-hydroxybutyrine-2 ' -deoxyuridine), super G (8-aza-7-deazaguanosine), locked Nucleic Acids (LNA), unlocked nucleic acids (UNA, e.g., UNA-A, UNA-U, UNA-C, UNA-G), iso-dG, iso-dC, 2' fluoro bases (e.g., fluoro C, fluoro U, fluoro A, and fluoro G), or any combination.
In some of the cases where the number of the cases, the transition oligonucleotide may have a length of at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 248. 249 or 250 nucleotides or longer.
In some of the cases where the number of the cases, the length of the switching oligonucleotide may be up to about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 248. 249 or 250 nucleotides.
Once the contents of the cells are released into their respective partitions, the macromolecular components contained therein (e.g., macromolecular components of the biological particles such as RNA, DNA, or proteins) may be further processed within the partitions. According to the methods and systems described herein, the macromolecular component contents of individual biological particles may be provided with unique identifiers such that when characterizing those macromolecular components, they may be attributed as having been derived from one or more identical biological particles. The ability to attribute a feature to an individual biological particle or group of biological particles is provided by the specific assignment of unique identifier Fu Te to the individual biological particle or group of biological particles. Individual biological particles or groups of biological particles may be assigned or associated with a unique identifier, for example in the form of a nucleic acid barcode, in order to tag or label the macromolecular components of the biological particles (and thus their characteristics) with the unique identifier. These unique identifiers can then be used to attribute the components and characteristics of the biological particles to individual biological particles or groups of biological particles.
In some aspects, this is performed by co-segregating individual biological particles or groups of biological particles from a unique identifier, such as described above (with reference to fig. 2). In some aspects, the unique identifier is provided in the form of a nucleic acid molecule (e.g., an oligonucleotide) comprising a nucleic acid barcode sequence that may be linked or otherwise associated with the nucleic acid content of the individual biological particle or with other components of the biological particle, particularly with fragments of such nucleic acids. The nucleic acid molecules are partitioned such that, when between nucleic acid molecules in a given partition, the nucleic acid barcode sequences contained therein are identical, but when between different partitions, the nucleic acid molecules may and do have different barcode sequences, or at least represent a large number of different barcode sequences in all partitions in a given analysis. In some aspects, only one nucleic acid barcode sequence may be associated with a given partition, but in some cases, there may be two or more different barcode sequences.
The nucleic acid barcode sequence may comprise about 6 to about 20 or more nucleotides within the sequence of a nucleic acid molecule (e.g., an oligonucleotide). The nucleic acid barcode sequence may comprise about 6 to about 20, 30, 40, 50, 60, 70, 80, 90, 100 or more nucleotides. In some cases, the barcode sequence may be about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 nucleotides or more in length. In some cases, the barcode sequence may be at least about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 nucleotides or more in length. In some cases, the barcode sequence may be up to about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 nucleotides or less in length. These nucleotides may be completely contiguous, i.e. in a single stretch of adjacent nucleotides, or they may be divided into two or more separate subsequences separated by 1 or more nucleotides. In some cases, the separate barcode sequences may be about 4 to about 16 nucleotides in length. In some cases, the barcode sequence may be about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 nucleotides or longer. In some cases, the barcode sequence may be at least about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 nucleotides or longer. In some cases, the barcode sequence may be up to about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 nucleotides or less.
The co-partitioned nucleic acid molecules may also contain other functional sequences useful for processing nucleic acids from the co-partitioned biological particles. These sequences include, for example, targeting or random/universal amplification primer sequences for amplifying genomic DNA from individual biological particles within a partition, while ligating associated barcode sequences, sequencing primers or primer recognition sites, hybridization or detection sequences, for example, for identifying the presence of these sequences or for down-sizing any of a number of other potential functional sequences. Other mechanisms for co-partitioning oligonucleotides may also be employed, including, for example, coalescing two or more droplets (one of which contains an oligonucleotide), or microdispersing the oligonucleotides into partitions (e.g., droplets within a microfluidic system).
In one example, microcapsules (such as beads) are provided that each include a plurality of the above-described barcoded nucleic acid molecules (e.g., barcoded oligonucleotides) releasably attached to the beads, wherein all nucleic acid molecules attached to a particular bead will include the same nucleic acid barcode sequence, but represent a plurality of different barcode sequences in the population of beads used. In some embodiments, for example, hydrogel beads comprising a polyacrylamide polymer matrix are used as solid supports and delivery vehicles for nucleic acid molecules into partitions, as they are capable of carrying large amounts of nucleic acid molecules, and may be configured to release those nucleic acid molecules upon exposure to a specific stimulus, as described elsewhere herein. In some cases, the bead population provides a diverse barcode sequence library comprising at least about 1,000 different barcode sequences, at least about 5,000 different barcode sequences, at least about 10,000 different barcode sequences, at least about 50,000 different barcode sequences, at least about 100,000 different barcode sequences, at least about 1,000,000 different barcode sequences, at least about 5,000,000 different barcode sequences, or at least about 10,000,000 different barcode sequences or more. In some cases, the bead population provides a diverse barcode sequence library comprising about 1,000 to about 10,000 different barcode sequences, about 5,000 to about 50,000 different barcode sequences, about 10,000 to about 100,000 different barcode sequences, about 50,000 to about 1,000,000 different barcode sequences, or about 100,000 to about 10,000,000 different barcode sequences.
In addition, a large number of nucleic acid (e.g., oligonucleotide) molecules can be provided for each bead that are linked. In particular, the number of molecules comprising the nucleic acid molecules of the barcode sequence on the individual beads can be at least about 1,000 nucleic acid molecules, at least about 5,000 nucleic acid molecules, at least about 10,000 nucleic acid molecules, at least about 50,000 nucleic acid molecules, at least about 100,000 nucleic acid molecules, at least about 500,000 nucleic acids, at least about 1,000,000 nucleic acid molecules, at least about 5,000,000 nucleic acid molecules, at least about 10,000,000 nucleic acid molecules, at least about 50,000,000 nucleic acid molecules, at least about 100,000,000 nucleic acid molecules, at least about 250,000,000 nucleic acid molecules, and in some cases, at least about 10 hundred million nucleic acid molecules or more. In some embodiments, the number of nucleic acid molecules comprising the barcode sequence on an individual bead is between about 1,000 and about 10,000 nucleic acid molecules, about 5,000 and about 50,000 nucleic acid molecules, about 10,000 and about 100,000 nucleic acid molecules, about 50,000 to about 1,000,000 nucleic acid molecules, about 100,000 to about 10,000,000 nucleic acid molecules, about 1,000,000 to about 10 hundred million nucleic acid molecules.
The nucleic acid molecules of a given bead may include identical (or common) barcode sequences, different barcode sequences, or a combination of both. The nucleic acid molecules of a given bead may include multiple sets of nucleic acid molecules. A given set of nucleic acid molecules may include identical barcode sequences. The same barcode sequence may be different from the barcode sequence of another set of nucleic acid molecules.
In addition, when partitioning a population of beads, the resulting partitioned population can also include a diverse barcode library including at least about 1,000 different barcode sequences, at least about 5,000 different barcode sequences, at least about 10,000 different barcode sequences, at least about 50,000 different barcode sequences, at least about 100,000 different barcode sequences, at least about 1,000,000 different barcode sequences, at least about 5,000,000 different barcode sequences, or at least about 10,000,000 different barcode sequences. Further, each partition of the population may include at least about 1,000 nucleic acid molecules, at least about 5,000 nucleic acid molecules, at least about 10,000 nucleic acid molecules, at least about 50,000 nucleic acid molecules, at least about 100,000 nucleic acid molecules, at least about 500,000 nucleic acids, at least about 1,000,000 nucleic acid molecules, at least about 5,000,000 nucleic acid molecules, at least about 10,000,000 nucleic acid molecules, at least about 50,000,000 nucleic acid molecules, at least about 100,000,000 nucleic acid molecules, at least about 250,000,000 nucleic acid molecules, and in some cases, at least about 10 hundred million nucleic acid molecules.
In some cases, the resulting partitioned population provides a diverse barcode sequence library comprising about 1,000 to about 10,000 different barcode sequences, about 5,000 to about 50,000 different barcode sequences, about 10,000 to about 100,000 different barcode sequences, about 50,000 to about 1,000,000 different barcode sequences, or about 100,000 to about 10,000,000 different barcode sequences. In addition, each partition of the population may include from about 1,000 to about 10,000 nucleic acid barcode molecules, from about 5,000 to about 50,000 nucleic acid barcode molecules, from about 10,000 to about 100,000 nucleic acid barcode molecules, from about 50,000 to about 1,000,000 nucleic acid barcode molecules, from about 100,000 to about 10,000,000 nucleic acid barcode molecules, from about 1,000,000 to about 10 hundred million nucleic acid barcode molecules.
In some cases, it may be desirable to incorporate multiple different barcodes into a given partition, the barcodes being attached to a single or multiple beads within the partition. For example, in some cases, mixed but known sets of barcode sequences may provide greater assurance of authentication in subsequent processing, e.g., by providing a stronger address or home of the barcode to a given partition, as a duplicate acknowledgement or independent acknowledgement of the output of the given partition.
Upon application of a specific stimulus to the bead, the nucleic acid molecule (e.g., oligonucleotide) may be released from the bead. In some cases, the stimulus may be a light stimulus, for example by cleavage of a photolabile bond, thereby releasing the nucleic acid molecule. In other cases, thermal stimulation may be used, wherein an increase in the temperature of the bead environment will cause cleavage or other release of the bond from the bead. In other cases, chemical stimulus may be used that cleaves the bond of the nucleic acid molecule to the bead, or otherwise causes release of the nucleic acid molecule from the bead. In one instance, such compositions include the polyacrylamide matrices described above for encapsulating biological particles, and can be degraded by exposure to a reducing agent (such as DTT) to release the linked nucleic acid molecules.
In some aspects, systems and methods for controlled separation are provided. The droplet size may be controlled by adjusting certain geometric features in the channel architecture (e.g., microfluidic channel architecture). For example, the spread angle, width, and/or length of the channel may be adjusted to control droplet size.
Fig. 4 shows an example of a microfluidic channel structure for controlled separation of beads into discrete droplets. The channel structure 400 may include a channel segment 402 that communicates with a reservoir 404 at a channel connection 406 (or intersection). The reservoir 404 may be a chamber. As used herein, any reference to a "reservoir" may also refer to a "chamber. In operation, the aqueous fluid 408 containing the suspended beads 412 may be transported along the channel segment 402 into the connection 406 to encounter the second fluid 410 that is immiscible with the aqueous fluid 408 in the reservoir 404, thereby producing droplets 416, 418 of the aqueous fluid 408 flowing into the reservoir 404. The junction 406 where the aqueous fluid 408 and the second fluid 410 meet may be based on certain geometric parameters (e.g., w, h) such as the hydrodynamic forces at the junction 406, the flow rates of the two fluids 408, 410, the fluid characteristics, and the channel structure 400 0 α, etc.) to form droplets. By continuously injecting aqueous fluid 408 from channel segment 402 through connection 406, a plurality of droplets may be collected in reservoir 404.
The discrete droplets generated may include beads (e.g., as in occupied droplets 416). Alternatively, the discrete droplets generated may comprise more than one bead. Alternatively, the discrete droplets generated may not include any beads (e.g., as in unoccupied droplets 418). In some cases, the discrete droplets generated may contain one or more biological particles, as described elsewhere herein. In some cases, the discrete droplets generated may contain one or more reagents, as described elsewhere herein.
In some cases, the aqueous fluid 408 may have a substantially uniform concentration or frequency of beads 412. Beads 412 may be introduced into channel segment 402 from a separate channel (not shown in fig. 4). The frequency of the beads 412 in the channel section 402 may be controlled by controlling the frequency of introduction of the beads 412 into the channel section 402 and/or the relative flow rates of the fluids in the channel section 402 and the individual channels. In some cases, beads may be introduced into channel segment 402 from a plurality of different channels, and the frequencies controlled accordingly.
In some cases, the aqueous fluid 408 in the channel segment 402 may contain biological particles (e.g., as described with reference to fig. 1 and 2). In some cases, the aqueous fluid 408 may have a substantially uniform concentration or frequency of biological particles. As with the beads, the biological particles may be introduced into the channel segment 402 from a separate channel. The frequency or concentration of biological particles in the aqueous fluid 408 in the channel section 402 may be controlled by controlling the frequency of introduction of biological particles into the channel section 402 and/or the relative flow rates of the fluid in the channel section 402 and the separate channel. In some cases, biological particles may be introduced into channel segment 402 from a plurality of different channels, and the frequencies controlled accordingly. In some cases, a first individual channel may introduce beads into channel segment 402, and a second individual channel may introduce biological particles into the channel segment. The first separate channel into which the beads are introduced may be upstream or downstream of the second separate channel into which the biological particles are introduced.
The second fluid 410 may comprise an oil, such as a fluorinated oil, that includes a fluorosurfactant for stabilizing the resulting droplets (e.g., inhibiting subsequent coalescence of the resulting droplets).
In some cases, the second fluid 410 may not experience and/or be directed to any flow into or out of the reservoir 404. For example, the second fluid 410 may be substantially stationary in the reservoir 404. In some cases, the second fluid 410 may be subject to flow within the reservoir 404, but not flow into or out of the reservoir 404, such as by applying pressure to the reservoir 404 and/or being affected by the incoming flow of aqueous fluid 408 at the connection 406. Alternatively, the second fluid 410 may be subjected to and/or directed to flow into or out of the reservoir 404. For example, reservoir 404 may be a channel that directs second fluid 410 from upstream to downstream, transporting the generated droplets.
The channel structure 400 at or near the connection 406 may have certain geometric features that at least partially determine the size of the droplets formed by the channel structure 400. The channel section 402 may have a height h at or near the connection 406 0 And a width w. For example, the channel segment 402 may have a rectangular cross-section that leads to a reservoir 404 having a wider cross-section (such as in width or diameter). Alternatively, the cross-section of the channel section 402 may be other shapes, such as a circular shape, a trapezoidal shape, a polygonal shape, or any other shape. The top and bottom walls of the reservoir 404 at or near the connection 406 may be inclined at an expansion angle α. The spread angle α allows the tongue (the portion of the aqueous fluid 408 that exits the channel segment 402 at the junction 406 and enters the reservoir 404 prior to droplet formation) to increase in depth and facilitate reducing the curvature of the intermediately formed droplets. The droplet size may decrease with increasing spread angle. Can be obtained by the method h 0 The following equations for the w and alpha geometry predict the final droplet radius R d
For example, for channel structures w=21 μm, h=21 μm, and α=3°, the predicted droplet size is 121 μm. In another example, for a channel structure w=25 μm, h=25 μm, and α=5°, the predicted droplet size is 123 μm. In another example, for a channel structure of w=28 μm, h=28 μm, and α=7°, the predicted droplet size is 124 μm.
In some cases, the spread angle α may be in the range of about 0.5 ° to about 4 °, about 0.1 ° to about 10 °, or about 0 ° to about 90 °. For example, the spread angle may be at least about 0.01 °, 0.1 °, 0.2 °, 0.3 °, 0.4 °, 0.5 °, 0.6 °, 0.7 °, 0.8 °, 0.9 °, 1 °, 2 °, 3 °, 4 °, 5 °, 6 °, 7 °, 8 °, 9 °, 10 °, 15 °, 20 °, 25 °, 30 °, 35 °, 40 °, 45 °, 50 °, 55 °, 60 °, 65 °, 70 °, 75 °, 80 °, 85 ° or higher. In some cases, the spread angle may be up to about 89 °, 88 °, 87 °, 86 °, 85 °, 84 °, 83 °, 82 °, 81 °, 80 °, 75 °, 70 °, 65 °, 60 °, 55 °, 50 °, 45 °, 40 °, 35 °, 30 °, 25 °, 20 °, 15 °, 10 °, 9 °, 8 °, 7 °, 6 °, 5 °, 4 °, 3 °, 2 °, 1 °, 0.1 °, 0.01 ° or less. In some cases, the width w may be in the range of about 100 micrometers (μm) to about 500 μm. In some cases, the width w may be in the range of about 10 μm to about 200 μm. Alternatively, the width may be less than about 10 μm. Alternatively, the width may be greater than about 500 μm. In some cases, the flow rate of the aqueous fluid 408 entering the connection 406 may be between about 0.04 microliters (μl)/minute (min) and about 40 μl/min. In some cases, the flow rate of the aqueous fluid 408 entering the connection 406 may be between about 0.01 microliters (μl)/minute (min) and about 100 μl/min. Alternatively, the flow rate of the aqueous fluid 408 entering the connection 406 may be less than about 0.01 μl/min. Alternatively, the flow rate of the aqueous fluid 408 into the connection 406 may be greater than about 40 μL/min, such as 45 μL/min, 50 μL/min, 55 μL/min, 60 μL/min, 65 μL/min, 70 μL/min, 75 μL/min, 80 μL/min, 85 μL/min, 90 μL/min, 95 μL/min, 100 μL/min, 110 μL/min, 120 μL/min, 130 μL/min, 140 μL/min, 150 μL/min, or more. At lower flow rates (such as flow rates less than or equal to about 10 microliters/minute), the droplet radius may not depend on the flow rate of the aqueous fluid 408 entering the junction 406.
In some cases, at least about 50% of the droplets generated may have a uniform size. In some cases, at least about 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or more of the droplets generated may have uniform size. Alternatively, less than about 50% of the droplets generated may have a uniform size.
The flux of droplet generation may be increased by increasing the point of generation, for example, increasing the number of connections (e.g., connections 406) between the channel segments (e.g., channel segments 402) of the aqueous fluid 408 and the reservoir 404. Alternatively or in addition, the flux of droplet generation may be increased by increasing the flow rate of the aqueous fluid 408 in the channel segment 402.
Fig. 5 shows an example of a microfluidic channel structure for achieving increased droplet generation throughput. The microfluidic channel structure 500 can include a plurality of channel segments 502 and reservoirs 504. Each of the plurality of channel segments 502 may be in fluid communication with a reservoir 504. The channel structure 500 may include a plurality of channel connections 506 between the plurality of channel segments 502 and the reservoir 504. Each channel connection may be a point of droplet generation. The channel segment 402 from the channel structure 400 in fig. 4 and any description of its components may correspond to a given channel segment of the plurality of channel segments 502 in the channel structure 500 and any description of its corresponding components. The repository 404 from the channel structure 400 and any description of its components may correspond to the repository 504 from the channel structure 500 and any description of its corresponding components.
Each of the plurality of channel segments 502 may include an aqueous fluid 508 containing suspended beads 512. The reservoir 504 may include a second fluid 510 that is immiscible with the aqueous fluid 508. In some cases, the second fluid 510 may not experience and/or be directed to any flow into or out of the reservoir 504. For example, the second fluid 510 may be substantially stationary in the reservoir 504. In some cases, the second fluid 510 may be subject to flow within the reservoir 504, but not flow into or out of the reservoir 504, such as by applying pressure to the reservoir 504 and/or being affected by an incoming flow of aqueous fluid 508 at the connection. Alternatively, the second fluid 510 may be subjected to and/or directed to flow into or out of the reservoir 504. For example, reservoir 504 may be a channel that directs second fluid 510 from upstream to downstream, transporting the generated droplets.
In operation, an aqueous fluid 508 containing suspended beads 512 may be transported along the plurality of channel segments 502 into the plurality of connections 506 to encounter a second fluid 510 in the reservoir 504 to produce droplets 516, 518. Droplets may be formed from each channel segment at each corresponding connection with reservoir 504. The junction where the aqueous fluid 508 and the second fluid 510 meet may be based on certain geometric parameters (e.g., w, h 0 α, etc.) to form droplets as described elsewhere herein. By continuously injecting aqueous fluid 508 from the plurality of channel segments 502 through the plurality of connections 506, a plurality of droplets may be collected in the reservoir 504. The flux may increase significantly with the parallel channel configuration of the channel structure 500. For example, a channel structure having five inlet channel segments containing aqueous fluid 508 may generate droplets at a frequency five times that of a channel structure having one inlet channel segment, provided that the fluid flow rates in the channel segments are substantially the same. The fluid flow rates in the different inlet channel segments may or may not be substantially the same. The channel structure may have as many parallel channel segments as practical and allow the reservoir to be dimensioned. For example, the channel structure may have at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 500, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 1500, 5000, or more parallel or substantially parallel channel segments.
For each of the plurality of channel segments 502, geometric parameters w, h 0 And α may or may not be uniform. For example, each channel segment may have the same or different widths at or near its respective channel connection with the reservoir 504. For example, each channel segment may have the same or different heights at or near its respective channel connection with the reservoir 504. In another example The reservoirs 504 may have the same or different expansion angles at different channel connections with the plurality of channel segments 502. When the geometric parameters are uniform, it is advantageous that the droplet size can be controlled to be uniform even if the flux is increased. In some cases, when it is desired to have different droplet size distributions, the geometric parameters of the plurality of channel segments 502 may be changed accordingly.
In some cases, at least about 50% of the droplets generated may have a uniform size. In some cases, at least about 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or more of the droplets generated may have uniform size. Alternatively, less than about 50% of the droplets generated may have a uniform size.
Fig. 6 shows another example of a microfluidic channel structure for achieving increased droplet generation throughput. The microfluidic channel structure 600 may include a plurality of channel segments 602 arranged generally circularly around the perimeter of a reservoir 604. Each of the plurality of channel segments 602 may be in fluid communication with a reservoir 604. The channel structure 600 may include a plurality of channel connections 606 between the plurality of channel segments 602 and the reservoir 604. Each channel connection may be a point of droplet generation. The channel segment 402 from the channel structure 400 in fig. 4 and any description of its components may correspond to a given channel segment of the plurality of channel segments 602 in the channel structure 600 and any description of its corresponding components. The repository 404 from the channel structure 400 and any description of its components may correspond to the repository 604 from the channel structure 600 and any description of its corresponding components.
Each of the plurality of channel segments 602 may include an aqueous fluid 608 containing suspended beads 612. The reservoir 604 may include a second fluid 610 that is immiscible with the aqueous fluid 608. In some cases, the second fluid 610 may not experience and/or be directed to any flow into or out of the reservoir 604. For example, the second fluid 610 may be substantially stationary in the reservoir 604. In some cases, the second fluid 610 may be subject to flow within the reservoir 604, but not flow into or out of the reservoir 604, such as by applying pressure to the reservoir 604 and/or being affected by an incoming flow of aqueous fluid 608 at the connection. Alternatively, the second fluid 610 may be subjected to and/or directed to flow into or out of the reservoir 604. For example, reservoir 604 may be a channel that directs second fluid 610 from upstream to downstream, transporting the generated droplets.
In operation, an aqueous fluid 608 containing suspended beads 612 may be transported along the plurality of channel segments 602 into the plurality of connections 606 to encounter the second fluid 610 in the reservoir 604 to produce a plurality of droplets 616. Droplets may be formed from each channel segment at each corresponding connection with reservoir 604. At the junction where the aqueous fluid 608 and the second fluid 610 meet, droplets may be formed based on certain geometric parameters such as the hydrodynamic forces at the junction, the flow rates of the two fluids 608, 610, the fluid characteristics, and the channel structure 600 (e.g., the width and height of the channel segments 602, the expansion angle of the reservoirs 604, etc.), as described elsewhere herein. By continuously injecting aqueous fluid 608 from the plurality of channel segments 602 through the plurality of connections 606, a plurality of droplets may be collected in the reservoir 604. The flux may increase significantly with the substantially parallel channel configuration of the channel structure 600. The channel structure may have as many substantially parallel channel segments as practical and allow the reservoir to be dimensioned. For example, the channel structure may have at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 1500, 5000, or more parallel or substantially parallel channel segments. The plurality of channel segments may be substantially evenly spaced around the rim or perimeter of the reservoir, for example. Alternatively, the spacing of the plurality of channel segments may be non-uniform.
The reservoir 604 may have a spread angle α (not shown in fig. 6) at or near each channel connection. Each channel segment of the plurality of channel segments 602 may have a width w and a height h at or near the channel connection 0 . For each of the plurality of channel segments 602, geometric parameters w, h 0 And alpha may be the same or may beAre not uniform. For example, each channel segment may have the same or different widths at or near its respective channel connection with the reservoir 604. For example, each channel segment may have the same or different heights at or near its respective channel connection with the reservoir 604.
The reservoirs 604 may have the same or different expansion angles at different channel connections to the plurality of channel segments 602. For example, a circular reservoir (as shown in fig. 6) may have a conical, domed, or hemispherical ceiling (e.g., a top wall) to provide each channel segment 602 at or near the plurality of channel connections 606 with the same or substantially the same expansion angle. When the geometric parameters are uniform, it is advantageous that the resulting droplet size can be controlled to be uniform even if the flux is increased. In some cases, when it is desired to have different droplet size distributions, the geometric parameters of the plurality of channel segments 602 may be changed accordingly.
In some cases, at least about 50% of the droplets generated may have a uniform size. In some cases, at least about 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or more of the droplets generated may have uniform size. Alternatively, less than about 50% of the droplets generated may have a uniform size. The beads and/or biological particles injected into the droplets may or may not have a uniform size.
Fig. 7A shows a cross-sectional view of another example of a microfluidic channel structure with geometric features for controlled separation. The channel structure 700 may include a channel segment 702 that communicates with a reservoir 704 at a channel connection 706 (or intersection). In some cases, the channel structure 700 and one or more components thereof may correspond to the channel structure 100 and one or more components thereof. Fig. 7B shows a perspective view of the channel structure 700 of fig. 7A.
An aqueous fluid 712 comprising a plurality of particles 716 may be transported along the channel segment 702 into the connection 706 to encounter a second fluid 714 (e.g., oil, etc.) that is immiscible with the aqueous fluid 712 in the reservoir 704, thereby producing droplets 720 of the aqueous fluid 712 flowing into the reservoir 704. At the junction 706 where the aqueous fluid 712 and the second fluid 714 meet, droplets may be formed based on certain geometric parameters (e.g., Δh, etc.) such as the hydrodynamic forces at the junction 706, the relative flow rates of the two fluids 712, 714, the fluid characteristics, and the channel structure 700. By continuously injecting aqueous fluid 712 from channel segment 702 at connection 706, a plurality of droplets may be collected in reservoir 704.
The discrete droplets generated may comprise one or more particles of the plurality of particles 716. As described elsewhere herein, the particle may be any particle, such as a bead, a cell bead, a gel bead, a biological particle, a macromolecular component of a biological particle, or other particle. Alternatively, the discrete droplets generated may not contain any particles.
In some cases, the aqueous fluid 712 may have a substantially uniform concentration or frequency of particles 716. Particles 716 (e.g., beads) may be introduced into channel segment 702 from a separate channel (not shown in fig. 7) as described elsewhere herein (e.g., with reference to fig. 4). The frequency of particles 716 in channel segment 702 may be controlled by controlling the frequency of introduction of particles 716 into channel segment 702 and/or the relative flow rates of the fluids in channel segment 702 and the individual channels. In some cases, particles 716 may be introduced into channel segment 702 from a plurality of different channels, and the frequency controlled accordingly. In some cases, different particles may be introduced via separate channels. For example, a first individual channel may introduce beads into channel segment 702, and a second individual channel may introduce biological particles into the channel segment. The first separate channel into which the beads are introduced may be upstream or downstream of the second separate channel into which the biological particles are introduced.
In some cases, the second fluid 714 may not undergo and/or be directed to any flow into or out of the reservoir 704. For example, the second fluid 714 may be substantially stationary in the reservoir 704. In some cases, the second fluid 714 may be subject to flow within the reservoir 704, but not into or out of the reservoir 704, such as by applying pressure to the reservoir 704 and/or being affected by an incoming flow of aqueous fluid 712 at the connection 706. Alternatively, the second fluid 714 may be subjected to and/or directed to flow into or out of the reservoir 704. For example, reservoir 704 may be a channel that directs the second fluid 714 from upstream to downstream, transporting the generated droplets.
The channel structure 700 at or near the connection 706 may have certain geometric features that at least partially determine the size and/or shape of the droplets formed by the channel structure 700. The channel segment 702 may have a first cross-sectional height h 1 And the reservoir 704 may have a second cross-sectional height h 2 . First cross-sectional height h 1 And a second cross-sectional height h 2 May be different such that there is a height difference of ah at the connection 706. Second cross-sectional height h 2 May be greater than the first cross-sectional height h 1 . In some cases, the further away from the connection 706, for example, the cross-sectional height of the reservoir may then gradually increase. In some cases, the cross-sectional height of the reservoir may increase according to the expansion angle β at or near the connection 706. The height difference Δh and/or spread angle β may allow the tongue (the portion of the aqueous fluid 712 that exits the channel segment 702 at the connection 706 and enters the reservoir 704 prior to droplet formation) to increase in depth and facilitate reducing the curvature of the intermediately formed droplets. For example, the droplet size may decrease with increasing height difference and/or increasing spread angle.
The height difference Δh may be at least about 1 μm. Alternatively, the height difference may be at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500 μm or more. Alternatively, the height difference may be up to about 500, 400, 300, 200, 100, 90, 80, 70, 60, 50, 45, 40, 35, 30, 25, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 μm or less. In some cases, the spread angle β may be in the range of about 0.5 ° to about 4 °, about 0.1 ° to about 10 °, or about 0 ° to about 90 °. For example, the spread angle may be at least about 0.01 °, 0.1 °, 0.2 °, 0.3 °, 0.4 °, 0.5 °, 0.6 °, 0.7 °, 0.8 °, 0.9 °, 1 °, 2 °, 3 °, 4 °, 5 °, 6 °, 7 °, 8 °, 9 °, 10 °, 15 °, 20 °, 25 °, 30 °, 35 °, 40 °, 45 °, 50 °, 55 °, 60 °, 65 °, 70 °, 75 °, 80 °, 85 ° or higher. In some cases, the spread angle may be up to about 89 °, 88 °, 87 °, 86 °, 85 °, 84 °, 83 °, 82 °, 81 °, 80 °, 75 °, 70 °, 65 °, 60 °, 55 °, 50 °, 45 °, 40 °, 35 °, 30 °, 25 °, 20 °, 15 °, 10 °, 9 °, 8 °, 7 °, 6 °, 5 °, 4 °, 3 °, 2 °, 1 °, 0.1 °, 0.01 ° or less.
In some cases, the flow rate of the aqueous fluid 712 entering the connection 706 may be between about 0.04 microliters (μl)/minute (min) and about 40 μl/min. In some cases, the flow rate of the aqueous fluid 712 entering the connection 706 may be between about 0.01 microliters (μl)/minute (min) and about 100 μl/min. Alternatively, the flow rate of the aqueous fluid 712 entering the connection 706 may be less than about 0.01 μl/min. Alternatively, the flow rate of the aqueous fluid 712 entering the connection 706 may be greater than about 40 μL/min, such as 45 μL/min, 50 μL/min, 55 μL/min, 60 μL/min, 65 μL/min, 70 μL/min, 75 μL/min, 80 μL/min, 85 μL/min, 90 μL/min, 95 μL/min, 100 μL/min, 110 μL/min, 120 μL/min, 130 μL/min, 140 μL/min, 150 μL/min, or more. At lower flow rates (such as flow rates less than or equal to about 10 microliters/minute), the droplet radius may not depend on the flow rate of the aqueous fluid 712 entering the connection 706. The second fluid 714 may be stationary or substantially stationary in the reservoir 704. Alternatively, the second fluid 714 may flow at a flow rate such as described above for the aqueous fluid 712.
In some cases, at least about 50% of the droplets generated may have a uniform size. In some cases, at least about 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or more of the droplets generated may have uniform size. Alternatively, less than about 50% of the droplets generated may have a uniform size.
Although fig. 7A and 7B illustrate the height difference Δh that abruptly changes (e.g., increases) at the connection portion 706, the height difference may gradually increase (e.g., from about 0 μm to the maximum height difference). Alternatively, the height difference may be gradually reduced (e.g., tapered) from the maximum height difference. As used herein, a gradual increase or decrease in the height difference may refer to a continuous increase or decrease in the height difference, wherein the angle between any one micro-segment of the height profile and the immediately adjacent micro-segment of the height profile is greater than 90 °. For example, at the connection 706, the bottom wall of the channel and the bottom wall of the reservoir may meet at an angle greater than 90 °. Alternatively or in addition, the top wall of the channel (e.g., ceiling) and the top wall of the reservoir (e.g., ceiling) may meet an angle of greater than 90 °. The gradual increase or decrease may be linear or non-linear (e.g., exponential, sinusoidal, etc.). Alternatively or additionally, the height difference may be variably increased and/or decreased linearly or non-linearly. While fig. 7A and 7B illustrate the expanded bank cross-sectional height as being linear (e.g., a constant expansion angle β), the cross-sectional height may be expanded non-linearly. For example, the reservoir may be defined at least in part by a dome-like (e.g., hemispherical) shape having a variable expansion angle. The cross-sectional height may be expanded in any shape.
The network of channels (e.g., as described above or elsewhere herein) may be fluidly coupled to appropriate fluidic components. For example, the inlet channel segment is fluidly coupled to an appropriate source of material to be delivered to the channel connection. These sources may include any of a variety of different fluidic components, from a simple reservoir defined in or connected to the body structure of the microfluidic device, to a fluid conduit that delivers fluid from an external source of the device, a manifold, a fluid flow unit (e.g., actuator, pump, compressor), etc. Likewise, the outlet channel segment (e.g., channel segment 208, reservoir 604, etc.) may be fluidly coupled to a receiving container or conduit for the separated cells for subsequent processing. Again, this may be a reservoir defined in the body of the microfluidic device, or it may be a fluid conduit for delivering the separated cells to a subsequent processing operation, instrument or component.
The methods and systems described herein may be used to greatly improve the efficiency of single cell applications and/or other applications that receive droplet-based inputs. For example, subsequent operations that may be performed after sorting the occupied cells and/or cells of an appropriate size may include generating amplification products, purification (e.g., via Solid Phase Reversible Immobilization (SPRI)), further processing (e.g., cleavage, ligation, and subsequent amplification (e.g., via PCR)) of the functional sequences. These operations may occur in the ontology (e.g., outside the partition). In the case where the partition is a droplet in an emulsion, the emulsion may be broken and the contents of the droplet combined for additional operations. Additional reagents that may be co-partitioned with the barcoded beads may include oligonucleotides for blocking ribosomal RNA (rRNA) and nucleases for digesting genomic DNA in cells. Alternatively, rRNA removers may be applied during additional processing operations. The configuration of the constructs generated by this method can help minimize (or avoid) sequencing of the poly-T sequence and/or sequence the 5' end of the polynucleotide sequence during sequencing. The amplification products (e.g., the first amplification product and/or the second amplification product) can be sequenced for sequence analysis. In some cases, amplification may be performed using a partial hairpin sequencing amplification (PHASE) method.
A variety of applications require assessment of the presence and quantification of different biological particles or organism types within a population of biological particles, including, for example, microbiome analysis and characterization, environmental testing, food safety testing, epidemiological analysis, for example, in contaminant traceability, and the like.
In the methods and systems described herein, one or more labeling agents capable of binding to or otherwise coupling to one or more cellular features may be used to characterize a cell and/or a cellular feature. In some cases, the cell characteristic comprises a cell surface characteristic. Cell surface features may include, but are not limited to, receptors, antigens, surface proteins, transmembrane proteins, clusters of differentiated proteins, protein channels, protein pumps, carrier proteins, phospholipids, glycoproteins, glycolipids, cell-cell interactions, protein complexes, antigen presenting complexes, major histocompatibility complexes, engineered T cell receptors, B cell receptors, chimeric antigen receptors, gap junctions and adhesion junctions, or any combination thereof. In some cases, the cellular features may include intracellular analytes, such as proteins, protein modifications (e.g., phosphorylation states or other post-translational modifications), nuclear proteins, nuclear membrane proteins, or any combination thereof.
The labeling agent can include, but is not limited to, a protein (e.g., antigen), peptide, antibody (or epitope-binding fragment thereof), lipophilic moiety (such as cholesterol), cell surface receptor binding molecule, receptor ligand, small molecule, bispecific antibody, bispecific T cell adapter, T cell receptor adapter, B cell receptor adapter, antibody prodrug, aptamer, monoclonal antibody, affimer, darpin, and protein scaffold, or any combination thereof. The labeling agent may include (e.g., be linked to) a reporter oligonucleotide that indicates the cell surface characteristics to which the binding group binds. For example, the reporter oligonucleotide may comprise a barcode sequence that allows for the identification of the tagging agent. For example, a labeling agent specific for one type of cell feature (e.g., a first cell surface feature) may have a first reporter oligonucleotide coupled thereto, while a labeling agent specific for a different cell feature (e.g., a second cell surface feature) may have a different reporter oligonucleotide coupled thereto. For a description of exemplary labeling agents, reporter oligonucleotides and methods of use, see, e.g., U.S. patent 10,550,429; U.S. patent publication 20190177800; and U.S. patent publication 20190367969, which are incorporated by reference herein in their entirety.
In particular examples, a library of potential cellular feature markers associated with a nucleic acid reporter may be provided, for example, wherein different reporter oligonucleotide sequences are associated with each marker that is capable of binding to a specific cellular feature. In some aspects, different members of the library may be characterized by the presence of different oligonucleotide sequence tags, e.g., an antibody capable of binding to a first type of protein may have a first known reporter oligonucleotide sequence associated therewith, while an antibody capable of binding to a second protein (i.e., different than the first protein) may have a different known reporter oligonucleotide sequence associated therewith. Prior to partitioning, cells can be incubated with a library of labeling agents, which can represent labeling agents for a wide variety of different cellular features (e.g., receptors (e.g., BCR, TCR), proteins, etc.), and include their associated reporter oligonucleotides. Unbound labeling agent can be washed from the cells, and the cells can then be co-partitioned (e.g., co-partitioned into droplets or wells) with partition-specific barcode oligonucleotides (e.g., attached to beads, such as gel beads), as described elsewhere herein. Thus, a partition may include one or more cells as well as bound labeling agents and their known associated reporter oligonucleotides.
In other cases, for example to facilitate sample multiplexing, a labeling agent specific for a particular cellular feature may have a first plurality of labeling agents (e.g., antibodies or lipophilic moieties) coupled to a first reporter oligonucleotide and a second plurality of labeling agents coupled to a second reporter oligonucleotide. In this way, different samples or groups may be processed independently and then combined together for pooled analysis (e.g., partition-based barcoding as described elsewhere herein). See, for example, U.S. patent publication 20190323088, which is hereby incorporated by reference in its entirety.
In some aspects, these reporter oligonucleotides may comprise a nucleic acid barcode sequence that allows identification of the labeling agent to which the reporter oligonucleotide is coupled. The choice of oligonucleotide as a reporter may provide the following advantages: can create significant diversity in sequence while also being readily attachable to most biomolecules (e.g., antibodies, etc.), and easy to detect (e.g., using sequencing or array techniques).
The attachment (coupling) of the reporter oligonucleotide to the labeling agent may be accomplished by any of a variety of direct or indirect, covalent or non-covalent associations or linkages. For example, the oligonucleotide may be conjugated using chemical conjugation techniques (e.g., lightning available from Innova Biosciences) Antibody labeling kit) covalently linked to a portion of a labeling agent (such as a protein, e.g., an antibody or antibody fragment), and the use of otherNon-covalent attachment mechanisms are used for attachment, for example using biotinylated antibodies and oligonucleotides with avidin or streptavidin linkers (or beads comprising one or more biotinylated linkers coupled to the oligonucleotides). Antibodies and oligonucleotide biotinylation techniques are available. See, e.g., fang et al, "Fluoride-Cleavable Biotinylation Phosphoramidite for 5' -end-Labelling and Affinity Purification of Synthetic Oligonucleotides," Nucleic Acids res.2003, 1 month 15; 31 708-715, which is incorporated by reference herein in its entirety for all purposes. Also, protein and peptide biotinylation techniques have been developed and are ready for use. See, for example, U.S. patent No. 6,265,552, which is incorporated by reference herein in its entirety for all purposes. In addition, click chemistry such as methyltetrazine-PEG 5-NHS ester reaction, TCO-PEG4-NHS ester reaction, and the like can be used to couple the reporter oligonucleotide to the labeling agent. Commercially available kits (such as those from thunder and Abcam) may be used to couple the reporter oligonucleotide to the labeling agent as appropriate. In another example, the labeling agent is coupled indirectly (e.g., via hybridization) to a reporter oligonucleotide that comprises a barcode sequence that identifies the labeling agent. For example, the labeling agent can be directly coupled (e.g., covalently bound) to a hybridization oligonucleotide comprising a sequence that hybridizes to a sequence of the reporter oligonucleotide. Hybridization of the hybridization oligonucleotide to the reporter oligonucleotide couples the labeling agent to the reporter oligonucleotide. In some embodiments, the reporter oligonucleotide may be released from the tagging agent, such as upon application of a stimulus. For example, the reporter oligonucleotide may be linked to the labeling agent by an labile bond (e.g., chemically labile, photolabile, thermally labile, etc.), as generally described elsewhere herein for release of molecules from the support. In some cases, the reporter oligonucleotides described herein may include one or more functional sequences useful for subsequent processing, such as an adapter sequence, a Unique Molecular Identifier (UMI) sequence, a sequencer-specific flow cell ligation sequence (such as a P5, P7 or partial P5 or P7 sequence), a primer or primer binding sequence, a sequencing primer, or Primer binding sequences (such as R1, R2 or partial R1 or R2 sequences).
In some cases, the labeling agent may comprise a reporter oligonucleotide and a tag. The label may be a fluorophore, a radioisotope, a molecule capable of undergoing a colorimetric reaction, a magnetic particle, or any other suitable molecule or compound capable of detection. The tag may be conjugated directly or indirectly to a labeling agent (or reporter oligonucleotide) (or the tag may be conjugated to a molecule that can bind to a labeling agent or reporter oligonucleotide). In some cases, the tag is conjugated to a first oligonucleotide that is complementary (e.g., hybridizes) to the sequence of the reporter oligonucleotide.
FIG. 11 depicts an exemplary labeling agent (1110, 1120, 1130) comprising a reporter oligonucleotide (1140) attached thereto. The labeling agent 1110 (e.g., any of the labeling agents described herein) is attached (either directly (e.g., covalently) or indirectly) to the reporter oligonucleotide 1140. Reporter oligonucleotide 1140 may comprise barcode sequence 1142 identifying marker 1110. Reporter oligonucleotide 1140 may also comprise one or more functional sequences useful for subsequent processing, such as an adapter sequence, a Unique Molecular Identifier (UMI) sequence, a sequencer-specific flow cell ligation sequence (such as P5, P7 or a portion of P5 or P7 sequence), a primer or primer binding sequence, a sequencing primer or primer binding sequence (such as R1, R2 or a portion of R1 or R2 sequence).
Referring to fig. 11, in some cases, reporter oligonucleotide 1140 conjugated to a labeling agent (e.g., 1110, 1120, 1130) comprises a primer sequence 1141, a barcode sequence identifying the labeling agent (e.g., 1110, 1120, 1130), and a functional sequence 1143. Functional sequence 1143 may be configured to hybridize to complementary sequences, such as those present on nucleic acid barcode molecule 1190 (not shown), such as those described elsewhere herein. In some cases, nucleic acid barcode molecules 1190 are attached to a support (e.g., a bead, such as a gel bead), such as those described elsewhere herein. For example, nucleic acid barcode molecule 1190 may be attached to a support via releasable bonds (e.g., including labile bonds), such as those described elsewhere herein. In some cases, reporter oligonucleotide 1140 comprises one or more additional functional sequences, such as those described above.
In some cases, the tagging agent 1110 is a protein or polypeptide (e.g., an antigen or a desired antigen) comprising a reporter oligonucleotide 1140. Reporter oligonucleotide 1140 comprises a barcode sequence 1142 that identifies polypeptide 1110 and can be used to infer, for example, the presence of a binding partner for polypeptide 1110 (i.e., a molecule or compound to which the polypeptide binds). In some cases, the tagging agent 1110 is a lipophilic moiety (e.g., cholesterol) comprising the reporter oligonucleotide 1140, wherein the lipophilic moiety is selected such that the tagging agent 1110 is integrated into a cell membrane or nucleus. Reporter oligonucleotide 1140 comprises a barcode sequence 1142 that identifies a lipophilic moiety 1110, which in some cases is used to label cells (e.g., cell populations, cell samples, etc.) for multiplex analysis, as described elsewhere herein. In some cases, the labeling agent is an antibody 1120 (or epitope-binding fragment thereof) comprising a reporter oligonucleotide 1140. Reporter oligonucleotide 1140 comprises a barcode sequence 1142 that identifies antibody 1120 and can be used to infer, for example, the presence of a target of antibody 1120 (i.e., a molecule or compound to which antibody 1120 binds). In other embodiments, the labeling agent 1130 comprises an MHC molecule 1131 with a peptide 1132 and a reporter oligonucleotide 1140 that identifies the peptide 1132. In some cases, MHC molecules are coupled to support 1133. In some cases, the support 1133 is streptavidin (e.g., MHC molecules 1131 may comprise biotin). In other embodiments, the support 1133 is a polysaccharide, such as dextran. In some cases, reporter oligonucleotide 1140 may be coupled directly or indirectly to MHC labeling agent 1130 in any suitable manner, such as to MCH molecule 1131, support 1133, or peptide 1132. In some embodiments, the labeling agent 1130 comprises a plurality of MHC molecules that can be coupled to a support (e.g., 1133), i.e., is an MHC multimer. There are many possible configurations of class I and/or class II MHC multimers that can be used with the compositions, methods, and systems disclosed herein, e.g., MHC tetramers, MHC pentamers (MHC assembled via coiled-coil domains, e.g., MHC MHC class I pentamer (promimune, ltd.)), MHC octamer, MHC dodecamer, MHC-decorated dextran molecules (e.g., MHC +.>(Immudex)), and the like. For a description of exemplary labeling agents (including antibody and MHC-based labeling agents), reporter oligonucleotides, and methods of use, see, e.g., U.S. patent No. 10,550,429, U.S. 10,954,562, U.S. patent publication No. 20190367969, and U.S. patent application serial No. 63/135,514 filed on 1 month 8 of 2021, all of which are incorporated herein by reference in their entirety.
In some cases, the analysis of the one or more analytes (e.g., using the labeling agents described herein) includes a workflow as generally depicted in fig. 12A. For example, in some embodiments, the cells are contacted with a labeling agent 1210 (e.g., a polypeptide (e.g., an antigen), an antibody, or a pMHC molecule or complex) conjugated to one or more reporter oligonucleotides 1220 and optionally further processed prior to barcoding. The optional treatment steps may include one or more washing and/or cell sorting steps. In some cases, cells bound to a labeling agent 1210 (e.g., a polypeptide, antibody, or pMHC molecule or complex) conjugated to an oligonucleotide 1220 and a support 1230 (e.g., a bead, such as a gel bead) comprising a nucleic acid barcode molecule 1290 are partitioned into partitions (e.g., droplets of a droplet emulsion or wells of a microwell/nanopore array) among a plurality of partitions. In some cases, the partition comprises at most a single cell bound to the labeling agent 1210. In some embodiments, nucleic acid barcode molecule 1290 is attached to support 1230 via releasable bond 1240 (e.g., including labile bonds) as described elsewhere herein.
With continued reference to fig. 12A, in some cases, a reporter oligonucleotide 1220 conjugated to a tagging agent 1210 (e.g., a polypeptide, an antibody, a pMHC molecule such as an MHC multimer, etc.) comprises a first adaptor sequence 1211 (e.g., a primer sequence), a barcode sequence 1212 identifying the tagging agent 1210 (e.g., a peptide of a polypeptide, an antibody, or a pMHC molecule or complex), and an adaptor sequence 1213. The adaptor sequence 1213 may be configured to hybridize to a complementary sequence, such as complementary sequence 1223 present on a nucleic acid barcode molecule 1290, such as those described elsewhere herein. In some cases, nucleic acid barcode molecules 1290 are attached to a support 1230 (e.g., a bead, such as a gel bead), such as those described elsewhere herein. For example, nucleic acid barcode molecules 1290 can be attached to support 1230 via releasable bonds 1240 (e.g., including labile bonds), such as those described elsewhere herein. In some cases, oligonucleotide 1220 comprises one or more additional functional sequences, such as those described above.
In some cases, the analysis of multiple analytes (e.g., RNA and one or more analytes using the labeling agents described herein) includes a workflow as generally depicted in fig. 12A-C. The cells are contacted with a labeling agent and treated as generally described above and depicted in fig. 12A. For example, sequence 1213 can then be hybridized with complementary sequence 1223 (e.g., via a nucleic acid reaction, such as nucleic acid extension or ligation) to generate a barcoded nucleic acid molecule comprising cellular (e.g., partition-specific) barcode sequence 1222 (or its reverse complement) and reporter barcode sequence 1212 (or its reverse complement). Referring to fig. 12B-C, in some cases, nucleic acid molecules (such as RNA molecules) derived from cells can be similarly processed to supplement the cell (e.g., partition-specific) barcode sequences 1222 to these molecules or derivatives thereof (e.g., cDNA molecules). For example, referring to fig. 12B, in some embodiments, primer 1250 comprises a sequence that is complementary to a sequence of RNA molecule 1260 (such as an RNA encoding a BCR sequence) from a cell. In some cases, primer 1250 comprises one or more adaptor sequences 1251 that are not complementary to RNA molecule 1260. In some cases, primer 1250 comprises a poly-T sequence. In some cases, primer 1250 comprises a sequence that is complementary to a target sequence in an RNA molecule. In some cases, primer 1250 comprises a sequence that is complementary to a region of an immune molecule (such as a constant region of a TCR or BCR sequence). Primer 1250 hybridizes to RNA molecule 1260 and generates cDNA molecule 1270 in a reverse transcription reaction. In some cases, the reverse transcriptase is selected such that several non-template bases 1280 (e.g., poly-C sequences) are added to the cDNA. Nucleic acid barcode molecule 1290 comprises a sequence 1224 complementary to a non-template base, and reverse transcriptase performs a template switching reaction on nucleic acid barcode molecule 1290 to generate a barcoded nucleic acid molecule comprising cellular (e.g., partition specific) barcode sequence 1222 (or reverse complement thereof) and cDNA sequence 1270 (or a portion thereof). In another example, referring to fig. 12C, in some embodiments, nucleic acid barcode molecule 1290 comprises a sequence 1223 that is complementary to the sequence of RNA molecule 1260 from a cell. In some cases, sequence 1223 comprises a sequence specific for an RNA molecule. In some cases, sequence 1223 comprises a poly-T sequence. In some cases, sequence 1223 comprises a sequence specific for an RNA molecule. In some cases, sequence 1223 comprises a sequence complementary to a region of an immune molecule (such as a constant region of a TCR or BCR sequence). Sequence 1223 hybridizes to RNA molecule 1260 and generates cDNA molecule 1270 in a reverse transcription reaction, thereby generating a barcoded nucleic acid molecule comprising cellular (e.g., partition specific) barcode sequence 1222 (or reverse complement thereof) and cDNA sequence 1270 (or a portion thereof). The barcoded nucleic acid molecules can then optionally be processed as described elsewhere herein, for example, to amplify the molecules and/or to supplement the sequencing platform specific sequences to the fragments. See, for example, U.S. patent publication 20180105808, which is hereby incorporated by reference in its entirety. The barcoded nucleic acid molecules or derivatives generated therefrom can then be sequenced on a suitable sequencing platform.
In some embodiments, the analysis of multiple analytes (e.g., RNA and one or more analytes using a labeling agent described herein) includes a workflow as generally depicted in fig. 13A-C. For example, in some embodiments, the cells are contacted with a labeling agent 1210 (e.g., a polypeptide, antibody, or pMHC molecule or complex) conjugated to one or more reporter oligonucleotides 1220 and optionally further processed prior to barcoding. The optional treatment steps may include one or more washing and/or cell sorting steps. In some cases, cells that bind to a labeling agent 1210 (e.g., a polypeptide (e.g., an antigen), an antibody, or a pMHC molecule or complex) conjugated to an oligonucleotide 1220 and a support 1330 (e.g., a bead, such as a gel bead) comprising nucleic acid barcode molecules 1310 and 1320 having a common barcode sequence 1314 are partitioned into partitions (e.g., droplets of a droplet emulsion or pores of a microwell/nanopore array) among a plurality of partitions. In some cases, the partition comprises at most a single cell bound to the labeling agent 1210. In some embodiments, nucleic acid barcode molecules 1310 and 1320 are attached to support 1230 via releasable bonds 1340 (e.g., including labile bonds) as described elsewhere herein. The nucleic acid barcode molecule 1310 may comprise an adaptor sequence 1311, a barcode sequence 1312, and an adaptor sequence 1313. The nucleic acid barcode molecule 1320 may comprise an adapter sequence 1321, a barcode sequence 1312, and an adapter sequence 1323, wherein the adapter sequence 1323 comprises a different sequence than the adapter sequence 1313. In some cases, the adapter 1311 and the adapter 1321 comprise the same sequence. In some cases, the adapter 1311 and the adapter 1321 comprise different sequences. Although support 1330 is shown to contain nucleic acid barcode molecules 1310 and 1320, any suitable number of barcode molecules that contain a common barcode sequence 1312 are contemplated herein. For example, in some embodiments, support 1330 further comprises a nucleic acid barcode molecule 1350. The nucleic acid barcode molecule 1350 can comprise an adaptor sequence 1351, a barcode sequence 1312, and an adaptor sequence 1353, wherein the adaptor sequence 1353 comprises a different sequence than the adaptor sequences 1313 and 1323. In some cases, the nucleic acid barcode molecule (e.g., 1310, 1320, 1550) comprises one or more additional functional sequences, such as UMI or other sequences described herein.
After separation, referring to fig. 13B, in some embodiments, sequence 1213 is hybridized with a complementary sequence 1313 of nucleic acid barcode molecule 1310 to generate (e.g., via a nucleic acid reaction, such as nucleic acid extension or ligation) a barcoded nucleic acid molecule comprising cellular (e.g., partition-specific) barcode sequence 1312 (or its reverse complement) and reporter barcode sequence 1212 (or its reverse complement). Nucleic acid molecules (such as RNA molecules) derived from cells can be similarly processed to supplement the cell (e.g., partition-specific) barcode sequences 1312 to these molecules or derivatives thereof (e.g., cDNA molecules). For example, referring to fig. 13C, in some embodiments, nucleic acid barcode molecule 1320 comprises sequence 1323 that is complementary to the sequence of RNA molecule 1260 from the cell. In some cases, sequence 1323 comprises a poly-T sequence. In other cases, sequence 1323 comprises a sequence that is complementary to a target sequence in an RNA molecule. In some cases, sequence 1323 comprises a sequence that is complementary to a region of an immune molecule (such as a constant region of a TCR or BCR sequence). Sequence 1323 hybridizes to RNA molecule 1260 and generates a barcoded cDNA molecule in a reverse transcription reaction that comprises a cellular (e.g., partition specific) barcode sequence 1323 (or its reverse complement) and a cDNA sequence corresponding to mRNA1260 (or a portion thereof). The barcoded nucleic acid molecules can then optionally be processed as described elsewhere herein, for example, to amplify the molecules and/or to supplement the sequencing platform specific sequences to the fragments. See, for example, U.S. patent publication 20180105808, which is hereby incorporated by reference in its entirety. The barcoded nucleic acid molecules or derivatives generated therefrom can then be sequenced on a suitable sequencing platform. Nucleic acid sequences of interest can be identified from the sequence data. Such nucleic acid sequences of interest may be enriched from barcoded nucleic acid molecules or derivatives generated therefrom according to the methods disclosed herein.
Computer system
The present disclosure provides a computer system programmed to implement the methods of the present disclosure. FIG. 14 illustrates a computer system 1401 that is programmed or otherwise configured to (i) design nucleic acid primers as described herein, control amplification reactions as provided herein, perform cloning and/or expression of a nucleic acid sequence of interest and/or a protein product of a nucleic acid sequence of interest as provided herein, or analyze a protein product of a nucleic acid sequence of interest as provided herein. The computer system 1401 can adjust various aspects of the disclosure, such as the amount of primers, buffers, nucleic acid, or other reagents added to the amplification reaction, the thermal cycling of the amplification reaction, conditions for introducing the enriched nucleic acid sequence of interest into a vector, conditions for expressing a protein product of the nucleic acid sequence of interest, and/or provide reagents for experiments and/or adjust conditions to analyze a protein product of the nucleic acid sequence of interest. The computer system 1401 may be a user's electronic device or a computer system located at a remote location relative to the electronic device. The electronic device may be a mobile electronic device.
The computer system 1401 includes a central processing unit (CPU, also referred to herein as a "processor" and a "computer processor") 1405, which may be a single-core processor or a multi-core processor, or a plurality of processors for parallel processing. The computer system 1401 also includes memory or memory locations 1410 (e.g., random access memory, read only memory, flash memory), an electronic storage unit 1415 (e.g., a hard disk), a communication interface 1420 (e.g., a network adapter) for communicating with one or more other systems, and peripheral devices 1425 such as cache, other memory, data storage, and/or electronic display adapters. The memory 1410, the storage unit 1415, the interface 1420, and the peripheral 1425 communicate with the CPU 1405 through a communication bus (solid line) such as a motherboard. The storage unit 1415 may be a data storage unit (or data repository) for storing data. The computer system 1401 may be operatively coupled to a computer network ("network") 1430 by means of a communication interface 1420. The network 1430 may be an intranet, the internet, and/or an extranet, or an intranet and/or an extranet in communication with the internet. Network 1430 is in some cases a telecommunications network and/or a data network. Network 1430 may include one or more computer servers that may support distributed computing, such as cloud computing. The network 1430 may in some cases implement a point-to-point network with the aid of the computer system 1401, which may enable devices coupled to the computer system 1401 to function as clients or servers.
The CPU 1405 may execute a sequence of machine-readable instructions, which may be embodied in a program or software. The instructions may be stored in a memory location, such as memory 1410. Instructions may be directed to CPU 1405, which may then program or otherwise configure CPU 1405 to implement the methods of the present disclosure. Examples of operations performed by CPU 1405 may include fetching, decoding, executing, and writing back.
CPU 1405 may be part of a circuit such as an integrated circuit. One or more other components of system 1401 may be included in the circuit. In some cases, the circuit is an Application Specific Integrated Circuit (ASIC).
The storage unit 1415 may store files such as drivers, libraries, and saved programs. The storage unit 1415 may store user data such as user preferences and user programs. In some cases, the computer system 1401 may include one or more additional data storage units located outside the computer system 1401, such as on a remote server in communication with the computer system 1401 via an intranet or the Internet.
The computer system 1401 may communicate with one or more remote computer systems over a network 1430. For example, the computer system 1401 may communicate with a remote computer system of a user (e.g., an operator). Examples of remote computer systems include personal computers (e.g., portable PCs), tablet PCs or tablet PCs (e.g., iPad、/>Galaxy Tab), phone, smart phone (e.g.)>iPhone, android enabled device, +.>) Or a personal digital assistant. A user may access the computer system 1401 via the network 1430.
The methods as described herein may be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 1401, such as on the memory 1410 or the electronic storage unit 1415. The machine executable code or machine readable code may be provided in the form of software. During use, the code may be executed by the processor 1405. In some cases, the code may be retrieved from the storage unit 1415 and stored on the memory 1410 for ready access by the processor 1405. In some cases, electronic storage 1415 may be eliminated and machine executable instructions stored on memory 1410.
The code may be pre-compiled and configured for use with a machine having a processor adapted to execute the code, or may be compiled during runtime. The code may be provided in a programming language that is selectable to enable execution of the code in a pre-compiled or compiled manner.
Aspects of the systems and methods provided herein, such as the computer system 1401, may be embodied in programming. Aspects of the technology may be considered an "article of manufacture" or "article of manufacture" which is typically in the form of machine-executable code and/or associated data carried on or embodied in one type of machine-readable medium. The machine executable code may be stored on an electronic storage unit, such as a memory (e.g., read only memory, random access memory, flash memory) or a hard disk. A "storage" medium may include any or all of the tangible memory of a computer, processor, etc., or its associated modules, such as various semiconductor memories, tape drives, disk drives, etc., which may provide non-transitory storage for software programming at any time. All or part of the software may sometimes communicate over the internet or various other telecommunications networks. Such communication may enable, for example, software to be loaded from one computer or processor into another computer or processor, for example, from a management server or host into a computer platform of an application server. Thus, another type of medium that may carry software elements includes light waves, electric waves, and electromagnetic waves, such as those used across physical interfaces between local devices, through wired and optical landline networks, and through various air links. Physical elements carrying such waves, such as wired or wireless links, optical links, etc., may also be considered as media carrying software. As used herein, unless limited to a non-transitory tangible "storage" medium, terms, such as computer or machine "readable medium," refer to any medium that participates in providing instructions to a processor for execution.
Thus, a machine-readable medium, such as computer-executable code, may take many forms, including but not limited to, tangible storage media, carrier wave media, or physical transmission media. Nonvolatile storage media includes, for example, optical or magnetic disks, any storage devices, etc., such as in any computer, such as those shown in the accompanying drawings, which may be used to implement a database. Volatile storage media include dynamic memory, such as the main memory of such a computer platform. Tangible transmission media include coaxial cables, copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier wave transmission media can take the form of electrical or electromagnetic signals, or acoustic or light waves, such as those generated during Radio Frequency (RF) and Infrared (IR) data communications. Thus, common forms of computer-readable media include, for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, or DVD-ROM, any other optical medium, punch cards, paper tape, any other physical storage medium with patterns of holes, RAM, ROM, PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, a cable or link transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
The computer system 1401 may include or be in communication with an electronic display 1435 that includes a User Interface (UI) 1440 for providing, for example, an enrichment output, an analysis of protein products of a nucleic acid sequence of interest, and the like. Examples of UIs include, but are not limited to, graphical User Interfaces (GUIs) and Web-based user interfaces.
The methods and systems of the present disclosure may be implemented by one or more algorithms. The algorithm may be implemented by software when executed by the central processing unit 1405. The algorithm may, for example, control enrichment of the nucleic acid sequence of interest, control cloning of the nucleic acid sequence of interest, and/or access or analysis of the protein product of the nucleic acid sequence of interest.
The devices, systems, compositions, and methods of the present disclosure can be used in a variety of applications, such as processing a single analyte (e.g., RNA, DNA, or protein) or multiple analytes (e.g., DNA and RNA, DNA and protein, RNA and protein, or RNA, DNA and protein) from a single cell. For example, a biological particle (e.g., a cell or cell bead) is partitioned in a partition (e.g., a droplet) and multiple analytes from the biological particle are processed for subsequent processing. The plurality of analytes may be from a single cell. This can allow, for example, simultaneous proteomics, transcriptomics and genomic analysis of cells.
Examples
Example 1: generation of cDNA library of antibodies
Libraries of cDNA molecules encoding a variety of antibodies can be created. Multiple B cells expressing an antibody may be sequestered, for example, in partitions, where each partition has no more than one cell. The cells can be lysed or permeabilized, and the nucleic acid (e.g., RNA) encoding the antibody can be isolated and labeled with an identification sequence comprising a barcode, a template switching oligonucleotide, and a unique molecular identifier as described, for example, in fig. 12B or fig. 12C. The identification sequence may be used to identify the sample from which the given nucleic acid is derived. The RNA may be reverse transcribed to produce cDNA, and the cDNA may be pooled to produce a cDNA library.
Example 2: amplification of target cDNA of cDNA library
The cDNA of a cDNA library (e.g., a full transcriptome barcoded gene expression library) can be identified as the cDNA of interest. For example, the cDNA may be identified as corresponding to an antibody having the desired activity, or may specifically bind to or neutralize the antigen (e.g., using a labeling agent as described elsewhere herein).
The cDNA can be enriched using a PCR protocol. For example, a cDNA library can be incubated with a primer pair, a polymerase, nucleotides, and a buffer. The primer pair may comprise a first primer that is at least partially complementary to a barcode of one of the cDNA libraries, and a second primer that is at least partially complementary to a sequence that is complementary to a sequence downstream of the barcode (e.g., in a constant region). The cDNA may be subjected to thermal cycling between 15 cycles and 40 cycles until the cDNA is enriched.
Example 3: further enrichment of target cDNA
The cDNA may be further enriched from the library, for example, to increase the abundance of the sequence of interest compared to other cDNA molecules in the library and/or the enriched product of example 2.
Further enrichment may be performed using a second PCR reaction. For example, the enriched cDNA from example 2 can be incubated with a primer pair, a polymerase, nucleotides, and a buffer. The primer pair may comprise a first primer that is at least partially complementary to a V (D) J region of one of the cDNA libraries, and a second primer that is at least partially complementary to a sequence downstream (e.g., in a constant region) of the V (D) J sequence. The cDNA may be subjected to thermal cycling between 15 cycles and 40 cycles until the cDNA is enriched.
Example 4: nested PCR
A dual (e.g., nested) PCR strategy can be employed to enrich for nucleic acid sequences of interest. An example of a nested PCR protocol is shown in FIG. 10. In this example, a cDNA molecule of interest is shown that comprises a first read sequence, a barcode sequence (identified as a "10X barcode sequence," which may be a partition specific barcode), a unique molecular identifier sequence (UMI), a Template Switching Oligonucleotide (TSO), a V sequence, a D sequence, a J sequence, a constant (C) sequence, and a second read sequence. Primers can be designed to enrich for the sequences of antibodies (i.e., V, D, J and C sequences) using a dual PCR strategy. This dual PCR strategy can employ a first enrichment step and a second enrichment step. The external F (forward) primer and the external R (reverse) primer may be primers for the first PCR enrichment step to enrich one of the plurality of cDNA molecules comprising the barcode. The outer F primer may comprise a sequence complementary to the barcode and the outer R primer may comprise a sequence complementary to the second read sequence. The inner F (forward) primer and the inner R (reverse) primer may be used in the second enrichment step to further enrich the product of the first enrichment step. The internal F primer may be complementary to the V sequence and optionally be part of the TSO sequence. The internal R primer may be complementary to the C sequence and the J sequence. In some cases, the internal F primer and the internal R primer may include non-binding handles that may allow cloning into a vector or pairing of sequences (e.g., using overlap extension).
An example of a primer design scheme for the first enrichment step and the second enrichment step is provided in FIG. 15. Here, the primer shown for the first enrichment step may be used for the first PCR reaction, and the primer shown for the second enrichment step may be used for the second PCR reaction.
Example 5: a clonable sequence is generated.
The sequence of the nucleic acid sequence of interest can be extracted to produce a clonable sequence. For example, the primers can be designed to generate a clonable sequence (e.g., a sequence encoding an amino acid fragment) from an enriched cDNA (e.g., from the enriched library of example 4). An example of a primer design that can produce a clonable sequence is provided in FIG. 16. This can be accomplished, for example, by using a forward primer specific for the V gene (e.g., specific for the V sequence) and a reverse primer specific for the constant sequence. The resulting nucleic acid molecules shown in the bottom panel may be cloned into vectors for expression or analysis. The expression vector may be configured to comprise a constant region sequence (or a portion thereof) such that when cloned into the expression vector, the enriched V (D) J molecules, such as paired antibody light and heavy chains, may be expressed as a fully functional immune molecule (e.g., comprising an intact constant region).
Example 6: cloning the enriched nucleic acid sequence.
B cells (e.g., single B cells) can be captured (e.g., separated together with barcoded beads), for example, using the techniques provided herein. The interior of the cell may be accessed, for example, by lysing or permeabilizing the cell, and the RNA of the cell may be reverse transcribed to generate barcoded cDNA from the RNA sequence. See, for example, fig. 12B or fig. 13C and accompanying text. This may be performed, for example, by 2 rounds of targeted amplification; the first amplified or second amplified or non-fragmented full-length cDNA can be used in the following steps. In some embodiments, the partition may comprise a cell barcode and a TSO sequence. In some embodiments, the partition may comprise a cell barcode, a UMI sequence as provided herein, and a TSO sequence as provided herein.
The resulting nucleic acid sequence (e.g., full length or fragment thereof) can be sequenced. Sequencing can produce one or more paired heavy and light chain sequences (e.g., heavy and light chain sequence pairs) associated with a specific cell barcode. Some of the input cdnas subjected to targeted amplification may be saved for later use (e.g., for capture of specific input cdnas or other uses).
One or more probes can be designed to target one or more V (D) J-junction regions, which can comprise a highly unique nucleotide sequence of 60-150 base pairs in length. See, for example, fig. 19. Similarly, one or more probes may be designed to target a corresponding cellular barcode, or a cellular barcode and a selected UMI sequence. These probes can be captured, for example, using a streptavidin/biotin acylation method, where the probes can be annealed to the cDNA and fragments that are not annealed to the probes can be washed away. In some embodiments, other suitable capture techniques may also be employed. In some cases, the probe may be fluorescent, which may enable droplet sorting. In some such cases, the addition of probe reagents and annealing of existing nucleic acids into droplets can enable selection of droplets of interest for further amplification or cloning. In some cases, a hydrogel may be selectively formed in the droplet containing the probe of interest. Such hydrogels can be used as part of the enrichment step. In addition to or in the absence of the linker specific probes, the probes may be used to target specific V genes or J genes.
Using the retained DNA of interest (containing the sequence to which the probe successfully anneals), specific heavy and light chains can be amplified, for example, by one or more rounds of PCR or linear amplification. Amplification may include targeting one or more of the cell barcode, UMI, 5' utr, and leader sequence using forward primers; cell bar codes, UMI, 5' utr; cell bar code, UMI; one or more of a cellular barcode and a 5' utr or a region of a V gene (such as a framework region) and targeting a constant region to a targeting antibody or a combination thereof using a reverse primer. Primers may contain overlapping extension linkers to physically link the targeted heavy and light chains, or introduce restriction sites or Gibson assembly sites to optimize cloning.
In some cases, for example if the number of antibodies targeted is large, a unique set of overlapping extension or linker molecules can be designed in a plate-based reaction. Such overlapping extension or linker molecules may be used to introduce clone-specific molecular tags.
Example 7: specific enrichment of BCR sequences from pooled cDNA libraries
Sample selection: a library of BCR-enriched products from cells selected by PE+/APC+ gating was used as template in the nested PCR reaction. Two negative controls were included to verify product specificity: BCR enriched product in antigen negative cells from the same donor as the sample used to generate the target clone, and BCR enriched product from purified B cells from a different donor.
Clone selection for amplification:
nested PCR reactions were performed on BCR enrichment products (and negative controls) to enrich the library for sequences, e.g., antibody sequences, of the four antigen-specific clonotypes. Antigen-specific clonotypes are selected to fall into one of four categories: (1) An extended clonotype with multiple distinct subclones (clone a); (2) Extended clonotypes with a single unique subclone (clone B); (3) A single cell clonotype with many potent UMIs (clone C); and (4) a single cell clonotype with little effective UMI (clone D).
Primer design:
commercially available software (Geneious Prime, primer 3) was used to generate primer sequences without typical sequence drawbacks such as hairpin Tm, self-dimer Tm and paired dimer Tm. Primers for the nested PCR reaction were designed to target: (1) In an external reaction, the cell barcode and UMI (forward primer) and isotype and J region (reverse primer); and (2) in an internal reaction, a leader peptide or FWR1 (forward primer) and CDR 3/linker (potentially extending into the J region; reverse primer), if necessary. The primer pairs are selected based on the compatibility of the inner pair and the outer pair.
The default setting of Geneious Prime 2021.1.1 using primer3 Tm settings is as described by Santa Lucia et al (1998) and the salt correction settings are as described by Owczarzy et al (2004). Monovalent salt (monovalent), divalent salt (divvalent), oligonucleotide (oligo) and dNTP concentrations were set to 50mM, 1.5mM, 50nM and 0.6mM, respectively. Each primer allowed a minimum size of 18 nucleotides, a maximum size of 27 nucleotides, and an optimal length of 20 nucleotides. The minimum Tm, maximum Tm and optimum Tm of each primer are set to 57 ℃, 60 ℃ and 63 ℃. The minimum, maximum and optimum values of the GC% content allowed were set to 20%, 80% and 50%. The maximum allowable dimer Tm is 47 ℃. The maximum allowable Tm difference is 100 ℃.
Amplification reaction:
the external primer sequences targeting the cell barcode +UMI and the framework 4 region were used for the first round of PCR for each of the 4 clones described above. Each amplification reaction contained 10nM of each primer in a total volume of 100uL, 1uL of BCR-enriched product, 25uM betaine (to increase polymerase continuous synthesis capacity (processivity)) and 50uL of 2 Xhot start high fidelity PCR master mix. These materials were amplified for a total of 10 cycles with an appropriate annealing temperature (51-54 ℃) and 1 minute 72℃extension for the primer pair used. The reaction was purged with 0.6XSPRIselect and the entire volume was placed into the second PCR. The reaction contained 10nM of each inner primer (targeting leader and backbone 4/constant region), product amplified from the first PCR, 25uM betaine and 50uL 2X hot start high fidelity PCR master mix in a total volume of 100 uL. These materials were amplified for a total of 10 cycles with an appropriate annealing temperature (54 ℃) and 1 minute 72℃extension for the primer pair used. The reaction was purged using 0.6X spiselect.
Nested PCR products were run on BioA and/or Labchip to assess product size and specificity. This procedure confirmed the specific products in antigen-positive B cells only of clones B (fig. 21B), C (fig. 21C) and D (fig. 21D). The product of clone C varied to a greater extent in size and also appeared in the negative control, indicating a more non-specific amplification of the clone. The results also show several products of clone a, considering that clonotype a is associated with multiple distinct subclones, as expected (fig. 21A).
In addition, a second step of single-step PCR or nested PCR using the conditions and primers was performed and analyzed as before. Significantly more non-specific products were observed for all clones tested, confirming the advantage of the nested approach in terms of improved specificity. See, for example, fig. 20A and B.
Sequencing:
finally, full length enriched clone sequences were converted to a sequencing library using a Prism library preparation kit commercially available from IDT and sequenced on a paired-end 300bp MiSeq run to determine sequence purity.
FIG. 22 shows the sequencing results of enrichment of the product after nested amplification of a nucleic acid sequence of interest (e.g., a target nucleic acid sequence encoding a fragment of BCR produced by clone A (an expanded clonotype with multiple subclones)) from a pooled barcoded cDNA library when the forward external primer lacks sufficient specificity. The consensus region from positions 254-284 depicts the cell barcode + UMI region (indicated by circling) targeted by the forward outer primer. As shown, the consensus sequence of cellular barcode + UMI has several variant positions, indicating poor specificity of the forward external primer pair for the selected barcode/UMI combination. The results indicate recovery of off-target sequences due to off-target binding of cDNA library members with multiple cell barcode/UMI combinations.
FIG. 23 shows the sequencing results of enrichment of the product after nested amplification of the nucleic acid sequence of interest (e.g., target nucleic acid sequence encoding a fragment of BCR produced by clone C (a single cell clone with many potent UMI)) from a pooled barcoded cDNA library when the forward external primer lacks sufficient specificity. As shown (circled in consensus), the cell barcode+umi region lacks largely the consensus sequence, indicating poor specificity of the forward external primer pair for the selected barcode/UMI combination. The consensus sequence of the BCR sequence of interest has two variant positions in the CDR3 region, which indicates recovery of off-target sequences due to binding of cDNA library members with multiple cell barcode/UMI combinations.
Similarly, sequencing results of the enriched product generated from clone D (single cell clone with little effective UMI) after nested amplification from the pooled barcoded cDNA library indicated that the forward external primer lacked specificity for the cell barcode + UMI combination, and off-target sequences were recovered (data not shown).
FIG. 24 shows the sequencing results of enrichment of the product after nested amplification of the nucleic acid sequence of interest (e.g., target nucleic acid sequence encoding a fragment of BCR produced by clone B (expanded clonotype with single unique subclones)) from a pooled barcoded cDNA library when the forward external primer binds to the cell barcode and UMI with sufficient specificity. As shown (circled in the consensus sequence), the consensus sequence of the cell barcode + UMI region has a single variant position, confirming that the forward outer primer binds to the cell barcode and UMI with sufficient specificity. Also as shown, the consensus sequence of the BCR fragment has no variant positions, indicating successful recovery of the full sequence of interest from the barcoded cDNA library using the nested amplification method when the forward external primer binds to the cell barcode and UMI with sufficient specificity.
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited to the specific examples provided within the specification. While the invention has been described with reference to the foregoing specification, the descriptions and illustrations of the embodiments herein are not intended to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it should be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the present invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims (125)

1. A method for enriching a nucleic acid sequence of interest, comprising:
(a) Providing a plurality of nucleic acid molecules comprising a plurality of identification sequences, wherein a nucleic acid molecule of the plurality of nucleic acid molecules comprises:
(i) An identification sequence of said plurality of identification sequences identifying said nucleic acid molecule,
wherein the identification sequence comprises a barcode sequence, a Template Switching Oligonucleotide (TSO) sequence, and/or a Unique Molecular Identifier (UMI) sequence; and
(ii) The nucleic acid sequence of interest is described in,
wherein the nucleic acid sequence of interest comprises a nucleic acid sequence encoding a T Cell Receptor (TCR), a B Cell Receptor (BCR), or a fragment thereof;
(b) Amplifying the nucleic acid sequence of interest in a first round of amplification using a first nucleic acid primer complementary to at least a portion of the identification sequence;
(c) A second nucleic acid primer complementary to at least a portion of the V (D) J sequence is used in a second round of amplification to further amplify the nucleic acid sequence of interest.
2. The method of claim 1, wherein the identification sequence comprises the barcode sequence.
3. The method of claim 1 or 2, wherein the identification sequence comprises the UMI sequence.
4. The method of claim 1 or claim 3, wherein the first nucleic acid primer comprises a sequence complementary to at least a portion of the UMI sequence.
5. The method of any preceding claim, wherein the first nucleic acid primer comprises a sequence complementary to at least a portion of the barcode sequence.
6. The method of any preceding claim, wherein the first round of amplification further comprises using a third primer comprising a sequence complementary to at least a portion of the complement of a nucleic acid sequence encoding a constant region of the TCR, BCR, or fragment thereof.
7. The method of any preceding claim, wherein the first round of amplification further comprises using a third primer comprising a sequence complementary to at least a portion of a sequence complementary to a nucleic acid sequence encoding a constant region of the BCR or fragment thereof.
8. The method of any preceding claim, wherein the second round of amplification further comprises using a fourth primer comprising a sequence complementary to at least a portion of the complement of a nucleic acid sequence encoding a sequence downstream of the V (D) J sequence.
9. The method of any preceding claim, wherein the second primer comprises a non-binding handle.
10. The method of claim 8 or claim 9, wherein the fourth primer comprises a non-binding handle.
11. The method of any preceding claim, wherein the plurality of provided nucleic acid molecules comprises complementary deoxyribonucleic acid (cDNA) molecules.
12. The method of any preceding claim, wherein the nucleic acid sequence of interest encodes the BCR or fragment thereof.
13. A method for enriching a nucleic acid sequence of interest, comprising:
(a) Providing a plurality of nucleic acid molecules comprising a plurality of identification sequences, wherein a nucleic acid molecule of the plurality of nucleic acid molecules comprises (i) an identification sequence of the plurality of identification sequences that identifies the nucleic acid molecule and (ii) the nucleic acid sequence of interest; and
(b) Amplifying the nucleic acid sequence of interest using a nucleic acid primer complementary to the identification sequence, thereby enriching the nucleic acid sequence of interest.
14. The method of claim 13, wherein the providing comprises generating the plurality of nucleic acid molecules comprising the plurality of identification sequences that identify the plurality of nucleic acid molecules.
15. The method of claim 13, wherein the plurality of nucleic acid molecules corresponds to a plurality of cell surface proteins from the plurality of cells.
16. The method of claim 13, wherein the plurality of cells comprises a plurality of T cells.
17. The method of claim 15, wherein the plurality of cell surface proteins comprises a plurality of T cell receptors.
18. The method of claim 15, wherein the plurality of cells comprises a plurality of B cells.
19. The method of claim 18, wherein the plurality of cell surface proteins comprises a plurality of B cell receptors.
20. The method of claim 13, wherein the plurality of nucleic acid molecules comprises a library of nucleic acid molecules encoding a plurality of variants of an amino acid sequence.
21. The method of claim 13, wherein the plurality of variants of the amino acid sequence are variants of a T cell receptor.
22. The method of claim 13, wherein the plurality of variants of the amino acid sequence are variants of an antibody or antigen binding fragment thereof.
23. The method of claim 13, wherein the plurality of nucleic acid molecules comprises complementary deoxyribonucleic acid (cDNA) molecules.
24. The method of claim 13, wherein the nucleic acid sequence of interest comprises a nucleic acid sequence encoding a T cell receptor or fragment thereof.
25. The method of claim 13, wherein the nucleic acid sequence of interest comprises a nucleic acid sequence encoding an antibody or antigen-binding fragment thereof.
26. The method of claim 13 or claim 25, wherein the nucleic acid sequence of interest comprises a nucleic acid sequence encoding a V (D) J sequence.
27. The method of claim 13, wherein the identification sequence comprises a barcode sequence.
28. The method of claim 27, wherein the nucleic acid sequence complementary to the identification sequence is complementary to at least a portion of the barcode sequence.
29. The method of claim 27, wherein the nucleic acid sequence complementary to the identification sequence is complementary to the barcode sequence and a read sequence of the nucleic acid sequence of interest.
30. The method of claim 13, wherein the identification sequence comprises a template switching oligonucleotide sequence.
31. The method of claim 30, wherein the nucleic acid sequence complementary to the identification sequence is complementary to at least a portion of the template switch oligonucleotide sequence.
32. The method of claim 13, wherein the identification sequence comprises a unique molecular identifier sequence.
33. The method of claim 32, wherein the nucleic acid sequence complementary to the identification sequence is complementary to at least a portion of the unique molecular identifier sequence.
34. The method of claim 13, wherein the nucleic acid primer further comprises a nucleic acid sequence complementary to a portion of the coding sequence.
35. The method of claim 26, wherein the nucleic acid primer further comprises a nucleic acid sequence complementary to at least a portion of the V (D) J sequence.
36. The method of claim 35, wherein the nucleic acid sequence complementary to at least a portion of the V (D) J sequence is complementary to a V sequence of the V (D) J sequence.
37. The method of claim 13, wherein the nucleic acid primer further comprises a non-binding handle.
38. The method of claim 13 or claim 37, further comprising amplifying the nucleic acid sequence of interest using another nucleic acid primer, wherein the other nucleic acid primer is different from the nucleic acid primer.
39. The method of claim 38, wherein the another nucleic acid primer comprises a non-binding handle.
40. The method of claim 38, wherein the nucleic acid primer and the another nucleic acid primer are configured to anneal to sequences flanking at least a portion of the nucleic acid sequence of interest.
41. The method of claim 13, further comprising a polymerase chain reaction to further enrich the nucleic acid sequences of the plurality of nucleic acid molecules.
42. The method of claim 13, wherein no other nucleic acid molecules of the plurality of nucleic acid molecules are amplified.
43. The method of claim 13, wherein the nucleic acid sequence of interest is enriched at least 1000-fold, at least 10,000-fold, or at least 100,000-fold.
44. The method of claim 13, further comprising contacting the plurality of nucleic acid sequences with a second nucleic acid primer, wherein the nucleic acid primer comprises a nucleic acid sequence that is complementary to a binding sequence on a complementary sequence of the nucleic acid sequence of interest.
45. The method of claim 44, wherein the second nucleic acid primer further comprises a non-binding handle.
46. The method of claim 44, wherein the nucleic acid sequence of interest comprises a sequence encoding an antibody or antigen-binding fragment thereof.
47. The method of claim 46, wherein the second nucleic acid primer is complementary to at least a portion of the complement of a nucleic acid sequence encoding a V (D) J sequence of the antibody or antigen binding fragment thereof.
48. The method of claim 46, wherein the second nucleic acid primer is complementary to at least a portion of the complement of a nucleic acid sequence encoding the constant region of the antibody or antigen binding fragment thereof.
49. The method of claim 48, wherein the second nucleic acid primer is further complementary to at least a portion of the complement of a nucleic acid sequence encoding the J region of the antibody or antigen binding fragment thereof.
50. The method of claim 13, further comprising cloning the nucleic acid sequence of interest into a vector.
51. The method of claim 50, wherein the vector is selected from the group consisting of viral vectors, plasmids, phages, cosmids, and artificial chromosomes.
52. The method of claim 50, wherein the cloning comprises combining two or more nucleic acid sequences.
53. The method of claim 52, wherein the two or more nucleic acid sequences comprise a nucleic acid sequence of a heavy chain and a nucleic acid sequence of a light chain of an antibody or antigen binding fragment.
54. The method of claim 52, wherein the two or more nucleic acid sequences comprise a nucleic acid sequence of an alpha chain of a T cell receptor and a nucleic acid sequence of a beta chain of a T cell receptor.
55. The method of claim 50, wherein the vector comprises at least a portion of a constant region of a T cell receptor.
56. The method of claim 50, wherein the vector comprises at least a portion of the constant region of an antibody or antigen binding fragment thereof.
57. The method of claim 50, wherein the vector comprises a promoter.
58. The method of claim 13, further comprising determining the level of enrichment of the nucleic acid sequence of interest.
59. The method of claim 58, further comprising performing a second round of amplification to further enrich the nucleic acid sequence of interest.
60. The method of claim 59, comprising contacting the nucleic acid sequence of interest with a third primer and a fourth primer.
61. The method of claim 60, wherein the third primer is complementary to a portion of a barcode of the identification sequence.
62. The method of claim 61, wherein the third primer is complementary to the 5' end of the barcode of the identification sequence.
63. The method of claim 62, wherein the third primer is further complementary to a read sequence.
64. The method of claim 60, wherein the third primer is complementary to a portion of the identification sequence upstream of the barcode of the identification sequence.
65. The method of claim 60, wherein the fourth primer is complementary to at least a portion of the complementary sequence of the constant region of the nucleic acid sequence of interest.
66. The method of claim 13, further comprising performing fragmentation of the nucleic acid sequence of interest.
67. The method of claim 13, further comprising adding an a tail to the nucleic acid sequence of interest.
68. The method of claim 13, further comprising performing SI-PCR on the nucleic acid sequence of interest.
69. The method of claim 13, further comprising V (D) J enrichment of the nucleic acid sequence of interest.
70. The method of claim 13, wherein the nucleic acid sequence of interest comprises a restriction site.
71. A method, comprising:
enriching a nucleic acid sequence of interest based at least on at least a portion of a constant region of the nucleic acid sequence of interest, thereby producing an enriched nucleic acid sequence of interest; and
modifying the enriched nucleic acid sequence to produce a modified enriched nucleic acid sequence compatible with the vector.
72. The method of claim 71, wherein the enriching is performed using a first nucleic acid primer and a second nucleic acid primer.
73. The method of claim 72, wherein one of the first nucleic acid primer and the second nucleic acid primer is a backbone leading (FWR 1) primer.
74. The method of claim 72, wherein the first nucleic acid primer is at least complementary to a barcode or portion thereof on the nucleic acid sequence of interest.
75. The method of claim 72, wherein the first nucleic acid primer is complementary to at least a unique molecular recognition sequence on the nucleic acid sequence of interest or a portion thereof.
76. The method of claim 72, wherein the first nucleic acid primer is complementary to at least a 5 'untranslated region (5' utr) or a portion thereof on the nucleic acid sequence of interest.
77. The method of claim 72, wherein the second nucleic acid primer is complementary to at least a constant region or portion thereof on the nucleic acid sequence of interest.
78. The method of claim 72, wherein the second nucleic acid primer is at least complementary to a J sequence or portion thereof on the nucleic acid sequence of interest.
79. The method of claim 72, wherein the second nucleic acid primer is complementary to at least a nucleic acid sequence of a junction region or portion thereof on the nucleic acid sequence of interest.
80. The method of claim 71, wherein the enriching is performed using hybridization capture.
81. The method of claim 80, wherein the hybridization capture is based on hybridization of a primer to a ligation sequence on the nucleic acid sequence of interest.
82. The method of claim 81, wherein the linking sequence is a V (D) J sequence or a portion thereof.
83. The method of claim 72, wherein the nucleic acid primers are selected based on Rapid Amplification of CDNA Ends (RACE) sequencing.
84. The method of claim 71, wherein the nucleic acid sequence of interest comprises a complementary deoxyribonucleic acid (cDNA) molecule.
85. The method of claim 71, wherein the nucleic acid sequence of interest further comprises a barcode.
86. The method of claim 71, wherein the nucleic acid sequence of interest encodes at least a portion of a cell surface protein of a cell.
87. The method of claim 86, wherein the cell surface protein is a T cell receptor or fragment thereof.
88. The method of claim 86, wherein the cell surface protein is a B cell receptor or fragment thereof.
89. The method of claim 86, wherein the cell is a T cell.
90. The method of claim 86, wherein the cell is a B cell.
91. The method of claim 71, wherein the constant region of a nucleic acid sequence of interest comprises a nucleic acid sequence encoding a V (D) J sequence or a portion thereof.
92. The method of claim 91, wherein the moiety is a V sequence.
93. The method of claim 91, wherein the moiety is a J sequence.
94. The method of claim 71, wherein the modification comprises adding a Gibson terminus to the amplified nucleic acid sequence.
95. The method of claim 94, wherein the adding of the Gibson ends is performed using Polymerase Chain Reaction (PCR).
96. The method of claim 71, wherein the modification comprises combining a second nucleic acid of interest with the enriched nucleic acid of interest.
97. The method of claim 96, wherein the combining comprises ligating the enriched nucleic acid sequence of interest to the second nucleic acid sequence of interest using overlapping extension primers.
98. The method of claim 96, wherein the combining comprises ligating the second nucleic acid sequence of interest to the enriched nucleic acid sequence of interest using a nucleic acid linker.
99. The method of claim 96, wherein the second nucleic acid sequence of interest is enriched.
100. The method of claim 96, wherein the second nucleic acid sequence of interest encodes at least a portion of a cell surface protein of a cell.
101. The method of claim 100, wherein the cell surface protein is a T cell receptor or fragment thereof.
102. The method of claim 100, wherein the cell surface protein is a B cell receptor or fragment thereof.
103. The method of claim 100, wherein the cell is a T cell.
104. The method of claim 71, further comprising cloning the modified enriched nucleic acid sequence into the vector.
105. The method of claim 104, wherein the cloning comprises a vector restriction digest.
106. The method of claim 105, wherein the vector restriction digestion comprises digestion at an fspI restriction site.
107. The method of claim 71, wherein the vector comprises a native leader sequence.
108. A method for enriching a nucleic acid sequence of interest, comprising:
(a) Providing a plurality of nucleic acid molecules comprising a plurality of identification sequences, wherein a nucleic acid molecule of the plurality of nucleic acid molecules comprises:
(i) An identification sequence of said plurality of identification sequences identifying said nucleic acid molecule,
wherein the identification sequence comprises a barcode sequence and a Unique Molecular Identifier (UMI) sequence; and
(ii) The nucleic acid sequence of interest is described in,
wherein the nucleic acid sequence of interest comprises a nucleic acid sequence encoding a B Cell Receptor (BCR) or a fragment thereof;
(b) Performing a first amplification reaction using a first set of primers comprising:
a first primer comprising a sequence complementary to at least a portion of the barcode sequence and/or the UMI sequence, and
a second primer comprising a sequence complementary to at least a portion of the nucleic acid sequence of interest encoding a ligation (J) region and/or an isoform region of the BCR or fragment thereof;
(c) Performing a second amplification reaction using a second set of primers comprising:
a third primer comprising a sequence complementary to the leader sequence of the BCR or fragment thereof and/or a nucleotide encoding at least a portion of the framework region (FWR) 1 of the BCR or fragment thereof; and
a fourth primer comprising a sequence complementary to a sequence that is complementary to at least a portion of the nucleic acid sequence of interest encoding a complementarity region (CDR) 3, FWR4, J region, D region, and/or V region of the BCR or fragment thereof or a junction between any one or more thereof.
109. The method of claim 108, wherein the first primer comprises a sequence complementary to at least a portion of the barcode sequence and/or the UMI sequence.
110. The method of claim 108 or 109, wherein the first primer comprises a sequence complementary to the barcode sequence and/or the UMI sequence.
111. The method of any one of claims 108-110, wherein said second primer comprises a sequence complementary to said complementary sequence of said nucleic acid sequence of interest encoding at least a portion of said J region of said BCR or fragment thereof.
112. The method of any one of claims 108 to 110, wherein said second primer comprises a sequence complementary to said complementary sequence of said nucleic acid sequence of interest encoding at least a portion of said isoform region of said BCR or fragment thereof.
113. The method of any one of claims 108-112, wherein said second primer comprises a sequence complementary to said complementary sequence of said nucleic acid sequence of interest encoding said J region and at least a portion of said isoform region of said BCR or fragment thereof.
114. The method of any one of claims 108 to 113, wherein said third primer comprises a sequence complementary to a nucleotide of at least a portion of said leader sequence of said BCR or fragment thereof.
115. The method of any one of claims 108 to 113 wherein the third primer comprises a sequence complementary to a nucleotide encoding at least a portion of the FWR1 of the BCR or fragment thereof.
116. The method of any one of claims 108-115, wherein the fourth primer comprises a sequence complementary to the complementary sequence encoding the CDR3 and at least a portion of the nucleic acid sequence of interest extending into the junction in the J region of the BCR or fragment thereof.
117. The method of any one of claims 108-115, wherein said fourth primer comprises a sequence complementary to said complementary sequence of at least a portion of said nucleic acid sequence of interest encoding said D-region and said J-region of said BCR or fragment thereof or a junction between said D-region and said J-region.
118. The method of any one of claims 108-115, wherein said fourth primer comprises a sequence complementary to said complementary sequence of at least a portion of said nucleic acid sequence of interest encoding said V region and said J region or a junction between said V region and said J region of said BCR or fragment thereof.
119. The method of any one of claims 108-115, wherein said fourth primer comprises a sequence complementary to said complementary sequence of at least a portion of said nucleic acid sequence of interest encoding said V-region and said D-region of said BCR or fragment thereof or a junction between said V-region and said D-region.
120. The method of any one of claims 108-115, wherein said fourth primer comprises a sequence complementary to said complementary sequence of at least a portion of said nucleic acid sequence of interest encoding said V region, said D region and said J region of said BCR or fragment thereof.
121. The method of any one of claims 108-121 wherein the third primer and the fourth primer comprise non-binding handles.
122. The method of any one of claims 108-122, further comprising cloning the enriched nucleic acid sequence of interest in a vector.
123. The method of any preceding claim, wherein the plurality of nucleic acid molecules are prepared from a cell sample of one or more donors.
124. The method of claim 124, wherein the one or more donors have been exposed to a target antigen and wherein the plurality of nucleic acid molecules comprising the nucleic acid sequence of interest comprise a nucleic acid sequence encoding a selected B Cell Receptor (BCR) or fragment thereof that binds the target antigen.
125. The method of claim 124, wherein preparing the plurality of nucleic acid molecules comprising the nucleic acid sequence encoding the selected B Cell Receptor (BCR) or fragment thereof that binds the target antigen from the cell sample of the one or more donors comprises the steps of:
(a) Partitioning a reaction mixture into a plurality of partitions, wherein the reaction mixture comprises:
(i) A plurality of cells of the cell sample, and
(ii) The target antigen is a target antigen of a human or animal cell,
wherein the target antigen is coupled to a reporter oligonucleotide,
wherein the reaction mixture comprises cells that bind to the target antigen,
wherein the partitioning provides a partition, the partition comprising:
(i) Said isolated cells that bind to said target antigen, and
(ii) A plurality of nucleic acid barcode molecules comprising a partition-specific barcode sequence,
(b) In the partitioning, a barcoded nucleic acid molecule is generated, wherein the barcoded nucleic acid molecule comprises:
(i) A first barcoded nucleic acid molecule comprising the sequence of the reporter oligonucleotide or its reverse complement and the partition specific barcode sequence or its reverse complement, and
(ii) A second barcoded nucleic acid molecule, which, if the first barcoded nucleic acid molecule is detected, comprises a nucleic acid molecule of interest to be included in the plurality of nucleic acid molecules.
CN202180059186.6A 2020-06-02 2021-06-01 Enrichment of nucleic acid sequences Pending CN116829734A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/033,787 2020-06-02
US202063051257P 2020-07-13 2020-07-13
US63/051,257 2020-07-13
PCT/US2021/035311 WO2021247618A1 (en) 2020-06-02 2021-06-01 Enrichment of nucleic acid sequences

Publications (1)

Publication Number Publication Date
CN116829734A true CN116829734A (en) 2023-09-29

Family

ID=88141432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180059186.6A Pending CN116829734A (en) 2020-06-02 2021-06-01 Enrichment of nucleic acid sequences

Country Status (1)

Country Link
CN (1) CN116829734A (en)

Similar Documents

Publication Publication Date Title
US11193122B2 (en) Methods and systems for droplet-based single cell barcoding
US10793905B2 (en) Methods and systems for processing polynucleotides
US20220145370A1 (en) Systems and methods for processing rna from cells
CN111051523B (en) Functionalized gel beads
EP3896171B1 (en) Method for processing polynucleotides for analysing individual cells
US20190345636A1 (en) Methods and systems for molecular library generation
CN112639985A (en) Systems and methods for metabolome analysis
CN111712579A (en) Systems and methods for processing nucleic acid molecules from one or more cells
US11655499B1 (en) Detection of sequence elements in nucleic acid molecules
CN112703252A (en) Method and system for minimizing bar code exchange
CN112005115A (en) Methods of characterizing multiple analytes from a single cell or cell population
CN109526228B (en) Single cell analysis of transposase accessible chromatin
EP4225934B1 (en) Methods and compositions for analyzing antigen binding molecules
WO2021226290A1 (en) Methods for identification of antigen-binding molecules
US20220403375A1 (en) Methods for enriching nucleic acid libraries for target molecules that do not produce artefactual antisense reads
US20240150835A1 (en) Methods and systems for spatial mapping of genetic variants
CN116829734A (en) Enrichment of nucleic acid sequences
US20230272463A1 (en) Enrichment of nucleic acid sequences
WO2022256313A1 (en) Validation of a unique molecular identifier associated with a nucleic acid sequence of interest
US20240053337A1 (en) Compositions and methods for single cell analyte detection and analysis
US11851683B1 (en) Methods and systems for selective analysis of cellular samples
US12049621B2 (en) Methods and systems for molecular composition generation
CN112004920B (en) Systems and methods for multiplex measurement of single cells and pooled cells
US20210180044A1 (en) Methods and systems for molecular composition generation
CN117980500A (en) Nucleic acid treatment via circularization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination