CN117693582A - Reverse transcriptase variants for enhanced performance - Google Patents

Reverse transcriptase variants for enhanced performance Download PDF

Info

Publication number
CN117693582A
CN117693582A CN202280044014.6A CN202280044014A CN117693582A CN 117693582 A CN117693582 A CN 117693582A CN 202280044014 A CN202280044014 A CN 202280044014A CN 117693582 A CN117693582 A CN 117693582A
Authority
CN
China
Prior art keywords
seq
reverse transcriptase
mutation
engineered
amino acid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280044014.6A
Other languages
Chinese (zh)
Inventor
德里克·亨特·瓦列霍
钱玉峰
贾夫林·C·迟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10X Genomics Inc
Original Assignee
10X Genomics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10X Genomics Inc filed Critical 10X Genomics Inc
Priority claimed from PCT/US2022/033199 external-priority patent/WO2022265965A1/en
Publication of CN117693582A publication Critical patent/CN117693582A/en
Pending legal-status Critical Current

Links

Landscapes

  • Enzymes And Modification Thereof (AREA)

Abstract

The present disclosure provides engineered reverse transcriptases that have been modified to enhance their enzymatic activity, increase their processivity, template switching efficiency, binding affinity and/or transcription efficiency, and reduce their rnase H activity. The disclosure also provides compositions and kits comprising engineered reverse transcriptases, and methods of using these reverse transcriptases to produce, amplify, or sequence nucleic acid molecules.

Description

Reverse transcriptase variants for enhanced performance
Cross-reference to related patent applications
The present application claims priority from U.S. provisional patent application No. 63/210,143 filed on day 14, 6, 2021 and U.S. provisional patent application No. 63/290,329 filed on day 16, 12, 2021, the contents of both provisional patent applications being incorporated herein by reference in their entirety.
Technical Field
The present invention relates to the fields of protein engineering and enzymology, in particular the development of reverse transcriptase variants.
Background
The discovery of Reverse Transcriptase (RT) in the 70 s of the 20 th century has completely changed the understanding of eukaryotes by demonstrating that genetic information is not a unidirectional flow from DNA to RNA to protein. Conversely, genetic information can also flow back from RNA to DNA. The ability to convert mature mRNA back to cDNA without the presence of introns in genomic DNA is critical to information obtained in a variety of biomedical contexts, including diagnostics, prognosis, biotechnology, and forensic biology. Since then, RT enzymes (RT) have become ubiquitous tools in molecular biology, pushing technologies such as: the next generation RNA sequencing, maxam-Gilbert sequencing and chain termination methods, or the de novo sequencing methods, including shotgun sequencing and bridge PCR, or the next generation methods, including polymerase clone sequencing, 454 pyrosequencing, illumina sequencing, SOLiD sequencing, ion Torrent semiconductor sequencing, heliScope single molecule sequencing, Sequencing.
RT enzyme was originally found in retroviruses such as Moloney Murine Leukemia Virus (MMLV). It is now clear that RT is present in other microorganisms, including transposable elements, where RT is responsible for converting the RNA genome of these organisms into DNA to facilitate integration of the microorganism into the host's chromosome. All known natural RTs originate from a common ancestor. In general, RT is a mesophilic enzyme, which functions optimally at moderate temperatures in the range of 20 ℃ to 45 ℃. The mesophilic nature of RT is problematic for in vitro amplification reactions because RNA tends to adopt stable secondary structures at lower temperatures, resulting in inefficient reverse transcription reactions at these low to moderate temperatures. In addition to the RNA secondary structure, RT reactions and amplification reactions also fail because biological samples from which nucleic acids are extracted often contain other compounds that inhibit reverse transcription and/or amplification reactions. This inhibition is particularly problematic when the volume of the amplification reaction is very small (e.g., nanoliters), such as in single cell profiling reactions and other methods where small reaction volumes are preferred.
Thus, there is a need for improved reverse transcriptase enzymes with improved properties, such as improved efficiency, sustained synthesis capacity, thermal reactivity and/or thermostability. The present disclosure addresses this need.
Disclosure of Invention
One aspect of the present disclosure provides an engineered reverse transcriptase comprising the amino acid sequence of SEQ ID NO:15, and further comprising a combination of mutations selected from the group consisting of: (a) E69K, L139P, E R, T306K, W313F, T330P, N454K; and one or more of M39V, P47L, M66L, F155Y, D200N, D200E, H R, G429S, L435G, L435K, P448A, D449G, H503V, D524N, T542D, E545G, D583N, H594Q, L603W, L603F, E K, E607G, P627S, H Y, H638G, A644V, D653H, K658R and L671P; or (b) one or more of E69K, L139P, D200N, E302R, T306K, W313F, T330P, L435G, P448A, D449G, N454K, D N, L W and E607K, and M39V, P47L, M66L, F155Y, H429Y, H542Y, H545Y, H583Y, H594Y, H627Y, H638Y, H644Y, H653Y, H658R and L671P.
In some embodiments, the engineered reverse transcriptase comprises an amino acid sequence that is at least 90% identical to an amino acid sequence selected from the group consisting of: SEQ ID NO. 2, SEQ ID NO. 3, SEQ ID NO. 4, SEQ ID NO. 5, SEQ ID NO. 6, SEQ ID NO. 7, SEQ ID NO. 8, SEQ ID NO. 9, SEQ ID NO. 10, SEQ ID NO. 11, SEQ ID NO. 12, SEQ ID NO. 13, SEQ ID NO. 14, SEQ ID NO. 22, SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 25, SEQ ID NO. 26, SEQ ID NO. 27, SEQ ID NO. 28, SEQ ID NO. 29, SEQ ID NO. 30, SEQ ID NO. 31, SEQ ID NO. 32, SEQ ID NO. 33, SEQ ID NO. 34, SEQ ID NO. 35, SEQ ID NO. 36 and SEQ ID NO. 37.
In some embodiments, the engineered reverse transcriptase exhibits enhanced reverse transcriptase activity compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1. In some embodiments, the enhanced reverse transcriptase activity is selected from the group consisting of sustained synthesis capacity, template switching efficiency, binding affinity, and transcription efficiency.
In some embodiments, the enhanced reverse transcriptase activity is enhanced Template Switching (TS) efficiency compared to the template switching efficiency of a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1.
In some embodiments, the enhanced reverse transcriptase-related activity is enhanced transcription efficiency compared to transcription efficiency of a reverse transcriptase having the amino acid sequence shown in SEQ ID No. 1.
In some embodiments, the enhanced reverse transcriptase activity is enhanced transcription efficiency and template conversion efficiency compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1.
In some embodiments, the enhanced reverse transcriptase activity is increased binding affinity compared to the binding affinity of a reverse transcriptase having the amino acid sequence shown in SEQ ID No. 1.
In some embodiments, the enhanced reverse transcriptase activity is increased binding affinity and template conversion efficiency compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1.
In some embodiments, the enhanced reverse transcriptase activity is enhanced sustained synthesis compared to the sustained synthesis of a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1.
In some embodiments, the enhanced reverse transcriptase activity is enhanced ability to produce mitochondrial UMI counts compared to a reverse transcriptase having the amino acid sequence set forth in SEQ ID NO. 1.
In some embodiments, the enhanced reverse transcriptase activity is enhanced ability to produce a ribosome UMI count as compared to a reverse transcriptase having the amino acid sequence set forth in SEQ ID NO. 1.
In some embodiments of the engineered reverse transcriptase described herein, the amino acid sequence of the engineered reverse transcriptase comprises E69K, L139P, D200N, E R, T306K, W313F, T330P, N454K, H503V, D N, L603W, E K and H634Y. In those embodiments, the amino acid sequence of the engineered reverse transcriptase further comprises a combination of mutations selected from the group consisting of: (a) M66L and L435G; (b) M39V, M L and L435K; (c) M39V and L435K; (D) M66L, L435G, P448A and D449G; (e) M39V, M66L, L435G, P448A and D449G; and (f) M66L.
In some embodiments of the engineered reverse transcriptase described herein, the amino acid sequence of the engineered reverse transcriptase comprises E69K, L139P, D200N, E302R, T306K, W F, T330P, L435G, P448A, D449G, N454K, D524N, L603W and E607K; and further comprising a combination of mutations selected from the group consisting of: (a) M66L; (b) M66L and H503V; (c) M66L and H634Y; and (d) M66L, H V503V and H634Y.
In some embodiments of the engineered reverse transcriptase described herein, the amino acid sequence of the engineered reverse transcriptase comprises the M39V, E69K, L139P, D mutation, E302R, T306K, W313F, T330P, G429S, P448A, D449 mutation, L435K, N454K, L mutation, E607 mutation, and L671P, and further comprises a second combination of mutations selected from the group consisting of: (a) D524N, T542D, P627S, A644V, D653H, K658R mutation, and wherein said D200 mutation is a D200N mutation, said D449 mutation is D449G, said L603 mutation is L603W, and said E607 mutation is an E607G mutation; (b) D524N, T542D, A644V, D653H, R H and K658R, and wherein said D200 mutation is a D200N mutation, said D449 mutation is a D449E mutation, said L603 mutation is an L603W mutation, and said E607 mutation is an E607G mutation; (c) E545G, D583N and H594Q, and wherein the D200 mutation is a D200N mutation, the D449 mutation is a D449G mutation, the L603 mutation is an L603F mutation, and the E607 mutation is an E607K mutation; (d) D524N, T542D, A644V, D653H and K658R, wherein said D200 mutation is a D200N mutation, said D449 mutation is a D449E mutation, said L603 mutation is an L603W mutation, and said E607 mutation is an E607G mutation; (e) H204R, D524N, T542D, P627S, D583N, A644V, D653H and K658R, wherein said D200 mutation is a D200E mutation, said D449 mutation is a D449G mutation, said L603 mutation is an L603W mutation, said E607 mutation is an E607G mutation; (f) H204R, E545G, D583N and H594Q, wherein said D200 mutation is a D200E mutation, said D449 mutation is a D449G mutation, said L603 mutation is an L603F mutation, and said E607 mutation is an E607K mutation; and (G) P47L, D524N, T542D, D583N, P627S, A644V, D653H and K658R, wherein said D200 mutation is a D200N mutation, said D449 mutation is a D449G mutation, said L603 mutation is an L603W mutation, and said E607 mutation is an E607G mutation.
Another aspect of the disclosure provides an engineered reverse transcriptase comprising an amino acid sequence selected from the group consisting of SEQ ID No. 2,SEQ ID NO:3,SEQ ID NO:4,SEQ ID NO:5,SEQ ID NO:6,SEQ ID NO:7,SEQ ID NO:8,SEQ ID NO:9,SEQ ID NO:10,SEQ ID NO:11,SEQ ID NO:12,SEQ ID NO:13,SEQ ID NO:14,SEQ ID NO:22,SEQ ID NO:23,SEQ ID NO:24,SEQ ID NO:25,SEQ ID NO:26,SEQ ID NO:27,SEQ ID NO:28,SEQ ID NO:29,SEQ ID NO:30,SEQ ID NO:30,SEQ ID NO:31,SEQ ID NO:32,SEQ ID NO:33,SEQ ID NO:34,SEQ ID NO:35,SEQ ID NO:36 and SEQ ID No. 37.
In some embodiments, the engineered reverse transcriptase exhibits enhanced reverse transcriptase activity compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1.
In some embodiments, the enhanced reverse transcriptase activity is selected from reverse transcriptase-related activities, including processivity, template conversion efficiency, binding affinity, and transcription efficiency.
Another aspect of the disclosure provides an engineered reverse transcriptase comprising the amino acid sequence of SEQ ID NO:15, and further comprising a combination of mutations selected from the group consisting of: T542D, D583N, E607G, A644V, D653H, K658R, E545G, D583N, H594Q and L603F.
In some embodiments, the engineered reverse transcriptase comprises: (a) An amino acid sequence that is at least about 90%, at least about 92%, at least about 95%, at least about 97%, at least about 98%, or at least about 99% identical to an amino acid sequence selected from the group consisting of SEQ ID NO. 8, SEQ ID NO. 9, SEQ ID NO. 10, SEQ ID NO. 11, SEQ ID NO. 12, SEQ ID NO. 13, and SEQ ID NO. 14; or (b) an amino acid sequence selected from the group consisting of SEQ ID NO. 8, SEQ ID NO. 9, SEQ ID NO. 10, SEQ ID NO. 11, SEQ ID NO. 12, SEQ ID NO. 13 and SEQ ID NO. 14.
In some embodiments, the engineered reverse transcriptase exhibits enhanced reverse transcriptase activity compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1.
In some embodiments, the enhanced reverse transcriptase activity is selected from reverse transcriptase-related activities, including rnase H activity, processivity, template switching efficiency, binding affinity, and transcription efficiency.
Another aspect of the disclosure provides an engineered reverse transcriptase comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of SEQ ID No. 2.
Another aspect of the disclosure provides an engineered reverse transcriptase comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of SEQ ID No. 24.
In some embodiments, the engineered reverse transcriptase has one or more of the following characteristics when compared to a wild type reverse transcriptase or a reverse transcriptase comprising the amino acid of SEQ ID NO: 1: (a) increased thermal stability; (b) increased thermal reactivity; (c) increased resistance to reverse transcriptase inhibitors; (d) an increased ability to reverse transcribe difficult templates; (e) speed increase; (f) an increase in sustained synthesis capacity; (g) increased specificity; (h) enhanced polymerization activity; or (i) increased sensitivity.
In this embodiment: (a) An increase in thermal reactivity, resistance to reverse transcriptase inhibitors, ability to reverse transcription difficulty templates, speed, processivity, specificity, or sensitivity of about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, or about 100% compared to wild type reverse transcriptase or reverse transcriptase comprising the amino acid of SEQ ID No. 1; or (b) has an increase in polymerization activity of about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, or about 100% as compared to the wild-type reverse transcriptase or reverse transcriptase comprising the amino acid of SEQ ID NO. 1.
One aspect of the disclosure provides an isolated nucleic acid molecule encoding an engineered reverse transcriptase described herein.
Another aspect of the present disclosure provides an expression vector comprising an isolated nucleic acid as described herein.
Another aspect of the present disclosure provides a host cell transfected with an expression vector as described herein.
One aspect of the present disclosure provides a method of using an engineered reverse transcriptase described herein, the method comprising contacting the engineered reverse transcriptase with a nucleic acid template under suitable conditions to produce a polymerized nucleic acid product, wherein the nucleic acid template is RNA, DNA, or a nucleic acid comprising non-natural nucleotides.
One aspect of the present disclosure provides a method of nucleic acid extension, the method comprising: (a) Contacting a target nucleic acid molecule with an engineered reverse transcriptase and a plurality of barcoded nucleic acid molecules comprising a barcode sequence, and (b) incubating the target nucleic acid, the engineered reverse transcriptase, and the barcoded molecules under conditions in which the barcoded molecules are extended by the engineered reverse transcriptase, wherein the engineered reverse transcriptase comprises an amino acid sequence of the engineered reverse transcriptase described herein.
Drawings
FIGS. 1A-1G show CLUSTAL O (1.2.4) polyprotein alignment reports of wild-type (WT) and engineered Moloney murine leukemia virus reverse transcriptase (MMLV RT) variants disclosed herein. FIG. 1A shows an alignment illustrating the differences between the engineered MMLV RT variant (SEQ ID NO: 1) and wt MMLV (SEQ ID NO:15;GenBank Seq ID NP_955591.1p80RT (ebi.ac. uk/Tools/msa/cluster /). The MMLV RT variant of SEQ ID NO. 1 is the RT enzyme found in enzyme cocktail C (EMC) and was used as a control in the examples disclosed herein. FIGS. 1B-1H show the similarity and differences between WT MMLV RT, MMLV RT variants of SEQ ID NO:1 and novel MMLV RT variants disclosed herein (Table 2).
Fig. 2 shows an exemplary schematic diagram of a Capillary Electrophoresis (CE) validation assay process disclosed herein. Specifically, the 5' -end labeled DNA primer is initially hybridized to the RNA template at room temperature (about 25 ℃); the multimeric rG-labeled template switching oligonucleotide (rG-TSO) is then added to the reaction mixture. The temperature was raised to 53℃to initiate the synthesis of the first strand cDNA and addition of the poly-C tail. Hybridization of the rG-TSO oligonucleotides and TSO elongation occurs. Finally, the expanded sample was transferred to a Seqstudio TM The gene analyzer performs analysis.
Fig. 3 shows an exemplary trace of CE measurement output. For the size of the individual primers, full length extension of primer length, and full length extension of primer plus template switching oligonucleotide, a control of synthetic size was used to calibrate the product size. Product length is indicated on the x-axis and signal intensity is indicated on the y-axis.
FIGS. 4A-4B show exemplary traces of CE assay output of an enzyme control of enzyme cocktail C (FIG. 4B) containing an engineered reverse transcriptase and a transcription positive, template switching null engineered reverse transcriptase (listed as AR; FIG. 4A). Product length is indicated on the x-axis; signal strength is indicated on the y-axis. Peaks associated with full length product, full length product plus tail and TSO are indicated.
FIG. 5 shows exemplary traces of CE assay outputs of enzyme mixture C as described in FIG. 1, including length parameters associated with various reaction products. The length parameter is used for calculation of transcription efficiency and template conversion efficiency.
FIGS. 6A-6B show bar graphs summarizing the results obtained from CE analysis of various reverse transcriptase variants compared to variant MMLV RT of SEQ ID NO. 1. Variants are indicated on the x-axis of each graph. The y-axis indicates the percentage of full-length product (FIG. 6A) and the percentage of template switch product (FIG. 6B) when the listed RT variants were used for reverse transcription and template switch oligonucleotide assays, respectively.
FIG. 7 shows an exemplary bar graph comparing transcription efficiencies and template conversion efficiencies (TSO efficiencies) of various engineered reverse transcriptases disclosed herein in a CE assay using GAPDH RNA (SEQ ID NO: 18) sequences as reverse transcription templates. Bars indicating transcription efficiency are shown on the left (dark grey) for each enzyme tested; bars indicating template conversion efficiency are shown on the right (light grey) for each enzyme tested. The percentage product is indicated on the y-axis; the different enzymes tested are indicated on the x-axis.
FIG. 8 shows an exemplary table of transcription efficiencies, template conversion efficiencies, and product percentages (plus TSO) for various engineered reverse transcriptase variants (SEQ ID NOs: 22, 23, 21, 4, 3, 5, 24, 2, and 7) compared to control SEQ ID NO:1 in a CE assay using a GAPDH RNA template (SEQ ID NO: 18). Variants included different combinations of mutation sites (wt MMLV positions of SEQ ID NO: 15) as listed under "MMLV positions".
FIG. 9 shows an exemplary bar graph summarizing cDNA yields obtained from control engineered reverse transcriptase (MMLV RT; SEQ ID NO: 1) compared to variant MMLV RT (SEQ ID NO:22, 24, 2, 3 and 7) disclosed herein in single cell experiments. Single cell experiments were performed in either 3 '(sc-3' left) or 5 '(sc-5' right) experimental designs.
FIGS. 10A-10C show exemplary tables summarizing metrics of single cell gene expression experiments generated with control RT (SEQ ID NO:1; a known MMLV RT variant) and engineered MMLV RT variants disclosed herein (SEQ ID NO:22, 24, 2, 3 and 7); fig. 10A shows the results of the 20K read metrics for the median gene and median UMI per cell, fig. 10B shows the results of the 50K read metrics for the median gene and median UMI per cell, and fig. 10C shows the read results mapped to transcriptomes in a single cell. The percentages indicate the percentage change relative to the control SEQ ID NO: 1.
Fig. 11A-11B show exemplary tables of metrics related to results obtained from the engineered MMLV RT variants disclosed herein in the 3' single cell experiments from fig. 10A-10C.
Fig. 12 shows an exemplary table summarizing metrics for 5' single cell experiments including 20K read metrics, 50K read metrics, and reads mapped to transcriptomes using the same controls and engineered MMLV RT variants as in fig. 10A-10C. The percentages indicate the percentage change relative to the control SEQ ID NO: 1.
Fig. 13A-13B show exemplary tables summarizing metrics related to the single cell 5' experiment of fig. 12.
Fig. 14A-14B show exemplary tables reporting Gene Expression (GEX) metrics compared to the engineered MMLV RT variants disclosed herein (SEQ ID NOs: 2, 25, 24, or 7) using control engineered MMLV RT (SEQ ID NO: 1) in different single cell types.
FIGS. 15A-15C show exemplary scatter plots (FIGS. 15A-15B) and t-distribution domain insert (t-SNE) plots (FIG. 15C) of single cell gene expression results using 5' single cell chemistry in human PBMC and mouse PBMC (C57 BL/6 cells) comparing the two engineered MMLV RT variants disclosed herein (SEQ ID NOs: 2 and 7).
FIG. 16 shows an exemplary table summarizing the results of an immunospectral analysis from experiments comparing control enzyme mixture C (control engineered MMLV RT; SEQ ID NO: 1) with the three engineered MMLV RT variants disclosed herein (SEQ ID NO:2, 25 and 24). The percent change is relative to the control.
FIG. 17 shows a schematic representation of a generalized capture probe for use in spatial transcriptomics and single cell transcriptomics assays, which are exemplary applications other than general reverse transcription reactions, wherein the engineered thermostable reverse transcriptases of the present invention can be used to extend the capture probe using the captured target nucleic acid as a template, thereby producing cDNA products.
Detailed Description
It is to be understood that certain aspects, modes, embodiments, variations and features of the inventive method are described below in varying levels of detail in order to provide a basic understanding of the inventive technology.
While various embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.
I. Summary of the invention
The challenge in the cDNA synthesis reaction is interference from RNA secondary structures. Although higher reaction temperatures may remove secondary structures from the template RNA, if the reverse transcriptase is not nascent thermostable, elevated temperatures will typically result in lower Reverse Transcriptase (RT) enzyme activity. In addition, RT enzyme activity may be reduced by inhibitors, such as those that may be found in cell lysates and related reagents. Wild-type (WT) Moloney Murine Leukemia Virus (MMLV) reverse transcriptase is an RT enzyme that is normally inactivated at higher temperatures. Several commercially available mutant MMLV RT enzymes have been generated that exhibit improved thermostability, fidelity, substrate affinity and/or reduced terminal deoxynucleotidyl transferase activity. For example, specific residues of MMLV, such as M39V, M L, E69K, E302R, T306K, W313F, L/K435G and N454K of wild type MMLV (SEQ ID NO: 15) have been shown to improve the thermostability of wild type RT MMLV. See, e.g., arezi et al Nucleic Acids Res.37 (2): 473-481 (2009), U.S. Pat. No. 7,078,208, and Baranauska et al, prot.Eng.25 (10): 657-668 (2012); and FIG. 1A.
While these variants MMLV RT may perform well in conventional amplification reactions, these variants are not optimal for reverse transcription of mRNA when assayed using high throughput amplification reactions (e.g., spatial array and single cell transcriptomics assays) and the like. This is because high throughput amplification reaction assays require reaction volumes typically less than about 1 nanoliter. In addition, sample processing chemicals can negatively impact the function and activity of wild-type and available MMLV variants.
Accordingly, the present disclosure provides an engineered reverse transcriptase comprising the amino acid sequence of SEQ ID NO:15, and further comprising a combination of mutations selected from the group consisting of: E69K, L139P, E R, T306K, W313F, T330P, N454K; and one or more of M39V, P47L, M66L, F155Y, D200N, D200E, H R, G429S, L435G, L435K, P448A, D449G, H503V, D524N, T542D, E545G, D583N, H594Q, L603W, L603F, E K, E607G, P627S, H Y, H638G, A644V, D653H, K658R and L671P; or further comprises a combination of mutations selected from the group consisting of: E69K, L139P, D200N, E302R, T K, W313F, T330P, L G, P448A, D449G, N454K, D524N, L603W and E607K, and one or more of M39V, P47L, M66L, F155Y, H429Y, H503Y, H542 545Y, H583 594Y, H627Y, H634Y, H638Y, H653Y, H658R and L671P. In some embodiments, the engineered reverse transcriptase exhibits enhanced reverse transcriptase activity compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1 or 15.
In another aspect, the present disclosure provides an engineered reverse transcriptase comprising an amino acid sequence selected from the group consisting of SEQ ID NO. 2,SEQ ID NO:3,SEQ ID NO:4,SEQ ID NO:5,SEQ ID NO:6,SEQ ID NO:7,SEQ ID NO:8,SEQ ID NO:9,SEQ ID NO:10,SEQ ID NO:11,SEQ ID NO:12,SEQ ID NO:13,SEQ ID NO:14,SEQ ID NO:22,SEQ ID NO:23,SEQ ID NO:24,SEQ ID NO:25,SEQ ID NO:26,SEQ ID NO:27,SEQ ID NO:28,SEQ ID NO:29,SEQ ID NO:30,SEQ ID NO:30,SEQ ID NO:31,SEQ ID NO:32,SEQ ID NO:33,SEQ ID NO:34,SEQ ID NO:35,SEQ ID NO:36, and SEQ ID NO. 37.
In another aspect, the present disclosure provides an engineered reverse transcriptase comprising the amino acid sequence of SEQ ID NO:15, and further comprising a combination of mutations selected from the group consisting of: T542D, D583N, E607G, A644V, D653H, K658R, E545G, D583N, H594Q and L603F. In another aspect, the present disclosure provides an engineered reverse transcriptase comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of SEQ ID NO. 2 or 24.
The disclosure also provides isolated nucleic acid molecules encoding the novel engineered reverse transcriptases disclosed herein, expression vectors comprising the isolated nucleic acid molecules, and host cells transfected with the isolated nucleic acid molecules or expression vectors comprising the novel engineered reverse transcriptases disclosed herein. In another aspect, the present disclosure provides a method of using the engineered reverse transcriptase disclosed herein or a nucleic acid extension method comprising the engineered reverse transcriptase.
A. Summary of experimental results
As described above, when the reaction volume is less than about 1nL, reverse transcription of mRNA from single cells can be suppressed. In addition, sample processing chemicals can negatively impact the function and activity of wild-type and available MMLV variants. Overcoming the inhibitory effects of reaction volumes and the effects of processing chemicals has become a challenge for efficient single cell profiling using mRNA. As shown in the examples below, the novel MMLV variants disclosed herein overcome these challenges.
The novel class of MMLV variants described herein (fig. 1B-1H) exhibit a combination of reverse transcriptase activity and high thermostability in conventional RT-PCR amplification and high throughput amplification reaction assays, such as single cell profiling using mRNA. As shown in fig. 6 and 7, all MMLV variants disclosed herein show significant enhancement of transcription efficiency and template switching in single cell profiling analysis when compared to wild-type MMLV or variant MMLV comprising the amino acid sequence of SEQ ID NO: 1. In particular, variants comprising the M66L, H503 or H634 mutation, alone or in combination, showed excellent transcription efficiency and template conversion in the wild type (SEQ ID NO: 15) or variant (SEQ ID NO: 1) background. See fig. 6-9. As shown in fig. 9-14, these novel variants also showed enhanced efficiency in all the test parameters determined by single cell profiling. Furthermore, the novel variant MMLV RT enzymes disclosed herein showed significant enhancement in Gene Expression (GEX) sensitivity and mapping using human and mouse peripheral blood mononuclear cells (fig. 15 and 16).
B. Exemplary benefits of novel MMLV RT variants
Thus, the combination of mutations in each variant disclosed herein is unexpectedly sufficient to overcome the inhibitory effect of the following factors on the function and activity of wild-type and/or available MMLV variants: (1) low volume high throughput amplification reaction volumes (i.e., less than about 1 nanoliter), which can lead to (2) chemically crowded reaction conditions, and (3) sample processing chemicals. Many of these substitutions are surprising and unexpected. For example, the P448A and D449G substitutions in SEQ ID NO. 1 revert back to wild type in most of the novel MMLV variants disclosed herein, as further experiments demonstrated that these two mutations were not as advantageous as originally expected. In addition, the residues that have been mutated in SEQ ID NO. 1 are further mutated to produce novel variants with improved transcriptional activity in high throughput amplification reaction assays. For example, as shown in fig. 1B-1H, D200N is mutated to D200E in some variants; L435G is mutated to L435K in some variants; L603W is mutated to L603F in some variants; and E607K is mutated to E607G in some variants.
Furthermore, the engineered reverse transcriptase variants described herein unexpectedly show a higher resistance to inhibition by cell lysate than enzymes having the amino acid sequences shown in SEQ ID No. 1 or 15. Finally, the engineered reverse transcriptase variants of the present disclosure unexpectedly show a greater ability to capture full length transcripts in a T cell receptor paired transcriptional profiling assay than exhibited by enzymes having the amino acid sequences shown in SEQ ID NOs 1 or 15.
The engineered reverse transcriptase variants described herein can be used in any application requiring RNA amplification. Various applications of cell processing and analysis methods and systems are known in the art, including analysis of specific individual cells, analysis of different cell types within different cell type populations, analysis and characterization of large populations of cells for environmental, human health, epidemic medicine, or any of a variety of different applications.
Engineering reverse transcriptase
Reverse transcriptase or Reverse Transcriptase (RT) enzymes are RNA-dependent DNA polymerases typically used to produce copies of RNA sequences, thereby producing cDNA molecules. Reverse transcription is initiated by hybridization of a primer sequence to an RNA molecule that is extended by a reverse transcriptase in a template directed manner. Reverse transcriptase adds multiple non-template nucleotides to the nucleotide chain, thereby producing complementary deoxyribonucleic acid (cDNA) molecules. The resulting cDNA can then be unhybridized from the template RNA molecule in a number of ways known in the art.
A. Novel variants
In one aspect, the present disclosure provides an engineered reverse transcriptase comprising the amino acid sequence of SEQ ID NO:15, and further comprising a combination of mutations selected from the group consisting of: E69K, L139P, E R, T306K, W313F, T330P, N454K; and one or more of M39V, P47L, M66L, F155Y, D200N, D200E, H R, G429S, L435G, L435K, P448A, D449G, H503V, D524N, T542D, E545G, D583N, H594Q, L603W, L603F, E K, E607G, P627S, H Y, H638G, A644V, D653H, K658R and L671P; or further comprises a combination of mutations selected from the group consisting of: E69K, L139P, D200N, E302R, T K, W313F, T330P, L G, P448A, D449G, N454K, D524N, L603W and E607K, and one or more of M39V, P47L, M66L, F155Y, H429Y, H503Y, H542 545Y, H583 594Y, H627Y, H634Y, H638Y, H653Y, H658R and L671P.
The engineered reverse transcriptase of the present disclosure is a variant Moloney Murine Leukemia Virus (MMLV) reverse transcriptase having one or more mutations. In particular, the novel engineered reverse transcriptase described herein comprises a combination of mutations in the amino acid sequence of either wild type MMLV (SEQ ID NO 15) or a known MMLV variant (SEQ ID NO: 1). As used herein, "mutation" refers to changes in a parent or wild-type DNA sequence that alter the amino acid sequence encoded by the DNA, including but not limited to substitutions, insertions, deletions, point mutations, mutations of multiple nucleotides or amino acids, transposition, inversion, frameshift, nonsense mutations, truncations, or other forms of aberration that distinguish a polynucleotide or protein sequence from a wild-type sequence of a gene or gene product. Consequences of a mutation include, but are not limited to, the creation of a new feature, characteristic, function or trait not found in the protein encoded by the parent DNA, including, but not limited to, an N-terminal truncation, a C-terminal truncation, or a chemical modification. "mutation" also includes N-terminal or C-terminal extension. In some embodiments, the mutations disclosed herein are substitutions.
In particular, the disclosure relates to mutant or modified reverse transcriptases comprising one or more (e.g., one, two, three, four, five, ten, twelve, fifteen, twenty, etc.) amino acid changes. These amino acid changes render the reverse transcriptase more efficient for nucleic acid synthesis requiring very small volumes (e.g., single cell profiling assays) than the unmutated or unmodified reverse transcriptase. As will be appreciated by those of skill in the art, one or more of the identified amino acids may be deleted and/or substituted with one or more amino acid residues. In a preferred aspect, any one or more amino acids may be substituted with any one or more amino acid residues, such as Ala, arg, asn, asp, cys, GIn, GIu, GIy, his, he, leu, lys, met, phe, pro, ser, thr, trp, tyr and/or VaI.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises E69K, L139P, D200N, E R, T306K, W313F, T330P, N454K, H503V, D524N, L W, E K and H634Y in SEQ ID NO 15. In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises M66L, E69K, L P, D200N, E302R, T306K, W313F, T330P, N454K, H503V, D524N, L603W, E607K, L G and H634Y of SEQ ID NO 15; and also comprises a combination of mutations M66L and. In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises E69K, L139P, D200N, E R, T306K, W F, T P, N454K, H503V, D524N, L603W, E K, H634Y, M39V, M L and L435K of SEQ ID NO 15. In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises M39V, E69K, L435K, L139P, D200N, E302R, T K, W F, T P, N454K, H503V, D524N, L603W, E K and H634Y in SEQ ID NO 15.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises E69K, L139P, D200N, E R, T306K, W313F, T P, N454K, H V, D524N, L603W, E607K, H634Y, M66L, L435G, P448A and D449G of SEQ ID NO 15. In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises E69K, L139P, D200N, E R, T306K, W313F, T330P, N503V, D524N, L W, E607 38349 39V, M66L, L435G, P448A, D449G and H634Y in SEQ ID NO 15. In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises M66L, E69K, L P, D200N, E302R, T306K, W313F, T330P, N454K, H503V, D524N, L603W, E K and H634Y of SEQ ID NO 15.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises E69K, L139P, D200N, E R, T306K, W313F, T330P, L435G, P448A, D449G, N454K, D N, L603W and E607K of SEQ ID NO 15. In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises M66L, E69K, L139P, D N, E8238 306K, W313F, T330 435G, P448A, D449G, N454K, D524N, L603W and E607K of SEQ ID NO 15.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises E69K, L139 82348 200N, E302R, T306K, W313F, T330P, L435G, P448A, D449G, N454K, D N, L603W, E607K, M L and H503V of SEQ ID NO 15. In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises E69K, L139P, D200N, E302R, T306K, W313F, T330P, L435G, P448A, D449G, N454K, D N, L603W, E607K, M L and H634Y of SEQ ID NO 15. In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises E69K, L139P, D200N, E R, T306K, W313F, T P, L435G, P A, D449G, N454K, D N, L603W, E607K, M66L, H V and H634Y of SEQ ID NO 15.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises the M39V, E69K, L139P, D mutation, E302R, T306K, W313F, T330P, G429S, P448A, D449 mutation, L435K, N454K, L603 mutation, E607 mutation, and L671P of SEQ ID NO. 15.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises the M39V, E69K, L139P, D mutation, E302R, T306K, W313F, T330P, G S, P448A, D449 mutation, L435K, N454K, L603 mutation, E607 mutation, L671P, D524N, T542D, P627S, A644V, D653H and K658R mutation of SEQ ID NO. 15. In this embodiment, the D200 mutation is a D200N mutation, the D449 mutation is D449G, the L603 mutation is L603W, and the E607 mutation is an E607G mutation.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises the M39V, E69K, L139P, D mutation, E302R, T306K, W313F, T330P, G S, P448A, D449 mutation, L435K, N454K, L603 mutation, E607 mutation, L671P, D524N, T542D, A644V, D653H, R H and K658R of SEQ ID NO. 15. In this embodiment, the D200 mutation is a D200N mutation, the D449 mutation is a D449E mutation, the L603 mutation is an L603W mutation, and the E607 mutation is an E607G mutation.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises the M39V, E69K, L139P, D mutation, E302R, T306K, W313F, T330P, G429S, P448A, D449 mutation, L435K, N454K, L mutation, E607 mutation, L671P, E545G, D583N and H594Q of SEQ ID NO. 15. In this embodiment, the D200 mutation is a D200N mutation, the D449 mutation is a D449G mutation, the L603 mutation is an L603F mutation, and the E607 mutation is an E607K mutation.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises the M39V, E69K, L139P, D mutation, E302R, T306K, W F, T330P, G429S, P448A, D449 mutation, L435K, N454K, L603 mutation, E607 mutation, L671P, D524N, T542D, A644V, D653H and K658R of SEQ ID NO. 15. In this embodiment, the D200 mutation is a D200N mutation, the D449 mutation is a D449E mutation, the L603 mutation is an L603W mutation, and the E607 mutation is an E607G mutation.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises the M39V, E69K, L P, D mutation, E302R, T306K, W313F, T330P, G S, P448A, D449 mutation, L435K, N454K, L603 mutation, E607 mutation, L671P, H204R, D N, T542D, P627S, D583N, A644V, D653H and K658R of SEQ ID NO. 15. In this embodiment, the D200 mutation is a D200E mutation, the D449 mutation is a D449G mutation, the L603 mutation is an L603W mutation, and the E607 mutation is an E607G mutation.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises the M39V, E69K, L139P, D mutation, E302R, T306K, W313F, T330P, G429S, P448A, D449 mutation, L435K, N454K, L603 mutation, E607 mutation, L671P, H204R, E545G, D583N and H594Q of SEQ ID NO. 15. In this embodiment, the D200 mutation is a D200E mutation, the D449 mutation is a D449G mutation, the L603 mutation is an L603F mutation, and the E607 mutation is an E607K mutation.
In some embodiments, the amino acid sequence of the engineered reverse transcriptase comprises the M39V, E69K, L P, D mutation, E302R, T306K, W313F, T330P, G S, P448A, D449 mutation, L435K, N454K, L603 mutation, E607 mutation, and L671P, P47L, D524N, T542D, D583N, P627S, A644V, D653H and K658R of SEQ ID NO. 15. In this embodiment, the D200 mutation is a D200N mutation, the D449 mutation is a D449G mutation, the L603 mutation is an L603W mutation, and the E607 mutation is an E607G mutation.
Some variants share the following changes, i.e., combinations of variants, including T542D, D583N, E607G, A644V, D653H and K658R (all relative to SEQ ID NO: 15). Some variants share the following changes, i.e., combinations of variants, including E545G, D583N, H594Q and L603F (all relative to SEQ ID NO: 15). These variants may also comprise additional alterations that may affect one or more reverse transcriptase-related activities.
One aspect of the disclosure provides an engineered reverse transcriptase comprising an amino acid sequence selected from the group consisting of SEQ ID NO. 2,SEQ ID NO:3,SEQ ID NO:4,SEQ ID NO:5,SEQ ID NO:6,SEQ ID NO:7,SEQ ID NO:8,SEQ ID NO:9,SEQ ID NO:10,SEQ ID NO:11,SEQ ID NO:12,SEQ ID NO:13,SEQ ID NO:14,SEQ ID NO:22,SEQ ID NO:23,SEQ ID NO:24,SEQ ID NO:25,SEQ ID NO:26,SEQ ID NO:27,SEQ ID NO:28,SEQ ID NO:29,SEQ ID NO:30,SEQ ID NO:30,SEQ ID NO:31,SEQ ID NO:32,SEQ ID NO:33,SEQ ID NO:34,SEQ ID NO:35,SEQ ID NO:36 and SEQ ID NO. 37.
In the context of two or more nucleic acids or polypeptide sequences, percent sequence identity refers to the number of residues or bases that are identical for a given alignment of two polypeptides or nucleic acid sequences. When compared and aligned for a given parameter, such as maximum correspondence, the sequences share a specified percentage of identical nucleotide or amino acid residues, respectively, as measured using one of the sequence comparison algorithms described below (or other algorithms available to the skilled artisan) or by visual inspection.
Conventionally, amino acid additions, substitutions and deletions in aligned reference sequences are all differences that may reduce the percent identity, depending on the parameters used to evaluate the percent identity. Often, additions, substitutions and deletions in aligned reference sequences are evaluated in an equivalent manner. In some cases, the change in length between two sequences that results in one sequence having bases or residues beyond the N-or C-or 5 'or 3' end of the other sequence is discarded in the sequence alignment such that the alignment region is defined by the end of the shorter or earlier termination sequence and amino acid alignment extending beyond the N-or C-or 5 'or 3' end of the earlier termination sequence of the polynucleotide has no effect on the percent identity score of the region. For example, by one calculation method, if the reference sequence is contained entirely as a contiguous non-segment in a longer polynucleotide without amino acid differences, an alignment of a 105 amino acid long polypeptide to a 100 amino acid long reference sequence will have a 100% identity score. From such an evaluation, a single amino acid difference (addition, deletion or substitution) between two sequences would mean that the two sequences are 99% identical over a 100 amino acid span of the aligned reference sequences.
Conversely, in the context of two nucleic acids or polypeptides (e.g., DNA encoding a polymerase, or an amino acid sequence of a polymerase), "substantially identical" refers to two or more sequences or subsequences that have at least about 60%, at least about 80%, at least about 90% -95%, at least about 98%, at least about 99% or more nucleotide or amino acid residue identity, as measured using a sequence comparison algorithm or by visual inspection, for maximum correspondence. Such "substantially identical" sequences are generally considered "homologous" and do not relate to an actual ancestor. "substantial identity" exists over a region of a sequence that is at least about 50 residues in length, at least about 100 residues, at least about 150 residues, or over the full length of two sequences to be compared.
Proteins and/or protein sequences are "homologous" when they are naturally or artificially derived from a common ancestral protein or protein sequence. Similarly, nucleic acids and/or nucleic acid sequences are homologous when they are naturally or artificially derived from a common ancestral nucleic acid or nucleic acid sequence. Homology is typically inferred from sequence similarity between two or more nucleic acids or proteins (or sequences thereof). The exact percentage of sequence similarity that can be used to establish homology varies with the nucleic acid and protein in question, but sequence similarity as little as 25% across about 50, about 100, about 150 or more residues is typically used to establish homology. Higher levels of sequence similarity, such as at least about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95%, or about 99% or more, may also be used to establish homology.
Methods for determining percent sequence similarity (e.g., BLAST protein (BLASTP) and nucleotide (BLASTN) using default parameters) are described herein and are generally available. For sequence comparison and homology determination, typically one sequence serves as a reference sequence for comparison to the test sequence. When using a sequence comparison algorithm, the test sequence and reference sequence may be entered into a computer, subsequence coordinates specified if necessary, and sequence algorithm program parameters specified. The sequence comparison algorithm then calculates the percent sequence identity of the test sequence relative to the reference sequence based on the specified program parameters. Optimal alignments of sequences for comparison are known to those skilled in the art.
In some embodiments, the engineered reverse transcriptase comprises an amino acid sequence that is at least about or about 90%, at least about or about 91%, at least about or about 92%, at least about or about 93%, at least about or about 94%, at least about or about 95%, at least about or about 96%, at least about or about 97%, at least about or about 98%, or at least about or about 99% identical to an amino acid sequence selected from the group consisting of: 1,SEQ ID NO:2,SEQ ID NO:3,SEQ ID NO:4,SEQ ID NO:5,SEQ ID NO:6,SEQ ID NO:7,SEQ ID NO:8,SEQ ID NO:9,SEQ ID NO:10,SEQ ID NO:11,SEQ ID NO:12,SEQ ID NO:13,SEQ ID NO:14,SEQ ID NO:15,SEQ ID NO:22,SEQ ID NO:23,SEQ ID NO:24,SEQ ID NO:25,SEQ ID NO:26,SEQ ID NO:27,SEQ ID NO:28,SEQ ID NO:29,SEQ ID NO:30,SEQ ID NO:30,SEQ ID NO:31,SEQ ID NO:32,SEQ ID NO:33,SEQ ID NO:34,SEQ ID NO:35,SEQ ID NO:36 and 37.
Another aspect of the disclosure provides an engineered reverse transcriptase comprising the amino acid sequence of SEQ ID NO:15, and further comprising a combination of mutations selected from the group consisting of: T542D, D583N, E607G, A644V, D653H, K658R, E545G, D583N, H594Q and L603F. In some embodiments, the engineered transcriptase comprises an amino acid sequence that is at least about or about 90%, at least about or about 91%, at least about or about 92%, at least about or about 93%, at least about or about 94%, at least about or about 95%, at least about or about 96%, at least about or about 97%, at least about or about 98%, or at least about or about 99% identical to an amino acid sequence selected from the group consisting of SEQ ID NO. 8, SEQ ID NO. 9, SEQ ID NO. 10, SEQ ID NO. 11, SEQ ID NO. 12, SEQ ID NO. 13, and SEQ ID NO. 14. In some embodiments, the engineered transcriptase comprises an amino acid sequence selected from the group consisting of SEQ ID NO. 8, SEQ ID NO. 9, SEQ ID NO. 10, SEQ ID NO. 11, SEQ ID NO. 12, SEQ ID NO. 13, and SEQ ID NO. 14.
One aspect of the disclosure provides an engineered reverse transcriptase comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of SEQ ID NO. 2. Another aspect of the disclosure provides an engineered reverse transcriptase comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of SEQ ID No. 24.
In some embodiments, the engineered reverse transcriptase comprises an amino acid sequence that is at least 95% identical to a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1. In some embodiments, the engineered reverse transcriptase comprises an amino acid sequence that is at least 95% identical to SEQ ID No. 1 and has at least one mutation selected from the group consisting of: the M39V mutation, P47L mutation, M66L mutation, E69K mutation, L139P mutation, D200N mutation, H204R mutation, E302R mutation, T306K mutation, W313F mutation, T330P mutation, L435G mutation, G429S mutation, L435K mutation, P448A mutation, D449G mutation, N454K mutation, H503V mutation, D524N mutation, T542 mutation, E545G mutation, D583N mutation, H594Q mutation, L603W mutation, E607K mutation, P627S mutation, H634Y mutation, A644V mutation, R650H mutation, D653H mutation, K658R mutation and L671P mutation; and the engineered reverse transcriptase exhibits altered reverse transcriptase-related activity.
In some embodiments, the present application provides an engineered reverse transcriptase comprising an amino acid sequence that is at least 95% identical to SEQ ID No. 1, and wherein the amino acid sequence of the engineered reverse transcriptase comprises a combination of mutations indexed to SEQ ID No. 15 selected from the group consisting of (a) an E69K mutation, an L139P mutation, a D200N mutation, an E302R mutation, a T306K mutation, a W313F mutation, a T330P mutation, an N454K mutation, an H503V mutation, a D524N mutation, an L603W mutation, an E607K mutation, and an H634Y mutation; (b) M66L mutation, E69K mutation, L139P mutation, D200N mutation, E302R mutation, T306K mutation, W313F mutation, T330P mutation, N454K mutation, D524N mutation, H503V mutation, L603W mutation, E607K mutation, and H634Y mutation, and at least one mutation selected from the group consisting of L435G mutation, L435K mutation, M39V mutation, P448A mutation, and D449G mutation; (c) M39V mutation, E69K mutation, L139P mutation, D200N mutation, E302R mutation, T306K mutation, W313F mutation, T330P mutation, N454K mutation, H503V mutation, D524N mutation, L603W mutation, E607K mutation, and H634Y mutation; and (D) an M39V mutation, an E69K mutation, an L139P mutation, a D200 mutation, an E302R mutation, a T306K mutation, a W313F mutation, a T330P mutation, an L435K mutation, a G429S mutation, a P448A mutation, a D449 mutation, an N454K mutation, an L603 mutation, an E607 mutation, and an L671P mutation, wherein the D200 mutation is selected from D200N and D200E, wherein the D449 mutation is selected from D449G and D449E, wherein the L603 mutation is selected from L603W and L603F, wherein the E607 mutation is selected from E607G and E607K, and the amino acid sequence of the engineered reverse transcriptase further comprises at least one mutation selected from P47L, H204R, D524N, T542D, E545G, D583N, H59Q, P S, A644 2 650 54653H, K658 62658 671P and S679P.
In some embodiments, the engineered reverse transcriptase of the present application has an amino acid sequence that is at least 95% identical to SEQ ID No. 1, and wherein the amino acid sequence of the engineered reverse transcriptase comprises a combination of mutations indexed by SEQ ID No. 15, wherein the amino acid sequence of the engineered reverse transcriptase comprises a combination of mutations selected from the group consisting of: the E69K mutation, the L139P mutation, the D200N mutation, the E302R mutation, the T306K mutation, the W313F mutation, the T330P mutation, the N454K mutation, the H503V mutation, the D524N mutation, the L603W mutation, the E607K mutation, and the H634Y mutation, and further comprising a second combination of mutations selected from the group consisting of: (a) an M66L mutation and an L534G mutation, (b) an M39V mutation, an M66L mutation and an L435K mutation, (c) an M39V mutation and an L435K mutation, (D) an M66L mutation, an L435G mutation, a P448 mutation and a D449G mutation, and (e) an M39V mutation, an M66L mutation, an L435G mutation, a P448 mutation and a D449G mutation.
In some embodiments, the engineered reverse transcriptase of the present application has an amino acid sequence that is at least 95% identical to SEQ ID No. 1, and wherein the amino acid sequence of the engineered reverse transcriptase comprises a combination of mutations selected from the group consisting of: the M39V mutation, the E69K mutation, the L139P mutation, the D200 mutation, the E302R mutation, the T306K mutation, the W313F mutation, the T330P mutation, the G429S mutation, the P448A mutation, the D449 mutation, the L435K mutation, the N454K mutation, the L603 mutation, the E607 mutation, and the L671P mutation, and further comprising a second combination of mutations selected from the group consisting of: (a) A D524N mutation, a T542D mutation, an a644V mutation, a D653H mutation, a K658R mutation, a S679P mutation, and wherein the D200 mutation is a D200N mutation, the D449 mutation is D449G, the L603 mutation is L603W, the E607 mutation is an E607G mutation, and a P627S mutation; (b) A D524N mutation, a T542D mutation, an a644V mutation, a D653H mutation, an R650 mutation, and a K658R mutation, and wherein the D200 mutation is a D200N mutation, the D449 mutation is a D449E mutation, the L603 mutation is an L603W mutation, and the E607 mutation is an E607G mutation; (c) An E545G mutation, a D583N mutation, an H594Q mutation, and an S679P mutation, and wherein the D200 mutation is a D200N mutation, the D449 mutation is a D449G mutation, the L603 mutation is an L603F mutation, and the E607 mutation is an E607K mutation; (d) A D524N mutation, a T542D mutation, an a644V mutation, a D653H mutation, and a K658R mutation, and wherein the D200 mutation is a D200N mutation, the D449 mutation is a D449E mutation, the L603 mutation is an L603W mutation, and the E607 mutation is an E607G mutation; (e) H204R mutation, D524N mutation, T542D mutation, D583N mutation, a644V mutation, D653H mutation, and K658R mutation, wherein the D200 mutation is a D200E mutation, the D449 mutation is a D449G mutation, the L603 mutation is an L603W mutation, the E607 mutation is an E607G mutation, and P627S mutation, (F) H204R mutation, E454G mutation, D583N mutation, H594Q mutation, and S679P mutation, wherein the D200 mutation is a D200E mutation, the D449 mutation is a D449G mutation, the L603 mutation is an L603F mutation, and the E607 mutation is an E607K mutation; and (G) a P47 mutation, a D524N mutation, a T542D mutation, a D583N mutation, an a644V mutation, a D653H mutation, a K658R mutation, and a S679P mutation, wherein the P47 mutation is a P47L mutation, the D200 mutation is a D200N mutation, the D449 mutation is a D449G mutation, the L603 mutation is an L603W mutation, the E607 mutation is an E607G mutation, and a P627S mutation. Variants may comprise a combination of mutations or alterations, and may also comprise a second combination of mutations.
In some embodiments, the engineered reverse transcriptase of the present application has an amino acid sequence shown in the group comprising SEQ ID NO. 2, SEQ ID NO. 3, SEQ ID NO. 4, SEQ ID NO. 5, SEQ ID NO. 6, SEQ ID NO. 7, SEQ ID NO. 8, SEQ ID NO. 9, SEQ ID NO. 10, SEQ ID NO. 11, SEQ ID NO. 12, SEQ ID NO. 13 and SEQ ID NO. 14.
In some embodiments, the engineered reverse transcriptase of the present application comprises the amino acid sequences shown in table 2.
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
1. Label (Label)
One aspect of the disclosure provides an engineered reverse transcriptase comprising a tag protein. The tags used in the practice of the present invention may be used for a number of purposes, and a number of tags may be added to impart one or more different functions to the engineered reverse transcriptase and/or derivatives thereof of the present disclosure. For example, the tag may (1) facilitate protein-protein interactions within the protein and with other protein molecules, (2) tailor the protein to a particular purification method, (3) enable one to identify whether the protein is present in the composition; or (4) other functional characteristics of the protein.
In some embodiments, the engineered reverse transcriptase described herein further comprises a tag protein selected from an affinity tag, a fluorescent tag, or an expression and/or solubility enhancing tag. In some embodiments of the present invention, in some embodiments, the tag protein is selected from hexahistidine tag (his-tag), fasciola hepatica (Fasciola hepatica) 8-kDa antigen tag (Fh 8), glutathione-S-transferase (GST) tag, maltose binding protein tag (MBP), FLAg tag peptide (FLAG tag), streptavidin binding peptide tag (Strep-II), calmodulin binding protein tag (CBP), mutant dehalogenase tag (HaloTag), staphylococcal protein A (protein A), intein-mediated chitin binding domain purification (IMPACT (CBD)), cellulose Binding Module (CBM), dockerin (dockerin) domain tag of Clostridium johnsonii (Clostridium josui), fungal avidin-like protein (Tamavidin) a small ubiquitin-like modification tag (SUMO), a streptococcal tag, a thioredoxin (Trx) tag, an aVariFlex C-terminal solubility enhancing tag, a short peptide C-terminal tag, a solubility enhancing peptide Sequence (SET) tag, an IgG domain B1 (GB 1) tag of protein G, an IgG repeat domain ZZ (ZZ) tag of protein a, a mutated dehalogenase tag (HaloTag), a solubility enhancing ubiquitous tag (SNUT tag), a 17 kilodalton protein (Skp tag), a phage T7 protein kinase (T7 PK) tag, an escherichia coli (e.coli) secretin a protein a (EspA) tag, a monomeric phage T7.3 protein (Orc protein) (Mocr) tag, an E.coli trypsin inhibitor (Ecotin) tag, a calbindin (CaBP) tag, a stress-responsive arsenate reductase (ArsC) tag, an N-terminal fragment of translation initiation factor IF2 (IF 2-domain I) tag, an N-terminal fragment of translation initiation factor IF2 (expressive) tag, a stress-responsive protein tag (e.g., rpoA, tag, slyD Tsf tag, rpoS tag, potD tag, or Crr tag), and an E.coli acidic protein tag (e.g., msyB tag, yigD tag, and rpoD tag). Additional affinity tags and solubility enhancing tags are known to those skilled in the art. See Costa et al, front. Microbiol, 63 (5): 2014; esposito and Chatterjee Curr. Opin. Biotechnol.,17:353-358 (2006); malhotra, A. "Tagging for protein expression," in Guide to Protein Purification, 2 nd edition, editors R.R. Burgess and M.P. Deutscher (San Diego, calif. Elsevier), 463:239-258 (2009).
In some embodiments, the tag is selected from the group consisting of a hexahistidine tag (his-tag), a small ubiquitin-like modification tag (SUMO), a VariFlex C-terminal solubility enhancing tag, a short peptide C-terminal tag, a thioredoxin (Trx) tag, a VariFlex C-terminal solubility enhancing tag, a solubility enhancing peptide Sequence (SET) tag, an IgG domain B1 (GB 1) tag of protein G, an IgG repeat domain ZZ (ZZ) tag of protein a, a solubility enhancing ubiquitous tag (SNUT tag), a 17 kilodalton protein (Skp tag), a bacteriophage T7 protein kinase (T7 PK) tag, an escherichia coli secretin a (EspA) tag, a monomeric bacteriophage T7.3 protein (Orc protein) (Mocr tag), an escherichia coli trypsin inhibitor (Ecotin) tag, a calbindin (CaBP) tag, a stress-reactive arsenate reductase (ArsC) tag, an N-terminal fragment of translation initiation factor IF2 (IF 2-domain I) tag, an N-terminal of translation initiation factor IF2 (fascian-expressed fragment of fascian), a fascian-G (fascian-G) tag, a fascian-G (fascian-G) and a fascian-G antigen-G (fascian-G); strep), calmodulin binding protein tag (CBP), mutant dehalogenase tag (HaloTag), staphylococcal protein A (protein A), intein-mediated chitin binding domain purification (IMPACT (CBD)), cellulose Binding Module (CBM), ankyrin domain tag (Dock) of clostridium johnsonii (Clostridium josui) or fungal avidin-like protein (Tamavidin).
In one embodiment, the tag is an affinity tag selected from the group consisting of: histidine tags, such as hexahistidine tag (his-tag or 6 his-tag), fasciola hepatica 8-kDa antigen tag (Fh 8), glutathione-S-transferase (GST) tag, maltose binding protein tag (MBP), FLAg tag peptide (FLAg), streptavidin binding peptide tag (Strep-II), calmodulin binding protein tag (CBP), mutated dehalogenase tag (HaloTag), staphylococcal protein a (protein a), intein-mediated chitin binding domain purification (IMPACT (CBD)), ankyrin domain tag of clostridium johnsonii (Dock) or fungal avidin-like protein (Tamavidin). In one embodiment, the tag is a hexahistidine tag.
In some embodiments, the tag is selected from the group consisting of a small ubiquitin-like modified tag (SUMO), a VariFlex C-terminal solubility enhancing tag, a short peptide C-terminal tag, a thioredoxin (Trx) tag, a solubility enhancing peptide Sequence (SET) tag, an IgG domain B1 (GB 1) tag of protein G, an IgG repeat domain ZZ (ZZ) tag of protein a, a solubility enhancing ubiquitous tag (SNUT) tag, a 17 kilodalton protein (Skp) tag, a phage T7 protein kinase (T7 PK) tag, an escherichia coli secretin a (EspA) tag, a monomeric phage T7.3 protein (Orc protein) (Mocr) tag, an escherichia coli trypsin inhibitor (Ecotin) tag, a calcium binding protein (CaBP) tag, a stress-responsive arsenate reductase (ArsC) tag, an N-terminal fragment of translation initiation factor IF2 (IF 2-domain I) tag, an N-terminal fragment of translation initiation factor IF2 (expressive) tag, a liver slice 8-kDa antigen tag (38h 8), a transferase-GST 7.3 protein (Orc protein) (Mocr) tag, a fasciolin-binding peptide (FLAg) tag, and a maltose binding peptide (FLAg) tag; strep), calmodulin binding protein tag (CBP), mutant dehalogenase tag (HaloTag), staphylococcal protein a (protein a), intein-mediated chitin binding domain purification (IMPACT (CBD)) Cellulose Binding Module (CBM), dockerin domain tag (Dock) of clostridium johnsonii (Clostridium josui), fungal avidin-like protein (Tamavidin).
In some embodiments, the solubility enhancing tag is selected from the group consisting of a SUMO tag, a GST tag, a Trx tag, a VariFlex C-terminal solubility enhancing tag, a short peptide C-terminal tag, a Fh8 tag, an MBP tag, a SET tag, a GB1 tag, a ZZ tag, a HaloTag, a SNUT tag, a Skp tag, a T7PK tag, an EspA tag, a Mocr tag, an Ecotin tag, a CaBO tag, an ArsC tag, an IF 2-domain I tag, an expressive tag, an RpoA tag, a SlyD tag, a Tsf tag, an RpoS tag, a PotD tag, a Crr tag, an msyB tag, a yigD tag, and an rpoD tag.
In some embodiments, the tag is an affinity tag. In one embodiment, the tag is an affinity tag and comprises a histidine purification tag. In one embodiment, the tag is a hexahistidine tag (his tag). In one embodiment, the tag comprises the amino acid sequence of the sequence HHHHH (SEQ ID NO: 38). In one embodiment, the tag is a solubility enhancing tag. In one embodiment, the solubility enhancing tag is a short peptide C-terminal tag. In one embodiment, the solubility enhancing tag comprises the amino acid sequence of SEEDEEKEEDG (SEQ ID NO: 39) or an amino acid sequence having at least 90% sequence identity to the amino acid sequence of SEQ ID NO: 39.
In some embodiments, the engineered transcriptase or derivative thereof comprises an affinity tag at the N-terminus or C-terminus of the amino acid sequence. In some embodiments, affinity tags include, but are not limited to, albumin Binding Proteins (ABP), AU1 epitopes, AU5 epitopes, T7 tags, V5 tags, B tags, chloramphenicol Acetyl Transferase (CAT), dihydrofolate reductase (DHFR), aviTag, calmodulin tags, polyglutamic acid tags, E tags, FLAG tags, HA tags, myc tags, NE tags, S tags, SBP tags, doftag 1, softag 3, spot tags, tetracysteine (TC) tags, ty tags, VSV tags, xpress tags, biotin Carboxyl Carrier Protein (BCCP), green fluorescent protein tags, haloTag, nus tags, thioredoxin tags, fc tags, cellulose binding domain, chitin Binding Protein (CBP), choline binding domain, galactose binding domain, maltose Binding Protein (MBP), horseradish peroxidase (HRP), strep tag, HSV epitope, ketosteroid isomerase (KSI), KT3 epitope, lacZ, luciferase, PDZ domain, PDZ ligand, polyarginine (Arg tag), polyaspartic acid (Asp tag), polycysteine (Cys tag), polyphenylalanine (Phe tag), proficiency eXact, protein C, S1 tag, S1 tag, staphylococcal protein a (protein a), staphylococcal protein G (protein G), small ubiquitin-like modifier (SUMO), tandem Affinity Purification (TAP), trpE, ubiquitin, universal (Universal), glutathione-S-transferase (GST) and poly (His) tag. In some cases, the affinity tag is at least 5 histidine amino acids.
In some embodiments, the engineered reverse transcriptase may comprise an affinity tag at the N-terminus or C-terminus of the amino acid sequence. In some embodiments, the affinity tag is cleaved from the reverse transcriptase prior to use, or it may remain on the reverse transcriptase, wherein the contents do not significantly alter the activity of the reverse transcriptase. In some cases, affinity tags may include, but are not limited to, albumin Binding Proteins (ABP), AU1 epitopes, AU5 epitopes, T7 tags, V5 tags, B tags, chloramphenicol Acetyl Transferase (CAT), dihydrofolate reductase (DHFR), aviTag, calmodulin tags, polyglutamic acid tags, E tags, FLAG tags, HA tags, myc tags, NE tags, S tags, SBP tags, doftag 1, softag 3, spot tags, tetracysteine (TC) tags, ty tags, VSV tags, xpress tags, biotin Carboxyl Carrier Protein (BCCP), green fluorescent protein tags, haloTag, nus tags, thioredoxin tags, fc tags, cellulose binding domain, chitin Binding Protein (CBP), choline binding domain, galactose binding domain, maltose Binding Protein (MBP), horseradish peroxidase (HRP), strep tag, HSV epitope, ketosteroid isomerase (KSI), KT3 epitope, lacZ, luciferase, PDZ domain, PDZ ligand, polyarginine (Arg tag), polyaspartic acid (Asp tag), polycysteine (Cys tag), polyphenylalanine (Phe tag), proficiency eXact, protein C, S1 tag, S1 tag, staphylococcal protein a (protein a), staphylococcal protein G (protein G), small ubiquitin-like modifier (SUMO), tandem Affinity Purification (TAP), trpE, ubiquitin, universal (Universal), glutathione-S-transferase (GST) and poly (His) tag. In some cases, the affinity tag is at least 5 histidine amino acids (SEQ ID NO: 16).
In some embodiments, the tag further comprises an endoprotease cleavage site selected from ENLYFQ/G (SEQ ID NO: 40), DDDDK/(SEQ ID NO: 41), IEGR/(SEQ ID NO: 42), LVPR/GS (SEQ ID NO: 43) or LEVLFQ/GP (SEQ ID NO: 44).
The skilled person will appreciate that the polymerases of the present invention may additionally be modified without decreasing their biological activity. Modifications may be made to facilitate domain cloning, expression or integration into a fusion protein. Such modifications are well known to those skilled in the art and include, for example, addition of codons at either end of the polynucleotide encoding the binding domain to provide, for example, methionine added at the amino terminus to provide an initiation site, or additional amino acids placed at either end to create restriction sites or stop codons or purification sequences that facilitate positioning.
One or more domains may also be modified to facilitate ligation of the variant reverse transcriptase to another molecule to obtain a polynucleotide. Thus, engineered reverse transcriptases modified by such methods are also part of the invention. For example, codons for cysteine residues may be placed at either end of the reverse transcriptase, such that the reverse transcriptase may be linked by, for example, sulfide linkages. Modification may be performed using recombinant methods or Chemical methods (see, e.g., pierce Chemical co. Catalog No. Rockford IL).
2. Protease cleavage sequence
In some embodiments, the engineered reverse transcriptase or derivative thereof further comprises a protease cleavage sequence. In some embodiments, cleavage of the protease cleavage sequence by the protease results in cleavage of the affinity tag from the engineered reverse transcriptase or derivative thereof. In some cases, the protease cleavage sequence/site is recognized by the protease, the proteases include, but are not limited to, alanine carboxypeptidase, halimasch astaxanthin, bacterial leucyl aminopeptidase, carcinoprocoagulant substance, cathepsin B, clostripain, cytoplasmic alanyl aminopeptidase, elastase, endoprotease Arg-C, enterokinase (EnTK), gastric subunit protease, gelatinase, gly-X carboxypeptidase, glycyl endopeptidase, human rhinovirus 3C protease, dermatan C, iga specific serine endopeptidase, leucyl aminopeptidase, leucyl endopeptidase, lysC, lysosomal pro-X carboxypeptidase, lysyl aminopeptidase, methionyl aminopeptidase, mucin, phenylhydrazine lyase, pancreatic endopeptidase E, picornain 2A, picornain C, pre-endopeptidase, prolyl aminopeptidase, pre-protease I, pre-protease II, russellysin, yeast pepsin (sapropen), semenogelase, T-plasmin, thrombin (Thrombin), tissue factor Xa, prothrombin, protease Xa-A, prothrombin, xa-8-factor Xa, protease, xa-factor Xa, and human protease. In some embodiments, the protease cleavage sequence is a thrombin cleavage sequence.
In some embodiments, the engineered reverse transcriptase disclosed herein, or a derivative thereof, comprises the amino acid sequence of ENLYFQ/G (SEQ ID NO: 40), DDDDK/(SEQ ID NO: 41), IEGR/(SEQ ID NO: 42), LVPR/GS (SEQ ID NO: 43), or LEVLFQ/GP (SEQ ID NO: 44).
In some embodiments, the tag is cleaved or removed from the engineered reverse transcriptase or derivative thereof via a cleavage site. In one embodiment, endoprotease cleavage or removal of a tag selected from tobacco etch virus protease (Tev), enterokinase (EntK), factor Xa (Xa), thrombin (Thr), genetically engineered derivative of human rhinovirus 3C protease (PreScission), ulp1 catalytic nuclear (SUMO protease) is used. In one embodiment, the tag is cleaved at ENLYFQ/G (SEQ ID NO: 40) using tobacco etch virus protease (Tev). In another embodiment, enterokinase (EntK) is used to cleave the tag at DDDDK/(SEQ ID NO: 41). In another embodiment, factor Xa (Xa) is used to cleave the tag at IEGR/(SEQ ID NO: 42). In another embodiment, thrombin (Thr) is used to cleave the tag at LVPR/GS (SEQ ID NO: 43). In another embodiment, the tag is cleaved at LEVLFQ/GP (SEQ ID NO: 44) using a genetically engineered derivative of human rhinovirus 3C protease. In another embodiment, the tag is cleaved with a Ulp1 catalytic core (SUMO protease). The catalytic core of Ulp1 recognizes the SUMO tertiary structure and cleaves at the C-terminus of a Gly-Gly sequence conserved in SUMO.
In some embodiments, the engineered reverse transcriptase of the present disclosure further comprises a protease cleavage sequence, wherein cleavage of the protease cleavage sequence by the protease results in cleavage of the affinity tag from the engineered reverse transcriptase. In some of the cases where the number of the cases, protease cleavage sequences include, but are not limited to, alanine carboxypeptidase, armillariella mellea astaxanthin, bacterial leucinyl aminopeptidase, carcinoprocoagulant substance, cathepsin B, clostripain, cytoplasmic alanyl aminopeptidase, elastase, endoprotease Arg-C, enterokinase, gastric subproteinase, gelatinase, gly-X carboxypeptidase, glycyl endopeptidase, human rhinovirus 3C protease, dermatan C, iga specific serine endopeptidase, leucinyl aminopeptidase, leucinyl endopeptidase, lysC, lysosomal pro-X carboxypeptidase, lysyl aminopeptidase, methionyl aminopeptidase, myxobacteria, phenelzine lyase, pancreatic endopeptidase E, picornain 2A, picornain C, pre-endopeptidase, prolyl aminopeptidase, pre-protein converting enzyme I, pre-protein converting enzyme II, russellysin, yeast pepsin (saccharopepsin), semenogelase, T-plasminogen activator, thrombin, tissue kallikrein, tobacco Etch Virus (TEV), togavirin, tryptophanyl aminopeptidase, U-plasminogen activator, V8, venomin A, venomin AB and Xaa-pro aminopeptidase. In some cases, the protease cleavage sequence is a thrombin cleavage sequence.
B. Enhanced reverse transcriptase activity
The engineered reverse transcriptase of the present disclosure is a variant Moloney Murine Leukemia Virus (MMLV) reverse transcriptase having increased or enhanced reverse transcriptase activity. The term "increased" reverse transcriptase activity refers to the level of reverse transcriptase activity of a variant (e.g., a mutant reverse transcriptase (e.g., a MMLV variant as disclosed herein) compared to its wild type form (e.g., wt MMLV or MMLV having the amino acid of SEQ ID NO: 15) or a known variant (e.g., MMLV having the amino acid of SEQ ID NO: 1).
The reverse transcriptase of the present invention includes any reverse transcriptase having one or a combination of the properties described herein. Such characteristics include, but are not limited to, enhanced stability, enhanced thermostability, reduced or eliminated rnase H activity, reduced terminal deoxynucleotidyl transferase activity, increased accuracy, increased processivity, increased specificity, and/or increased fidelity.
The engineered reverse transcriptase may exhibit one or more reverse transcriptase-related activities including, but not limited to, RNA-dependent DNA polymerase activity, rnase H activity, DNA-dependent DNA polymerase activity, RNA binding activity, DNA binding activity, polymerase activity, primer extension activity, strand displacement activity, helicase activity, strand transfer activity, template binding activity, and transcriptional template switching activity. It has been recognized that any change in activity may increase, decrease or not affect different reverse transcriptase-related activities. It has also been recognized that a change in one activity may alter various properties of reverse transcriptase. It should be appreciated that when multiple characteristics are affected, the characteristics may be similarly or differently altered. It has also been recognized that methods for assessing reverse transcriptase-related activity are known in the art.
In some embodiments, the engineered reverse transcriptase has one or more of the following characteristics when compared to a wild type reverse transcriptase or a reverse transcriptase comprising the amino acid of SEQ ID NO: 1: the thermal stability is increased; an increase in thermal reactivity; increased resistance to reverse transcriptase inhibitors; the ability to reverse transcribe difficult templates increases; the speed increases; continuous synthesis capacity increases; increased specificity; enhancement of polymerization activity; or sensitivity increases.
In some embodiments, the increase in thermal reactivity, resistance to reverse transcriptase inhibitors, ability to reverse transcription difficulty templates, rate, sustained synthesis ability, specificity, or sensitivity of the engineered reverse transcriptase is about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, or about 100% as compared to a wild type reverse transcriptase or a reverse transcriptase comprising the amino acid of SEQ ID No. 1. In some embodiments, the polymerization activity of the engineered reverse transcriptase is enhanced by about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, or about 100% as compared to a wild type reverse transcriptase or a reverse transcriptase comprising the amino acid of SEQ ID No. 1.
In some embodiments, the engineered reverse transcriptase exhibits enhanced reverse transcriptase activity compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1 or 15. In some embodiments, the enhanced reverse transcriptase activity is selected from the group consisting of sustained synthesis capacity, template switching efficiency, binding affinity, and transcription efficiency. In some embodiments, the enhanced reverse transcriptase activity is enhanced Template Switching (TS) efficiency compared to the template switching efficiency of a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1 or 15.
In some embodiments, the enhanced reverse transcriptase-related activity is enhanced transcription efficiency compared to the transcription efficiency of a reverse transcriptase having the amino acid sequence shown in SEQ ID No. 1 or 15. In some embodiments, the enhanced reverse transcriptase activity is enhanced transcription efficiency and template conversion efficiency compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID No. 1 or 15.
In some embodiments, the enhanced reverse transcriptase activity is increased binding affinity compared to the binding affinity of a reverse transcriptase having the amino acid sequence shown in SEQ ID No. 1 or 15.
1. Thermal stability
As used herein, the term "thermostable" generally refers to an enzyme, such as a reverse transcriptase ("thermostable reverse transcriptase"), that retains a greater percentage or amount of activity after heat treatment than the same enzyme having wild type thermostability retains after the same treatment. Thus, a reverse transcriptase having increased/enhanced thermostability may be defined as a reverse transcriptase having any thermostability increase, preferably from about 1.2 to about 10,000 times, from about 1.5 to about 10,000 times, from about 2 to about 5,000 times, or from about 2 to about 2000 times, or any value in between these amounts, and after heat treatment activity remains sufficient to cause a decrease in activity of the reverse transcriptase that is wild type for thermostability. In other aspects of the disclosure, the increase in thermal stability may be greater than about 5-fold, greater than about 10-fold, greater than about 50-fold, greater than about 100-fold, greater than about 500-fold, or greater than about 1000-fold.
To determine the thermostability of the engineered reverse transcriptase of the present disclosure, the engineered reverse transcriptase can be compared to the corresponding wild type MMLV or variant thereof (e.g., SEQ ID NO: 1) to determine a relative increase or increase in thermostability. For example, after 5 minutes of heat treatment at 60 ℃, the engineered reverse transcriptase may retain about 90% of the activity present prior to heat treatment, while the wild type MMLV or MMLV variant (e.g., SEQ ID NO: 1) may retain 10% of its original activity. Likewise, after 15 minutes of heat treatment at 60 ℃, the engineered reverse transcriptase may retain about 80% of its original activity, while the wild type MMLV or MMLV variant may have no measurable activity. Similarly, after 15 minutes of heat treatment at 60 ℃, the engineered reverse transcriptase may retain about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90% or about 95% of its original activity, while the wild type MMLV or MMLV variant may have no measurable activity or may retain 20%, 15%, 10% or no of its original activity. In the first case (i.e., after 5 minutes of heat treatment at 60 ℃), the heat stability of the reverse transcriptase is 9 times (90% to 10%) that of the wild type reverse transcriptase. Examples of conditions that may be used to measure the thermostability of an enzyme, such as reverse transcriptase, are described in more detail below and in the examples.
The thermostability of a reverse transcriptase (e.g., an engineered reverse transcriptase as described herein) can be determined, for example, by comparing the residual activity of a reverse transcriptase that has been subjected to heat treatment (e.g., incubation at 60 ℃ for a given time, e.g., five minutes) to a control sample of the same reverse transcriptase that has been incubated at room temperature for the same length of time as the heat treatment. One way to measure residual activity is by incorporating radiolabeled deoxyribonucleotides into the oligodeoxyribonucleotide primer according to the use of a complementary oligoribonucleotide template. For example, reverse transcriptase can be determined using poly (ribose C) templates to bind [ alpha ] 32 P]The ability of dGTP to incorporate oligo-dG primers to determine the residual activity of reverse transcriptase. Methods for measuring the residual activity of reverse transcriptase and polymerase are known to those skilled in the art. See, for example, nikiforov, t.t., anal biochem, 2011,412 (2): 229-36, which is hereby incorporated by reference.
In some embodiments, the engineered reverse transcriptase of the present disclosure is thermophilic. In one embodiment, the engineered reverse transcriptase is resistant to heat inactivation as compared to the wild type polymerase. In another embodiment, the engineered reverse transcriptase is resistant to heat inactivation at the following temperatures: about 53 ℃ to about 75 ℃; about 55 ℃ to about 75 ℃; about 60 ℃ to about 75 ℃; about 53 ℃ to about 68 ℃; about 55 ℃ to about 68 ℃; about 45 ℃ to about 68 ℃; or about 50 ℃ to about 68 ℃. In yet another embodiment, the engineered reverse transcriptase is resistant to heat inactivation at a temperature of about 68 ℃.
In another embodiment, the thermostability of the engineered reverse transcriptase is determined by measuring the half life of the engineered reverse transcriptase. Such half-lives can be compared to a control or wild-type enzyme to determine differences (or delta) in half-life.
2. Half-life period
In some embodiments, the engineered reverse transcriptase has an enhanced half life at the following temperatures when compared to the wild type polymerase and/or wild type reverse transcriptase: about 53 ℃ to about 75 ℃; about 55 ℃ to about 75 ℃; about 60 ℃ to about 75 ℃; about 53 ℃ to about 68 ℃; about 55 ℃ to about 68 ℃; about 45 ℃ to about 68 ℃; or about 50 ℃ to about 68 ℃.
The half-life of the engineered reverse transcriptase of the present disclosure is preferably measured at high temperature (e.g., above 37 ℃) and preferably at a temperature in the range of 40 ℃ to 80 ℃, or at a temperature in the range of 45 ℃ to 75 ℃, 50 ℃ to 70 ℃, 55 ℃ to 65 ℃, and 58 ℃ to 62 ℃. The preferred half-life range of the engineered reverse transcriptase of the present disclosure can be from about 4 minutes to about 10 hours, from about 4 minutes to about 7.5 hours, from about 4 minutes to about 5 hours, from about 4 minutes to about 2.5 hours, or from about 4 minutes to about 2 hours, depending on the temperature used. For example, the number of the cells to be processed, the reverse transcriptase activity of the engineered reverse transcriptase of the present disclosure can have a half life of at least about 4 minutes, at least about 5 minutes, at least about 6 minutes, at least about 7 minutes, at least about 8 minutes, at least about 9 minutes, at least about 10 minutes, at least about 11 minutes, at least about 12 minutes, at least about 13 minutes, at least about 14 minutes, at least about 15 minutes, at least about 20 minutes, at least about 25 minutes, at least about 30 minutes, at least about 40 minutes, at least about 50 minutes, at least about 60 minutes, at least about 70 minutes, at least about 80 minutes, at least about 90 minutes, at least about 100 minutes, at least about 115 minutes, at least about 125 minutes, at least about 150 minutes, at least about 175 minutes, at least about 200 minutes, at least about 225 minutes, at least about 250 minutes, at least about 275 minutes, at least about 300 minutes, at least about 400 minutes, at least about 500 minutes, or any period of time between these values at a temperature of about 48 ℃, about 50 ℃, about 52 ℃, about 54 ℃, about 56 ℃, about 58 ℃, about 60 minutes, at least about 70 minutes, at least about 100 minutes, at least about 115 minutes, at least about 125 minutes. In some embodiments, the thermostability of the engineered reverse transcriptase enhances the half life of the engineered reverse transcriptase.
3. Continuous synthesis capability
In some embodiments, the engineered reverse transcriptase has one or more of the following characteristics when compared to the wild type polymerase and/or reverse transcriptase: the thermal stability is increased; an increase in thermal reactivity; increased resistance to reverse transcriptase inhibitors; the ability to reverse transcribe difficult templates increases; the speed increases; continuous synthesis capacity increases; increased specificity; enhancement of polymerization activity; sensitivity increases, or any combination of these.
Continuous synthesis capability is defined as the ability of a polymerase or reverse transcriptase to perform continuous nucleic acid synthesis on a template nucleic acid without frequent dissociation. Sustained synthesis capacity can be measured by the average number of nucleotides incorporated by the polymerase in a single association/dissociation event. DNA polymerase or reverse transcriptase alone produces a short DNA product strand for each binding event. Most DNA polymerases or reverse transcriptases are essentially enzymes with low processivity. The low sustained synthetic capacity of DNA polymerase or reverse transcriptase alone is not sufficient to replicate large genomes in time.
In some embodiments, the polymerization activity of an engineered reverse transcriptase as described herein is enhanced by about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, or about 100% as compared to a wild type reverse transcriptase.
In some embodiments, the engineered reverse transcriptase reverse transcribes an RNA molecule having at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900, or at least about 1000 nucleotides. In another embodiment, the engineered reverse transcriptase reverse transcribes an RNA molecule of at least about 1kb, at least about 2kb, at least about 3kb, at least about 4kb, at least about 5kb, at least about 6kb, at least about 7kb, at least about 8kb, at least about 9kb, at least about 10kb, at least about 11kb, at least about 12kb, at least about 13kb, at least about 14kb, or at least about 15 kb. In another embodiment, the engineered reverse transcriptase reverse transcribes an RNA molecule of at least about 7kb or at least about 8 kb.
In some embodiments, the increase in thermal reactivity, resistance to reverse transcriptase inhibitors, ability to reverse transcription difficulty templates, speed, sustained synthesis ability, specificity, or sensitivity of an engineered reverse transcriptase as described herein is about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, or about 100% compared to a wild type polymerase.
In some embodiments, the enhanced reverse transcriptase activity is increased binding affinity and template conversion efficiency compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1. In some embodiments, the enhanced reverse transcriptase activity is enhanced sustained synthesis compared to the sustained synthesis of a reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1.
4. Transcription efficiency
In some embodiments, the engineered reverse transcriptases disclosed herein exhibit enhanced transcription efficiency compared to the transcription efficiency of a reverse transcriptase having the amino acid sequence set forth in SEQ ID No. 1 or SEQ ID No. 15. As described herein, the conversion of mRNA into cDNA by reverse transcriptase mediated reverse transcription is an essential step in single cell profiling and gene expression analysis. However, for all reasons disclosed herein, the use of unmodified reverse transcriptase to catalyze reverse transcription is inefficient. The engineered reverse transcriptase of the present disclosure is preferably modified or mutated such that the transcriptional efficiency of the engineered enzyme is increased or enhanced.
Furthermore, the engineered reverse transcriptase variants described herein may also exhibit unexpectedly higher resistance to (i.e., less inhibition by) cell lysate than that exhibited by an enzyme having the amino acid sequence shown in SEQ ID NO. 1. Finally, the engineered reverse transcriptase variants of the present disclosure may have unexpectedly greater ability to capture full length transcripts (e.g., in a T cell receptor paired transcriptional profiling assay) than exhibited by an enzyme having the amino acid sequence set forth in SEQ ID No. 1.
It is recognized that mutation of one or more residues may alter the first reverse transcriptase activity differently than the second reverse transcriptase activity. In addition, it is recognized that different combinations of mutations, such as different sites or residue changes, may similarly or differently alter reverse transcriptase activity. Variants that can undergo template switching in the 5' assay share the following changes relative to SEQ ID NO: 15: E69K, E302R, T306K, W313F, K435G and N454K. These variants may also comprise additional alterations that may affect one or more reverse transcriptase-related activities. M39V and M66L may improve template switching relative to SEQ ID NO. 15. Without being limited by mechanisms, variants comprising the M39V or M66L mutation that do not exhibit altered performance in a 5' gem single cell assay may exhibit altered persistence, altered k d Or both. The K435 mutant may improve the thermostability in the presence of the primer template relative to SEQ ID NO. 15. In the absence of a primer template, the K435 variant may exhibit thermal denaturation characteristics similar to wild-type protein. With respect to SEQ ID NO. 15, K435, P448 and D449 are residues in the linking domain; it was found that altering these residues may result in increased conformational flexibility. In addition, the linking domain is thought to affect the conformational flexibility of the rnase H domain. With respect to SEQ ID NO. 15, H503 and H634 occur within the RNase H domain. The H503V and H634Y variants may affect primer-template contact, sustained synthesis capacity, or both primer-template contact and sustained synthesis capacity.
The combination of variants comprising T542D, D583N, E607G, A644V, D653H and K658R and the combination of variants comprising E545G, D583N, H594Q and L603F may exhibit altered rnase H activity.
The engineered reverse transcriptase variants of the present disclosure unexpectedly provide altered reverse transcriptase activity such as, but not limited to, improved thermostability, sustained reverse transcription, non-template base addition, binding affinity, and template switching capacity.
5. Template switching oligonucleotides
The transcription efficiency of reverse transcriptase can be calculated as the ratio of the sum of the areas under the curve of the extended and tailed (2), incomplete Template Switch (TSO) (3) and complete Template Switch (TSO) (4) regions to the total area under the curve of all products (FIG. 5). Transcription efficiency reflects all those products that successfully completed transcription. Template switch oligonucleotide efficiency can be calculated as the ratio of the area under the curve of the full template switch region (4) to the area under the total curve of all products, including extension and tailing (2), incomplete TSO (3) and full TSO (4) (fig. 5). The engineered reverse transcriptase may have increased transcription efficiency, increased TSO efficiency, or both increased transcription efficiency and increased TSO efficiency.
For both transcription efficiency and template conversion efficiency, lengths of less than 45 nucleotides are considered incomplete (1). The length including full length and full length plus tail is considered to be the extension and tailing stage (2). The length longer than full length plus tail and shorter than full length plus tail and template switch is considered to be an incomplete template switch product (incomplete TSO, 3). The length with full length plus tail and template switch size is considered the template switch (TSO, 4).
Template switching oligonucleotides (also referred to herein as "switching oligonucleotides" or "switching oligonucleotides") may be used for template switching. In some cases, template switching may be used to increase the length of the cDNA. In some cases, template switching may be used to supplement a predefined nucleic acid sequence to the cDNA. In the example of template switching, the cDNA may be produced by reverse transcription of a template (e.g., cellular mRNA), where a reverse transcriptase having terminal transferase activity may add additional nucleotides, such as poly-C, to the cDNA in a template-independent manner. The transition oligonucleotide may comprise a sequence complementary to an additional nucleotide, such as poly-G. An additional nucleotide on the cDNA (e.g., polyC) may hybridize to an additional nucleotide on the switch oligonucleotide (e.g., polyG), whereby the reverse transcriptase may use the switch oligonucleotide as a template to further extend the cDNA. The template switching oligonucleotide may comprise a hybridization region and a template region. The hybridization region may comprise any sequence capable of hybridizing to a target. In some cases, as previously described, the hybridization region comprises a series of G bases to complement the overhanging C base at the 3' end of the cDNA molecule. The series of G bases can include 1G base, 2G bases, 3G bases, 4G bases, 5G bases, or more than 5G bases. The template sequence may comprise any sequence to be incorporated into the cDNA. In some cases, the template region comprises at least 1 (e.g., at least 2, 3, 4, 5, or more) tag sequences and/or functional sequences. The transition oligonucleotide may comprise deoxyribonucleic acid; ribonucleic acid; modified nucleic acids, including 2-aminopurine, 2, 6-diaminopurine (2-amino-dA), inverted dT, 5-methyl dC, 2' -deoxyinosine, super T (5-hydroxybutyrine-2 ' -deoxyuridine), super G (8-aza-7-deazaguanosine), locked Nucleic Acids (LNA), unlocked nucleic acids (UNA, e.g., UNA-A, UNA-U, UNA-C, UNA-G), iso-dG, iso-dC, 2' fluoro bases (e.g., fluoro C, fluoro U, fluoro A, and fluoro G), or any combination. Suitable lengths for transition oligonucleotides are known in the art. See, for example, U.S. patent application Ser. No. 15/975516, which is incorporated by reference herein in its entirety.
An overview of template transformations can be seen in fig. 2. The primer can be hybridized to an RNA template, wherein the primer is extended by reverse transcription using a reverse transcriptase, thereby producing a first strand cDNA molecule. The poly-C sequence may be added to the cDNA by a terminal transferase. A template switching oligonucleotide comprising a poly-G sequence complementary to a poly-C sequence added to the first strand cDNA is added to the reaction, the poly-G-TSO oligonucleotide hybridizes to the poly-C via complementarity, and a reverse transcriptase can further extend using the TSO sequence as a template. In an embodiment, the experiment for determining template conversion efficiency is measured on a capillary electrophoresis system such as a SeqStudio CE analyzer (thermo fisher). The results of a CE assay using a fluorescently labeled polynucleotide are illustrated in fig. 3. On the Y axis is fluorescence, on the X axis is nucleotide length, 5nt of FAM-labeled primer is shown, 45nt of FAM-labeled first strand cDNA product is shown, and about 75nt of TSO-extended first strand cDNA is illustrated. FIG. 3 illustrates this workflow, showing experimental results using RT enzymes (enzyme C, or SEQ ID NO: 1) known to have the ability to extend poly G-TSO compared to RT that is not expected to extend poly G-TSO (AR). On the top capillary electrophoresis plot, the full-length cDNA product and the full-length cDNA product with TSO tail (tailed) differ by only about 1nt; no poly G TSO extension was produced. In contrast, using enzyme cocktail C comprising SEQ ID NO:1RT, full-length cDNA products and full-length cDNA products with efficient poly G-TSO extension were produced.
The engineered reverse transcriptase of the present application can exhibit altered base-biased template switching activity, such as increased base-biased template switching activity, decreased base-biased template switching activity, or altered base-bias of template switching activity. Engineered reverse transcriptase variants may exhibit enhanced template switching with a 5' -G cap on the substrate.
Rnase H activity
In some embodiments, the engineered reverse transcriptase described herein is engineered to have reduced and/or eliminated rnase activity. In some embodiments, the engineered reverse transcriptase engineered to have reduced and/or eliminated rnase H activity comprises a mutation similar to the MMLV reverse transcriptase SEQ ID No. 1D561 mutation (SEQ ID No. 15D 583).
RNase H activity refers to the endoribonuclease degradation of RNA of a DNA-RNA hybrid, resulting in a 5' phosphate-capped oligonucleotide of 2-9 bases in length. Rnase H activity does not involve degradation of single stranded nucleic acids, duplex DNA or double stranded RNA. Removal of the RNase H activity of reverse transcriptase eliminates the RNA degradation problem of the RNA template and improves the efficiency of reverse transcription.
In some embodiments, the reverse transcriptase of the present disclosure can have reduced or substantially reduced rnase H activity. The reduction or substantial reduction or complete removal of the RNase H activity of a reverse transcriptase (e.g., MMLV) can prevent degradation of the RNA template before the RT reaction begins, thereby improving the efficiency of reverse transcription. See, e.g., gerard et al, FOCUS11 (4): 60 (1989); gerard et al, FOCUS14 (3): 91 (1992).
In some embodiments, the reverse transcriptase of the present disclosure substantially lacks rnase H activity. In this embodiment, the reverse transcriptase of the present disclosure has less than 10%, 5%, 1%, 0.5% or 0.1% of the rnase H activity of the wild type enzyme or of the variant having the amino acid of SEQ ID No. 1. In some embodiments, the reverse transcriptase of the present disclosure lacks rnase H activity. In this embodiment, the reverse transcriptase of the present disclosure has undetectable rnase H activity, or has less than about 1%, 0.5% or 0.1% rnase H activity of the wild type enzyme or a variant comprising the amino acid of SEQ ID No. 1.
As used herein, the term "reduced rnase H activity" means that the enzyme has less than 50%, e.g. less than 40%, 30% or less than 25%, 20%, more preferably less than 15%, less than 10% or less than 7.5% and most preferably less than 5% or less than 2% of the rnase H activity of the corresponding wild-type enzyme or variant comprising the amino acid of SEQ ID No. 1. The rnase H activity of the enzyme can be determined by a variety of assays, such as in, for example, us patent No. 5,405,776;6,063,608;5,244,797; and 5,668,005, in Kotewicz, M.L. et al, nucleic acids Res.16:265 (1988) and Gerard, G.F. et al, FOCUS14 (5): 91 (1992), the disclosures of all of which are incorporated herein by reference in their entirety.
III nucleic acids and expression vectors
One aspect of the present disclosure provides an isolated nucleic acid molecule encoding an engineered reverse transcriptase or derivative thereof as described herein. In some embodiments, the engineered reverse transcriptase is encoded by a nucleic acid as set forth herein, or is readily deduced from the polypeptide information provided herein (e.g., SEQ ID NOS: 1-15 and 22-37) and is known in the art. The engineered reverse transcriptase need not be encoded by any particular nucleic acid exemplified herein. For example, redundancy in the genetic code allows for changes in the nucleotide codon sequences encoding the same amino acid. Thus, the engineered polymerases of the present disclosure can be produced from nucleic acid sequences other than those shown herein, e.g., codon optimized for a particular expression system. Codon optimisation may be performed, for example as shown in Athey et al BMC Bioinformatics,18:391-401 (2017).
Wild-type polymerase nucleic acid can be isolated from natural sources and used as a starting material for the production of novel polymerases. Generally, the terms and laboratory procedures in recombinant DNA technology described below are well known and commonly used in the art. Standard techniques for cloning, DNA and RNA isolation, amplification and purification are known. Enzymatic reactions, which typically involve DNA ligases, DNA polymerases, restriction endonucleases, and the like, are performed according to manufacturer's instructions. These and various other techniques are generally performed according to Sambrook and Russell, molecular Cloning-A Laboratory Manual, cold Spring Harbor Laboratory, cold Spring Harbor, N.Y. (1989) or Ausubel et al, current Protocols in Molecular Biology, volumes 1-3, john Wiley & Sons, inc. (1994-1998).
Isolation of polymerase nucleic acid can be accomplished by a variety of techniques. The polymerase nucleic acids of the invention can be produced from wild-type sequences. The wild-type sequence is altered to produce a modified sequence. The wild-type polymerase may be modified using methods well known in the art to produce the polymerases claimed in this application. Exemplary modification methods are site-directed mutagenesis, point mismatch repair or oligonucleotide-directed mutagenesis.
Another aspect of the disclosure provides an expression vector comprising an isolated nucleic acid encoding an engineered reverse transcriptase or derivative thereof as described herein. "vector" refers to a polynucleotide that is capable of replication in a host organism when it is independent of the host chromosome. Preferred vectors include plasmids and typically have an origin of replication. Vectors may include, for example, transcription and translation terminators, transcription and translation initiation sequences, and promoters useful for regulating expression of particular nucleic acids. The polymerases of the present disclosure can be expressed in a variety of host cells, including E.coli, other bacterial hosts, yeast, filamentous fungi, and various higher eukaryotic cells, such as COS, CHO, and HeLa cell lines, and myeloma cell lines. Techniques for gene expression in microorganisms are described, for example, in Smith, gene Expression in Recombinant Microorganisms (Bioprocess Technology, vol. 22) Marcel Dekker, 1994. Examples of bacteria that may be used for expression include, but are not limited to, escherichia (Escherichia), enterobacter (Enterobacter), azotobacter (Azotobacter), erwinia (Erwinia), bacillus (Bacillus), pseudomonas (Pseudomonas), klebsiella (Klebsiella), proteus (Proteus), salmonella (Salmonella), serratia (Serratia), shigella (Shigella), rhizobium (Rhizobia), vitreoscilla (Vitreoscilla) and Paracoccus (Paracoccus). Filamentous fungi that can be used as expression hosts include, for example, the following genera: aspergillus (Aspergillus), trichoderma (Trichoderma), neurospora (Neurospora), penicillium (Penicillium), cephalosporium (Cephalosporium), mirabilis (Achlya), podospora (Podospora), mucor (Mucor), xylosporium (Cochliobius) and Pyricularia (Pyricularia). See, e.g., U.S. Pat. No. 5,679,543 to Stahl and Tudzynski, editions, molecular Biology in Filamentous Fungi, john Wiley & Sons,1992. The synthesis of heterologous proteins in yeast is well known and described in the literature. Methods in Yeast Genetics Sherman F. Et al Cold Spring Harbor Laboratory (1982) is a well-known work describing various methods that can be used to produce enzymes in yeast. There are many expression systems known to those of ordinary skill in the art for producing the polymerase polypeptides of the present invention. See Gene Expression Systems, fernandex and Hoeffler edit Academic Press,1999; sambrook and Russell, supra; and Ausubel et al, current Protocols in Molecular Biology, volumes 1-3, john Wiley & Sons, inc. (1994-1998).
Another aspect of the present disclosure provides a host cell transfected with an expression vector comprising an isolated nucleic acid encoding an engineered reverse transcriptase as described herein. Eukaryotic expression systems for mammalian cells, yeast and insect cells are well known in the art and are also commercially available. In yeast, vectors include yeast integrating plasmids (e.g., YIp 5) and yeast replicating plasmids (YRp series plasmids) and pGPD-2. Expression vectors containing regulatory elements from eukaryotic viruses are commonly used in eukaryotic expression vectors, such as SV40 vectors, papilloma virus vectors and epstein-barr virus-derived vectors. Other exemplary eukaryotic vectors include pMSG, pav009/a+, pMTO10/a+, pmarneo-5, baculovirus pDSVE, and any other vector that allows expression of a protein under the direction of a CMV promoter, SV40 early promoter, SV40 late promoter, metallothionein promoter, murine mammary tumor virus promoter, rous sarcoma virus promoter, polyhedrin promoter, or other promoters that prove effective for expression in eukaryotic cells.
Once expressed, the engineered reverse transcriptase or its derivatives can be purified according to standard procedures in the art, including ammonium sulfate precipitation, affinity purification columns, column chromatography, gel electrophoresis, and the like (see, generally, R.scope, protein Purification, springer-Verlag, N.Y. (1982), deutscher, methods in Enzymology, vol.182: guide to Protein purification, academic Press, inc. N.Y. (1990)). Substantially pure compositions of at least about 90% to about 95% homogeneity are preferred, and about 98% to about 99% or greater homogeneity is most preferred. Once purified, partially purified or purified to the desired homogeneity, the polypeptide can be used (e.g., as an immunogen for antibody production).
To facilitate purification of the engineered reverse transcriptase or derivative thereof, the nucleic acid encoding the engineered reverse transcriptase or derivative thereof may also include coding sequences for epitopes or "tags" that can use affinity binding reagents. Examples of suitable epitopes include myc and V-5 reporter genes; expression vectors useful for recombinant production of fusion polypeptides having these epitopes are commercially available (e.g., invitrogen (Carlsbad Calif.) vectors pcDNA3.1/Myc-His and pcDNA3.1/V5-His are suitable for expression in mammalian cells). As described herein, other expression vectors suitable for attaching the tag to the fusion protein of the invention, and corresponding detection systems, are known to those of skill in the art, and there are several examples of suitable tags that are commercially available (e.g., FLAG' (Kodak, rochester N.Y.) A polyhistidine sequence that is capable of binding to a metal chelating affinity ligand.
Those skilled in the art will recognize that, upon biological expression or purification, the engineered reverse transcriptase or derivative thereof may have a conformation that is substantially different from the native conformation of the constituent polypeptides. In such cases, it may be necessary or desirable to denature and reduce the engineered reverse transcriptase or its derivative and refold the engineered reverse transcriptase or its derivative into a preferred conformation. Methods for reducing and denaturing proteins and inducing refolding are well known to those skilled in the art (see Debinski et al (1993) J. Biol. Chem.,268:14065-14070; kreitman and Pastan (1993) bioconjug. Chem.,4:581-585; and Buchner et al (1992) Anal. Biochem., 205:263-270). For example, debinski et al describe denaturation and reduction of inclusion body proteins in guanidine-DTE. The protein is then refolded in a redox buffer containing oxidized glutathione and L-arginine.
IV composition and reaction mixture
The present disclosure also provides compositions comprising various components in various combinations required for nucleic acid amplification. In some embodiments of the present disclosure, the compositions are formulated by mixing one or more of the engineered reverse transcriptase of the present disclosure, or a derivative thereof, in a buffered saline solution. One or more DNA polymerases and/or one or more nucleotides and/or one or more primers can optionally be added to produce the compositions of the present invention. These compositions can be used in the methods disclosed herein to generate, analyze, quantify, and otherwise manipulate nucleic acid molecules (e.g., using reverse transcription or one-step RT-PCR procedures).
In some embodiments, the engineered reverse transcriptase disclosed herein is provided at a working concentration (e.g., 1×) in a stable buffered saline solution. The terms "stable" and "stability" as used herein generally mean that the composition (such as an enzyme composition) retains at least 70%, preferably at least 80%, and most preferably at least 90% of the original enzyme activity (in units) after the enzyme or enzyme-containing composition has been stored at a temperature of about 4 ℃ for about one week, at a temperature of about-20 ℃ for about 2 to 6 months, and at a temperature of about-80 ℃ for about 6 months or more. As used herein, the term "working concentration" means the concentration of enzyme at or near the optimal concentration in solution for performing a particular function, such as reverse transcription of a nucleic acid.
Such compositions may also be formulated as concentrated stock solutions (e.g., 2×, 3×, 4×,5×, 6×, 10×, etc.). In some embodiments, the composition as a concentrated (e.g., 5×) stock solution allows for the addition of a greater amount of nucleic acid sample (such as when the composition is used for nucleic acid synthesis). The water used to form the compositions of the present invention is preferably distilled, deionized and sterile filtered (through a 0.1-0.2 micron filter) and is not contaminated with dnase and rnase. Such water is commercially available, for example from Life Technologies (Carlsbad, calif.), or can be prepared according to methods well known to those skilled in the art, as desired.
V. methods of using engineered reverse transcriptase
A. Amplification method
One aspect of the present disclosure provides a method of using an engineered reverse transcriptase described herein, the method comprising contacting the engineered reverse transcriptase with a nucleic acid template under suitable conditions to produce a polymerized nucleic acid product. In some embodiments, the nucleic acid template is RNA, or a nucleic acid comprising non-natural nucleotides. The engineered reverse transcriptase of the present disclosure can be used in any application requiring a reverse transcriptase with altered activity as indicated. Methods of using reverse transcriptase are known in the art; one of skill in the art can select any of the engineered reverse transcriptase disclosed herein.
An engineered reverse transcriptase or derivative thereof as described herein can be used to prepare nucleic acid molecules from one or more templates. Such methods can include mixing one or more nucleic acid templates (e.g., RNAs, such as non-coding RNAs (ncrnas), messenger RNAs (mrnas), micrornas (mirnas), and small interfering RNA (siRNA) molecules) with one or more reverse transcriptases of the disclosure and incubating the mixture under conditions sufficient to produce one or more nucleic acid molecules complementary to all or a portion of the one or more nucleic acid templates. Other methods of cDNA synthesis that may be advantageously used with the present disclosure will be apparent to those of ordinary skill in the art.
In some embodiments, methods of using an engineered reverse transcriptase or derivative thereof as described herein include amplifying one or more nucleic acid molecules, including mixing one or more nucleic acid templates with one of the engineered reverse transcriptase or derivatives thereof of the present disclosure, and incubating the mixture under conditions sufficient to amplify one or more nucleic acid molecules complementary to all or a portion of the one or more nucleic acid templates. In one embodiment, the method may further comprise the use of one or more DNA polymerases, and may be used in standard reverse transcription-polymerase chain reaction (RT-PCR).
In some embodiments, the method of using an engineered reverse transcriptase or derivative thereof as described herein can be a one-step (e.g., one-step RT-PCR) or a two-step (e.g., two-step RT-PCR) reaction. In one embodiment, the one-step RT-PCR type reaction may be accomplished in a single tube, thereby reducing the likelihood of contamination. Such one-step reactions include (a) mixing a nucleic acid template (e.g., mRNA) with one or more engineered reverse transcriptase or derivatives thereof of the present disclosure and one or more polymerase, and (b) incubating the mixture under conditions sufficient to amplify nucleic acid molecules complementary to all or a portion of the template.
In another embodiment, the two-step RT-PCR reaction may be accomplished in two separate steps. Such methods comprise (a) mixing a nucleic acid template (e.g., mRNA) with an engineered reverse transcriptase of the present disclosure or a derivative thereof, (b) incubating the mixture under conditions sufficient for a nucleic acid molecule (e.g., DNA molecule) that is complementary to all or part of the template, (c) mixing the nucleic acid molecule with one or more DNA polymerases, and (d) incubating the mixture of step (c) under conditions sufficient to amplify the nucleic acid molecule. For the amplification of long nucleic acid molecules (i.e., greater than about 3-5kb in length), a combination of a DNA polymerase and an engineered reverse transcriptase of the present disclosure, or a derivative thereof, may be used.
Amplification methods (using one or more engineered reverse transcriptase or derivatives thereof of the present disclosure) that may be used according to the present invention include PCR, isothermal amplification, strand Displacement Amplification (SDA), and Nucleic Acid Sequence Based Amplification (NASBA); and more complex PCR-based nucleic acid fingerprinting techniques such as Random Amplified Polymorphic DNA (RAPD) analysis, arbitrary primer PCR (AP-PCR) DNA Amplification Fingerprinting (DAF); microsatellite PCR; directional amplification of small satellite region DNA (DAVID); digital droplet PCT (ddPCR) and Amplified Fragment Length Polymorphism (AFLP) analysis. See, e.g., EP 0 534858; vos, P.et al Nucl. Acids Res.23 (21): 4407-4414 (1995); lin, J.J., kuo, J.FOCUS17 (2): 66-70 (1995); U.S. Pat. nos. 4,683,195 and 4,683,202; PCT publication No. WO 2006/081222; U.S. patent No. 5,455,166; EP 0 684 315 us patent No. 5,409,818; EP 0 329 822; williams, J.G.K. et al, nucleic acids Res.18 (22): 6531-6535, (1990); welsh, J. And McClelland, M., nucl. Acids Res.18 (24): 7213-7218 (1990); caetano-Anolles et al, bio/Technology 9:553-557 (1991); heath, D.D. et al, nucleic acids Res.21 (24): 5782-5785 (1993). Nucleic acid sequencing techniques that can employ the compositions of the present invention include dideoxy sequencing methods, such as those disclosed in U.S. patent nos. 4,962,022 and 5,498,523. In some embodiments, the engineered reverse transcriptase disclosed herein can be used in a method of amplifying or sequencing a nucleic acid molecule comprising one or more Polymerase Chain Reactions (PCR), such as any of the PCR-based methods described above.
Methods of producing the engineered reverse transcriptase of the present disclosure, or derivatives thereof, are known to those of skill in the art of molecular biology or molecular genetics. For example, nucleic acids encoding wild-type polymerase or nucleic acid binding domains can be produced using conventional techniques in the field of recombinant genetics. Basic teaching materials disclosing general methods of use of the invention include Sambrook and Russell, molecular Cloning, ALaboratory Manual (3 rd edition 2001); kriegler, gene Transfer and Expression: A Laboratory Manual (1990); current Protocols in Molecular Biology (Ausubel et al, 1994-1999); berger, sambrook and Ausubel, mullis et al (1987) U.S. Pat. nos. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al) Academic Press Inc. san Diego, calif. (1990) (Innis); arnheim and Levinson (10 month 1 1990) C & EN 36-47; the Journal Of NIH Research (1991) 3:81-94; (Kwoh et al (1989) Proc. Natl. Acad. Sci. USA 86:1173; guatelli et al (1990) Proc. Natl. Acad. Sci. USA 87,1874; lomell et al (1989) J. Clin. Chem.,35:1826; landegren et al (1988) Science 241:1077-1080; van Brunt (1990) Biotechnology 8:291-294; wu and Wallace (1989) Gene 4:560; barringer et al (1990) Gene 89:117).
B. Nucleic acid sample processing
One aspect of the present disclosure provides a method of nucleic acid extension, the method comprising: contacting the target nucleic acid molecule with an engineered reverse transcriptase and a plurality of barcoded nucleic acid molecules comprising a barcode sequence, and incubating the target nucleic acid, the engineered reverse transcriptase, and the barcoded molecules under conditions in which the barcoded molecules are extended by the engineered reverse transcriptase. In some embodiments, the engineered reverse transcriptase comprises the amino acid sequences of the engineered reverse transcriptase or derivatives thereof described herein. The target nucleic acid hybridizes to one of the plurality of barcoded molecules, and the hybridized barcoded molecule is extended by the engineered reverse transcriptase described herein.
RNA template
In some embodiments, the nucleic acid is a ribonucleic acid (RNA) molecule; and reverse transcribing the RNA molecule with the engineered reverse transcriptase to produce first strand cDNA. In some embodiments, the reverse transcription reaction introduces a barcode. For example, in some embodiments, the barcode is introduced during a reverse transcription amplification reaction that produces complementary deoxyribonucleic acid (cDNA) molecules upon reverse transcription of ribonucleic acid (RNA) molecules of the cells. In some embodiments, the RNA molecule is released from the cell. In some embodiments, the RNA molecule is released from the cell by permeabilizing the cell or lysing the cell. In some embodiments, the RNA molecule is messenger RNA (mRNA).
In some embodiments, the reverse transcription reaction of the engineered reverse transcriptase of the present disclosure is initiated upon hybridization of the capture sequence to the RNA molecule, and the capture probe is extended by the engineered reverse transcriptase of the present disclosure in a template-directed manner using the hybridized mRNA as a template. In some embodiments, the reverse transcription reaction produces single stranded cDNA molecules, each having a molecular tag and a barcode associated with the cDNA, and the cDNA is subsequently amplified to produce double stranded cDNA comprising the sequence of the barcoded molecule.
In some embodiments, the plurality of barcoded nucleic acid molecules comprises an oligo (dT) sequence. In this embodiment, the engineered reverse transcriptase uses mRNA hybridized to the oligo (dT) sequence of the barcoded nucleic acid molecule as a template, reverse transcribes the mRNA molecule into a complementary DNA molecule, and the nucleic acid binding domain binds and stabilizes the mRNA-oligo (dT) hybrid during reverse transcription. Following reverse transcription, the engineered reverse transcriptase as described herein further amplifies complementary DNA molecules comprising a barcode sequence, thereby producing amplified DNA products comprising the barcode sequence, molecular tag sequence, or the complement thereof.
In some embodiments of the nucleic acid extension methods described herein, the method further comprises a second nucleic acid molecule comprising an oligo (dT) sequence. In this embodiment, the plurality of barcoded nucleic acid molecules further comprises an oligo (dT) sequence; and the nucleic acid binding domain of the engineered reverse transcriptase binds and stabilizes the mRNA-oligo (dT) hybrid, while the polymerase domain of the engineered reverse transcriptase reverse transcribes the mRNA molecule using a second nucleic acid molecule comprising an oligo (dT) sequence, thereby producing a complementary DNA molecule. In this embodiment, the engineered reverse transcriptase further amplifies complementary DNA molecules, thereby producing an amplified DNA product comprising a barcode sequence.
In some embodiments, the nucleic acid extension method further comprises a cell, cell population, or tissue, and the template nucleic acid molecule is from the cell, cell population, or tissue.
In some embodiments, the barcode is coupled to a primer sequence, and the barcoding reaction is initiated by hybridization of the primer sequence to the RNA molecule. In some embodiments, each primer sequence comprises a random N-mer sequence. In some embodiments, the random N-mer sequence is complementary to a 3' sequence of a ribonucleic acid molecule in the cell. In some embodiments, the random N-mer sequence of the primer sequence comprises a poly dT sequence of at least 5 bases in length. In some embodiments, the random N-mer sequence comprises a poly dT sequence (SEQ ID NO: 17) that is at least 10 bases in length. In some embodiments, the barcode is introduced by extending the primer sequence in a template-directed manner using reagents for reverse transcription. In some embodiments, the molecular tag comprising the barcode plus the additional functional sequence or comprising only the additional functional sequence is also contained in a cDNA molecule generated during a reverse transcription reaction. In some embodiments, the reagents for reverse transcription include reverse transcriptase, a buffer, and a nucleotide mixture. In some embodiments, the reverse transcriptase adds a plurality of non-template oligonucleotides when reverse transcribing ribonucleic acid molecules from the nucleic acid molecules. In some embodiments, the reverse transcriptase is an engineered reverse transcriptase as disclosed herein.
In some embodiments, the barcoding reaction produces single-stranded complementary deoxyribonucleic acid (cDNA) molecules, each molecule having a barcode at its 5' end, followed by amplification of the cDNA to produce double-stranded cDNA having a barcode at the 5' end and a molecular tag at the 3' end of the double-stranded cDNA that may or may not contain a barcode.
In one aspect, the invention provides methods for nucleic acid sample processing using the engineered reverse transcriptases described herein. In one embodiment, the method comprises contacting a template ribonucleic acid (RNA) molecule with an engineered reverse transcriptase to reverse transcribe the RNA molecule into a complementary DNA (cDNA) molecule. The contacting step may be performed in the presence of a plurality of nucleic acid barcode molecules, wherein each nucleic acid barcode molecule comprises a barcode sequence. The nucleic acid barcode molecule may also comprise a sequence configured to be coupled to a template RNA molecule. Suitable sequences include, but are not limited to, oligo (dT) sequences, random N-mer primers, or target specific primers. The nucleic acid barcode molecule may also comprise a template switching sequence. In other embodiments, the RNA molecule is a messenger RNA (mRNA) molecule. In one embodiment, the contacting step provides conditions suitable to allow an engineered reverse transcriptase to (i) transcribe an mRNA molecule into a cDNA molecule having an oligo (dT) sequence and/or (ii) perform a template switching reaction, thereby producing a cDNA molecule comprising a barcode sequence or derivative thereof. In another embodiment, the contacting step can occur in (i) a partition having a reaction volume (as further described herein and see, e.g., U.S. patent nos. 10400280 and 10323278, each of which is incorporated herein by reference in its entirety), (ii) a bulk reaction of the reaction components (e.g., template RNA and engineered reverse transcriptase) in solution, or (iii) on a nucleic acid array (see, e.g., U.S. patent nos. 10480022 and 10030261 and WO/2020/047005 and WO/2020/047010, each of which is incorporated herein by reference in its entirety). In addition, reverse transcription reactions can occur in tissue (in situ reverse transcription), on templates associated with sequences on a substrate, such as practiced in spatial transcriptome, or further in RT-PCR or other in vitro reverse transcription reactions of purified targets, partially purified targets, or unpurified targets found in, for example, cell lysates.
Examples of assays involving nucleic acid sample processing may include, but are not limited to, single cell transcriptional profiling, single cell sequencing, single T cell and B cell immunoprofiling, single cell chromatin accessibility analysis (e.g., ATAC sequencing), single cell processing and analysis, paired single cell TCR sequencing, paired tcra and tcrp. These exemplary assays can be performed using commercially available systems for encapsulating biological samples, gel beads, barcodes, and/or other compounds/materials in droplets, such as the chromasum system (10X Genomics,Pleasanton CA USA). The engineered reverse transcriptase may be used in methods of assaying T Cell Receptors (TCRs), such as described in U.S. provisional application No. 62/902,178, which is incorporated herein by reference in its entirety.
In various embodiments, the poly-dT sequence may be extended in a reverse transcription reaction using mRNA as a template to produce cDNA transcripts complementary to the mRNA and also including the sequence of a barcode oligonucleotide. The terminal transferase activity of reverse transcriptase can add additional bases (e.g., poly-C) to the cDNA transcript. The switch oligonucleotide can then hybridize to additional bases added to the cDNA transcript and facilitate template switching. Thereafter, a sequence complementary to the switching oligonucleotide sequence may be incorporated into the cDNA transcript by extension of the cDNA transcript using the switching oligonucleotide as a template. Within any given partition, all cDNA transcripts of individual mRNA molecules contain a common barcode sequence. However, by including unique random N-mer sequences, transcripts made from different mRNA molecules within a given partition will differ in that unique sequence. As described elsewhere herein, this provides a quantitative feature that is identifiable even after any subsequent amplification of the contents of a given partition, e.g., the number of unique segments associated with a common barcode can be indicative of the amount of mRNA derived from a single partition (and thus from a single cell). The cDNA transcripts may then be amplified using PCR primers. The amplified product may then be purified (e.g., via Solid Phase Reversible Immobilization (SPRI)). The amplified product may be ligated to additional functional sequences and further amplified (e.g., via PCR). Functional sequences may include sequencer specific flow cell attachment sequences such as, but not limited to, P7 sequences for Illumina sequencing systems, and functional sequences that may include sequencing primer binding sites, e.g., R2 primers for Illumina sequencing systems, and may include sample indices, e.g., functional sequences of i7 sample index sequences for Illumina sequencing systems.
Although described in terms of specific sequence references for certain sequencing systems (e.g., illumina systems), it should be understood that references to these sequences are for illustration purposes only and that the methods described herein may be configured for other sequencing systems incorporating specific priming sequences, attachment sequences, indexing sequences, or other operational sequences used in those systems, such as those available from Ion Torrent, oxford Nanopore, genia, pacific Biosciences, complete Genomics, and the like.
2. Volume of
As described herein, wild-type and variant MMLV RT are not optimal for reverse transcription of mRNA when assayed using high throughput amplification reactions (e.g., spatial array and single cell transcriptome assays) and the like. This is because high throughput amplification reaction assays require reaction volumes typically less than about 1 nanoliter. Thus, the present disclosure provides novel engineered reverse transcriptases that function effectively in high throughput amplification reaction assays requiring reaction volumes of less than about 1 nanoliter.
In some embodiments, the method comprises providing a reaction volume comprising an engineered reverse transcriptase and a template ribonucleic acid (RNA) molecule, and is considered a "low capacity reaction". In one other embodiment, the contacting occurs in a reaction volume (i.e., a low capacity reaction) that may be less than 1 nanoliter, less than 750 picoliters, or less than 500 picoliters. In other embodiments, the reaction volume is present in a partition, such as a droplet or a well (including a microwell or nanopore).
In some embodiments, an engineered reverse transcriptase or derivative thereof as described herein is used with a reaction volume of less than about 1 nanoliter (nL). In some embodiments, an engineered reverse transcriptase or derivative thereof as described herein is used with a reaction volume of less than about 500 picoliters (pL). In some embodiments, the reaction volume is contained within a partition. In some embodiments, the reaction volume is contained within a droplet. In some embodiments, the reaction volume is contained within a droplet in the emulsion. In some embodiments, the reaction volume is contained within a droplet emulsion having a reaction volume of less than about 1 nL. In some embodiments, the reaction volume is contained within a droplet emulsion having a reaction volume of less than about 500 pL.
In some embodiments, the reaction volume is contained within the well. In some embodiments, the reaction volume is contained within pores having a reaction volume of less than about 1 nL. In some embodiments, the reaction volume is contained within the well. In some embodiments, the reaction volume is contained within pores having a reaction volume of less than about 500 pL. In some embodiments, the reaction volume is contained within a well of a well array having extracted nucleic acid molecules, and the template nucleic acid molecules are extracted nucleic acid molecules. In some embodiments, the reaction volume is contained within a well of a well array having cells comprising the template nucleic acid molecule, and wherein the template nucleic acid molecule is released from the cells.
3. Unique Molecular Identifier (UMI)
In some embodiments, the molecular tag, which may or may not include a barcode, also includes a functional sequence, such as a Unique Molecular Identifier (UMI). In some embodiments, the molecular tag is coupled to a primer sequence. In some embodiments, each of the primer sequences comprises a random N-mer sequence. In some embodiments, the random N-mer sequence is complementary to the 3' sequence of the RNA molecule. In some embodiments, the primer sequence comprises a poly dT sequence of at least 5 bases in length. In some embodiments, the primer sequence comprises a poly dT sequence (SEQ ID NO: 17) that is at least 10 bases in length. In some embodiments, the primer sequence comprises a poly dT sequence of at least 5 bases, at least 6 bases, at least 7 bases, at least 8 bases, at least 9 bases, at least 10 bases in length (SEQ ID NO: 17).
Individual cells or cell populations may be assigned or associated with Unique Molecular Identifiers (UMIs) to tag or label components of the cells (and thus their characteristics) with these unique identifiers. These unique molecular identifiers can be used to attribute components and features of cells to individual cells or groups of cells.
In some aspects, the unique molecular identifiers are provided in the form of nucleic acid molecules (e.g., oligonucleotides) that comprise nucleic acid barcode sequences that can be attached to or otherwise associated with the nucleic acid content of individual cells or to other components of the cells, particularly fragments of such nucleic acids. The nucleic acid molecules are partitioned such that, when between nucleic acid molecules in a given partition, the nucleic acid UMI sequences contained therein are identical, but when between different partitions, the nucleic acid molecules may and do have different UMI sequences, or at least represent a large number of different UMI sequences in all partitions in a given analysis. In some aspects, only one nucleic acid barcode or UMI sequence may be associated with a given partition, but in some cases, there may be two or more different barcodes or UMI sequences.
The nucleic acid UMI or barcode sequence may comprise about 6 to about 20 or more nucleotides within the sequence of a nucleic acid molecule (e.g., an oligonucleotide). The nucleic acid UMI or barcode sequence may comprise about 6 to about 20, 30, 40, 50, 60, 70, 80, 90, 100 or more nucleotides. In some cases, the UMI or barcode sequence may be about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 nucleotides or more in length. In some cases, the UMI or barcode sequences may be at least about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 nucleotides or more in length. In some cases, the UMI or barcode sequences may be up to about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 nucleotides or less in length. These nucleotides may be completely contiguous, i.e. in a single stretch of adjacent nucleotides, or they may be divided into two or more separate subsequences separated by 1 or more nucleotides. In some cases, the separate UMI or barcode sequences may be about 4 to about 16 nucleotides in length. In some cases, the UMI or the barcode sequence may be about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 nucleotides or longer. In some cases, the UMI or the barcode sequence may be at least about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 nucleotides or more. In some cases, the UMI or the barcode sequence may be up to about 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 nucleotides or less.
In addition, when the barcode population or UMI is partitioned, the resulting partitioned population may also include a diverse barcode or UMI library that may include at least about 1,000 different barcodes or UMI sequences, at least about 5,000 different barcodes or UMI sequences, at least about 10,000 different barcodes or UMI sequences, at least about 50,000 different barcodes or UMI sequences, at least about 100,000 different barcodes or UMI sequences, at least about 1,000,000 different barcodes or UMI sequences, at least about 5,000,000 different barcodes or UMI sequences, or at least about 10,000,000 different barcodes or UMI sequences. Further, each partition of the population may include at least about 1,000 nucleic acid molecules, at least about 5,000 nucleic acid molecules, at least about 10,000 nucleic acid molecules, at least about 50,000 nucleic acid molecules, at least about 100,000 nucleic acid molecules, at least about 500,000 nucleic acids, at least about 1,000,000 nucleic acid molecules, at least about 5,000,000 nucleic acid molecules, at least about 10,000,000 nucleic acid molecules, at least about 50,000,000 nucleic acid molecules, at least about 100,000,000 nucleic acid molecules, at least about 250,000,000 nucleic acid molecules, and in some cases, at least about 10 hundred million nucleic acid molecules.
In some embodiments, the enhanced reverse transcriptase activity of the engineered reverse transcriptase disclosed herein is enhanced ability to produce mitochondrial UMI counts compared to reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 1 or 15. In some embodiments, the enhanced reverse transcriptase activity is enhanced in the ability to produce increased ribosome UMI counts as compared to a reverse transcriptase having the amino acid sequence set forth in SEQ ID NO. 1 or 15. The read count and Unique Molecular Identifier (UMI) count are the primary gene expression quantification schemes used in single cell RNA sequencing (scRNA-seq) analysis, so as the ribosomal UMI count increases, the sensitivity and accuracy of the scRNA-seq assay in determining the transcriptome profile of any given cell, group of cells, or tissue increases. A number of indicators can be used for quality control of single cell RNA sequencing, including the percentage of reads mapped to ribosomal genes, the percentage of reads mapped to mitochondrial genes, the total number of UMIs detected, or the number of features mapped to 50% reads.
Advantageously, even after any subsequent amplification of the contents of a given partition, the number of different UMIs may be indicative of the amount of mRNA originating from the given partition, and thus the amount of mRNA originating from the cell. As described above, transcripts can be amplified, purified and sequenced to identify the sequence of cDNA transcripts of mRNA, as well as to sequence barcode and UMI segments. Although a poly dT primer sequence is described, other targeting or random primer sequences may be used to initiate a reverse transcription reaction. Also, while described as releasing barcoded oligonucleotides into a partition, in some cases, nucleic acid molecules that bind to beads (e.g., gel beads) can be used to hybridize and capture mRNA on a bead solid phase, e.g., to facilitate separation of RNA from other cellular content.
It is well recognized that certain reverse transcriptases can increase UMI reads of a gene of a desired length or length of interest. The desired length of the gene may be selected from the group having a length of less than 500 nucleotides, between 500 and 1000 nucleotides, between 1000 and 1500 nucleotides, and greater than 1500 nucleotides. It has been recognized that reverse transcriptase may preferentially increase the likelihood of producing more UMI reads from a range of genes. It has been recognized that the performance of an engineered reverse transcriptase in either a 3 '-reverse transcription assay or a 5' -reverse transcription assay may be similar, different or comparable. It has been similarly recognized that engineered reverse transcriptase may preferentially increase the likelihood of producing more UMI reads from a length of gene in a 3 '-reverse transcription assay than in a 5' -reverse transcription assay.
4. Gel beads
The engineered reverse transcriptase of the present application is applicable to methods in which cells can be co-compartmentalized with beads carrying barcodes and/or UMI. The barcoded nucleic acid molecules may be released from the beads in the partition. For example, in the context of analyzing sample RNA, a poly dT (polydextrose), also known as oligo (dT), segment of one of the released nucleic acid molecules may hybridize to the poly a tail of an mRNA molecule. Reverse transcription can produce cDNA transcripts of mRNA, but the transcripts include each sequence segment of a nucleic acid molecule. Without being limited by the mechanism, since the nucleic acid molecule comprises an anchor sequence, it is more likely to hybridize to the sequence end of the poly-a tail of mRNA and initiate reverse transcription. Substantially all cDNA transcripts of individual mRNA molecules may comprise a common barcode sequence segment within any given partition. However, transcripts made from different mRNA molecules within a given partition may vary at unique molecular identification sequence segments (e.g., UMI segments).
In some embodiments of the nucleic acid extension methods described herein, a plurality of barcoded nucleic acid molecules are attached to a support (e.g., a particle, a slide, a chip, a bead, etc.). In one embodiment, the support is selected from the group consisting of an array, a bead, a gel bead, a microparticle, and a polymer. In some embodiments, the barcoded nucleic acid molecules attached to the support comprise a molecular tag (UMI), a primer sequence, a capture sequence, a cleavage sequence, or an additional functional sequence. In some embodiments, the support is a gel bead. In this embodiment, the barcoded nucleic acid molecules are releasably attached to the gel beads. In some embodiments, the gel beads comprise a polyacrylamide polymer.
In some embodiments, the gel beads have a cross-section of less than about 100 μm. In some embodiments, the gel beads have a cross-section of less than about 60 μm. In some embodiments, the gel beads have a cross-section of less than about 50 μm. In some embodiments, the gel beads have a cross-section of less than about 40 μm. In some embodiments, the gel beads have a cross-section of less than about 100 μm, less than about 99 μm, less than about 98 μm, less than about 97 μm, less than about 96 μm, less than about 95 μm, less than about 94 μm, less than about 93 μm, less than about 92 μm, less than about 91 μm, less than about 90 μm, less than about 89 μm, less than about 88 μm, less than about 87 μm, less than about 86 μm, less than about 85 μm, less than about 84 μm, less than about 83 μm, less than about 82 μm, less than about 81 μm, less than about 80 μm, less than about 79 μm, less than about 78 μm, less than about 77 μm, less than about 76 μm, less than about 75 μm, less than about 74 μm, less than about 73 μm, less than about 72 μm, less than about 71 μm, less than about 70 μm, less than about 69 μm, less than about 68 μm, less than about 67 μm, less than about 64 μm, less than about 61 μm, less than about 64 μm, or less than about 60 μm.
Functionalization of the beads for attachment of nucleic acid molecules (e.g., oligonucleotides) can be accomplished by a number of different methods, including activation of chemical groups within the polymer, incorporation of active or activatable functional groups in the polymer structure, or attachment at the prepolymer or monomer stage of bead generation.
For example, a precursor (e.g., monomer, crosslinker) that polymerizes to form a bead may comprise acrydite moieties such that when the bead is produced, the bead also comprises acrydite moieties. The acrydite moiety can be linked to a nucleic acid molecule (e.g., an oligonucleotide) that can include a primer sequence (e.g., a primer for amplifying a target nucleic acid, a random primer, a primer sequence of a messenger RNA) and/or one or more barcode sequences. The one or more barcode sequences may include sequences that are the same for all nucleic acid molecules coupled to a given bead and/or sequences that are different in all nucleic acid molecules coupled to a given bead. The nucleic acid molecules can be incorporated into beads.
In some cases, the nucleic acid molecule may comprise a functional sequence (e.g., for ligation to a sequencing flow cell), e.g., forSequenced P5 sequence. In some cases, the nucleic acid molecule or derivative thereof (e.g., an oligonucleotide or polynucleotide generated from the nucleic acid molecule) may comprise another functional sequence, such as a P7 sequence for ligation to a sequencing flow cell for Illumina sequencing. In some cases, the nucleic acid molecule may comprise a barcode sequence. In some cases, the primer may also comprise a Unique Molecular Identifier (UMI). In some cases, the primer may comprise an R1 sequence for Illumina sequencing workflow. In some cases, the primer may comprise an R2 sequence for Illumina sequencing workflow. Examples of such nucleic acid molecules (e.g., oligonucleotides, polynucleotides, etc.) and uses thereof that can be used with the compositions, devices, methods, and systems of the present disclosure are provided in U.S. patent publication nos. 2014/0378345 and 2015/0376609, each of which is incorporated herein by reference The entire contents of which are incorporated herein by reference. However, the invention is not limited to any combination of nucleic acid molecules or derivatives thereof, or any particular sequencing platform, and these characterizations are merely examples of useful in reverse transcription workflows.
In operation, cells may be co-partitioned with the bar-coded beads. The barcoded nucleic acid molecules attached to the beads may be released from the beads in the partition. For example, in the context of analyzing sample RNA, a poly-dT (polydextrose), also known as oligo (dT), segment of one of the released nucleic acid molecules may hybridize to (e.g., capture) a poly-a tail of an mRNA molecule. Reverse transcription can produce a cDNA transcript of mRNA, but the cDNA transcript also includes each sequence segment of the nucleic acid molecule. Because the nucleic acid molecule comprises additional functional sequences (e.g., capture domains, primer domains, UMI, barcodes, etc.), it can hybridize to mRNA and initiate reverse transcription of the mRNA using the hybridized mRNA as a template. Within any given partition, all cDNA transcripts of individual mRNA molecules may contain a common barcode sequence. However, transcripts produced by different mRNA molecules within a given partition may vary by unique molecular recognition sequences (e.g., UMI). Advantageously, after any subsequent amplification of the contents of a given partition, the number of different UMIs may also be indicative of the amount of mRNA originating from the given partition, and thus of the amount of mRNA originating from the cells. As described above, transcripts can be amplified and sequenced to identify the sequence of the original mRNA capture template, as well as the sequence of the relevant barcodes and UMI. Although a poly dT capture sequence is described, other targeting or random capture sequences may be used to capture or hybridize to the template to initiate a reverse transcription reaction.
C. Processing TCR
In some embodiments, the engineered reverse transcriptase is used in a method including, but not limited to: processing TCRs from individual T cells or T cell populations, determining the nucleotide sequence of the T cell TCRs, and obtaining a TCR repertoire profile. In some methods, a nucleic acid barcode sequence is appended to a nucleic acid molecule encoding a TCR (e.g., a molecule derived from a T cell containing a nucleic acid sequence encoding a TCR, such as a TCRa and/or TCRb mRNA) resulting in a barcoded nucleic acid molecule comprising a sequence corresponding to the TCR nucleic acid sequence (e.g., comprising the V (D) J region of the TCR gene or its reverse complement) and a sequence corresponding to the barcode sequence (in some cases the reverse complement of the barcode sequence present in the nucleic acid barcode molecule). The barcoded nucleic acid molecules can serve as templates, such as template polynucleotides, which can be further processed (e.g., amplified) and sequenced to obtain target nucleic acid sequences. For example, the barcoded nucleic acid molecules can be further processed (e.g., amplified) and sequenced to obtain the nucleic acid sequence of the mRNA.
TCR is a molecule found on the surface of T cells. In general, binding of antigen molecules to TCRs results in cell activation and response. TCRs are heterodimers composed of two distinct protein chains. In many T cells, these two proteins are the alpha (α) chain and the beta (β) chain. In a smaller percentage of T cells, the two proteins are gamma (γ) and delta (δ) chains. The ratio of TCRs consisting of alpha/beta and gamma/delta chains may change during a disease state, such as cancer, tumor, infectious disease, inflammatory disease or autoimmune disease. Engagement of TCRs with peptide-MHC activates T cells through a series of biochemical events mediated by related enzymes, co-receptors, specialized adapter molecules and activated or released transcription factors.
Each of the two chains of the TCR contains multiple copies of a gene segment: variable "V" gene segments, diversity "D" segments, and junction "J" segments. TCR alpha chains are produced by recombination of V and J segments, while beta chains are produced by recombination of V, D and J segments. Similarly, production of TCR gamma chains involves recombination of V and J segments. Production of TCR delta chains was performed by recombination of V, D and J gene segments. The intersection of these specific regions (V and J of the alpha or gamma chain, V, D, J of the beta or delta chain) corresponds to the CDR3 region involved in antigen-MHC recognition. Complementarity determining regions (e.g., CDR1, CDR2, and CDR 3) or hypervariable regions are sequences in the variable regions of antigen receptors (e.g., T cell receptors and immunoglobulins) that are complementary to an antigen. The diversity of most CDRs, which is generated by somatic recombination events during T lymphocyte development, is found in CDR 3. CDR3, encoded by the junction region between the V and J or D and J genes, is highly variable. CDR3 is often used as a region of interest for determining T cell clonotype, a unique nucleotide sequence that occurs during gene rearrangement, because it is highly unlikely that two T cells will express the same CDR3 nucleotide sequence unless they are derived from the same clonally expanded T cell. Because an active TCR consists of paired chains within a single T cell, determining an active paired chain within a single T cell requires sequencing of a single T cell. The TCR gene sequences may include, but are not limited to, the sequences of various T cell receptor alpha variable genes (TRAV genes), T cell receptor alpha junction genes (TRAJ genes), T cell receptor alpha constant genes (TRAC genes), T cell receptor beta variable genes (TRBV genes), T cell receptor beta diversity genes (TRBD genes), T cell receptor beta junction genes (TRBJ genes), T cell receptor gamma variable genes (TRGV genes), T cell receptor gamma junction genes (TRGJ genes), T cell receptor gamma constant genes (TRGC genes), T cell receptor delta variable genes (TRDV genes), T cell receptor delta diversity genes (TRDD genes), T cell receptor delta junction genes (TRDJ genes), and T cell receptor delta constant genes (TRDC genes).
VII kit
In one aspect the invention provides a kit comprising an engineered reverse transcriptase or a derivative thereof as described herein. In some embodiments, the kit further comprises one or more of a carrier, a nucleotide, a buffer, a salt, and/or instructions. In another embodiment, the kit may comprise an engineered reverse transcriptase or derivative thereof for reverse transcription or amplification of nucleic acid molecules. In another embodiment, the kit may be used for single cell profiling of transcriptomes. In another embodiment, the kit may be used in spatial transcriptomics methods and assays. In another embodiment, the kit can be used in situ methods and assays.
The kit may include suitable reaction buffers, dntps, one or more primers, one or more control reagents, or any other reagents disclosed for practicing the methods of the disclosure. The engineered reverse transcriptase or derivative thereof, the reaction buffer, and dntps may be provided separately or together in a master mix solution. When the engineered reverse transcriptase or derivative thereof, reaction buffer and dntps are provided as a master mix, the master mix is present at a concentration at least twice that of the working concentration indicated in the instructions for use of the extension reaction. In other cases, the master mix may be present at a concentration at least three times, at least four times, at least five times, at least six times, at least seven times, at least eight times, at least nine times, or at least ten times the working concentration shown. The primer in the kit may be a poly dT primer, a random N-mer primer or a target specific primer.
The kit may further comprise one, two, three, four, five or more up to all of the following: a spacer fluid, including an aqueous buffer and a non-aqueous spacer fluid or oil; nucleic acid barcode capture probes releasably associated with beads as described herein; a microfluidic device; an agent for disrupting cells; reagents for amplifying nucleic acids; and instructions for using any of the foregoing materials in the methods described herein.
Instructions for using any method are typically recorded on a suitable recording medium (e.g., printed on a substrate such as paper or plastic), or available in digital format. Thus, the instructions may be present in the kit as a package insert, in the label of the container of the kit or a component thereof (i.e., associated with the package or sub-package). In some cases, the instructions may reside as electronic storage data files on a suitable computer-readable storage medium. In other cases, the actual instructions may not be present in the kit, but means may be provided to obtain the instructions from a remote source (e.g., via the internet). For example, a kit comprising a website from which instructions can be reviewed and/or from which instructions can be downloaded. As with the description, this means of obtaining the description is recorded on a suitable substrate.
Kits according to this aspect of the disclosure include a carrier device, e.g., a box, carton, tube, etc., having enclosed therein one or more container devices, such as vials, tubes, ampoules, bottles, etc., wherein a first container device contains one or more of the engineered reverse transcriptase of the disclosure or derivatives thereof having reverse transcriptase activity. When more than one polypeptide having reverse transcriptase activity is used, they may be present in a single container as a mixture of two or more engineered reverse transcriptases or derivatives thereof, or in separate containers. The kits of the present disclosure may further comprise (in the same or separate containers) one or more DNA polymerases, a suitable buffer, one or more nucleotides, and/or one or more primers.
Kits of the present disclosure may also comprise one or more hosts or cells, including those capable of absorbing nucleic acids (e.g., DNA molecules including vectors). Preferred hosts may include chemically competent or inductively competent bacteria, such as E.coli (including DH5, DH 5. Alpha., DH10B, HB101, top 10 and other K-12 strains, E.coli B and E.coli W strains).
In a particular aspect of the disclosure, a kit of the disclosure (e.g., a reverse transcription and amplification kit) can include one or more components (in mixture or separate form) including one or more engineered reverse transcriptases or derivatives thereof having reverse transcriptase activity of the disclosure, one or more nucleotides for nucleic acid molecule synthesis (where one or more nucleotides can be labeled, e.g., fluorescent label), and/or one or more primers (e.g., oligo (dT) for reverse transcription, random body for extension reaction, etc.). Such kits may also comprise one or more DNA polymerases.
VIII definition of
Unless defined otherwise, all technical and scientific terms used herein generally have the same meaning as commonly understood by one of ordinary skill in the art to which this technology belongs. As used in this specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the content clearly dictates otherwise. "A and/or B" is used herein to include all of the following alternatives: "A", "B", "A or B" and "A and B". For example, reference to "a cell" includes a combination of two or more cells, and the like. Generally, the nomenclature used herein and the laboratory procedures in cell culture, molecular genetics, organic chemistry, analytical chemistry, and nucleic acid chemistry and hybridization described below are those well known and commonly employed in the art.
Where values are described as ranges, it is understood that such disclosure includes disclosure of all possible sub-ranges within such ranges, as well as specific values falling within such ranges, whether or not the specific values or sub-ranges are explicitly stated.
Whenever the term "at least", "greater than" or "greater than or equal to" precedes the first value in a series of two or more values, the term "at least", "greater than" or "greater than or equal to" applies to each value in the series. For example, 1, 2, or 3 or more corresponds to 1 or more, 2 or 3 or more.
Whenever the term "no more," "less than," or "less than or equal to" precedes the first value in a series of two or more values, the term "no more," "less than," or "less than or equal to" applies to each value in the series. For example, less than or equal to 3, 2, or 1 corresponds to less than or equal to 3, less than or equal to 2, or less than or equal to 1.
Certain ranges of values are provided herein preceded by the term "about. The term "about" is used herein to literally support the exact number starting with that term, as well as numbers that are close to or approximate to starting with that term. In determining whether a number is close or approximates a specifically recited number, the close or approximated non-recited number may be a number that, in the context of its occurrence, provides a substantial equivalent of the specifically recited number. If the degree of approximation is not clear in the context, "about" means to within plus or minus 10% of the value provided, or rounded to the nearest significant figure, including the value provided in each case. In some embodiments, the term "about" means a specified value of ± up to 10%, up to ± 5% or up to ± 1%. Numerical ranges include the numbers defining the range. The term "about" is used herein to mean plus or minus ten percent (10%) of a value. For example, "about 100" refers to any number between 90 and 110.
Headings (e.g., (a), (b), (i), etc.) are presented merely for convenience in reading the specification and claims. The use of headings in the specification or claims does not require that the steps or elements be performed in alphabetical or numerical order or the order in which they are presented.
Use of ordinal terms such as "first," "second," "third," etc., in the claims does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements. Similarly, the use of these terms in the description does not itself imply any required priority, precedence or order.
As used herein, the term "analyte" refers to a biological molecule. Analytes include, but are not limited to, DNA analytes, RNA analytes, oligonucleotides, reporter molecules configured to be coupled directly to proteins, reporter molecules configured to be coupled indirectly to proteins, reporter molecules configured to be coupled directly to metabolites, and reporter molecules configured to be coupled indirectly to metabolites.
The terms "adapter", "adapter" and "tag" may be used synonymously. The adaptors or tags may be coupled to the polynucleotide sequences to be "tagged" by any method, including ligation, hybridization or other methods.
As used herein, the term "barcoded nucleic acid molecule" generally refers to a nucleic acid molecule resulting from, for example, treatment of a barcoded nucleic acid molecule with a nucleic acid sequence (e.g., a nucleic acid sequence complementary to a nucleic acid primer sequence comprised by the barcoded nucleic acid molecule). The nucleic acid sequence may be a targeting sequence or a non-targeting sequence. The barcoded nucleic acid molecule may be coupled or attached to a nucleic acid molecule comprising a nucleic acid sequence. For example, the barcoded nucleic acid molecules described herein can hybridize to an analyte (e.g., a messenger RNA (mRNA) molecule) of a cell. Reverse transcription can produce a barcoded nucleic acid molecule whose sequence corresponds to the nucleic acid sequence of the mRNA and the barcode sequence (or its reverse complement). Processing a nucleic acid molecule comprising a nucleic acid sequence, a barcoded nucleic acid molecule, or both, may include a nucleic acid reaction, such as, in non-limiting examples, reverse transcription, nucleic acid extension, ligation, and the like. The nucleic acid reaction may be performed before, during or after the barcoding of the nucleic acid sequence to produce a barcoded nucleic acid molecule. For example, a nucleic acid molecule comprising a nucleic acid sequence may be subjected to reverse transcription and then attached to a barcoded nucleic acid molecule to produce a barcoded nucleic acid molecule, or a nucleic acid molecule comprising a nucleic acid sequence may be attached to a barcoded nucleic acid molecule and then subjected to a nucleic acid reaction (e.g., extension, ligation) to produce a barcoded nucleic acid molecule. The barcoded nucleic acid molecules can serve as templates, such as template polynucleotides, which can be further processed (e.g., amplified) and sequenced to obtain target nucleic acid sequences. For example, in the methods and systems described herein, the barcoded nucleic acid molecules can be further processed (e.g., amplified) and sequenced to obtain the nucleic acid sequence (e.g., mRNA) of the nucleic acid molecule.
The barcoded nucleic acid molecules of the plurality of nucleic acid molecules may be used to generate "barcoded nucleic acid molecules". In some cases, the barcoded molecules comprise a different reporter barcode sequence that identifies the second analyte. Different reporter barcode sequences or analyte-specific barcode sequences can identify proteins, lipids, metabolites, or other secondary analytes.
The barcoded nucleic acid can be generated from the construct depicted in fig. 17 (e.g., via a nucleic acid reaction, such as nucleic acid extension or ligation). For example, the handle sequence can then be hybridized with a complementary sequence, such as capture sequence 1723, to generate (e.g., via a nucleic acid reaction, such as nucleic acid extension or ligation) a barcoded nucleic acid molecule comprising cellular (e.g., partition-specific) barcode sequence 1722 (or its reverse complement) and reporter barcode sequence 1722 (or its reverse complement). In some embodiments, capture handle sequence 1723 comprises a sequence that is complementary to a template switch oligonucleotide on capture sequence 1723. In some embodiments, the barcoded nucleic acid molecule 1790 (e.g., a barcoded partition specific molecule) further comprises UMI (not shown). The barcoded nucleic acid molecules can then optionally be processed as described elsewhere herein, for example, to amplify the molecule and/or to supplement the sequencing platform specific sequences to the fragments. See, for example, U.S. patent publication 2018/0105808, which is hereby incorporated by reference in its entirety for all purposes. The barcoded nucleic acid molecules or derivatives generated therefrom can then be sequenced on a suitable sequencing platform.
In some embodiments, multiple analytes (e.g., a nucleic acid and one or more analytes, using a labeling agent as described herein) can be analyzed. In some cases, analysis of analytes (e.g., nucleic acids, polypeptides, carbohydrates, lipids, glycans, glycan motifs, metabolites, proteins, etc.) includes the workflow generally depicted in fig. 17. The barcoded nucleic acid molecule 1790 (e.g., a partition-specific barcoded molecule) can be co-partitioned with one or more analytes. In some cases, the barcoded nucleic acid molecules 1790 are attached to a support 1730 (e.g., a bead, such as a gel bead), such as those described elsewhere herein. For example, the barcoded nucleic acid molecule 1790 can be attached to the support 1730 via a releasable bond 1740 (e.g., including an labile bond), such as those described elsewhere herein. The barcoded nucleic acid molecule 1790 can comprise a functional sequence 1721 and optionally other additional sequences, such as a barcode sequence 1722 (e.g., a common barcode, a partition-specific barcode, or other functional sequences described elsewhere herein) and/or a UMI sequence (not shown). The barcoded nucleic acid molecule 1790 may comprise a capture sequence 1723 that may be complementary to another nucleic acid sequence such that it may hybridize to a particular sequence, such as capture handle sequence 1723.
For example, capture sequence 1723 may comprise a poly-T sequence and may be used to hybridize to mRNA. Referring to fig. 17, in some embodiments, the barcoded nucleic acid molecule 1790 comprises a capture sequence 1723 that is complementary to the sequence of the RNA molecule 1760 from the cell. In some cases, capture sequence 1723 comprises a sequence specific for an RNA molecule. The capture sequence 1723 may comprise a known sequence or a targeting sequence, or a random sequence. In some cases, a nucleic acid extension reaction can be performed to produce a barcoded nucleic acid product comprising capture sequence 1723, functional sequence 1721, barcode sequence 1722, any other functional sequence, and a sequence corresponding to RNA molecule 1760.
In another example, the capture sequence 1723 may be complementary to a overhang sequence or an adaptor sequence that has been added to the analyte. Any suitable agent can degrade the beads. Suitable reagents may include, but are not limited to, temperature changes, pH changes, reduction, oxidation, and exposure to water or other aqueous solutions.
In some cases, cells bound to a label agent conjugated to an oligonucleotide and comprising a barcoded nucleic acid molecule 1790 (e.g., a bead, such as a gel bead) are partitioned into one of a plurality of partitions (e.g., a droplet of a droplet emulsion or a well of a microwell array).
As used herein, the term "bead" generally refers to a particle. The beads may be solid or semi-solid particles. The beads may be gel beads. The gel beads may include a polymer matrix (e.g., a matrix formed by polymerization or cross-linking). The polymer matrix may include one or more polymers (e.g., polymers having different functional groups or repeating units). The polymers in the polymer matrix may be randomly arranged, for example in a random copolymer, and/or have an ordered structure, for example in a block copolymer. Crosslinking may be achieved via covalent, ionic or induced interactions or physical entanglement. The beads may be macromolecules. Beads may be formed from nucleic acid molecules that are bound together. Beads may be formed via covalent or non-covalent assembly of molecules (e.g., macromolecules) such as monomers or polymers. Such polymers or monomers may be natural or synthetic. Such polymers or monomers may be or include, for example, nucleic acid molecules (e.g., DNA or RNA). The beads may be formed of a polymeric material. The beads may be magnetic or non-magnetic. The beads may be rigid. The beads may be flexible and/or compressible. The beads may be destructible or dissolvable. The beads may be solid particles (e.g., metal-based particles including, but not limited to, iron oxide, gold, or silver) covered with a coating comprising one or more polymers. Such coatings may be destructible or dissolvable.
As used herein, the term "efficiency" in the context of the nucleic acid modifying enzyme of the present invention refers to the ability of the enzyme to perform its catalytic function under specific reaction conditions. In general, "efficiency" as defined herein is expressed by the amount of product produced under a given reaction condition.
As used herein, the term "enhancing" in the context of an enzyme refers to increasing the activity of the enzyme, i.e. increasing the amount of product per unit time per unit enzyme.
As used herein, the term "fidelity" refers to the accuracy of polymerization, or the ability of a reverse transcriptase to distinguish between a correct substrate and a false substrate (e.g., nucleotide) when synthesizing a nucleic acid molecule that is complementary to a template. The higher the fidelity of the reverse transcriptase, the fewer erroneously incorporated nucleotides in the growing chain during nucleic acid synthesis; that is, an increase or enhancement in fidelity results in a more faithful reverse transcriptase with reduced error rate or reduced error incorporation.
As used herein, the term "percent homology", which is used interchangeably with the term "percent identity", refers to the level of nucleic acid or amino acid sequence identity between nucleic acid sequences encoding a polypeptide of the invention or any of the amino acid sequences of the polypeptide of the invention when aligned using a sequence alignment program.
As used herein, the term "identical" in the context of two nucleic acid or polypeptide sequences means that the residues in the two sequences are identical when aligned for maximum correspondence using a sequence comparison algorithm. Sequence comparison algorithms are known to those skilled in the art. See, e.g., ebi.ac. uk/Tools/msa/clustalo/.
As used herein, the term "inhibitor resistance" refers to the ability of a reverse transcriptase to reverse transcribe in the presence of compounds, chemicals, proteins, buffers, etc., which are generally inhibitory to (prevent or inhibit reverse transcriptase activity).
As used herein, the term "small volume reaction" means a reaction volume of less than 1 nanoliter, less than 750 picoliters, or less than 500 picoliters.
As used herein, the term "molecular tag" generally refers to a molecule capable of binding to a macromolecular component. Molecular tags can bind to macromolecular components with high affinity. Molecular tags can bind to macromolecular components with high specificity. The molecular tag may comprise a nucleotide sequence. The molecular tag may comprise a nucleic acid sequence. The nucleic acid sequence may be at least a portion or all of a molecular tag. The molecular tag may be a nucleic acid molecule or may be part of a nucleic acid molecule. The molecular tag may be an oligonucleotide or a polypeptide. The molecular tag may comprise a DNA aptamer. The molecular tag may be or comprise a primer. The molecular tag may be or comprise a protein. The molecular tag may comprise a polypeptide. The molecular tag may be a barcode.
As used herein, the term "mutation" or "mutant" or "variant" refers to one or more changes introduced in a wild-type DNA sequence or wild-type amino acid sequence. Examples of mutations and variants include, but are not limited to, substitutions, insertions, deletions, and point mutations. Mutations can be made at the nucleic acid level or at the amino acid level.
As used herein, the term "operably linked" or "conjugation" or "fusion" means that, with respect to a recombinant thermostable polymerase sequence, there is one or more sequences at the N-or C-terminus that, when transcribed and translated, produce additional polypeptides associated with the enzyme amino acid sequence, thereby producing conjugation or fusion of one or more polypeptides from one expression vector.
As used herein, the term "partition" generally refers to a space or volume that may be suitable for containing one or more species or carrying out one or more reactions. The partitions may be physical compartments such as droplets or holes. A partition may isolate a space or volume from another space or volume. The droplets may be a first phase (e.g., an aqueous phase) in a second phase (e.g., oil) that is immiscible with the first phase. The droplets may be a first phase in a second phase that is not phase separated from the first phase, such as capsules or liposomes in an aqueous phase. A partition may include one or more other (internal) partitions. In some cases, a partition may be a virtual compartment, which may be defined and identified by an index (e.g., an index library) that spans multiple and/or remote physical compartments. For example, the physical compartment may include a plurality of virtual compartments.
As used herein, the term "partitioning" is intended to encompass dividing, partitioning, depositing, separating, or compartmentalization into one or more partitions. Systems and methods for separating one or more particles (such as, but not limited to, biological particles, macromolecular components of biological particles, beads, reagents, etc.) into discrete compartments or partitions (interchangeably referred to herein as partitions), wherein each partition maintains its own separation of the contents from the contents of the other partitions, are known in the art. See, for example, US2020/0032335, which is incorporated herein by reference in its entirety. The partitions may be droplets in an emulsion. A partition may include one or more other partitions.
The "plurality of barcoded nucleic acid molecules" may include at least about 500 barcoded nucleic acid molecules, at least about 1,000 barcoded nucleic acid molecules, at least about 5,000 barcoded nucleic acid molecules, at least about 10,000 barcoded nucleic acid molecules, at least about 50,000 barcoded nucleic acid molecules, at least about 100,000 barcoded nucleic acid molecules, at least about 500,000 barcoded nucleic acid molecules, at least about 1,000,000 barcoded molecules, at least about 5,000,000 barcoded nucleic acid molecules, at least about 10,000,000 barcoded nucleic acid molecules, at least about 100,000,000 barcoded nucleic acid molecules, at least about 1,000,000 barcoded nucleic acid molecules. In some cases, the plurality of barcoded nucleic acid molecules comprises a partition specific barcode sequence.
Each of the plurality of barcoded nucleic acid molecules may comprise an identifier sequence separate from the partition-specific barcode sequence, wherein the identifier sequence is different for each of the plurality of barcoded nucleic acid partition-specific molecules. In some cases, such identifier sequences are Unique Molecular Identifiers (UMIs), as described elsewhere herein. As described elsewhere herein, UMI sequences can uniquely identify a particular nucleic acid molecule that is barcoded, which can identify, count, etc., the particular nucleic acid molecule being analyzed. Furthermore, in some cases, each of the plurality of barcoded nucleic acid molecules may comprise a partition-specific barcode sequence, and the beads may be from a plurality of beads, such as a population of barcoded beads. Each partition-specific barcode sequence may be different from the partition-specific barcode sequences of the barcoded nucleic acid molecules of other beads of the plurality of beads. In this case, a population of barcoded beads can be analyzed, wherein each bead contains a different partition-specific barcode sequence.
As used herein, the term "sustained synthesis capacity" refers to the ability of a reverse transcriptase to extend a primer continuously without dissociating from a nucleic acid template. The length of a template that a reverse transcriptase or polymerase can replicate can also be used to describe the sustained synthetic capacity of the reverse transcriptase or polymerase. In some embodiments, "sustained synthesis capacity" refers to the ability of a polymerase to remain bound to a template or substrate and to perform DNA synthesis. Continuous synthesis capacity is measured by the number of catalytic events that occur per binding event.
As used herein, the term "purified" means that the molecule is present in the sample at a concentration of at least 95 wt.% or at least 98 wt.% of the sample comprising the molecule.
As used herein, the terms "reverse transcriptase activity", "reverse transcription activity" or "reverse transcription" indicate the ability of an enzyme to synthesize a DNA strand (i.e., complementary DNA or cDNA) using RNA as a template. Reverse transcriptase activity can be measured by incubating the enzyme in the presence of RNA template and deoxynucleotide in the presence of a suitable buffer under suitable conditions, for example as described in the examples below. Methods for measuring RT activity are provided in the examples below and are also well known in the art. Bosworth et al Nature 1989,341:167-168.
As used herein, the term "Reverse Transcriptase (RT)" is used in its broadest sense to mean any enzyme that exhibits reverse transcription activity as measured by the methods disclosed herein or known in the art. Thus, "reverse transcriptase" of the present invention includes reverse transcriptase from retrovirus, other viruses, and DNA polymerase exhibiting reverse transcriptase activity, such as Tth DNA polymerase, taq DNA polymerase, tne DNA polymerase, tma DNA polymerase, and the like. RT from retroviruses include, but are not limited to, moloney murine leukemiSup>A Virus (M-MLV) RT, human Immunodeficiency Virus (HIV) RT, avian sarcomSup>A-leukemiSup>A Virus (ASLV) RT, rous SarcomSup>A Virus (RSV) RT, avian Myeloblastosis Virus (AMV) RT, avian myeloblastosis Virus (AEV) helper virus MCAVRT, avian myeloblastosis Virus MC29 helper virus MCAV RT, avian reticuloendotheliosis Virus (REV-T) helper virus REV-A RT, avian sarcomSup>A Virus UR2 helper virus UR2AVRT, avian sarcomSup>A Virus Y73 helper virus YAV RT, rous-associated Virus (RAV) RT and myeloblastosis Virus (MAV) RT and are described in U.S. patent application No. 2003/0198944 (hereby incorporated by reference in its entirety). For reviews, see, e.g., levin,1997, cell,88:5-8; brosius et al 5 1995,Virus Genes 11:163-79. Known reverse transcriptases from viruses require primers to synthesize DNA transcripts from RNA templates. Reverse transcriptase has been used primarily to transcribe RNA into cDNA, which is then cloned into a vector for further manipulation or for various amplification methods such as Polymerase Chain Reaction (PCR), nucleic Acid Sequence Based Amplification (NASBA), transcription Mediated Amplification (TMA) or self-sustained sequence replication (3 SR).
As used herein, the term "sample" generally refers to a biological sample of a subject. The biological sample may comprise any number of macromolecules, such as cellular macromolecules. The sample may be a cell sample. The sample may be a cell line or a cell culture sample. The sample may comprise one or more cells. The sample may comprise one or more microorganisms. The biological sample may be a nucleic acid sample or a protein sample. The biological sample may also be a carbohydrate sample or a lipid sample. The biological sample may be derived from another sample. The sample may be a tissue sample, such as a biopsy sample, core needle biopsy sample, needle aspirate, or fine needle aspirate. The sample may be a fluid sample, such as a blood sample, a urine sample, or a saliva sample. The sample may be a skin sample. The sample may be a cheek swab. The sample may be a plasma or serum sample. The sample may be a cell-free sample or a cell-free sample. The cell-free sample may comprise extracellular polynucleotides. The extracellular polynucleotides may be isolated from a body sample, which may be selected from the group consisting of blood, plasma, serum, urine, saliva, mucosal secretions, sputum, stool, and tears.
As used herein, the term "subject" generally refers to an animal such as a mammal (e.g., a human) or an avian (e.g., a bird), or other organism such as a plant. For example, the subject can be a vertebrate, mammal, rodent (e.g., mouse), primate, ape, or human. Animals may include, but are not limited to, farm animals, sports animals, and pets. The subject may be a healthy or asymptomatic individual, an individual who has or is suspected of having a disease (e.g., cancer) or is susceptible to the disease, and/or an individual in need of treatment or suspected of being in need of treatment. The subject may be a patient. The subject may be a microorganism or microbe (e.g., bacteria, fungi, archaea, viruses).
As used herein, the term "sequencing" generally refers to methods and techniques for determining the sequence of nucleotide bases in one or more polynucleotides. These polynucleotides may be, for example, nucleic acid molecules such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single stranded DNA). Sequencing may be performed by various systems currently available, such as, but not limited toPacific Biosciences/>Oxford />Or Life Technologies (Ion->) A sequencing system produced. Alternatively or in addition, sequencing may be performed using nucleic acid amplification, polymerase Chain Reaction (PCR) (e.g., digital PCR, quantitative PCR, or real-time PCR), or isothermal amplification. Such systems can provide a plurality of raw genetic data corresponding to genetic information of a subject (e.g., a human) as generated by the systems from a sample provided by the subject. In some examples, such systems provide sequencing reads (also referred to herein as "reads"). Reads may include a sequence of nucleobases corresponding to the sequence of a nucleic acid molecule that has been sequenced. In some cases, the systems and methods provided herein may be used with proteome information.
As used herein, the term "thermoreactive" or "thermoreactive" refers to the ability of a reverse transcriptase to exhibit enzymatic activity at elevated temperatures.
As used herein, "thermostable" or "thermostable" refers to the ability of a reverse transcriptase to withstand exposure to high temperatures, but not necessarily to exhibit activity at such high temperatures. In some embodiments, thermostable reverse transcriptase or polymerase refers to any enzyme that catalyzes polynucleotide synthesis by adding nucleotide units to a nucleotide chain using DNA or RNA as a template and has optimal activity at temperatures above 53 ℃.
As used herein, the terms "unique molecular identifier", "unique molecular identification sequence", "UMI" and "UMI sequence" are used synonymously. Individual barcoded molecules may include a common barcode sequence, such as a partition specific sequence or a spatial array, where each capture probe has a unique barcode sequence.
"binding sequence" refers to a nucleic acid sequence capable of binding to an analyte.
As used herein, the term "variant" refers to a protein derived from a precursor protein (such as a wt MMLV protein as shown in SEQ ID NO: 15) by adding one or more amino acids at either or both of the C-and N-termini or at one or more positions in the amino acid sequence, replacing one or more amino acids at one or more different amino acid positions in the amino acid sequence, or deleting one or more amino acids at either or both of the ends of the protein or at one or more positions in the amino acid sequence. Unless otherwise indicated, SEQ ID NO. 1 is a variant of MMLV and is commonly used as a control enzyme. The preparation of the enzyme variants is preferably achieved by: modifying a DNA sequence encoding a wild-type protein, transforming the DNA sequence into a suitable host, and expressing the modified DNA sequence to form a derivative enzyme. It is well recognized that the preparation of enzyme variants can be accomplished by: modifying a DNA sequence encoding a variant of the wild-type protein, transforming the DNA sequence into a suitable host, and expressing the modified DNA sequence to form a derivative enzyme. Variant reverse transcriptases of the invention include proteins comprising altered amino acid sequences compared to the amino acid sequence of the precursor enzyme, wherein the variant reverse transcriptases retain the characteristic enzymatic properties of the precursor enzyme, but may have altered properties in some particular aspects. For example, an engineered reverse transcriptase variant may have altered pH optimum or increased temperature stability, but may retain its characteristic transcriptional activity.
When optimally aligned for comparison, a "variant" may have at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 88%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or at least about 99.5% sequence identity to an amino acid sequence. Variant residue positions are described with respect to the wild-type amino acid sequence shown in SEQ ID NO. 15; otherwise the amino acid position is indexed to SEQ ID NO. 15.
A protein having a percentage (e.g., at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%) of sequence identity with another sequence means that the percentage of bases or amino acid residues are the same during the comparison of the two sequences when aligned. Such alignment and percent homology or identity may be determined using any suitable software program known in the art, such as those described in CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, ausubel et al, 1987, journal 30, section 7.7.18. Representative procedures include Vector NTI Advance TM 9.0 (Invitrogen Corp.Carlsbad, CA), GCG Pileup, FASTA (Pearson et al (1988) Proc. Natl Acad. ScL USA 85:2444-2448) and BLAST (BLAST Manual, altschul et al, nat 'l Cent. Biotechnol. Inf., nat' l Lib. Med. (NCIB NLM NIH), bethesda, md. and Altschul et al, (1997) Nucleic Acids Res.25:3389-3402). Another typical alignment procedure is ALIGN Plus (Scientific and Educational Software, PA), which typically uses default parameters. Other useful sequence alignment software programs are the TFASTA data search program (Data Searching Program) available in sequence software package (Sequence Software Package) version 6.0 (Genetics Computer Group, university of Wisconsin, madison, WI and CLC Main Workbench (Qiagen) version 20.0).
As used herein, the term "sequencing" generally refers to methods and techniques for determining the sequence of nucleotide bases in one or more polynucleotides. These polynucleotides may be, for example, nucleic acid molecules such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single stranded DNA). Sequencing can be performed by various systems currently available, such as but not limited to Pacific Biosciences/>Oxford />Or Life Technologies (Ion->) Is described. Alternatively or additionally, sequencing may be performed using nucleic acid amplification, polymerase Chain Reaction (PCR) (e.g., digital PCR, quantitative PCR, or real-time PCR), or isothermal amplification. Such systems can provide a plurality of raw genetic data corresponding to genetic information of a subject (e.g., a human) as generated by the systems from a sample provided by the subject. In some examples, such systems provide sequencing reads (also referred to herein as "reads"). Reads may include a sequence of nucleobases corresponding to the sequence of a nucleic acid molecule that has been sequenced. In some cases, the systems and methods provided herein may be used with proteome information.
As used herein, the term "wild-type" or "Wt" refers to a gene or gene product that has the characteristics of that gene or gene product when isolated from a naturally occurring source. The amino acid sequence shown in SEQ ID NO. 15 is a wild-type Murine Moloney Leukemia Virus (MMLV) sequence (Genbank NP-955591.1p80 RT).
It should be understood that the following examples are for illustrative purposes only and are not intended to limit the scope of the claims. Each aspect, embodiment, or feature of the invention may be combined with any other aspect, embodiment, or feature of the invention, unless clearly indicated to the contrary. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
Examples
EXAMPLE 1 capillary electrophoresis analysis of RT mutase
Reverse transcription and sequencing reactions.
The reaction volume was 50. Mu.l; the reaction contained 5 '-end-labeled GAPDH primer, GEM-U reagent (chromium 5' -unicellular assay, 10X Genomics)An RNA template (GAPDH template), template switch oligonucleotide 1 (TSO 1), and an indicated engineered reverse transcriptase. The stock solution concentrations and final concentrations in the reaction system are shown in table 1. For single turnover conditions, the reaction system contains stoichiometric equivalent amounts of enzyme and template. The reaction was incubated at 53℃for 45 min, then at HiDi TM Formamide (thermo Fisher) at a 1:20 dilution. The formamide mixture was heated to 95 ℃ for 5 minutes and then cooled on ice for 2 minutes. Loading samples into Seqstudio TM On a capillary electrophoresis Gene Analyzer (ThermoFisher), DS-33 matrix Standard dye group G5 (ThermoFisher) was selected, and a GS1200LIZ size standard (GeneScan) was used TM 1200LIZ TM Thermo fisher) for long fragment analysis. When Z is 1 And Z 2 When the contents of the channel are mixed, the GEM-U reagent approximates the formulation of the actual reagent mixture in the chromium 5' single cell GEM assay.
TABLE 1 determination of the reagents and templates, primers and TSO sequences by capillary electrophoresis (SEQ ID NOS: 18-20, respectively, in order of appearance)
Experimental results.
FIGS. 6A-6B provide exemplary results demonstrating the transcription (FIG. 6A) and template switching (FIG. 6B) efficiency of eight different engineered MMLV RT variants compared to a control MMLV RT comprising the amino acid of SEQ ID NO: 1. All the variants shown in the bar graph show more or less higher efficiency than the control. The amino acid sequences of variants 1, 2, 3, 4, 5, 6 and 8 are shown in SEQ ID NOS 8, 9, 10, 11, 12, 13 and 14, respectively. The template switching efficiency of variants having the amino acid sequences shown in SEQ ID NOS.8, 9, 10, 11, 12, 13 and 14 is higher than that of the control SEQ ID NO. 1. The amount of full length product (an indicator of transcription efficiency) obtained from variants having the amino acid sequences shown in SEQ ID NO. 8, 9, 10, 12, 13 and 14 is also higher than the control in SEQ ID NO. 1.
FIG. 7 provides additional exemplary results demonstrating the transcription efficiency (left bar of each group; dark gray) and template switching efficiency (right bar of each group; light gray) of additional engineered MMLV RT variants compared to control SEQ ID NO:1, which serves as a control MMLV RT enzyme. MMLV RT variants comprise the amino acid sequences of SEQ ID NO:2, 5, 4, 6 and 7. All MMLV RT variants, except AB and AM, exhibited transcriptional efficiency up to or above about 40% of that shown by the control MMLV RT of SEQ ID NO. 1. Overall, all MMLV RT variants, except AM, showed higher transcription efficiency than the control MMLV RT of SEQ ID NO: 1. MMLV variant AB, SEQ ID NO. 2, SEQ ID NO. 6 and SEQ ID NO. 7 exhibited a template conversion efficiency higher than 70% of the efficiency shown by the control MMLV RT of SEQ ID NO. 1. Variants SEQ ID NO. 2, SEQ ID NO. 6 and SEQ ID NO. 7 show an increase in efficiency for both transcription efficiency and template conversion efficiency compared to control SEQ ID NO. 1.
FIG. 8 shows additional MMLV variants (SEQ ID NOs: 2, 3, 4, 5, 7, 21, 22, 23 and 24) exhibiting similar levels of full length product formation indicating transcription efficiency. However, SEQ ID NO. 24 and SEQ ID NO. 2 show increased transcription efficiency over control SEQ ID NO. 1. It was noted that in variants comprising the L435G or M66L mutation in SEQ ID NO. 15 (wt MMLV position) improved conversion efficiency and target product formation was achieved. When the variants were combined, the improvement was slightly increased. The mutation M39V appears to improve template switching (variants with the amino acid sequences shown in SEQ ID NO:4 and SEQ ID NO: 5), but improves little when combined with M66L. See the results obtained from variants having the amino acid sequences shown in SEQ ID NO. 21 and SEQ ID NO. 3, SEQ ID NO. 2 and SEQ ID NO. 7, and SEQ ID NO. 22 and SEQ ID NO. 23. Variants with one or more of mutations P448A, D449G, H V and H634Y appear to be neutral in this case.
EXAMPLE 2 Single cell 3 'and 5' cDNA yield
Using 3' and 5' configurations (chromium 3' single cell assay)A constant or chrome 5' single cell assay, 10X Genomics), various engineered reverse transcriptases were evaluated in single cell experiments on Peripheral Blood Mononuclear Cells (PBMCs) at a cell load of 1,000. Emulsion droplets contain gel beads with either a barcoded poly dT primer sequence (3 'configuration) or a template switching oligonucleotide sequence barcoded (5' configuration), which also includes UMI and Illumina read 1 sequences. When cells lyse within the droplet, the poly dT primer hybridizes to the poly A tail of the cellular mRNA, which is extended by reverse transcriptase. Once the end of the template is reached, the reverse transcriptase exhibits terminal transferase activity to add three overhangs of non-templated deoxycytidine (CCC) at the 3' end of the synthesized cDNA. The CCC overhang hybridizes to 3 riboguanines (rgrgrgrg) at the 3 'end of the template switch oligonucleotide, allowing reverse transcriptase to "switch" the template and continue synthesis to the 5' end of the template switch oligonucleotide. Depending on the configuration of the gel beads used (3 'or 5'), the barcode and UMI will allow identification of the 3 'or 5' end of the mRNA molecule in the final sequencing library. Reverse transcription at 53℃for 45 min and incubation at 85℃for 5 min, the droplets were broken up and used Purifying the cDNA. cDNA was amplified by PCR, purified with 0.6 XSPRI, and quantified with Agilent BioAnalyzer using a DNA high sensitivity kit. cDNA yield (ng) was determined.
FIG. 9 provides a summary of cDNA yields from a series of experiments with engineered reverse transcriptases having the amino acid sequences shown in SEQ ID NOs 1 (control), 22, 24, 2, 3 and 7. (n=2). Results from the 3 'configuration are shown as left-hand bars for each enzyme, and results from the 5' configuration are shown as right-hand bars for each enzyme. In the 3' experiments, the yields of variants with the M66L mutations (SEQ ID NOS: 2, 3, 7 and 22) and/or the M39V (SEQ ID NOS: 3 or 7) mutations exceeded the cDNA product yields of the control SEQ ID NO:1 (SEQ ID NO:24 without mutations at M39 or M66). These results are comparable to those of total product yield tested using GAPDH mRNA template. Surprisingly, the yield of cDNA product when using the single cell 5' configuration is different from that expected based on the total product yield using the GAPDH mRNA template. For example, as shown in FIG. 9, all variants, regardless of mutation status, exceeded the cDNA yield of control SEQ ID NO. 1.
EXAMPLE 3 Single cell 3' quality metric
In single cell experiments using Peripheral Blood Mononuclear Cells (PBMCs), various variant reverse transcriptases were evaluated using 3 'and 5' reaction conditions. mu.L of amplified cDNA (3 'condition) or 20. Mu.L containing up to 50ng of amplified cDNA (5' condition) was fragmented and A tailed, purified with double sided SPRI (0.6 x/0.8 x) purification, ligated with functional adaptors with Illumina read 2 sequences, purified with 0.8x SPRI, and further amplified with sample index primers including P5 and P7 priming sites and i5 and i7 sample indices. The amplified products were purified with double sided (0.6 x/0.8 x) SPRI and the average size was determined using a DNA high sensitivity kit with Agilent BioAnalyzer. The purified amplification product was quantified by qPCR and was found in Illumina NovaSeq TM Purified amplification products were pooled for next generation sequencing, targeted to a sequencing depth of at least 50,000 reads per cell, and used the following running parameters (read 1:28 cycles, i7 index: 10 cycles, i5 index: 10 cycles, read 2:90 cycles). The data is collected, demultiplexed and processed. Standard quality metrics are obtained.
Single cell 5 'reactions use less enzyme and TSO oligonucleotides than single cell 3' reactions. The length of the 5'TSO oligonucleotide is also twice that of the 3' TSO oligonucleotide with a different sequence background due to UMI and the presence of the barcode. Single cell 5 'reaction conditions are generally considered to be more stringent performance tests than 3' single cell reaction conditions. The results from this series of experiments (3' reaction conditions) are summarized in fig. 10 and 11. The results from this series of experiments (5' reaction conditions) are summarized in fig. 12 and 13.
As shown in fig. 10, all variants with the M66L mutation showed improved sensitivity at 50k reads per cell, but the extent of improvement was contextually relevant under 3' reaction conditions. This trend correlates well with capillary electrophoresis data, where the engineered variant reverse transcriptase of SEQ ID NO. 24 performs poorly relative to other variants. Surprisingly, only the variant of SEQ ID NO. 2 shows a significant improvement at 20k reads/cell. The variant reverse transcriptase of SEQ ID No. 2 lacks the M39V mutation present in SEQ ID No. 3 and SEQ ID No. 7. Surprisingly, the M39V mutation improved in vitro template conversion efficiency. However, the M39V mutation alone does not appear to provide significant additional benefits when combined with M66L. Furthermore, the engineered variant reverse transcriptase of SEQ ID NO. 2 lacks the P448A and D449G mutations present in SEQ ID NO. 1, 22 and 7. Surprisingly, SEQ ID NOs 22 and 7 have similar sensitivities. In this case, the P448A and D449G mutations do not appear to alter sensitivity. Surprisingly, engineered reverse transcriptases with M66L changes, P448A, D449G and/or M39V suffered a loss in mapping reads to transcriptomes. Except for the engineered reverse transcriptase SEQ ID NO. 2.
FIG. 11 shows that most variants yield equivalent metrics for effective UMI, effective barcode, ribosomal UMI, mitochondrial UMI, transcript coverage, reads with any poly (A) sequence, reads with any switch oligonucleotide sequence, and reads with primer or homopolymer sequences under 3' reaction conditions. However, when the reads mapped to the transcriptome in the library generated by some variants with the M66L mutation in combination with P448A, D449G and/or M39V were evaluated, the reads mapped to the transcriptome were reduced. Surprisingly, variants of SEQ ID NO. 2 comprising M66L showed improved template conversion efficiency and maintained read levels mapped to the transcriptome similar to control RT of SEQ ID NO. 1.
FIG. 12 shows that under 5' reaction conditions, the engineered reverse transcriptase variant with the amino acid sequence shown in SEQ ID NO. 2 shows a significant improvement in sensitivity. Engineered reverse transcriptases with M66L, P448A, D449G and/or M39V substitutions suffer from the loss of mapping sequence reads to transcriptomes.
Figure 13 shows that most variants yield equivalent metrics for effective UMI, effective bar code, ribosomal UMI, mitochondrial UMI, transcript coverage, reads with any poly (a) sequence, reads with any switch oligonucleotide sequence, and reads with primer or homopolymer sequences under 5' reaction conditions. However, when the reads mapped to the transcriptome in the library generated by most variants with the M66L mutation in combination with P448A, D449G and/or M39V were evaluated, the reads mapped to the transcriptome were reduced. Surprisingly, variants with the M66L mutation having the amino acid sequence shown in SEQ ID NO. 2 show improved template conversion efficiency and the read level mapped to the transcriptome is less affected than when other engineered reverse transcriptases are used.
Example 4 Single cell sensitivity and mapping
In single cell experiments with human Peripheral Blood Mononuclear Cells (PBMC) and mouse peripheral blood mononuclear cells (C57B/L6), various engineered reverse transcriptases (SEQ ID NOS: 2, 7, 24 and 25) were evaluated using the 3 'and 5' reaction conditions as described above. Sensitivity and mapping were evaluated. The results of the engineered reverse transcriptase are compared to the results of commercially available engineered MMLV. The results of this series of experiments are summarized in fig. 14.
In FIGS. 14A-14B, the engineered reverse transcriptase variants were evaluated in both 5 'and 3' chemistry using human and mouse peripheral blood mononuclear cells. The percent change was compared to the commercially available MMLV reverse transcriptase as control RT. The median gene and median UMI changes queried at the time of each cell 20k read (fig. 14A) are shown, as well as changes in reads mapped to the transcriptome and reads mapped to the exon (fig. 14A). The amino acid sequences of the engineered reverse transcriptases are shown in SEQ ID NO. 2, SEQ ID NO. 7, SEQ ID NO. 24 and SEQ ID NO. 25. As shown in fig. 14A-14B, the improvement in 5 'and 3' chemistry was more pronounced in mouse PBMC than in human PBMC. Note the improvement in sensitivity of the engineered reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 2 to Gene Expression (GEX). It was also noted that the reads obtained with SEQ ID NO. 2 mapped to transcriptomes or exons were reduced compared to the control.
In addition, t-distribution field insertion (t-SNE) plots and scatter plots were used to assess the homogeneity of cell populations assessed for engineered reverse transcriptase variants having the amino acid sequences shown in SEQ ID No. 2 and SEQ ID No. 7 as compared to SEQ ID No. 1 (control). The results from the t-SNE analysis and scatter plot are shown in FIGS. 15A-15C.
The engineered reverse transcriptase having the amino acid sequence shown in SEQ ID No. 2 shows a close correlation in both human and mouse samples, as seen in the scatter plots for each variant (FIGS. 15A-15B). At least in human PBMC samples, variant SEQ ID NO. 2 may show better correlation than seen with SEQ ID NO. 7. (FIG. 15A). The engineered reverse transcriptase with the amino acid sequence shown in SEQ ID No. 7 shows a tighter correlation in the 5' and 3' chemistry in mouse cells than in human cells (3 ' data not shown). As shown in FIG. 15C, the overlapping t-SNE plot of the enzymes shows that the engineered reverse transcriptase with the amino acid sequences shown in SEQ ID NO. 2 and SEQ ID NO. 1 (control) shows homogeneity in the cell population compared to the engineered reverse transcriptase with the amino acid sequence shown in SEQ ID NO. 7.
EXAMPLE 5 immunotyping and TCR improvement
Immunoprofiling is an extension of 5' chemistry for analysis of specific genes for T cell and/or B cell receptors in mRNA libraries. Methods of immunotyping are known in the art and typically involve additional rounds of PCR on the cDNA with sequence-specific primer libraries to allow targeted enrichment of T-cell and/or B-cell receptor genes. Immunotyping can also detect UMI of B cell receptor genes, namely IGH, IGK and IGL (immunoglobulin heavy chain (IGH), kappa chain (IGK) and light chain (IGL)). The immunospectral analysis data provides information for immunological studies and is an extension of standard gene expression evaluations. Methods of immunotyping include, but are not limited to Chromium Next Gen Single Cell TM Kit (10X Genomics,Pleasanton CA).
To determine the efficiency of the novel MMLV RT variants disclosed herein (table 2) in single cell V (D) J assays, two additional rounds of PCR enrichment and TCR immunotyping assays were performed on amplified cDNA (2 μl) from the 5' configuration of the reverse transcription reaction, including double sided (0.5 x/0.8 x) SPRI purification between the first and second round of thermocycling reactions. The amplified product was then purified with subsequent double sided (0.5 x/0.8 x) SPRI, fragmented and tailed, ligated with functional adaptors with Illumina read 2 sequences, purified with 0.8x SPRI, and then further amplified with sample index primers comprising P5 and P7 priming sites and i5 and i7 sample indices. The amplified products were purified with 0.8x SPRI and the average size was determined using a DNA high sensitivity kit with Agilent BioAnalyzer. The material was then quantified by qPCR and pooled on Illumina NovaSeq for next generation sequencing, targeting sequencing depth of at least 5,000 reads per cell, and using the following operating parameters (read 1:28 cycles, i7 index: 10 cycles, i5 index: 10 cycles, read 2:90 cycles). Data were collected, demultiplexed and subjected to single cell V (D) J analysis.
The results obtained from the engineered reverse transcriptase are compared with those obtained from the control SEQ ID NO. 1. The percentage change of the median TRA UMI and median TRB UMI is shown in fig. 16. Figure 16 also shows the percentage change of median IGH, IGK and IGL relative to mouse PBMC. In human PBMC and mouse PBMC, the median TRA UMI and median TRB UMI obtained with the engineered reverse transcriptase having the amino acid sequence shown in SEQ ID NO. 2 are both greater than those obtained with SEQ ID NO. 1. Engineered reverse transcriptase variants previously demonstrated to exhibit IG sensitivity exhibit equivalent or improved IG sensitivity (compared to previous ATP results). In mouse PBMC, the median IGH UMI, median IGK UMI and median IGL UMI obtained with enzymes having the amino acid sequences shown in SEQ ID NO. 2, SEQ ID NO. 25 or SEQ ID NO. 24 are larger than those obtained with SEQ ID NO. 1 (right panel). The results obtained with the engineered reverse transcriptase having the amino acid sequence shown in SEQ ID No. 2 are significantly higher than those obtained with the engineered reverse transcriptase having the amino acid sequence shown in SEQ ID No. 25 or SEQ ID No. 24. The improvement shown with mouse PBMC was similar to the results observed with gene expression GEX (fig. 14).
Incorporated by reference
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference in their entirety to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

Claims (32)

1. An engineered reverse transcriptase comprising the amino acid sequence of SEQ ID No. 15 and further comprising a combination of mutations selected from the group consisting of:
(a) E69K, L139P, E R, T306K, W313F, T330P, N454K; and one or more of M39V, P47L, M66L, F155Y, D200N, D200E, H R, G429S, L435G, L435K, P448A, D449G, H503V, D524N, T542D, E545G, D583N, H594Q, L603W, L603F, E K, E607G, P627S, H Y, H638G, A644V, D653H, K658R and L671P; or (b)
(b) E69K, L139P, D200N, E302R, T K, W313F, T330P, L G, P448A, D449G, N454K, D524N, L603W and E607K, and one or more of M39V, P47L, M66L, F155Y, H429Y, H503Y, H542 545Y, H583 594Y, H627Y, H634Y, H638Y, H653Y, H658R and L671P.
2. The engineered reverse transcriptase of claim 1, wherein said engineered reverse transcriptase comprises an amino acid sequence that is at least 90% identical to an amino acid sequence selected from the group consisting of SEQ ID No. 2,SEQ ID NO:3,SEQ ID NO:4,SEQ ID NO:5,SEQ ID NO:6,SEQ ID NO:7,SEQ ID NO:8,SEQ ID NO:9,SEQ ID NO:10,SEQ ID NO:11,SEQ ID NO:12,SEQ ID NO:13,SEQ ID NO:14,SEQ ID NO:22,SEQ ID NO:23,SEQ ID NO:24,SEQ ID NO:25,SEQ ID NO:26,SEQ ID NO:27,SEQ ID NO:28,SEQ ID NO:29,SEQ ID NO:30,SEQ ID NO:30,SEQ ID NO:31,SEQ ID NO:32,SEQ ID NO:33,SEQ ID NO:34,SEQ ID NO:35,SEQ ID NO:36 and SEQ ID No. 37.
3. The engineered reverse transcriptase of claim 1 or claim 2, wherein said engineered reverse transcriptase exhibits enhanced reverse transcriptase activity compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID No. 1.
4. The engineered reverse transcriptase of claim 3, wherein said enhanced reverse transcriptase activity is selected from the group consisting of processivity, template switching efficiency, binding affinity, and transcription efficiency.
5. The engineered reverse transcriptase of claim 3 or claim 4, wherein said enhanced reverse transcriptase activity is enhanced Template Switching (TS) efficiency compared to said template switching efficiency of a reverse transcriptase having said amino acid sequence shown in SEQ ID No. 1.
6. The engineered reverse transcriptase of any one of claims 3-5, wherein said enhanced reverse transcriptase activity is enhanced transcription efficiency compared to said transcription efficiency of a reverse transcriptase having said amino acid sequence shown in SEQ ID No. 1.
7. The engineered reverse transcriptase of any one of claims 3-6, wherein said enhanced reverse transcriptase activity is said enhanced transcription efficiency and template conversion efficiency compared to a reverse transcriptase having said amino acid sequence shown in SEQ ID No. 1.
8. The engineered reverse transcriptase of any one of claims 3-7, wherein said enhanced reverse transcriptase activity is said increased binding affinity compared to said binding affinity of a reverse transcriptase having said amino acid sequence shown in SEQ ID No. 1.
9. The engineered reverse transcriptase of any one of claims 3-8, wherein said enhanced reverse transcriptase activity is increased binding affinity and template conversion efficiency compared to a reverse transcriptase having said amino acid sequence shown in SEQ ID No. 1.
10. The engineered reverse transcriptase of any one of claims 3 to 9, wherein said enhanced reverse transcriptase activity is enhanced processivity compared to said processivity of a reverse transcriptase having said amino acid sequence shown in SEQ ID No. 1.
11. The engineered reverse transcriptase of any one of claims 3-10, wherein said enhanced reverse transcriptase activity is enhanced ability to produce mitochondrial UMI counts compared to a reverse transcriptase having said amino acid sequence shown in SEQ ID No. 1.
12. The engineered reverse transcriptase of any one of claims 3 to 11, wherein said enhanced reverse transcriptase activity is enhanced ability to produce ribosomal UMI counts compared to a reverse transcriptase having said amino acid sequence set forth in SEQ ID No. 1.
13. The engineered reverse transcriptase of any one of claims 1 to 12, wherein the amino acid sequence of said engineered reverse transcriptase comprises E69K, L P, D200N, E302R, T306K, W313F, T35330P, N454K, H503V, D524N, L603W, E607K and H634Y.
14. The engineered reverse transcriptase of claim 13, wherein the amino acid sequence of said engineered reverse transcriptase further comprises a combination of mutations selected from the group consisting of:
(a) M66L and L435G;
(b) M39V, M L and L435K;
(c) M39V and L435K;
(d) M66L, L435G, P448A and D449G;
(e) M39V, M66L, L435G, P448A and D449G; and
(f)M66L。
15. the engineered reverse transcriptase of any one of claims 1-14, wherein the amino acid sequence of said engineered reverse transcriptase comprises E69K, L P, D200N, E302R, T306K, W313F, T330P, L435G, P448A, D449G, N454K, D524N, L W and E607K; and further comprising a combination of mutations selected from the group consisting of:
(a)M66L;
(b) M66L and H503V;
(c) M66L and H634Y; and
(d) M66L, H V and H634Y.
16. The engineered reverse transcriptase of any one of claims 1-15, wherein the amino acid sequence of said engineered reverse transcriptase comprises an M39V, E69K, L139P, D mutation, an E302R, T306K, W313F, T330P, G429S, P448A, D449 mutation, an L435K, N454K, L603 mutation, an E607 mutation, and an L671P, and further comprises a second combination of mutations selected from the group consisting of:
(a) D524N, T542D, P627S, A644V, D653H, K658R mutation, and wherein said D200 mutation is a D200N mutation, said D449 mutation is D449G, said L603 mutation is L603W, and said E607 mutation is an E607G mutation;
(b) D524N, T542D, A644V, D653H, R H and K658R, and wherein said D200 mutation is a D200N mutation, said D449 mutation is a D449E mutation, said L603 mutation is an L603W mutation, and said E607 mutation is an E607G mutation;
(c) E545G, D583N and H594Q, and wherein the D200 mutation is a D200N mutation, the D449 mutation is a D449G mutation, the L603 mutation is an L603F mutation, and the E607 mutation is an E607K mutation;
(d) D524N, T542D, A644V, D653H and K658R, wherein said D200 mutation is a D200N mutation, said D449 mutation is a D449E mutation, said L603 mutation is an L603W mutation, and said E607 mutation is an E607G mutation;
(e) H204R, D524N, T542D, P627S, D583N, A644V, D653H and K658R, wherein said D200 mutation is a D200E mutation, said D449 mutation is a D449G mutation, said L603 mutation is an L603W mutation, said E607 mutation is an E607G mutation;
(f) H204R, E545G, D583N and H594Q, wherein said D200 mutation is a D200E mutation, said D449 mutation is a D449G mutation, said L603 mutation is an L603F mutation, and said E607 mutation is an E607K mutation; and
(g) P47L, D524N, T542D, D583N, P627S, A644V, D653H and K658R, wherein said D200 mutation is a D200N mutation, said D449 mutation is a D449G mutation, said L603 mutation is an L603W mutation, and said E607 mutation is an E607G mutation.
17. An engineered reverse transcriptase comprising an amino acid sequence selected from the group consisting of SEQ ID No. 2,SEQ ID NO:3,SEQ ID NO:4,SEQ ID NO:5,SEQ ID NO:6,SEQ ID NO:7,SEQ ID NO:8,SEQ ID NO:9,SEQ ID NO:10,SEQ ID NO:11,SEQ ID NO:12,SEQ ID NO:13,SEQ ID NO:14,SEQ ID NO:22,SEQ ID NO:23,SEQ ID NO:24,SEQ ID NO:25,SEQ ID NO:26,SEQ ID NO:27,SEQ ID NO:28,SEQ ID NO:29,SEQ ID NO:30,SEQ ID NO:30,SEQ ID NO:31,SEQ ID NO:32,SEQ ID NO:33,SEQ ID NO:34,SEQ ID NO:35,SEQ ID NO:36 and SEQ ID No. 37.
18. The engineered reverse transcriptase of claim 17, wherein said engineered reverse transcriptase exhibits enhanced reverse transcriptase activity compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID No. 1.
19. The engineered reverse transcriptase of claim 18, wherein said enhanced reverse transcriptase activity is selected from the group consisting of reverse transcriptase-related activities comprising processivity, template conversion efficiency, binding affinity, and transcription efficiency.
20. An engineered reverse transcriptase comprising the amino acid sequence of SEQ ID No. 15 and further comprising a combination of mutations selected from the group consisting of: T542D, D583N, E607G, A644V, D653H, K658R, E545G, D583N, H594Q and L603F.
21. The engineered reverse transcriptase of claim 1 or claim 20, wherein said engineered reverse transcriptase comprises:
(a) An amino acid sequence that is at least about 90%, at least about 92%, at least about 95%, at least about 97%, at least about 98%, or at least about 99% identical to an amino acid sequence selected from the group consisting of SEQ ID NO. 8, SEQ ID NO. 9, SEQ ID NO. 10, SEQ ID NO. 11, SEQ ID NO. 12, SEQ ID NO. 13, and SEQ ID NO. 14; or (b)
(b) Amino acid sequence selected from the group consisting of SEQ ID NO. 8, SEQ ID NO. 9, SEQ ID NO. 10, SEQ ID NO. 11, SEQ ID NO. 12, SEQ ID NO. 13 and SEQ ID NO. 14.
22. The engineered reverse transcriptase of claim 20 or claim 21, wherein said engineered reverse transcriptase exhibits enhanced reverse transcriptase activity compared to a reverse transcriptase having the amino acid sequence shown in SEQ ID No. 1.
23. The engineered reverse transcriptase of claim 17, wherein said enhanced reverse transcriptase activity is selected from the group consisting of reverse transcriptase-related activities comprising rnase H activity, processivity, template conversion efficiency, binding affinity, and transcription efficiency.
24. An engineered reverse transcriptase comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of SEQ ID No. 2.
25. An engineered reverse transcriptase comprising an amino acid sequence that is at least 90% identical to the amino acid sequence of SEQ ID No. 24.
26. The engineered reverse transcriptase of any one of claims 1 to 25, wherein said engineered reverse transcriptase has one or more of the following characteristics when compared to a wild type reverse transcriptase or a reverse transcriptase comprising the amino acid of SEQ ID No. 1:
(a) The thermal stability is increased;
(b) An increase in thermal reactivity;
(c) Increased resistance to reverse transcriptase inhibitors;
(d) The ability to reverse transcribe difficult templates increases;
(e) The speed increases;
(f) Continuous synthesis capacity increases;
(g) Increased specificity;
(h) Enhancement of polymerization activity; or (b)
(i) The sensitivity increases.
27. The engineered reverse transcriptase of claim 26, wherein:
(a) An increase in thermal reactivity, resistance to reverse transcriptase inhibitors, ability to reverse transcription difficulty templates, speed, processivity, specificity, or sensitivity of about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, or about 100% compared to wild type reverse transcriptase or reverse transcriptase comprising the amino acid of SEQ ID No. 1; or (b)
(b) The polymerization activity is enhanced by about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, or about 100% as compared to a wild-type reverse transcriptase or reverse transcriptase comprising the amino acid of SEQ ID NO. 1.
28. An isolated nucleic acid molecule encoding the engineered reverse transcriptase of any one of claims 1-27.
29. An expression vector comprising the isolated nucleic acid of claim 28.
30. A host cell transfected with the expression vector of claim 29.
31. A method of using the engineered reverse transcriptase of any one of claims 1 to 27, said method comprising contacting said engineered reverse transcriptase with a nucleic acid template under suitable conditions to produce a polymerized nucleic acid product, wherein said nucleic acid template is RNA, DNA, or a nucleic acid comprising non-natural nucleotides.
32. A method of nucleic acid extension, the method comprising:
(a) Contacting a target nucleic acid molecule with an engineered reverse transcriptase and a plurality of barcoded nucleic acid molecules comprising a barcode sequence, an
(b) Incubating the target nucleic acid, the engineered reverse transcriptase, and the barcoded molecule under conditions in which the barcoded molecule is extended by the engineered reverse transcriptase, wherein the engineered reverse transcriptase comprises an amino acid sequence of the engineered reverse transcriptase of any one of claims 1 to 27.
CN202280044014.6A 2021-06-14 2022-06-13 Reverse transcriptase variants for enhanced performance Pending CN117693582A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/210,143 2021-06-14
US202163290329P 2021-12-16 2021-12-16
US63/290,329 2021-12-16
PCT/US2022/033199 WO2022265965A1 (en) 2021-06-14 2022-06-13 Reverse transcriptase variants for improved performance

Publications (1)

Publication Number Publication Date
CN117693582A true CN117693582A (en) 2024-03-12

Family

ID=90133879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280044014.6A Pending CN117693582A (en) 2021-06-14 2022-06-13 Reverse transcriptase variants for enhanced performance

Country Status (1)

Country Link
CN (1) CN117693582A (en)

Similar Documents

Publication Publication Date Title
US11932882B2 (en) Reverse transcriptase variants
WO2020056924A1 (en) Method for detecting nucleic acid
JP5383189B2 (en) RNA-dependent RNA polymerase, method and kit for amplifying and / or labeling RNA
JP3860809B2 (en) Nucleic acid sequence amplification
US20210139884A1 (en) Reverse transcriptase for nucleic acid sequencing
JP5985503B2 (en) Universal reference dye for quantitative amplification
CN102177250A (en) Method for direct amplification from crude nucleic acid samples
EP3568493A1 (en) Methods and compositions for reducing redundant molecular barcodes created in primer extension reactions
CN113692447A (en) System for controlling a power supply
JP2017523803A (en) Heat-sensitive exonuclease
US20040214292A1 (en) Method of producing template DNA and method of producing protein in cell-free protein synthesis system using the same
WO2022265965A1 (en) Reverse transcriptase variants for improved performance
WO2023114473A2 (en) Recombinant reverse transcriptase variants for improved performance
CN113811617A (en) Methods and systems for proteomic profiling and characterization
US9657337B2 (en) Reaction buffer for microarray
CN117693582A (en) Reverse transcriptase variants for enhanced performance
WO2022232571A1 (en) Fusion rt variants for improved performance
US20120135472A1 (en) Hot-start pcr based on the protein trans-splicing of nanoarchaeum equitans dna polymerase
EP4355866A1 (en) Reverse transcriptase variants for improved performance
WO2021262013A1 (en) Bst-nec dna fusion polymerase for use in isothermal replication of specific sars cov-2 virus sequences
US20230374475A1 (en) Engineered thermophilic reverse transcriptase
CN117321195A (en) Fusion RT variants for enhanced performance
US20240174991A1 (en) Fusion rt variants for improved performance
US20240174990A1 (en) Reverse transcriptase variants
JP7413283B2 (en) Methods for introducing mutations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication