WO2010077288A2 - Methods for identifying differences in alternative splicing between two rna samples - Google Patents

Methods for identifying differences in alternative splicing between two rna samples Download PDF

Info

Publication number
WO2010077288A2
WO2010077288A2 PCT/US2009/006450 US2009006450W WO2010077288A2 WO 2010077288 A2 WO2010077288 A2 WO 2010077288A2 US 2009006450 W US2009006450 W US 2009006450W WO 2010077288 A2 WO2010077288 A2 WO 2010077288A2
Authority
WO
WIPO (PCT)
Prior art keywords
rnas
population
subtractive
rna
cdnas
Prior art date
Application number
PCT/US2009/006450
Other languages
French (fr)
Other versions
WO2010077288A3 (en
Inventor
Gene Yeo
Jonathan Scolnick
Fred H. Gage
Original Assignee
The Salk Institute For Biological Studies
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Salk Institute For Biological Studies filed Critical The Salk Institute For Biological Studies
Publication of WO2010077288A2 publication Critical patent/WO2010077288A2/en
Publication of WO2010077288A3 publication Critical patent/WO2010077288A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1072Differential gene expression library synthesis, e.g. subtracted libraries, differential screening
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates

Definitions

  • the invention relates to molecular methods that can be used in highly parallel genome-wide analyses to identify alternative, e.g., unique and/or previously uncharacterized, splice isoform(s) of one or more genes of interest.
  • AS Alternative splicing
  • probe designs and labeling protocols used for microarray experiments tend to be biased towards the 3' end of the gene, and unless multiple probes are to match each specific exon-exon junction that might be spliced together in an alternative splicing event, certain splice isoforms can remain undetected and uncharacterized.
  • EST databases can be used to compare, e.g., mRNA splice variants across tissues, but splice events in which introns are retained in the mature mRNA are difficult to distinguish from database artifacts that comprise, e.g., pre-mRNA or genomic sequences.
  • compositions and highly parallel methods that can be used for the genome-wide detection of splice isoforms of one or more target gene that are present in, e.g., a tissue of interest at a developmental stage of interest, an organism of interest diagnosed with a disease of interest, and the like.
  • the invention described herein fulfills these and other needs, as will be apparent upon review of the following.
  • the present invention provides methods and related compositions useful for identifying one or more additional splice isoforms of, e.g., a target gene that is present in a sample population of RNAs but not in a subtractive population of RNAs.
  • mRNA species common to both the subtractive and sample RNA populations are removed, leaving mRNA splice isoform(s), e.g., cell type-specific splice isoforms, tissue-specific splice isoforms, disease-specific splice isoforms, and the like, that are unique to the sample population for further manipulation and analysis, e.g., sequencing, e.g., using an automated high-throughput sequencing system.
  • the methods and compositions provided by the invention can advantageously permit the identification of splice isoforms that would be otherwise difficult to isolate using methods that entail, e.g., designing probes specific to all possible exon-exon junctions, a priori knowledge of coding regions in a pre-mRNA, a sequenced genome, etc.
  • the invention provides methods of determining whether a sample population of RNAs includes one or more additional splice isoforms that are not present in a subtractive population of RNAs.
  • the invention provides methods of separating one or more alternate isoform of a target gene from a sample population of RNAs.
  • the methods include providing a subtractive population of RNAs that comprises a first isoform of the target gene and a sample population of RNAs that comprises at least one or more alternate isoform of the target gene.
  • the methods include reverse transcribing the subtractive RNAs to generate a population of subtractive cDNAs, fragmenting the sample RNAs, e.g., via enzymatic digestion, sonication, mechanical shearing, electrochemical cleavage, chemical cleavage, and/or nebulization, and hybridizing the resulting RNA fragments to the subtractive cDNAs.
  • RNA fragments in the sample population that comprise the first isoform of the target gene hybridize to the subtractive cDNAs to produce a subpopulation of RNA: cDNA duplexes, and RNA fragments comprising one or more alternate splice isoform(s) of the target gene comprise a subpopulation of unhybridized RNAs.
  • the methods include removing the RNA: cDNA duplexes from unhybridized RNA fragments, thereby separating the one or more alternate splice isoform of the target gene from the sample population of RNAs.
  • the methods optionally include reverse transcribing the unhybridized RNA fragments, sequencing the resulting cDNAs, e.g., using an automated high-throughput sequencing system, and comparing the sequences of the cDNAs to a sequence of the target gene, e.g., to characterize the additional isoforms of the target gene.
  • Reverse transcribing the unhybridized RNA fragments can optionally include ligating linkers, e.g., linkers that optionally comprise, e.g., a primer hybridization sequence and, e.g., any one or more of the moieties described below, to first ends of the fragments, annealing DNA primers to the linkers, and extending the primers with a reverse transcriptase to produce the cDNAs.
  • linkers e.g., linkers that optionally comprise, e.g., a primer hybridization sequence and, e.g., any one or more of the moieties described below, to first ends of the fragments, annealing DNA primers to the linkers, and extending the primers with a reverse transcriptase to produce the cDNAs.
  • the subtractive and sample populations of RNAs can be derived from any of a variety of cell types in any of a variety organisms, e.g., mammals.
  • the sample population of RNAs can optionally be derived from a first cell type, e.g., a non-disease cell, from an organism, and the subtractive population of RNAs can derived from a second cell type, e.g., a disease cell, from the same organism.
  • the sample population can be derived from a particular cell type in an organism and the subtractive population of RNAs can be derived from the same cell type in a second organism of the same species.
  • a sample population of RNAs can optionally be derived from a cell type in an organism at a first developmental stage, and the subtractive population of RNAs can be derived from the same cell type in the same organism at a second developmental stage.
  • the sample population of RNAs can be derived from a first cell type from a first organism, which cell type has been exposed to a first treatment, and the subtractive population of RNAs can be derived from the same cell type that has been exposed to a second treatment.
  • the sample population of RNAs can be derived from a cell, and the subtractive RNAs can be derived from a synthetic source, e.g., a population of oligos comprising a defined set of known splice isoforms. This particular embodiment can be of beneficial use in diagnostic and prognostic assays.
  • Reverse transcribing the subtractive population of RNAs can optionally include annealing tagged DNA primers to 3' ends of the RNAs in the subtractive population and extending the tagged primers with a reverse transcriptase.
  • the tags on the DNA primers can optionally include any of a variety of moieties, e.g., one or more ligand, fluorescent label, blocking group, phosphorylated nucleotide, nucleotide analog, fluorinated nucleotide, nucleotide comprising a heavy atom, biotinylated nucleotide, methylated nucleotide, uracil, sequence capable of forming hairpin secondary structure, oligonucleotide hybridization site, restriction site, DNA promoter, protein binding site, sample or library identification sequence, a thiol linker, a phosphorothioated nucleotide, an amine-reactive nucleotide, and/or cis regulatory sequence.
  • Removing the subpopulation of RNA: cDNA duplexes from the unhybridized RNAs, e.g., to isolate the alternate isoforms(s) of a target gene can optionally include, e.g., electrophoresis or digesting the RNA: cDNA duplexes with RNAseH and a DNAse.
  • the methods can further include annealing tagged DNA primers to 3' ends of the RNAs in the subtractive population, extending the tagged primers with a reverse transcriptase to produce tagged cDNAs.
  • RNA fragments can be hybridized to the population of tagged cDNAs to produce a subpopulation of RNA: tagged cDNA duplexes and a subpopulation of unhybridized RNAs; and the RNA: tagged cDNA duplexes can be separated from the unhybridized RNA fragments via affinity purification.
  • the subtractive and sample populations of RNAs can be reversed.
  • cDNAs which are produced by reverse transcribing sample RNAs, e.g., using any of the methods described above, can be hybridized to RNA fragments produced from a subtractive population of RNAs, e.g., using any of the methods described above.
  • the RNA: cDNA duplexes can then be removed from unhybridized RNA fragments to identify one or more splice isoform, e.g., that is present in the subtractive population of RNAs and absent from the sample population of RNAs.
  • These RNA fragments can be optionally be reverse transcribed and analyzed, as described above. Accordingly, the methods provided by the invention can be used to determine whether there are any additional splice isoforms of a target gene in the subtractive population of RNAs not present in the sample population of RNAs.
  • compositions that include a subpopulation of RNA: cDNA duplexes and a subpopulation of unhybridized RNA fragments that have been produced by the methods described above. It will be apparent to those of skill in the art that the methods and compositions provided by the invention can optionally be used alone or in combination.
  • Kits are also a feature of the invention.
  • the present invention provides kits that include useful reagents, e.g., tagged DNA primers, affinity columns, and/or one or more enzymes that are used in the methods, e.g., a reverse transcriptase, a DNA polymerase, etc.
  • useful reagents e.g., tagged DNA primers, affinity columns, and/or one or more enzymes that are used in the methods, e.g., a reverse transcriptase, a DNA polymerase, etc.
  • Such reagents are most preferably packaged in a fashion to enable their use.
  • kits of the invention optionally include additional reagents, such as a control target nucleic acids, buffer solutions and/or salt solutions, including, e.g., divalent metal ions, i.e., Mg ++ , Mn ++ and/or Fe ++ , nucleic acid adapter tags, e.g., to prepare unhybridized RNA fragments for sequencing, e.g., using a currently available or future automated high-throughput sequencing system.
  • additional reagents such as a control target nucleic acids, buffer solutions and/or salt solutions, including, e.g., divalent metal ions, i.e., Mg ++ , Mn ++ and/or Fe ++ , nucleic acid adapter tags, e.g., to prepare unhybridized RNA fragments for sequencing, e.g., using a currently available or future automated high-throughput sequencing system.
  • kits also typically include a container to hold the kit components, instructions for use of the compositions, e.g., to practice the methods, and other reagents in accordance with the desired application methods, e.g., identifying exon-exon junctions, or other characteristics of alternately spliced mRNA isoforms of a target gene.
  • derived from is used to refer to the original source organism, tissue, cells, etc. from which, e.g., a population of RNAs to be used with the methods of the invention, was obtained.
  • populations of RNAs can be derived from, e.g., a cell line or a eukaryotic organism, including, but not limited to, mammals, nematodes, insects, etc.
  • Linker As used herein, a linker is a short, single-stranded nucleic acid 2-20 nucleotides in length that can be attached to a single stranded nucleic acid, e.g., an RNA, via ligation or by extending the linker, e.g., with a reverse transcriptase.
  • a nucleic acid linker can include any one or more of an oligonucleotide hybridization site, a restriction site, a DNA promoter, a protein binding site, a sample or library identification sequence, a thiol linker, a phosphorothioated nucleotide, an amine-reactive nucleotide, a cis regulatory sequence, modified nucleotide or nucleotide analog, and/or the like.
  • Subtractive cDNAs are the cDNAs that are produced from the reverse transcription of a subtractive population of RNAs.
  • subtractive cDNAs are hybridized to a sample population of RNAs, and, thus, species common to both the subtractive and sample RNA populations form RNA: cDNA duplexes.
  • these RNA: cDNA duplexes are removed from or "subtracted" from the hybridization reaction.
  • Subtractive cDNAs can optionally comprise a tag that includes, e.g., any one or more of the moieties described herein, and such tags can permit the removal/separation of the resulting RNA: cDNA hybrids from unhybridized RNA fragments, e.g., fragments derived from sample RNAs that comprise additional isoform(s) of one or more target gene.
  • Subtractive RNAs As used herein, “subtractive RNAs" (or, alternately, a
  • “subtractive population of RNAs” are a reference population of RNAs that are used to remove mRNA species that common to both a subtractive RNA population and the sample RNA population to which the subtractive population is being compared.
  • Subtractive cDNAs are produced from subtractive RNAs, e.g., via reverse transcription.
  • Subtractive RNAs can be derived from any of a number of sources, e.g., cells, tissues, organisms, etc., and typically represent a control population of RNAs to which, e.g., a sample population of RNAs is compared to, e.g., identify one or more additional splice isoforms of a target gene that are present in the sample population of RNAs and absent from the subtractive population of RNAs.
  • Tags refers to a moiety linked to a molecule of interest that can be used as a molecular label to detect the molecule of interest in population and/or as a by which to separate the molecule of interest from the population.
  • tags can be hybridized to the ends of the nucleic acid fragments and extended with a polymerase to produce tagged fragments or ligated to the ends of the nucleic acid fragments with a ligase.
  • Tags can comprise any one or more moieties that include, e.g., a ligand, a fluorescent label, a blocking group, a phosphorylated nucleotide, a nucleotide analog, a fluorinated nucleotide, a nucleotide comprising a heavy atom, a biotinylated nucleotide, a methylated nucleotide, a uracil, a sequence capable of forming hairpin secondary structure, an oligonucleotide hybridization site, a restriction site, a DNA promoter, a protein binding site, a sample or library identification sequence, a thiol linker, a phosphorothioated nucleotide, an amine-reactive nucleotide, and/or a cis regulatory sequence.
  • a tag can comprise a linker, described above.
  • treatment refers to a defined set of experimental conditions to which, e.g., a source organism, a source tissue, a source cell line, etc. was exposed prior to the collection of its RNA.
  • a treatment can include, e.g., exposing a clonal population of undifferentiated T cells to a defined set of cytokines for a defined length of time; exposing a disease cell line to a drug; etc.
  • Figure 1 provides a schematic depiction of a subtractive population of RNAs and a sample population of RNAs.
  • Figure 2 provides a schematic that depicts how methods of the invention can be used to isolate an isoform of a target gene that is found only in the sample population.
  • Figure 2 also illustrates embodiments of the compositions that are provided by the invention. DETAILED DESCRIPTION OF THE INVENTION
  • compositions provided herein can be used in combination for the reliable identification of splice variants of a target gene, e.g., variants that are present in a sample population of RNAs but absent from a second population of RNAs to which the sample population is being compared.
  • the methods are also useful in determining whether a sample population of RNAs includes one or more unique splice isoforms that are not present in the second population of RNAs.
  • the methods of the invention provide several advantages over currently available technologies, e.g., cDNA library screening, northern blotting, RT-PCR, 5' and 3' RACE, cloning, EST sequencing, and microarray technology.
  • conventional cDNA library screening and Northern blotting are often not sensitive enough to detect low abundance alternate splice isoforms, e.g., isoforms that are present in a population in a single copy or very few copies.
  • Molecular techniques such as RT-PCR, cloning, EST sequencing, and 5' or 3' RACE (Rapid Amplification of cDNA Ends), are laborious, and these methods can become impracticable, and costly, if scaled to the degree necessary to perform genome-wide analyses.
  • microarray technologies permit highly parallel analyses of splice isoforms in RNA populations
  • microarray platforms are expensive. Additionally, the ability to identify differential splice isoforms of a target gene using a microarray depends on whether each exon-exon junction that can be produced by a splicing event is represented by a sufficient number of probes in the array. Furthermore, splice isoforms comprising five or fewer exons can be difficult to detect (Robinson et al. (2009) "Differential splicing using whole-transcript microarrays.” BMC Bioinformatics 10: 156).
  • the methods provided herein are cost-effective, highly parallel, and can be used to efficiently identify alternative splice isoforms present in an RNA sample, e.g., regardless of their copy number or the number of exons they comprise.
  • the detailed description is organized to first elaborate the methods and compositions provided by the invention for isolating, e.g., unique or previously uncharacterized RNA splice isoforms of one or more target genes of interest that are present in a sample population of RNAs and absent from a subtractive population of RNAs. Next, details regarding alternative splicing are described. Details regarding sequencing reactions and high-throughput sequencing systems are then provided. Kits, systems, and broadly applicable molecular biological techniques that can be used to perform any of the methods are described thereafter.
  • the methods provided herein can be used to determine whether one population of RNAs includes one or more different RNA splice isoforms that are not present in a second population. Accordingly, the methods can be performed to determine whether the first population of RNAs includes a particular RNA splice isoform, or any additional RNA splice isoforms, not present in a second population of RNAs.
  • two populations of RNA are provided: subtractive population 100 and sample population 125 (see Figure 1).
  • Subtractive RNA population 100 which comprises splice isoform 101 of a target gene, is reverse transcribed to produce subtractive population of cDNAs 115.
  • the cDNAs in population 115 are synthesized by annealing tagged primers to subtractive RNAs 100 and extending the primers, e.g., with a reverse transcriptase, such that each cDNA 110 in population 115 comprises tag 105.
  • Sample RNA population 125 which comprises splice isoforms 101 and 102 of the target gene, is fragmented, e.g., using any one or more of the methods described herein, to produce population of RNA fragments 130.
  • RNA populations 100 and 125 can be derived from any of a variety of sources, e.g., cells, tissues, organs, etc, in which differences in AS patterns may be of interest to the practitioner. For example, where differences in splicing patterns between cell types are to be examined, subtractive RNA population 100 can be derived from a first cell type in an organism, and sample RNA population 125 can be derived from a second cell type in the same organism. Alternately, subtractive RNA population 100 can be derived from a first cell type in a first organism, and sample RNA population 125 can be derived from a first cell type in a second organism of the same species as the first organism.
  • RNA populations 100 and 125 can be obtained from the same tissue at different developmental stages or, e.g., from a disease cell and a non-disease cell.
  • subtractive RNA population 100 can be derived from a cell
  • sample RNA population 125 can be derived from a synthetic source, e.g., a population of oligos comprising a defined set of known splice isoforms, e.g., isoforms 101 and 102.
  • RNA fragments 130 can be produced from RNA population 100, and cDNAs 115 can be derived from RNA population 125.
  • composition 200 which is provided by the invention, includes a subpopulation of RNA: cDNA duplexes 210, which subpopulation comprises cDNAs from population 115 and the complementary RNA fragments from population 130 to which they hybridize. Also included in composition 200 are unhybridized RNA fragments 215, e.g., RNA fragments from population 130 for which there are no complementary cDNA sequences in population 115.
  • an RNA: cDNA duplex 210 can comprise cDNA 250 which includes, e.g., tag 105, and encodes the reverse transcript of splice isoform 101, and RNA fragments 245, which comprise subsequences of splice isoforms 101 and 102 that can hybridize to cDNA 250.
  • RNA fragments 215 can comprise subsequences 240 of splice isoform 102, e.g., RNA subsequences that do not hybridize to any cDNA sequences in population 115.
  • invention also provides methods of determining whether a particular splice isoform, e.g., isoform 101 and/or 102, are present in a population of RNAs, e.g., population 100.
  • RNA: cDNA duplexes 210 are then removed from population 200, e.g., via enzymatic digestion, electrophoresis, affinity purification, or the like (see Figure 2B).
  • DNAse and RNAse H can be added to population 200 to digest the cDNAs and the RNA fragments hybridized to the cDNAs, respectively, in duplexes 210.
  • population 200 can be run over an affinity column that comprises a moiety that binds tag 105. Only unhybridized RNA fragments 215 will be eluted from the column.
  • the RNA: cDNA duplexes (e.g., duplexes 210) which are typically a higher molecular weight than the unhybridized RNA fragments, e.g., fragments 215, can be separated from the unhybridized RNA fragments via electrophoresis.
  • the methods facilitate the removal of mRNA species common to both subtractive RNA population 100 and sample RNA population 125, while mRNA fragments 215, e.g., cell type-specific splice isoforms, tissue-specific splice isoforms, disease-specific splice isoforms, and/or the like, that are unique to sample population 125, remain and can be further characterized.
  • Such analyses can include, e.g., ligating linkers to the RNA fragments in preparation for reverse transcription. The reverse transcribed fragments can then be sequenced, optionally using any of a variety of high- throughput sequencing platforms described below.
  • the methods of the invention can also used to determine whether the sample population of RNAs comprises any splice isoforms of a target gene that are not present in the subtractive population, and/or vice versa.
  • AS pre-mRNA splicing
  • AS is a precisely regulated process in which a pre-mRNA's exons are separated and reconnected in different combinations to produce alternative mature mRNA species that encode multiple protein isoforms.
  • alternative splicing can alter the function of proteins by removing or adding specific domains, e.g., nuclear localization signals, transcription activation domains, DNA or RNA binding domains, trans-membrane domains, phosphorylation sites, and/or post- translation modification sites.
  • alternative splicing can cause substantial changes in protein structure by altering even just a few residues at a splice site (Davletov, et al.
  • AS can also generate variability within untranslated regions of mRNAs which affect gene expression by adding or removing mRNA elements that, e.g., regulate translation efficiency, mRNA stability, or intracellular localization.
  • AS has been observed in nearly all multicellular organisms.
  • bioinformatic analyses based on EST sequences and exon-exon junction microarray studies estimate that 59%-74% of human genes are alternatively spliced (Johnson, et al. (2003) "Genome-wide array of human alternative pre-mRNA splicing with exon junction microarrays.” Science 302: 2142-2144; Kan, et al. (2001) "Gene structure prediction and alternative splicing analysis using genomically aligned ESTs.” Genome Res 11: 889-900), indicating that AS is one major source of protein diversity in humans.
  • the methods and compositions provided by the invention can permit the identification of splice isoforms that would be otherwise difficult to isolate using currently available methods that entail, e.g., designing probes specific to all possible exon-exon junctions, having a priori knowledge of coding regions in a pre-mRNA, knowing the complete sequence of, e.g., a large mammalian genome, etc.
  • AS events can undergo regulation in which splicing pathways are modulated according to, e.g., cell type (see, e.g., Cooper, T. A. (2005) "Alternative splicing regulation impacts heart development.” Cell 120: 59-72), developmental stage (see, e.g., Barberan- Soler, et al. (2008) "Alternative Splicing Regulation During C. elegans Development: Splicing Factors as Regulated Targets.” PIoS 4: elOOOOOl), gender (see, e.g., Chang, et al.
  • spliceosome a large complex comprising over 100 core proteins and 5 small nuclear RNAs (snRNAs) (described in, e.g., Smith, et al.
  • RNA splicing is dependent upon the identity of nucleotide sequences, or "core splicing signals" at the 5' splice site, the 3' splice site, and the branch point.
  • ESEs exonic splice enhancers
  • ESSs exonic splice silencers
  • ISEs intronic splice enhancers
  • ISSs intronic splice silencers
  • the methods and compositions of the invention can be beneficially used to determine which splice isoforms are present in, e.g., a cell or tissue, e.g., at specific developmental stages or in response to particular environmental stimuli, and such data can inform drug discovery and diagnostics efforts.
  • protein-rich target genes of interest whose splice variants can be further characterized using the invention include, e.g., cadherins, which play roles in cell adhesion.
  • Cadherins are involved in morphogenesis of tissues such as the neural tube, and their misexpression has been implicated in human malignancies (Wheelock et al.
  • AS and its disruption can also influence the susceptibility of an individual to a disease and/or the disease's severity (Wang, et al. (2007) "Splicing in disease: disruption of the splicing code and the decoding machinery.” Nat Rev Genet 8: 749-761; Srebow, et al. (2006) “The connection between splicing and cancer.” Journ Cell Sci 119: 2635-2641; Faustino, et al. (2003) "Pre-mRNA splicing and human disease.” Genes Dev 17: 419-437).
  • splicing defects have been identified as the cause of numerous diseases including ⁇ -thalassemia, cystic fibrosis, and premature aging (Faustino, et al. (2003) "Pre-mRNA splicing and human disease.” Genes Dev 17: 419-437). Determining how the splicing of, e.g., a target gene, is altered in, e.g., a disease cell, can inform strategies directed at reversing or circumventing misregulated splicing events.
  • the methods provided herein can be used to detect whether alternate RNA splice isoforms are present in a population of RNAs derived from a patient, e.g., as compared to a subtractive population of RNAs, to make a diagnosis, predict a prognosis, or inform a drug regimen.
  • RNAs derived from a patient e.g., as compared to a subtractive population of RNAs
  • Further details regarding spliceosome proteins and splicing mechanism can be found in, e.g., Jurica (2008) "Detailed close-ups and the big picture of spliceosomes.” Curr Opin Struct Biol 18: 315-20; Schellenberg, et al.
  • RNA fragments that are enriched following the removal of cDNA:RNA hybrids can comprise additional splice isoforms of a target gene, e.g., that are present in a sample population of RNAs and absent from the subtractive population of RNAs or vice versa (see Figure 2 and corresponding description).
  • RNA fragments can optionally be reverse transcribed, according to methods described elsewhere herein, and sequenced using, e.g., any of a variety of high-throughput DNA sequencing systems (reviewed in, e.g., Chan, et al.
  • Affymetrix and Complete Genomics, Inc. rely on indirect methods of determining a DNA's sequence, e.g., sequencing by hybridization (SBH), in which a sequence of a DNA is assembled based on experimental data obtained from hybridization experiments performed to determine the oligonucleotide content of the DNA chain.
  • SBH sequencing by hybridization
  • SBH typically employs an array comprising a known arrangement of short oligonucleotides of known sequence, e.g., oligonucleotides representing all possible sequences of a given length.
  • SoLID a commercial sequencing system available from Applied Science
  • Biosystems is based on "sequencing by ligation" (SBL), in which the mismatch sensitivity of a DNA ligase enzyme is used to determine the underlying sequence of the target nucleic acid molecule.
  • SBL sequencing by ligation
  • one or more sets of encoded adaptors is ligated to the terminus of a target polynucleotide, e.g., a single-stranded DNA of unknown sequence.
  • Encoded adaptors whose protruding strands form perfectly matched duplexes with the complementary protruding strands of the target polynucleotide are ligated, and the identity of the nucleotides in the protruding strands is determined by an oligonucleotide tag carried by the encoded adaptor. Such determination, or "decoding” is carried out by specifically hybridizing a labeled tag complementary to its corresponding tag on the ligated adaptor.
  • SBS sequencing by synthesis
  • 454 Sequencing a technology available from 454 Life Sciences, is a massively-parallellized, multiplex pyrosequencing system (Nyren (2007) "The History of Pyrosequencing.” Methods MoI Biol 373: 1-14; Ronaghi (2001) "Pyrosequencing sheds light on DNA sequencing.” Genome Res 11: 3-11; and Wheeler, et al. (2008) "The complete genome of an individual by massively parallel DNA sequencing.” Nature 452: 872-876) that relies on fixing nebulized, adapter-ligated single-stranded DNA fragments to small DNA- capture beads.
  • Single molecule real-time sequencing is another massively parallel sequencing technology that can be compatible with the high-throughput resequencing of target nucleic acids isolated isolated from a sample, e.g., by using capture probes synthesized according to any of the methods described previously.
  • SMRT technology relies on arrays of multiplexed zero-mode waveguides (ZMWs) in which, e.g., thousands of sequencing reactions can take place simultaneously.
  • ZMWs multiplexed zero-mode waveguides
  • the ZMW is a structure that creates an illuminated observation volume that is small enough to observe, e.g., the template-dependent synthesis of a single single-stranded DNA molecule by a single DNA polymerase (See, e.g., Levene, et al. (2003) "Zero Mode Waveguides for Single Molecule Analysis at High Concentrations," Science 299: 682-686).
  • cDNAs derived from unhybridized RNA fragments can be sequenced using systems that include bridge amplification technologies, e.g., in which primers bound to a solid phase are used in the extension and amplification of solution phase target nucleic acid acids prior to SBS.
  • bridge amplification technologies e.g., in which primers bound to a solid phase are used in the extension and amplification of solution phase target nucleic acid acids prior to SBS.
  • RNA fragments e.g., derived from a sample population of RNAs, that encode one or more splice isoform of one or more target gene, e.g., that is present in a sample population of RNAs but not in a subtractive population of RNAs (see Figure 1 and corresponding description).
  • the principle of this approach relies on the removal of mRNA species common to both the subtractive and sample RNA populations, leaving behind RNA fragments that comprise splice isoform(s) unique to the sample population of RNAs.
  • Such splice isoforms can optionally include, e.g., cell type-specific splice isoforms, tissue- specific splice isoforms, disease-specific splice isoforms, etc.
  • the alternative splice isoforms are thus isolated from the sample RNA population and can be subject to further analysis, e.g., sequencing using an automated high-throughput sequencing system.
  • the methods of the invention can also optionally be used to identify mRNA species that are present in the sample population of RNAs in higher abundance relative to the subtractive population of RNAs.
  • the subtractive and sample populations of RNAs can be reversed.
  • cDNAs that are produced by reverse transcribing a sample population of RNAs can be hybridized to RNA fragments derived from a subtractive population of RNAs.
  • the RNA: cDNA duplexes can then be removed from unhybridized RNA fragments, e.g., fragments derived from subtractive RNAs, to identify one or more splice isoform that is both present in the subtractive population of RNAs and absent from the sample population of RNAs.
  • These RNA fragments can be optionally be reverse transcribed and further characterized, as described above.
  • Subtractive cDNAs e.g., derived from reverse transcription of the subtractive RNAs
  • RNA fragments e.g., produced from a sample population of RNAs
  • the results of subtractive hybridization are validated using additional techniques that are well known in the art, e.g., northern blot, in situ hybridization, RT-PCR, and the like. These techniques are described in detail in, e.g., e.g., Sambrook et al., Molecular Cloning - A Laboratory Manual (3rd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 2000 (“Sambrook”); and Current Protocols in Molecular Biology, F.M. Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (supplemented through 2007) ("Ausubel”).
  • RNAs and cDNAs [0052] The methods described herein include providing two distinct populations of
  • RNAs e.g., a subtractive population of RNAs and a sample population of RNAs.
  • the subtractive RNAs used to generate subtractive cDNAs, and the sample RNAs are fragmented and hybridized to the subtractive cDNAs.
  • Hybridization produces a subpopulation of unhybridized RNA fragments and a subpopulation of RNA: cDNA duplexes, which are then removed from the unhybridized RNA fragments.
  • Determining the sequences of the unhybridized RNA fragments can be useful, e.g., in identifying one or more splice variants of one or more target gene or in comparing the differential expression of, e.g., splice isoforms of a target gene, e.g., between different tissue types, between different treatments to the same tissue type, or between different developmental stages of the same tissue type.
  • mRNA can typically be isolated from almost any source using protocols and methods described in, e.g., Sambrook and Ausubel.
  • the yield and quality of the isolated mRNA can depend on how a tissue is stored prior to RNA extraction, the means by which the tissue is disrupted during RNA extraction, or on the type of tissue from which the RNA is extracted, and RNA isolation protocols can be optimized accordingly.
  • mRNA isolation kits are commercially available, e.g., the mRNA-ONLYTM Prokaryotic mRNA Isolation Kit and the mRNA-ONLYTM Eukaryotic mRNA Isolation Kit (Epicentre Biotechnologies), the FastTrack 2.0 mRNA Isolation Kit (Invitrogen), and the Easy-mRNA Kit (BioChain).
  • mRNA from various sources e.g., bovine, mouse, and human
  • tissues e.g. brain, blood, and heart
  • BioChain Hayward, CA
  • Ambion Austin, TX
  • Clontech Mesountainview, CA
  • reverse transcriptase is used to generate cDNAs from the mRNA templates.
  • Methods and protocols for the production of cDNA from mRNAs e.g., harvested from prokaryotes as well as eukaryotes, are elaborated in cDNA Library Protocols, I. G. Cowell, et al., eds., Humana Press, New Jersey, 1997, Sambrook, and Ausubel.
  • kits are commercially available for the preparation of cDNA, including the Cells-to-cDNATM II Kit (Ambion), the RETROscriptTM Kit (Ambion), the CloneMinerTM cDNA Library Construction Kit (Invitrogen), and the Universal RiboClone ® cDNA Synthesis System (Promega).
  • Many companies e.g., Agencourt Bioscience and Clontech, offer cDNA synthesis services.
  • RNA fragments are generated from a sample population of RNAs, i.e., in preparation for hybridization to subtractive cDNAs derived from the subtractive population of RNAs.
  • RNA fragments There exist a plethora of ways of producing such RNA fragments. These include, but are not limited to, mechanical methods, such as sonication, mechanical shearing, nebulization, hydroshearing, and the like; enzymatic methods, such as exonuclease digestion, endonuclease digestion, and the like; chemical cleavage, and electrochemical cleavage. These methods are further explicated in Sambrook and Ausubel. In preferred embodiments, chemical cleavage is used to fragment RNAs, as detailed in the example below.
  • tagged subtractive cDNAs are produced from the subtractive population of RNAs via reverse transcription.
  • the tags can permit the detection of RNA: cDNA duplexes, e.g., in a population of nucleic acids that comprises a subpopulation of RNA: cDNA duplexes and a subpopulation of unhybridized RNA fragments, e.g., following hybridization of subtractive cDNAs to a population of RNA fragments derived from a sample population of RNAs.
  • the tags permit the RNA: cDNA duplexes to be separated, e.g., via affinity purification or the like, from the subpopulation of unhybridized RNA fragments, e.g., RNA fragments that represent splice isoforms of one or more target gene that are present in a sample population of RNAs but not present in the subtractive population of RNAs.
  • Nucleic acid tags e.g., such as those optionally present on the subtractive cDNAs, can comprise any of a plethora of ligands, such as high-affinity DNA-binding proteins; modified nucleotides, such as methylated, biotinylated, or fluorinated nucleotides; and nucleotide analogs, such as dye-labeled nucleotides, non-hydrolysable nucleotides, or nucleotides comprising heavy atoms.
  • ligands such as high-affinity DNA-binding proteins
  • modified nucleotides such as methylated, biotinylated, or fluorinated nucleotides
  • nucleotide analogs such as dye-labeled nucleotides, non-hydrolysable nucleotides, or nucleotides comprising heavy atoms.
  • tags can optionally comprise one or more fluorescent label, blocking group, phosphorylated nucleotide, thiol linker, phosphothiorated nucleotide, amine-reactive nucleotide, uracil, and/or the like.
  • fluorescent label for example, a fluorescent label, blocking group, phosphorylated nucleotide, thiol linker, phosphothiorated nucleotide, amine-reactive nucleotide, uracil, and/or the like.
  • reagents are widely available from a variety of vendors, including Perkin Elmer, Jena Bioscience and Sigma-Aldrich.
  • Nucleic acid tags can also include oligonucleotides that comprise specific sequences, such as restriction sites, cis regulatory sites, nucleotide hybridization sites, protein binding sites, sequences capable of forming hairpin secondary structures, DNA promoters, sample or library identification sequences, and the like. Such sequences can be of advantageous use in, e.g., sequencing cDNAs derived from unhybridized sample RNA fragments that have been reverse transcribed using tagged primers. Linkers that are attached to unhybridized RNA fragments in preparation for reverse transcription can also beneficially include any one or more of the sequences listed above.
  • Oligonucleotide tags can be custom synthesized by commercial suppliers such as Operon (Huntsville, AL), IDT (Coralville, IA) and Bioneer (Alameda, CA). Any of a number of methods that are well known in the art can be used to join tags to nucleic acids of interest, include chemical linkage, ligation, and extension of a primer comprising a tag by a polymerase or reverse transcriptase. Further details regarding nucleic acid tags and the methods by which they are attached to nucleic acids of interest are elaborated in Sambrook and Ausubel.
  • RNA hybrids comprise additional mRNA splice isoforms and not, e.g., RNA species that are expressed at higher levels in the sample population.
  • a variety of nucleic acid amplification and/or copying methods are known in the art and can be implemented to, e.g., amplify subtractive cDNAs and/or cDNAs derived from the reverse transcription of unhybridized RNA fragments, e.g., RNA fragments from a sample population of RNAs.
  • the most widely used in vitro technique among these methods is polymerase chain reaction (PCR), which requires the addition of nucleotides, oligonucleotide primers, buffer, and an appropriate polymerase to the amplification reaction mix.
  • PCR polymerase chain reaction
  • SDA strand displacement amplification
  • RCA rolling-circle amplification
  • MDA multiple- displacement amplification
  • Kits are also a feature of the invention.
  • the present invention provides kits that include useful reagents, e.g., tagged DNA primers, affinity columns, and/or one or more enzymes that are used in the methods, e.g., a reverse transcriptase, a DNA polymerase, etc.
  • useful reagents e.g., tagged DNA primers, affinity columns, and/or one or more enzymes that are used in the methods, e.g., a reverse transcriptase, a DNA polymerase, etc.
  • Such reagents are most preferably packaged in a fashion to enable their use.
  • kits of the invention optionally include additional reagents, such as a control target nucleic acids, buffer solutions and/or salt solutions, including, e.g., divalent metal ions, i.e., Mg + *, Mn ++ and/or Fe ++ , nucleic acid adapter tags, e.g., to prepare unhybridized RNA fragments for sequencing, e.g., using a currently available or future automated high-throughput sequencing system.
  • additional reagents such as a control target nucleic acids, buffer solutions and/or salt solutions, including, e.g., divalent metal ions, i.e., Mg + *, Mn ++ and/or Fe ++ , nucleic acid adapter tags, e.g., to prepare unhybridized RNA fragments for sequencing, e.g., using a currently available or future automated high-throughput sequencing system.
  • kits also typically include a container to hold the kit components, instructions for use of the compositions, e.g., to practice the methods, and other reagents in accordance with the desired application methods, e.g., identifying exon-exon junctions, or other characteristics of alternately spliced mRNA isoforms of a target gene.
  • the methods and compositions provided by the invention can advantageously be integrated with systems that can, e.g., automate and/or multiplex the steps of the methods described herein, e.g., methods for separating one or more alternate isoform of a target gene from a sample population of RNAs.
  • Systems of the invention can include one or more modules, e.g., that automate a method herein, e.g., for high-throughput sequencing applications.
  • Such systems can include fluid-handling elements and controllers that move reaction components into contacts with one another, signal detectors, and system software/instructions.
  • Systems of the invention can optionally include modules that provide for detection or tracking of products, e.g., unhybridized RNAs that comprise sequences that correspond to a splice isoform of a target gene that is present in a sample population of RNAs but not in a subtractive population of RNAs. Additionally or alternatively, the systems can monitor the synthesis of cDNAs from such unhybridized RNAs and/or detect the nucleotide sequence of such cDNAs, e.g., produced during a sequencing reaction. Detectors can include spectrophotometers, epifluorescent detectors, CCD arrays, CMOS arrays, microscopes, cameras, or the like.
  • Optical labeling is particularly useful because of the sensitivity and ease of detection of these labels, as well as their relative handling safety, and the ease of integration with available detection systems (e.g., using microscopes, cameras, photomultipliers, CCD arrays, CMOS arrays and/or combinations thereof).
  • High- throughput analysis systems using optical labels include DNA sequencers, array readout systems, cell analysis and sorting systems, and the like.
  • fluorescent products and technologies see, e.g., Sullivan (ed) (2007) Fluorescent Proteins, Volume 85, Second Edition (Methods in Cell Biology) (Methods in Cell Biology) ISBN-10: 0123725585; Hof et al.
  • System software e.g., instructions running on a computer can be used to track and inventory reactants or products, and/or for controlling robotics/ fluid handlers to achieve transfer between system stations/modules.
  • the overall system can optionally be integrated into a single apparatus, or can consist of multiple apparatus with overall system software/instructions providing an operable linkage between modules.
  • RNAs are prepared from cells and enriched for polyadenylated mRNAs using methods well known to one of skill in the art.
  • EDTA is then added to a final concentration of 6OmM to quench the reactions, and the fragmented mRNAs are run on a 15% acrylamide/7M urea gel.
  • a fragment of the gel that corresponds to the position on the gel at which 25-50 base pair-long fragments migrate is excised, and the gel slice is placed in an 0.5ml microfuge tube that has had holes punched through the bottom with a 21 gauge syringe needle.
  • the 0.5 ml tube is placed in 1.5ml microfuge tube and spun briefly until pieces of the gel slice are pushed into the 1.5 ml tube through the holes in the 0.5 ml tube.
  • RNAse free tubes Following the incubation, the 1.5 ml microfuge tube is spun for 15 minutes at 7500, and the supernatant is transferred to a new tube. The supernatant is filtered through a Spin-X centrifuge filter (available from Corning) to remove any gel pieces that may have been transferred. After all gel pieces have been removed, the RNA fragments are ethanol precipitated and pelletted, a technique well known by those of skill in the art. The preceding steps are performed in RNAse free tubes with RNAse-free reagents.
  • Subtractive RNAs are prepared by harvesting total RNA from cells and purifying polyadenylated mRNAs from the total RNA using any of a variety of methods known to those of skill in the art. The polyadenylated RNAs are then reverse transcribed into cDNA with biotin-conjugated oligo dT primers. The resulting RNAxDNA duplexes are treated with RNAse H to hydrolyze the RNA strands of the duplexes. It is assumed that cDNAs are produced from the RNAs at a 1:1 ratio.
  • RNA cDNA duplexes are purified from the population of unhybridized
  • RNA with excess Streptavidin beads (1 mg beads can hold lOOpmol of Biotin). Beads are added to the hybridization mix and incubated at 4°C for 5 hours. The tube containing the beads and hybridization mix is spun, pelletting the subtractive biotin-conjugated cDNAs and the RNAs to which they have hybridized. The supernatant, which contains the unhybridized RNAs, e.g., RNAs that are unique to the sample population, is transferred to a new 1.5 ml microfuge tube, and the RNAs are ethanol precipitated and pelletted, according to methods well known in the art.
  • RNA fragments are then resolved on a 15% acrylomide/7M urea gel, and fragment of the gel that corresponds to the position on the gel at which 25-50 base pair-long fragments migrate is excised.
  • the RNA fragments are eluted from the gel slice, as described above.
  • RNA fragments for sequencing [0070]
  • Solexa adaptors are ligated to the RNA fragments, which are then gel purified as described above. 5' linkers are ligated to the purified fragment, and the ligation products are subject to a second round of gel purification.
  • the fragments to which the Solexa adaptors and 5' linkers have been attached treated with DNAs and extracted with phenol/chloroform.
  • the adaptor-ligated fragments are then precipitated with 1:1 ethanol: isopropanol and pelletted. The pellet is resuspended, and a reverse transcriptase reaction is performed with Solexa 3' primers to produce cDNAs from the RNAs.
  • the cDNAs are amplified via PCR, and the PCR products run on a gel. DNA fragments between 60 and 100 base pairs in size are extracted from the gel.

Abstract

Provided are methods of separating one or more alternate isoform of a target gene, which alternate isoform is present in a sample population of RNAs but not in a subtractive population of RNAs. Compositions comprising a subpopulation of RNA:cDNA duplexes and a subpopulation of unhybridized RNA fragments are also provided.

Description

METHODS FOR IDENTIFYING DIFFERENCES IN ALTERNATIVE SPLICING
BETWEEN TWO RNA SAMPLES
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to and benefit of U.S. Provisional Patent
Application 61/201,372, entitled, "Methods for Identifying Differences in Alternative Splicing Between Two RNA Samples," by Yeo, Scolnick, and Gage, filed December 9, 2008, the disclosure of which is incorporated herein in its entirety for all purposes.
FIELD OF THE INVENTION
[0002] The invention relates to molecular methods that can be used in highly parallel genome-wide analyses to identify alternative, e.g., unique and/or previously uncharacterized, splice isoform(s) of one or more genes of interest.
BACKGROUND OF THE INVENTION
[0003] Alternative splicing (AS) is a major source of protein diversity in higher eukaryotic organisms, and this process is frequently regulated in a developmental stage- specific or tissue-specific manner (Romero, et al. (2006) "Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms." Proc Natl Acad Sci USA 103: 8390-8395; Yeo, et al. (2004) "Variation in alternative splicing across human tissues." Genome Biol 5: R74). Recent estimates suggest that 50% - 75% of multi-exon genes in the human genome undergo alternative splicing (AS) (Modrek, et al. (2001) "Genome-wide detection of alternative splicing in expressed sequences of human genes." Nucl Acids Res 29: 2850-2859), generating multiple mRNA isoforms that can produce proteins with distinct properties and different (even antagonistic) functions. In addition, a number of genetic mutations involved in human disease have been mapped to changes in splicing signals or sequences that regulate splicing (Faustino, et al. (2003) "Pre-mRNA splicing and human disease." Genes Dev 17: 419-437). Thus, determining what kinds of splice isoforms are present in a population of RNAs can be critical to a comprehensive understanding of biological regulation and disease. [0004] Strategies that have previously been used to identify alternate splice isoforms of a target gene include, e.g., cDNA library screening, northern blotting, RT-PCR, 5' and 3' RACE, cloning, and EST sequencing. However, these conventional techniques can be laborious, and they are often not sensitive enough to detect low abundance transcripts. Microarray technology can be used for highly parallel, genome-wide studies of alternative splicing. However, probe designs and labeling protocols used for microarray experiments tend to be biased towards the 3' end of the gene, and unless multiple probes are to match each specific exon-exon junction that might be spliced together in an alternative splicing event, certain splice isoforms can remain undetected and uncharacterized. EST databases can be used to compare, e.g., mRNA splice variants across tissues, but splice events in which introns are retained in the mature mRNA are difficult to distinguish from database artifacts that comprise, e.g., pre-mRNA or genomic sequences.
[0005] What is needed in the art are new compositions and highly parallel methods that can be used for the genome-wide detection of splice isoforms of one or more target gene that are present in, e.g., a tissue of interest at a developmental stage of interest, an organism of interest diagnosed with a disease of interest, and the like. The invention described herein fulfills these and other needs, as will be apparent upon review of the following.
SUMMARY OF THE INVENTION
[0006] The present invention provides methods and related compositions useful for identifying one or more additional splice isoforms of, e.g., a target gene that is present in a sample population of RNAs but not in a subtractive population of RNAs. In the methods, mRNA species common to both the subtractive and sample RNA populations are removed, leaving mRNA splice isoform(s), e.g., cell type-specific splice isoforms, tissue-specific splice isoforms, disease-specific splice isoforms, and the like, that are unique to the sample population for further manipulation and analysis, e.g., sequencing, e.g., using an automated high-throughput sequencing system. The methods and compositions provided by the invention can advantageously permit the identification of splice isoforms that would be otherwise difficult to isolate using methods that entail, e.g., designing probes specific to all possible exon-exon junctions, a priori knowledge of coding regions in a pre-mRNA, a sequenced genome, etc. Correspondingly, the invention provides methods of determining whether a sample population of RNAs includes one or more additional splice isoforms that are not present in a subtractive population of RNAs.
[0007] Thus, in a first aspect, the invention provides methods of separating one or more alternate isoform of a target gene from a sample population of RNAs. The methods include providing a subtractive population of RNAs that comprises a first isoform of the target gene and a sample population of RNAs that comprises at least one or more alternate isoform of the target gene. The methods include reverse transcribing the subtractive RNAs to generate a population of subtractive cDNAs, fragmenting the sample RNAs, e.g., via enzymatic digestion, sonication, mechanical shearing, electrochemical cleavage, chemical cleavage, and/or nebulization, and hybridizing the resulting RNA fragments to the subtractive cDNAs. Following hybridization, RNA fragments in the sample population that comprise the first isoform of the target gene hybridize to the subtractive cDNAs to produce a subpopulation of RNA: cDNA duplexes, and RNA fragments comprising one or more alternate splice isoform(s) of the target gene comprise a subpopulation of unhybridized RNAs. The methods include removing the RNA: cDNA duplexes from unhybridized RNA fragments, thereby separating the one or more alternate splice isoform of the target gene from the sample population of RNAs.
[0008] The methods optionally include reverse transcribing the unhybridized RNA fragments, sequencing the resulting cDNAs, e.g., using an automated high-throughput sequencing system, and comparing the sequences of the cDNAs to a sequence of the target gene, e.g., to characterize the additional isoforms of the target gene. Reverse transcribing the unhybridized RNA fragments can optionally include ligating linkers, e.g., linkers that optionally comprise, e.g., a primer hybridization sequence and, e.g., any one or more of the moieties described below, to first ends of the fragments, annealing DNA primers to the linkers, and extending the primers with a reverse transcriptase to produce the cDNAs. [0009] As will be appreciated by one of skill in the art, the subtractive and sample populations of RNAs can be derived from any of a variety of cell types in any of a variety organisms, e.g., mammals. For example, the sample population of RNAs can optionally be derived from a first cell type, e.g., a non-disease cell, from an organism, and the subtractive population of RNAs can derived from a second cell type, e.g., a disease cell, from the same organism. Optionally, the sample population can be derived from a particular cell type in an organism and the subtractive population of RNAs can be derived from the same cell type in a second organism of the same species. Similarly, a sample population of RNAs can optionally be derived from a cell type in an organism at a first developmental stage, and the subtractive population of RNAs can be derived from the same cell type in the same organism at a second developmental stage. Optionally, the sample population of RNAs can be derived from a first cell type from a first organism, which cell type has been exposed to a first treatment, and the subtractive population of RNAs can be derived from the same cell type that has been exposed to a second treatment. Optionally, the sample population of RNAs can be derived from a cell, and the subtractive RNAs can be derived from a synthetic source, e.g., a population of oligos comprising a defined set of known splice isoforms. This particular embodiment can be of beneficial use in diagnostic and prognostic assays.
[0010] Reverse transcribing the subtractive population of RNAs can optionally include annealing tagged DNA primers to 3' ends of the RNAs in the subtractive population and extending the tagged primers with a reverse transcriptase. The tags on the DNA primers can optionally include any of a variety of moieties, e.g., one or more ligand, fluorescent label, blocking group, phosphorylated nucleotide, nucleotide analog, fluorinated nucleotide, nucleotide comprising a heavy atom, biotinylated nucleotide, methylated nucleotide, uracil, sequence capable of forming hairpin secondary structure, oligonucleotide hybridization site, restriction site, DNA promoter, protein binding site, sample or library identification sequence, a thiol linker, a phosphorothioated nucleotide, an amine-reactive nucleotide, and/or cis regulatory sequence.
[0011] Removing the subpopulation of RNA: cDNA duplexes from the unhybridized RNAs, e.g., to isolate the alternate isoforms(s) of a target gene, can optionally include, e.g., electrophoresis or digesting the RNA: cDNA duplexes with RNAseH and a DNAse. Optionally, the methods can further include annealing tagged DNA primers to 3' ends of the RNAs in the subtractive population, extending the tagged primers with a reverse transcriptase to produce tagged cDNAs. RNA fragments can be hybridized to the population of tagged cDNAs to produce a subpopulation of RNA: tagged cDNA duplexes and a subpopulation of unhybridized RNAs; and the RNA: tagged cDNA duplexes can be separated from the unhybridized RNA fragments via affinity purification.
[0012] In alternative embodiments of the methods, the subtractive and sample populations of RNAs can be reversed. For example, cDNAs which are produced by reverse transcribing sample RNAs, e.g., using any of the methods described above, can be hybridized to RNA fragments produced from a subtractive population of RNAs, e.g., using any of the methods described above. The RNA: cDNA duplexes can then be removed from unhybridized RNA fragments to identify one or more splice isoform, e.g., that is present in the subtractive population of RNAs and absent from the sample population of RNAs. These RNA fragments can be optionally be reverse transcribed and analyzed, as described above. Accordingly, the methods provided by the invention can be used to determine whether there are any additional splice isoforms of a target gene in the subtractive population of RNAs not present in the sample population of RNAs.
[0013] Relatedly, the invention provides compositions that include a subpopulation of RNA: cDNA duplexes and a subpopulation of unhybridized RNA fragments that have been produced by the methods described above. It will be apparent to those of skill in the art that the methods and compositions provided by the invention can optionally be used alone or in combination.
[0014] Kits are also a feature of the invention. The present invention provides kits that include useful reagents, e.g., tagged DNA primers, affinity columns, and/or one or more enzymes that are used in the methods, e.g., a reverse transcriptase, a DNA polymerase, etc. Such reagents are most preferably packaged in a fashion to enable their use. The kits of the invention optionally include additional reagents, such as a control target nucleic acids, buffer solutions and/or salt solutions, including, e.g., divalent metal ions, i.e., Mg++, Mn++ and/or Fe++, nucleic acid adapter tags, e.g., to prepare unhybridized RNA fragments for sequencing, e.g., using a currently available or future automated high-throughput sequencing system. Such kits also typically include a container to hold the kit components, instructions for use of the compositions, e.g., to practice the methods, and other reagents in accordance with the desired application methods, e.g., identifying exon-exon junctions, or other characteristics of alternately spliced mRNA isoforms of a target gene.
DEFINITIONS
[0015] Before describing the present invention in detail, it is to be understood that this invention is not limited to particular devices or biological systems, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, the singular forms "a", "an" and "the" include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to "an isoform of a target gene" includes a combination of two or more isoforms; reference to "an RNA" optionally includes multiple copies of the RNA, and the like.
[0016] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although any methods and materials similar or equivalent to those described herein can be used in the practice of testing of the present invention, the preferred materials and methods are described herein. In describing and claiming the present invention, the following terminology will be used in accordance with the definitions set out below.
[0017] Derived from: As used herein, "derived from" is used to refer to the original source organism, tissue, cells, etc. from which, e.g., a population of RNAs to be used with the methods of the invention, was obtained. For example, populations of RNAs can be derived from, e.g., a cell line or a eukaryotic organism, including, but not limited to, mammals, nematodes, insects, etc.
[0018] Linker: As used herein, a linker is a short, single-stranded nucleic acid 2-20 nucleotides in length that can be attached to a single stranded nucleic acid, e.g., an RNA, via ligation or by extending the linker, e.g., with a reverse transcriptase. A nucleic acid linker can include any one or more of an oligonucleotide hybridization site, a restriction site, a DNA promoter, a protein binding site, a sample or library identification sequence, a thiol linker, a phosphorothioated nucleotide, an amine-reactive nucleotide, a cis regulatory sequence, modified nucleotide or nucleotide analog, and/or the like.
[0019] Subtractive cDNA: As used herein, "subtractive cDNAs" are the cDNAs that are produced from the reverse transcription of a subtractive population of RNAs. In particular embodiments of the methods, subtractive cDNAs are hybridized to a sample population of RNAs, and, thus, species common to both the subtractive and sample RNA populations form RNA: cDNA duplexes. Using the methods herein, these RNA: cDNA duplexes are removed from or "subtracted" from the hybridization reaction. Subtractive cDNAs can optionally comprise a tag that includes, e.g., any one or more of the moieties described herein, and such tags can permit the removal/separation of the resulting RNA: cDNA hybrids from unhybridized RNA fragments, e.g., fragments derived from sample RNAs that comprise additional isoform(s) of one or more target gene.
[0020] Subtractive RNAs: As used herein, "subtractive RNAs" (or, alternately, a
"subtractive population of RNAs") are a reference population of RNAs that are used to remove mRNA species that common to both a subtractive RNA population and the sample RNA population to which the subtractive population is being compared. Subtractive cDNAs are produced from subtractive RNAs, e.g., via reverse transcription. Subtractive RNAs can be derived from any of a number of sources, e.g., cells, tissues, organisms, etc., and typically represent a control population of RNAs to which, e.g., a sample population of RNAs is compared to, e.g., identify one or more additional splice isoforms of a target gene that are present in the sample population of RNAs and absent from the subtractive population of RNAs.
[0021] Tags: As used herein, a "tag" refers to a moiety linked to a molecule of interest that can be used as a molecular label to detect the molecule of interest in population and/or as a by which to separate the molecule of interest from the population. For example, tags can be hybridized to the ends of the nucleic acid fragments and extended with a polymerase to produce tagged fragments or ligated to the ends of the nucleic acid fragments with a ligase. Tags can comprise any one or more moieties that include, e.g., a ligand, a fluorescent label, a blocking group, a phosphorylated nucleotide, a nucleotide analog, a fluorinated nucleotide, a nucleotide comprising a heavy atom, a biotinylated nucleotide, a methylated nucleotide, a uracil, a sequence capable of forming hairpin secondary structure, an oligonucleotide hybridization site, a restriction site, a DNA promoter, a protein binding site, a sample or library identification sequence, a thiol linker, a phosphorothioated nucleotide, an amine-reactive nucleotide, and/or a cis regulatory sequence. Optionally, a tag can comprise a linker, described above.
[0022] Treatment: As used herein, "treatment" refers to a defined set of experimental conditions to which, e.g., a source organism, a source tissue, a source cell line, etc. was exposed prior to the collection of its RNA. For example, a treatment can include, e.g., exposing a clonal population of undifferentiated T cells to a defined set of cytokines for a defined length of time; exposing a disease cell line to a drug; etc.
BRIEF DESCRIPTION OF THE DRAWINGS [0023] Figure 1 provides a schematic depiction of a subtractive population of RNAs and a sample population of RNAs.
[0024] Figure 2 provides a schematic that depicts how methods of the invention can be used to isolate an isoform of a target gene that is found only in the sample population. Figure 2 also illustrates embodiments of the compositions that are provided by the invention. DETAILED DESCRIPTION OF THE INVENTION
OVERVIEW
[0025] The impact of alternative pre-mRNA splicing (AS) on protein function and expression is largely uncharacterized. Determining which splice isoforms, e.g., unique splice isoforms, are present in, e.g., a cell or tissue, e.g., at a specific developmental stage, in a disease state, or in response to a particular environmental stimulus, can provide insight into regulatory pathways, and can thus be advantageously useful in drug discovery and/or diagnostic efforts. The methods and compositions provided herein can be used in combination for the reliable identification of splice variants of a target gene, e.g., variants that are present in a sample population of RNAs but absent from a second population of RNAs to which the sample population is being compared. The methods are also useful in determining whether a sample population of RNAs includes one or more unique splice isoforms that are not present in the second population of RNAs.
[0026] The methods of the invention provide several advantages over currently available technologies, e.g., cDNA library screening, northern blotting, RT-PCR, 5' and 3' RACE, cloning, EST sequencing, and microarray technology. First, conventional cDNA library screening and Northern blotting are often not sensitive enough to detect low abundance alternate splice isoforms, e.g., isoforms that are present in a population in a single copy or very few copies. Molecular techniques such as RT-PCR, cloning, EST sequencing, and 5' or 3' RACE (Rapid Amplification of cDNA Ends), are laborious, and these methods can become impracticable, and costly, if scaled to the degree necessary to perform genome-wide analyses. While microarray technologies permit highly parallel analyses of splice isoforms in RNA populations, microarray platforms are expensive. Additionally, the ability to identify differential splice isoforms of a target gene using a microarray depends on whether each exon-exon junction that can be produced by a splicing event is represented by a sufficient number of probes in the array. Furthermore, splice isoforms comprising five or fewer exons can be difficult to detect (Robinson et al. (2009) "Differential splicing using whole-transcript microarrays." BMC Bioinformatics 10: 156). In contrast, the methods provided herein are cost-effective, highly parallel, and can be used to efficiently identify alternative splice isoforms present in an RNA sample, e.g., regardless of their copy number or the number of exons they comprise. [0027] The detailed description is organized to first elaborate the methods and compositions provided by the invention for isolating, e.g., unique or previously uncharacterized RNA splice isoforms of one or more target genes of interest that are present in a sample population of RNAs and absent from a subtractive population of RNAs. Next, details regarding alternative splicing are described. Details regarding sequencing reactions and high-throughput sequencing systems are then provided. Kits, systems, and broadly applicable molecular biological techniques that can be used to perform any of the methods are described thereafter.
METHODS AND COMPOSITIONS FOR IDENTIFYING DIFFERENCES IN
ALTERNATIVE SPLICE ISOFORMS BETWEEN TWO RNA SAMPLES
[0028] As described above, the methods provided herein can be used to determine whether one population of RNAs includes one or more different RNA splice isoforms that are not present in a second population. Accordingly, the methods can be performed to determine whether the first population of RNAs includes a particular RNA splice isoform, or any additional RNA splice isoforms, not present in a second population of RNAs. To perform the methods of the invention, two populations of RNA are provided: subtractive population 100 and sample population 125 (see Figure 1). Subtractive RNA population 100, which comprises splice isoform 101 of a target gene, is reverse transcribed to produce subtractive population of cDNAs 115. In preferred embodiments, the cDNAs in population 115 are synthesized by annealing tagged primers to subtractive RNAs 100 and extending the primers, e.g., with a reverse transcriptase, such that each cDNA 110 in population 115 comprises tag 105. Sample RNA population 125, which comprises splice isoforms 101 and 102 of the target gene, is fragmented, e.g., using any one or more of the methods described herein, to produce population of RNA fragments 130.
[0029] RNA populations 100 and 125 can be derived from any of a variety of sources, e.g., cells, tissues, organs, etc, in which differences in AS patterns may be of interest to the practitioner. For example, where differences in splicing patterns between cell types are to be examined, subtractive RNA population 100 can be derived from a first cell type in an organism, and sample RNA population 125 can be derived from a second cell type in the same organism. Alternately, subtractive RNA population 100 can be derived from a first cell type in a first organism, and sample RNA population 125 can be derived from a first cell type in a second organism of the same species as the first organism. Similarly, RNA populations 100 and 125 can be obtained from the same tissue at different developmental stages or, e.g., from a disease cell and a non-disease cell. Optionally, subtractive RNA population 100 can be derived from a cell, and sample RNA population 125 can be derived from a synthetic source, e.g., a population of oligos comprising a defined set of known splice isoforms, e.g., isoforms 101 and 102.
[0030] In alternate embodiments of the methods, RNA fragments 130 can be produced from RNA population 100, and cDNAs 115 can be derived from RNA population 125.
[0031] In the methods, RNA fragments 130 are then hybridized to subtractive cDNAs 115 to produce composition 200 (see Figure 2A). Composition 200, which is provided by the invention, includes a subpopulation of RNA: cDNA duplexes 210, which subpopulation comprises cDNAs from population 115 and the complementary RNA fragments from population 130 to which they hybridize. Also included in composition 200 are unhybridized RNA fragments 215, e.g., RNA fragments from population 130 for which there are no complementary cDNA sequences in population 115. For example, an RNA: cDNA duplex 210, e.g., that is present in composition 200, can comprise cDNA 250 which includes, e.g., tag 105, and encodes the reverse transcript of splice isoform 101, and RNA fragments 245, which comprise subsequences of splice isoforms 101 and 102 that can hybridize to cDNA 250. Similarly, RNA fragments 215 can comprise subsequences 240 of splice isoform 102, e.g., RNA subsequences that do not hybridize to any cDNA sequences in population 115. Thus, invention also provides methods of determining whether a particular splice isoform, e.g., isoform 101 and/or 102, are present in a population of RNAs, e.g., population 100.
[0032] RNA: cDNA duplexes 210 are then removed from population 200, e.g., via enzymatic digestion, electrophoresis, affinity purification, or the like (see Figure 2B). For example, DNAse and RNAse H can be added to population 200 to digest the cDNAs and the RNA fragments hybridized to the cDNAs, respectively, in duplexes 210. In alternate embodiments of the methods in which the subtractive cDNAs comprise a tag (e.g., tag 105), population 200 can be run over an affinity column that comprises a moiety that binds tag 105. Only unhybridized RNA fragments 215 will be eluted from the column. In other embodiments, the RNA: cDNA duplexes (e.g., duplexes 210) which are typically a higher molecular weight than the unhybridized RNA fragments, e.g., fragments 215, can be separated from the unhybridized RNA fragments via electrophoresis.
[0033] As described above, the methods facilitate the removal of mRNA species common to both subtractive RNA population 100 and sample RNA population 125, while mRNA fragments 215, e.g., cell type-specific splice isoforms, tissue-specific splice isoforms, disease-specific splice isoforms, and/or the like, that are unique to sample population 125, remain and can be further characterized. Such analyses can include, e.g., ligating linkers to the RNA fragments in preparation for reverse transcription. The reverse transcribed fragments can then be sequenced, optionally using any of a variety of high- throughput sequencing platforms described below.
[0034] Accordingly, the methods of the invention can also used to determine whether the sample population of RNAs comprises any splice isoforms of a target gene that are not present in the subtractive population, and/or vice versa.
FURTHER DETAILS REGARDING ALTERNATIVE SPLICING
[0035] Alternative pre-mRNA splicing (AS) is a precisely regulated process in which a pre-mRNA's exons are separated and reconnected in different combinations to produce alternative mature mRNA species that encode multiple protein isoforms. For example, alternative splicing can alter the function of proteins by removing or adding specific domains, e.g., nuclear localization signals, transcription activation domains, DNA or RNA binding domains, trans-membrane domains, phosphorylation sites, and/or post- translation modification sites. Additionally or alternatively, alternative splicing can cause substantial changes in protein structure by altering even just a few residues at a splice site (Davletov, et al. (2004) "Sculpting a domain by splicing." Nat Struct Biol 11: 4-5). AS can also generate variability within untranslated regions of mRNAs which affect gene expression by adding or removing mRNA elements that, e.g., regulate translation efficiency, mRNA stability, or intracellular localization.
[0036] AS has been observed in nearly all multicellular organisms. For example, bioinformatic analyses based on EST sequences and exon-exon junction microarray studies estimate that 59%-74% of human genes are alternatively spliced (Johnson, et al. (2003) "Genome-wide array of human alternative pre-mRNA splicing with exon junction microarrays." Science 302: 2142-2144; Kan, et al. (2001) "Gene structure prediction and alternative splicing analysis using genomically aligned ESTs." Genome Res 11: 889-900), indicating that AS is one major source of protein diversity in humans. The methods and compositions provided by the invention can permit the identification of splice isoforms that would be otherwise difficult to isolate using currently available methods that entail, e.g., designing probes specific to all possible exon-exon junctions, having a priori knowledge of coding regions in a pre-mRNA, knowing the complete sequence of, e.g., a large mammalian genome, etc.
[0037] AS events can undergo regulation in which splicing pathways are modulated according to, e.g., cell type (see, e.g., Cooper, T. A. (2005) "Alternative splicing regulation impacts heart development." Cell 120: 59-72), developmental stage (see, e.g., Barberan- Soler, et al. (2008) "Alternative Splicing Regulation During C. elegans Development: Splicing Factors as Regulated Targets." PIoS 4: elOOOOOl), gender (see, e.g., Chang, et al. (2005) "Age and gender-dependent alternative splicing of P/Q-type calcium channel EF- hand." Neuroscience 145: 1026-1036), external stimuli (see, e.g., Keller, et al. (2007) "Extracellular Matrix Gene Alternative Splicing by Trabecular Meshwork Cells in Response to Mechanical Stretching." Investigative Ophthalmology and Visual Science 48: 1164-1172), and/or other factors. Splicing is carried out by the spliceosome, a large complex comprising over 100 core proteins and 5 small nuclear RNAs (snRNAs) (described in, e.g., Smith, et al. (2008) "Naught may endure but mutability": spliceosome dynamics and the regulation of splicing." MoI Cell 30: 657-66; Matlin, et al. (2007) "Spliceosome assembly and composition." Adv Exp Med Biol 623: 14-35; Valadkhan, S. (2007) "The spliceosome: caught in a web of shifting interactions." Cwrr Opin Struct Biol Yl'. 310-5). RNA splicing is dependent upon the identity of nucleotide sequences, or "core splicing signals" at the 5' splice site, the 3' splice site, and the branch point. Each of these cis elements is recognized multiple times by spliceosome proteins during spliceosome assembly. In addition to these elements, additional sites, e.g., exonic splice enhancers (ESEs), exonic splice silencers (ESSs), intronic splice enhancers (ISEs), and intronic splice silencers (ISSs), can recruit additional trans-aciing splicing factors that activate or suppress, e.g., splice site recognition, spliceosome assembly, etc. Such cis-acting elements are described in further detail in, e.g., Matlin (2005) "Understanding alternative splicing: Toward a cellular code." Nat Rev MoI Cell Biol 6: 386-398; Chasin, L. (2007) "Searching for Splicing Motifs." In Alternative splicing in the postgenomic era (Eds. B. J. Blencowe and B. R, Graveley), pp. 85-106. Landes Biosciences: Austin TX; Wang, et al. (2008) "Splicing regulation: From a parts list of regulatory elements to an integrated splicing code." RNA doi: 10.1261/rna.876308; House, et al. (2008) "Regulation of alternative splicing: more than just the ABCs." Journ Biol Chem 283: 1217-1221; and others.
[0038] The methods and compositions of the invention can be beneficially used to determine which splice isoforms are present in, e.g., a cell or tissue, e.g., at specific developmental stages or in response to particular environmental stimuli, and such data can inform drug discovery and diagnostics efforts. In humans, protein-rich target genes of interest whose splice variants can be further characterized using the invention include, e.g., cadherins, which play roles in cell adhesion. Cadherins are involved in morphogenesis of tissues such as the neural tube, and their misexpression has been implicated in human malignancies (Wheelock et al. (2003) "Cadherins as Modulators of Cellular Phenotype." Annu Rev Cell Dev Biol 19: 207-235). Neurexins, which play roles in presynaptic and postsynaptic development (Chih et al. (2006) "Alternative Splicing Controls Selective Trans-Synaptic Interactions of the Neuroligin-Neurexin Complex." Neuron 51: 171-178), and calcium-activated potassium channels, which play roles in neuronal excitation (Jurkat- Rott et al. (2004) "The impact of splice isoforms on voltage-gated calcium channel alphal subunits." J Physiol 554: 609-19), are other genes whose alternative splice isoforms can be advantageously identified using the methods described herein.
[0039] AS and its disruption can also influence the susceptibility of an individual to a disease and/or the disease's severity (Wang, et al. (2007) "Splicing in disease: disruption of the splicing code and the decoding machinery." Nat Rev Genet 8: 749-761; Srebow, et al. (2006) "The connection between splicing and cancer." Journ Cell Sci 119: 2635-2641; Faustino, et al. (2003) "Pre-mRNA splicing and human disease." Genes Dev 17: 419-437). For example, alternative splicing defects have been identified as the cause of numerous diseases including β-thalassemia, cystic fibrosis, and premature aging (Faustino, et al. (2003) "Pre-mRNA splicing and human disease." Genes Dev 17: 419-437). Determining how the splicing of, e.g., a target gene, is altered in, e.g., a disease cell, can inform strategies directed at reversing or circumventing misregulated splicing events. Thus, the methods provided herein can be used to detect whether alternate RNA splice isoforms are present in a population of RNAs derived from a patient, e.g., as compared to a subtractive population of RNAs, to make a diagnosis, predict a prognosis, or inform a drug regimen. [0040] Further details regarding spliceosome proteins and splicing mechanism can be found in, e.g., Jurica (2008) "Detailed close-ups and the big picture of spliceosomes." Curr Opin Struct Biol 18: 315-20; Schellenberg, et al. (2008) "Pre-mRNA splicing: a complex picture in higher definition." TIBS 33: 243-6; Neubauer (2005) "The analysis of multiprotein complexes: the yeast and the human spliceosome as case studies." Methods Enzymol 405: 236-63; Konarska, et al. (2005) "Insights into the mechanisms of splicing: more lessons from the ribosome." Genes Dev 9: 2255-60; and others. Details regarding AS regulation are elaborated in, e.g., (2007) Alternative splicing in the postgenomic era. Eds. B. J. Blencowe and B. R, Graveley, 2007. Landes Biosciences: Austin TX; Lareau, et al. (2007) "The coupling of alternative splicing and nonsense-mediated mRNA decay." Adv Exp Med Biol 623: 190-211; Blaustein, et al. (2007) "Signals, pathways and splicing regulation." Int J Biochem Cell Biol 39: 2031-48; Hagiwara (2005) "Alternative splicing: a new drug target of the post-genome era." Biochim Biophys Acta 1754: 324-31; Schwerk, et al. (2005) "Regulation of apoptosis by alternative pre-mRNA splicing." MoI Cell 19: 1-13; and others. Additional information regarding AS and disease are reviewed in, e.g., Hartmann, et al. (2008) "Diagnostics of pathogenic splicing mutations: does bioinformatics cover all bases?" Front Biosci 13: 3252-72; Orengo, et al. (2007) "Alternative splicing in disease." Adv Exp Med Biol 623: 212-23; Moore, et al. (2008) "Global analysis of mRNA splicing." RNA 14: 197-203; Solis, et al. (2008) "Splicing fidelity, enhancers, and disease." Front Biosci 13: 1926-42; and others.
FURTHER DETAILS REGARDING HIGH THROUGHPUT SEQUENCING SYSTEMS [0041] The unhybridized RNA fragments that are enriched following the removal of cDNA:RNA hybrids can comprise additional splice isoforms of a target gene, e.g., that are present in a sample population of RNAs and absent from the subtractive population of RNAs or vice versa (see Figure 2 and corresponding description). These RNA fragments can optionally be reverse transcribed, according to methods described elsewhere herein, and sequenced using, e.g., any of a variety of high-throughput DNA sequencing systems (reviewed in, e.g., Chan, et al. (2005) "Advances in Sequencing Technology" (Review) Mutation Research 573: 13-40). See, e.g., Hodges, et al. (2007) "Genome-wide in situ exon capture for selective resequencing." Nat Genet 39: 1522-1527; Olson M (2007) "Enrichment of super-sized resequencing targets from the human genome." Nat Methods 4: 891-892; Porreca, et al. (2007) "Multiplex amplification of large sets of human exons." Nat Methods 4: 931-936.
[0042] One subset of commercial sequencing systems, e.g., those available from
Affymetrix and Complete Genomics, Inc., rely on indirect methods of determining a DNA's sequence, e.g., sequencing by hybridization (SBH), in which a sequence of a DNA is assembled based on experimental data obtained from hybridization experiments performed to determine the oligonucleotide content of the DNA chain. See, e.g., Drmanac, et al. (2002) "Sequencing by hybridization (SBH): advantages, achievements, and opportunities." Adv Biochem Eng Biotechnol 11: 75-101. SBH typically employs an array comprising a known arrangement of short oligonucleotides of known sequence, e.g., oligonucleotides representing all possible sequences of a given length. An unknown sequence of, e.g., fluorescently labeled DNA, is fragmented, and the resulting fragments are then hybridized to the oligonucleotide probes in the array. Because the hybridization of a nucleic acid to a short complementary sequence can be sensitive to even single-base mismatches, the hybridization intensity of the labeled nucleic acid fragments to individual probes in the array is computationally assessed to determine the sequences of the fragments. Additional computational approaches are then used to assemble the sequence fragments to determine the entire sequence of the nucleic acid whose fragments were hybridized to the array.
[0043] SoLID, a commercial sequencing system available from Applied
Biosystems, is based on "sequencing by ligation" (SBL), in which the mismatch sensitivity of a DNA ligase enzyme is used to determine the underlying sequence of the target nucleic acid molecule. Briefly, one or more sets of encoded adaptors is ligated to the terminus of a target polynucleotide, e.g., a single-stranded DNA of unknown sequence. Encoded adaptors whose protruding strands form perfectly matched duplexes with the complementary protruding strands of the target polynucleotide are ligated, and the identity of the nucleotides in the protruding strands is determined by an oligonucleotide tag carried by the encoded adaptor. Such determination, or "decoding" is carried out by specifically hybridizing a labeled tag complementary to its corresponding tag on the ligated adaptor.
[0044] Other commercial high-throughput sequencing systems, e.g., those available from 454 Life Sciences, Illumina, and Pacific Biosciences, are based on multiplexed direct sequencing methods, e.g., "sequencing by synthesis" (SBS), in which each base position in a single-stranded DNA template is determined individually during the synthesis of a complementary strand. 454 Sequencing, a technology available from 454 Life Sciences, is a massively-parallellized, multiplex pyrosequencing system (Nyren (2007) "The History of Pyrosequencing." Methods MoI Biol 373: 1-14; Ronaghi (2001) "Pyrosequencing sheds light on DNA sequencing." Genome Res 11: 3-11; and Wheeler, et al. (2008) "The complete genome of an individual by massively parallel DNA sequencing." Nature 452: 872-876) that relies on fixing nebulized, adapter-ligated single-stranded DNA fragments to small DNA- capture beads.
[0045] Single molecule real-time sequencing (SMRT) is another massively parallel sequencing technology that can be compatible with the high-throughput resequencing of target nucleic acids isolated isolated from a sample, e.g., by using capture probes synthesized according to any of the methods described previously. Developed and commercialized by Pacific Biosciences, SMRT technology relies on arrays of multiplexed zero-mode waveguides (ZMWs) in which, e.g., thousands of sequencing reactions can take place simultaneously. The ZMW is a structure that creates an illuminated observation volume that is small enough to observe, e.g., the template-dependent synthesis of a single single-stranded DNA molecule by a single DNA polymerase (See, e.g., Levene, et al. (2003) "Zero Mode Waveguides for Single Molecule Analysis at High Concentrations," Science 299: 682-686).
[0046] For example, cDNAs derived from unhybridized RNA fragments, e.g., fragments comprising splice isoforms present in a sample population of RNAs but not in a sutractive population of RNAs or vice versa, can be sequenced using systems that include bridge amplification technologies, e.g., in which primers bound to a solid phase are used in the extension and amplification of solution phase target nucleic acid acids prior to SBS. (See, e.g., Mercier, et al. (2005) "Solid Phase DNA Amplification: A Brownian Dynamics Study of Crowding Effects." Biophysical Journal 89: 32-42; Bing, et al. (1996) "Bridge Amplification: A Solid Phase PCR System for the Amplification and Detection of Allelic Differences in Single Copy Genes." Proceedings of the Seventh International Symposium on Human Identification, Promega Corporation Madison, WI.) Solexa sequencing, available from Illumina, is one such sequencing system. FURTHER DETAILS REGARDING MOLECULAR BIOLOGY TECHNIQUES
Subtractive Hybridization [0047] In the methods provided by the invention, subtractive hybridization is used to identify RNA fragments, e.g., derived from a sample population of RNAs, that encode one or more splice isoform of one or more target gene, e.g., that is present in a sample population of RNAs but not in a subtractive population of RNAs (see Figure 1 and corresponding description). The principle of this approach relies on the removal of mRNA species common to both the subtractive and sample RNA populations, leaving behind RNA fragments that comprise splice isoform(s) unique to the sample population of RNAs. Such splice isoforms can optionally include, e.g., cell type-specific splice isoforms, tissue- specific splice isoforms, disease-specific splice isoforms, etc. The alternative splice isoforms are thus isolated from the sample RNA population and can be subject to further analysis, e.g., sequencing using an automated high-throughput sequencing system. The methods of the invention can also optionally be used to identify mRNA species that are present in the sample population of RNAs in higher abundance relative to the subtractive population of RNAs.
[0048] In alternative embodiments of the methods, the subtractive and sample populations of RNAs can be reversed. For example, cDNAs that are produced by reverse transcribing a sample population of RNAs can be hybridized to RNA fragments derived from a subtractive population of RNAs. The RNA: cDNA duplexes can then be removed from unhybridized RNA fragments, e.g., fragments derived from subtractive RNAs, to identify one or more splice isoform that is both present in the subtractive population of RNAs and absent from the sample population of RNAs. These RNA fragments can be optionally be reverse transcribed and further characterized, as described above.
[0049] Subtractive cDNAs, e.g., derived from reverse transcription of the subtractive RNAs, and RNA fragments, e.g., produced from a sample population of RNAs, "hybridize" when they associate, typically in solution. Nucleic acids hybridize due to a variety of well-characterized physico-chemical forces, such as hydrogen bonding, solvent exclusion, base stacking and the like. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes part I chapter 2, "Overview of principles of hybridization and the strategy of nucleic acid probe assays," (Elsevier, New York), as well as in Current Protocols in Molecular Biology, Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (supplemented through 2004) ("Ausubel"); Hames and Higgins (1995) Gene Probes 1 IRL Press at Oxford University Press, Oxford, England, (Hames and Higgins 1) and Hames and Higgins (1995) Gene Probes 2 IRL Press at Oxford University Press, Oxford, England (Hames and Higgins 2).
[0050] Typically, the results of subtractive hybridization are validated using additional techniques that are well known in the art, e.g., northern blot, in situ hybridization, RT-PCR, and the like. These techniques are described in detail in, e.g., e.g., Sambrook et al., Molecular Cloning - A Laboratory Manual (3rd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 2000 ("Sambrook"); and Current Protocols in Molecular Biology, F.M. Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (supplemented through 2007) ("Ausubel").
[0051] Further details regarding subtractive hybridization are elaborated in, e.g.,
Aasheim, et al. (1996) "Subtractive hybridization for the isolation of differentially expressed genes using magnetic beads. " Meth MoI Biol 69: 115-128; Wink. "The Investigation of Transcriptional Activity." An Introduction to Molecular Biotechnology. Ed. Michael Wink. Germany: Wiley- VCH, 2006. 334-340; Blumberg, et al. "Subtractive Hybridization and Construction of cDNA Libraries." Methods in Molecular Biology, Vol. 97. Ed. Sharpe and Mason. Totowa, NJ: Humana Press, Inc, 1999. 119-129; Røsok, et al. "Discovery of differentially expressed genes: technical considerations." Methods in Molecular Biology, Vol. 360. Ed. Sharpe and Mason. Totowa, NJ: Humana Press, Inc, 2007. 115-129; Darwin (2005) "Genome-wide screens to identify genes of human pathogenic Yersinia species that are expressed during host infection." Curr Issues MoI Biol 7: 135-49; and others. In addition, subtractive hybridization kits are commercially available, including PCR-Select™ cDNA Subtraction Kit (Clontech).
Preparing RNAs and cDNAs [0052] The methods described herein include providing two distinct populations of
RNAs, e.g., a subtractive population of RNAs and a sample population of RNAs. The subtractive RNAs used to generate subtractive cDNAs, and the sample RNAs are fragmented and hybridized to the subtractive cDNAs. Hybridization produces a subpopulation of unhybridized RNA fragments and a subpopulation of RNA: cDNA duplexes, which are then removed from the unhybridized RNA fragments. Determining the sequences of the unhybridized RNA fragments can be useful, e.g., in identifying one or more splice variants of one or more target gene or in comparing the differential expression of, e.g., splice isoforms of a target gene, e.g., between different tissue types, between different treatments to the same tissue type, or between different developmental stages of the same tissue type.
[0053] mRNA can typically be isolated from almost any source using protocols and methods described in, e.g., Sambrook and Ausubel. The yield and quality of the isolated mRNA can depend on how a tissue is stored prior to RNA extraction, the means by which the tissue is disrupted during RNA extraction, or on the type of tissue from which the RNA is extracted, and RNA isolation protocols can be optimized accordingly. Many mRNA isolation kits are commercially available, e.g., the mRNA-ONLY™ Prokaryotic mRNA Isolation Kit and the mRNA-ONLY™ Eukaryotic mRNA Isolation Kit (Epicentre Biotechnologies), the FastTrack 2.0 mRNA Isolation Kit (Invitrogen), and the Easy-mRNA Kit (BioChain). In addition, mRNA from various sources, e.g., bovine, mouse, and human, and tissues, e.g. brain, blood, and heart, is commercially available from, e.g., BioChain (Hayward, CA), Ambion (Austin, TX), and Clontech (Mountainview, CA).
[0054] Once the purified mRNA is recovered, reverse transcriptase is used to generate cDNAs from the mRNA templates. Methods and protocols for the production of cDNA from mRNAs, e.g., harvested from prokaryotes as well as eukaryotes, are elaborated in cDNA Library Protocols, I. G. Cowell, et al., eds., Humana Press, New Jersey, 1997, Sambrook, and Ausubel. In addition, many kits are commercially available for the preparation of cDNA, including the Cells-to-cDNA™ II Kit (Ambion), the RETROscript™ Kit (Ambion), the CloneMiner™ cDNA Library Construction Kit (Invitrogen), and the Universal RiboClone® cDNA Synthesis System (Promega). Many companies, e.g., Agencourt Bioscience and Clontech, offer cDNA synthesis services.
Generating Nucleic Acid Fragments [0055] In the methods described herein, RNA fragments are generated from a sample population of RNAs, i.e., in preparation for hybridization to subtractive cDNAs derived from the subtractive population of RNAs. There exist a plethora of ways of producing such RNA fragments. These include, but are not limited to, mechanical methods, such as sonication, mechanical shearing, nebulization, hydroshearing, and the like; enzymatic methods, such as exonuclease digestion, endonuclease digestion, and the like; chemical cleavage, and electrochemical cleavage. These methods are further explicated in Sambrook and Ausubel. In preferred embodiments, chemical cleavage is used to fragment RNAs, as detailed in the example below.
Nucleic Acid Tags [0056] In the methods provided by this invention, tagged subtractive cDNAs are produced from the subtractive population of RNAs via reverse transcription. The tags can permit the detection of RNA: cDNA duplexes, e.g., in a population of nucleic acids that comprises a subpopulation of RNA: cDNA duplexes and a subpopulation of unhybridized RNA fragments, e.g., following hybridization of subtractive cDNAs to a population of RNA fragments derived from a sample population of RNAs. In addition, the tags permit the RNA: cDNA duplexes to be separated, e.g., via affinity purification or the like, from the subpopulation of unhybridized RNA fragments, e.g., RNA fragments that represent splice isoforms of one or more target gene that are present in a sample population of RNAs but not present in the subtractive population of RNAs. Nucleic acid tags, e.g., such as those optionally present on the subtractive cDNAs, can comprise any of a plethora of ligands, such as high-affinity DNA-binding proteins; modified nucleotides, such as methylated, biotinylated, or fluorinated nucleotides; and nucleotide analogs, such as dye-labeled nucleotides, non-hydrolysable nucleotides, or nucleotides comprising heavy atoms. For example, tags can optionally comprise one or more fluorescent label, blocking group, phosphorylated nucleotide, thiol linker, phosphothiorated nucleotide, amine-reactive nucleotide, uracil, and/or the like. Such reagents are widely available from a variety of vendors, including Perkin Elmer, Jena Bioscience and Sigma-Aldrich.
[0057] Nucleic acid tags can also include oligonucleotides that comprise specific sequences, such as restriction sites, cis regulatory sites, nucleotide hybridization sites, protein binding sites, sequences capable of forming hairpin secondary structures, DNA promoters, sample or library identification sequences, and the like. Such sequences can be of advantageous use in, e.g., sequencing cDNAs derived from unhybridized sample RNA fragments that have been reverse transcribed using tagged primers. Linkers that are attached to unhybridized RNA fragments in preparation for reverse transcription can also beneficially include any one or more of the sequences listed above. Oligonucleotide tags can be custom synthesized by commercial suppliers such as Operon (Huntsville, AL), IDT (Coralville, IA) and Bioneer (Alameda, CA). Any of a number of methods that are well known in the art can be used to join tags to nucleic acids of interest, include chemical linkage, ligation, and extension of a primer comprising a tag by a polymerase or reverse transcriptase. Further details regarding nucleic acid tags and the methods by which they are attached to nucleic acids of interest are elaborated in Sambrook and Ausubel.
Amplifying and/or Copying Nucleic Acids [0058] Certain transcripts can be present in a sample population of RNAs in greater abundance relative to a subtractive population of RNAs. Thus, in certain embodiments, it can be beneficial to amplify subtractive cDNAs to insure that the unhybridized RNA fragments that remain following the removal of RNA: cDNA hybrids comprise additional mRNA splice isoforms and not, e.g., RNA species that are expressed at higher levels in the sample population.
[0059] A variety of nucleic acid amplification and/or copying methods are known in the art and can be implemented to, e.g., amplify subtractive cDNAs and/or cDNAs derived from the reverse transcription of unhybridized RNA fragments, e.g., RNA fragments from a sample population of RNAs. The most widely used in vitro technique among these methods is polymerase chain reaction (PCR), which requires the addition of nucleotides, oligonucleotide primers, buffer, and an appropriate polymerase to the amplification reaction mix. Additional methods that can be used to amplify, or copy, nucleic acids include strand displacement amplification (SDA), rolling-circle amplification (RCA) and multiple- displacement amplification (MDA). Each of these techniques is further described in Sambrook et al., Molecular Cloning - A Laboratory Manual (3rd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 2000 ("Sambrook"); and Current Protocols in Molecular Biology, F.M. Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (supplemented through 2007) ("Ausubel"), and DNA Amplification: Current Technologies and Applications, V. V. Demidov et al., eds., (1st Ed.), Taylor and Francis, 2004.
KITS
[0060] Kits are also a feature of the invention. The present invention provides kits that include useful reagents, e.g., tagged DNA primers, affinity columns, and/or one or more enzymes that are used in the methods, e.g., a reverse transcriptase, a DNA polymerase, etc. Such reagents are most preferably packaged in a fashion to enable their use. The kits of the invention optionally include additional reagents, such as a control target nucleic acids, buffer solutions and/or salt solutions, including, e.g., divalent metal ions, i.e., Mg+*, Mn++ and/or Fe++, nucleic acid adapter tags, e.g., to prepare unhybridized RNA fragments for sequencing, e.g., using a currently available or future automated high-throughput sequencing system. Such kits also typically include a container to hold the kit components, instructions for use of the compositions, e.g., to practice the methods, and other reagents in accordance with the desired application methods, e.g., identifying exon-exon junctions, or other characteristics of alternately spliced mRNA isoforms of a target gene.
SYSTEMS
[0061] The methods and compositions provided by the invention can advantageously be integrated with systems that can, e.g., automate and/or multiplex the steps of the methods described herein, e.g., methods for separating one or more alternate isoform of a target gene from a sample population of RNAs. Systems of the invention can include one or more modules, e.g., that automate a method herein, e.g., for high-throughput sequencing applications. Such systems can include fluid-handling elements and controllers that move reaction components into contacts with one another, signal detectors, and system software/instructions.
[0062] Systems of the invention can optionally include modules that provide for detection or tracking of products, e.g., unhybridized RNAs that comprise sequences that correspond to a splice isoform of a target gene that is present in a sample population of RNAs but not in a subtractive population of RNAs. Additionally or alternatively, the systems can monitor the synthesis of cDNAs from such unhybridized RNAs and/or detect the nucleotide sequence of such cDNAs, e.g., produced during a sequencing reaction. Detectors can include spectrophotometers, epifluorescent detectors, CCD arrays, CMOS arrays, microscopes, cameras, or the like. Optical labeling is particularly useful because of the sensitivity and ease of detection of these labels, as well as their relative handling safety, and the ease of integration with available detection systems (e.g., using microscopes, cameras, photomultipliers, CCD arrays, CMOS arrays and/or combinations thereof). High- throughput analysis systems using optical labels include DNA sequencers, array readout systems, cell analysis and sorting systems, and the like. For a brief overview of fluorescent products and technologies see, e.g., Sullivan (ed) (2007) Fluorescent Proteins, Volume 85, Second Edition (Methods in Cell Biology) (Methods in Cell Biology) ISBN-10: 0123725585; Hof et al. (eds) (2005) Fluorescence Spectroscopy in Biology: Advanced Methods and their Applications to Membranes, Proteins, DNA, and Cells (Springer Series on Fluorescence) ISBN-10: 354022338X; Haughland (2005) Handbook of Fluorescent Probes and Research Products, 10th Edition (Invitrogen, Inc./ Molecular Probes); BioProbes Handbook, (2002) from Molecular Probes, Inc.; and Valeur (2001) Molecular Fluorescence: Principles and Applications Wiley ISBN-10: 352729919X. System software, e.g., instructions running on a computer can be used to track and inventory reactants or products, and/or for controlling robotics/ fluid handlers to achieve transfer between system stations/modules. The overall system can optionally be integrated into a single apparatus, or can consist of multiple apparatus with overall system software/instructions providing an operable linkage between modules.
EXAMPLES
[0063] The following examples are offered to illustrate, but not to limit the claimed invention. One of skill will immediately recognize a variety of non-critical parameters that can be modified to achieve essentially similar results.
[0064] It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims.
Preparing a sample population of RNAs [0065] To prepare a sample population of RNAs for subtractive hybridization, total
RNAs are prepared from cells and enriched for polyadenylated mRNAs using methods well known to one of skill in the art. The polyadenylated mRNAs are then added to diluted 5X fragmentation buffer (20OmM Tris-acetate pH= 8.1, 50OmM potassium acetate, 15OmM magnesium acetate) and incubated at 95°C for 30 minutes. EDTA is then added to a final concentration of 6OmM to quench the reactions, and the fragmented mRNAs are run on a 15% acrylamide/7M urea gel. A fragment of the gel that corresponds to the position on the gel at which 25-50 base pair-long fragments migrate is excised, and the gel slice is placed in an 0.5ml microfuge tube that has had holes punched through the bottom with a 21 gauge syringe needle. The 0.5 ml tube is placed in 1.5ml microfuge tube and spun briefly until pieces of the gel slice are pushed into the 1.5 ml tube through the holes in the 0.5 ml tube. The volume of the gel pieces is estimated and 5 volumes of elution buffer (5OmM NaCl, 10 mM Tris-HCl pH=8.0) are added to the tube, which is then rotated at 40C overnight. Following the incubation, the 1.5 ml microfuge tube is spun for 15 minutes at 7500, and the supernatant is transferred to a new tube. The supernatant is filtered through a Spin-X centrifuge filter (available from Corning) to remove any gel pieces that may have been transferred. After all gel pieces have been removed, the RNA fragments are ethanol precipitated and pelletted, a technique well known by those of skill in the art. The preceding steps are performed in RNAse free tubes with RNAse-free reagents.
Preparing a subtractive population of cDNAs [0066] Subtractive RNAs are prepared by harvesting total RNA from cells and purifying polyadenylated mRNAs from the total RNA using any of a variety of methods known to those of skill in the art. The polyadenylated RNAs are then reverse transcribed into cDNA with biotin-conjugated oligo dT primers. The resulting RNAxDNA duplexes are treated with RNAse H to hydrolyze the RNA strands of the duplexes. It is assumed that cDNAs are produced from the RNAs at a 1:1 ratio.
Subtractive hybridization of the sample population of RNAs to the subtractive population of cDNAs [0067] Assuming a 1:1 RNA: cDNA conversion ratio during the reverse transcription reaction described above, a 1:2 ration of subtractive cDNAs: sample RNAs are added to diluted 5X hybridization buffer (20OmM HEPES pH=7.3, 2.5M NaCl, 5mM EDTA) and incubated at 800C for 5 minutes. The cDNAs and RNAs are then further incubated at 650C overnight.
[0068] RNA: cDNA duplexes are purified from the population of unhybridized
RNA with excess Streptavidin beads (1 mg beads can hold lOOpmol of Biotin). Beads are added to the hybridization mix and incubated at 4°C for 5 hours. The tube containing the beads and hybridization mix is spun, pelletting the subtractive biotin-conjugated cDNAs and the RNAs to which they have hybridized. The supernatant, which contains the unhybridized RNAs, e.g., RNAs that are unique to the sample population, is transferred to a new 1.5 ml microfuge tube, and the RNAs are ethanol precipitated and pelletted, according to methods well known in the art. [0069] The unhybridized RNA fragments are then resolved on a 15% acrylomide/7M urea gel, and fragment of the gel that corresponds to the position on the gel at which 25-50 base pair-long fragments migrate is excised. The RNA fragments are eluted from the gel slice, as described above.
Preparing RNA fragments for sequencing [0070] Solexa adaptors are ligated to the RNA fragments, which are then gel purified as described above. 5' linkers are ligated to the purified fragment, and the ligation products are subject to a second round of gel purification. The fragments to which the Solexa adaptors and 5' linkers have been attached treated with DNAs and extracted with phenol/chloroform. The adaptor-ligated fragments are then precipitated with 1:1 ethanol: isopropanol and pelletted. The pellet is resuspended, and a reverse transcriptase reaction is performed with Solexa 3' primers to produce cDNAs from the RNAs. The cDNAs are amplified via PCR, and the PCR products run on a gel. DNA fragments between 60 and 100 base pairs in size are extracted from the gel.
[0071] While the foregoing invention has been described in some detail for purposes of clarity and understanding, it will be clear to one skilled in the art from a reading of this disclosure that various changes in form and detail can be made without departing from the true scope of the invention. For example, all the techniques and apparatus described above can be used in various combinations. All publications, patents, patent applications, and/or other documents cited in this application are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication, patent, patent application, and/or other document were individually and separately indicated to be incorporated by reference for all purposes.

Claims

CLAIMSWHAT IS CLAIMED IS:
1. A method of separating one or more alternate splice isoform of a target gene from a sample population of RNAs, the method comprising: a) providing a subtractive population of RNAs, which population comprises a first isoform of the target gene; b) reverse transcribing the subtractive population of RNAs to produce a population of cDNAs; c) providing a sample population of RNAs, which population comprises the one or more alternate splice isoform of the target gene; d) fragmenting the sample population of RNAs to produce RNA fragments; e) hybridizing the RNA fragments to the population of cDNAs to produce a subpopulation of RNAxDNA duplexes and a subpopulation of unhybridized RNAs, wherein the subpopulation of RNA:cDNA duplexes comprises the RNA fragments that comprise the first isoform of the target gene, and wherein the subpopulation of unhybridized RNAs comprises the RNA fragments that comprise the one or more alternate splice isoform of the target gene; and, f) removing the RNAxDNA duplexes from unhybridized RNA fragments, thereby separating the one or more alternate splice isoform of the target gene from the sample population of RNAs.
2. The method of claim 1, wherein the subtractive population of RNAs is derived from a mammal.
3. The method of claim 1, wherein reverse transcribing the subtractive population of RNAs comprises annealing tagged DNA primers to 3' ends of RNAs in the subtractive population and extending the tagged primers with a reverse transcriptase.
4. The method of claim 3, wherein tags on the tagged DNA primers comprise one or more moiety selected from: a ligand, a fluorescent label, a blocking group, a phosphorylated nucleotide, a nucleotide analog, a fluorinated nucleotide, a nucleotide comprising a heavy atom, a biotinylated nucleotide, a methylated nucleotide, a uracil, a sequence capable of forming hairpin secondary structure, an oligonucleotide hybridization site, a restriction site, a DNA promoter, a protein binding site, a sample or library identification sequence, a thiol linker, a phosphorothioated nucleotide, an amine-reactive nucleotide and a cis regulatory sequence.
5. The method of claim 1, wherein the sample population of RNAs is derived from a first cell type and the subtractive population of RNAs is derived from a second cell type, wherein the first and second cell types are derived from a first organism.
6. The method of claim 5, wherein the first cell type is a non-disease cell and the second cell type is a disease cell.
7. The method of claim 1, wherein the sample population of RNAs is derived from a first cell type in a first organism and the subtractive population of RNAs is derived from the first cell type in a second organism, wherein the first and second organisms are members of a first species.
8. The method of claim 1, wherein the sample population of RNAs is derived from a first cell type in a first organism at a first developmental stage and the subtractive population of RNAs is derived from the first cell type in the first organism at a second developmental stage.
9. The method of claim 1, wherein the sample population of RNAs is derived from a first cell type of a first organism that has been exposed to a first treatment and the subtractive population of RNAs is derived from the first cell type of the first organism that has been exposed to a second treatment.
10. The method of claim 1, wherein fragmenting the sample population of RNAs to produce the RNA fragments comprises one or more method selected from: enzymatic digestion, sonication, mechanical shearing, electrochemical cleavage, chemical cleavage and nebulization.
11. The method of claim 1, wherein removing the RNA:cDNA duplexes comprises digesting the RNAxDNA duplexes with an RNAseH and a DNAse.
12. The method of claim 1, further comprising: annealing tagged DNA primers to 3' ends of the RNAs in the subtractive population and extending the tagged primers with a reverse transcriptase to produce tagged cDNAs; and, wherein removing the RNAxDNA duplexes comprises separating the RNA: tagged cDNA duplexes from the unhybridized RNA fragments via affinity purification.
13. The method of claim 1, wherein removing the RNAxDNA duplexes comprises separating the RNAxDNA duplexes from the unhybridized RNAs via electrophoresis.
14. The method of claim 1, wherein the method comprises: a) reverse transcribing the unhybridized RNA fragments to produce product cDNAs; b) sequencing the product cDNAs; and, c) comparing the sequences of the product cDNAs to a sequence of the target gene.
15. The method of claim 14, wherein reverse transcribing the unhybridized RNA fragments comprises attaching linkers to first ends of the fragments, annealing DNA primers to the linkers, and extending the primers with a reverse transcriptase to produce the cDNAs.
16. The method of claim 14, wherein the cDNAs are sequenced by an automated high- throughput sequencing system.
17. A composition, comprising: a subpopulation of RNAxDNA duplexes and a subpopulation of unhybridized RNA fragments, which composition has been produced by: a) providing a subtractive population of RNAs, which population comprises a first isoform of the target gene; b) reverse transcribing the subtractive population of RNAs to produce a population of cDNAs; c) providing a sample population of RNAs, which population comprises the one or more alternate splice isoform of the target gene; d) fragmenting the sample population of RNAs to produce RNA fragments; e) hybridizing the RNA fragments to the population of cDNAs to produce a subpopulation of RNAxDNA duplexes and a subpopulation of unhybridized RNAs, wherein the subpopulation of RNAxDNA duplexes comprises the RNA fragments that comprise the first isoform of the target gene, and wherein the subpopulation of unhybridized RNAs comprises the RNA fragments that comprise the one or more alternate splice isoform of the target gene.
PCT/US2009/006450 2008-12-09 2009-12-08 Methods for identifying differences in alternative splicing between two rna samples WO2010077288A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US20137208P 2008-12-09 2008-12-09
US61/201,372 2008-12-09

Publications (2)

Publication Number Publication Date
WO2010077288A2 true WO2010077288A2 (en) 2010-07-08
WO2010077288A3 WO2010077288A3 (en) 2010-11-04

Family

ID=42310447

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/006450 WO2010077288A2 (en) 2008-12-09 2009-12-08 Methods for identifying differences in alternative splicing between two rna samples

Country Status (1)

Country Link
WO (1) WO2010077288A2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012116248A1 (en) * 2011-02-24 2012-08-30 Massachusetts Institute Of Technology ALTERNATIVELY SPLICED mRNA ISOFORMS AS PROGNOSTIC INDICATORS FOR METASTATIC CANCER
CN105063074A (en) * 2015-06-16 2015-11-18 青岛科技大学 Method for artificially reforming functional protein
US20150337364A1 (en) * 2014-01-27 2015-11-26 ArcherDX, Inc. Isothermal Methods and Related Compositions for Preparing Nucleic Acids
US9487828B2 (en) 2012-05-10 2016-11-08 The General Hospital Corporation Methods for determining a nucleotide sequence contiguous to a known target nucleotide sequence
US10450597B2 (en) 2014-01-27 2019-10-22 The General Hospital Corporation Methods of preparing nucleic acids for sequencing
US11390905B2 (en) 2016-09-15 2022-07-19 Archerdx, Llc Methods of nucleic acid sample preparation for analysis of DNA
US11795492B2 (en) 2016-09-15 2023-10-24 ArcherDX, LLC. Methods of nucleic acid sample preparation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030165931A1 (en) * 1998-03-11 2003-09-04 Bruno Tocque Qualitative differential screening
US20030219805A1 (en) * 2002-03-13 2003-11-27 Zvi Kelman Detection of alternative and aberrant mRNA splicing
US20070003929A1 (en) * 2002-12-12 2007-01-04 Yoshihide Hayashizaki Method for identifying, analyzing and/or cloning nucleic acid isoforms
US20080153086A1 (en) * 2003-08-08 2008-06-26 Wong Albert J Method For Rapid Identification of Alternative Splicing
WO2008119767A2 (en) * 2007-03-30 2008-10-09 Oryzon Genomics, S. A. Method of nucleic acid analysis.

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030165931A1 (en) * 1998-03-11 2003-09-04 Bruno Tocque Qualitative differential screening
US20030219805A1 (en) * 2002-03-13 2003-11-27 Zvi Kelman Detection of alternative and aberrant mRNA splicing
US20070003929A1 (en) * 2002-12-12 2007-01-04 Yoshihide Hayashizaki Method for identifying, analyzing and/or cloning nucleic acid isoforms
US20080153086A1 (en) * 2003-08-08 2008-06-26 Wong Albert J Method For Rapid Identification of Alternative Splicing
WO2008119767A2 (en) * 2007-03-30 2008-10-09 Oryzon Genomics, S. A. Method of nucleic acid analysis.

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9926601B2 (en) 2011-02-24 2018-03-27 Massachusetts Institute Of Technology Alternatively spliced mRNA isoforms as prognostic indicators for metastatic cancer
WO2012116248A1 (en) * 2011-02-24 2012-08-30 Massachusetts Institute Of Technology ALTERNATIVELY SPLICED mRNA ISOFORMS AS PROGNOSTIC INDICATORS FOR METASTATIC CANCER
US10017810B2 (en) 2012-05-10 2018-07-10 The General Hospital Corporation Methods for determining a nucleotide sequence contiguous to a known target nucleotide sequence
US9487828B2 (en) 2012-05-10 2016-11-08 The General Hospital Corporation Methods for determining a nucleotide sequence contiguous to a known target nucleotide sequence
US10718009B2 (en) 2012-05-10 2020-07-21 The General Hospital Corporation Methods for determining a nucleotide sequence contiguous to a known target nucleotide sequence
US11781179B2 (en) 2012-05-10 2023-10-10 The General Hospital Corporation Methods for determining a nucleotide sequence contiguous to a known target nucleotide sequence
CN107075566A (en) * 2014-01-27 2017-08-18 阿谢尔德克斯有限公司 For preparing the isothermal method of nucleic acid and compositions related
US20150337364A1 (en) * 2014-01-27 2015-11-26 ArcherDX, Inc. Isothermal Methods and Related Compositions for Preparing Nucleic Acids
US10450597B2 (en) 2014-01-27 2019-10-22 The General Hospital Corporation Methods of preparing nucleic acids for sequencing
CN107075566B (en) * 2014-01-27 2021-06-15 阿谢尔德克斯有限责任公司 Isothermal methods for preparing nucleic acids and related compositions
US11807897B2 (en) 2014-01-27 2023-11-07 The General Hospital Corporation Methods of preparing nucleic acids for sequencing
CN105063074A (en) * 2015-06-16 2015-11-18 青岛科技大学 Method for artificially reforming functional protein
CN105063074B (en) * 2015-06-16 2019-03-05 青岛耐德生物技术有限公司 A kind of method of artificial reconstructed functional protein
US11390905B2 (en) 2016-09-15 2022-07-19 Archerdx, Llc Methods of nucleic acid sample preparation for analysis of DNA
US11795492B2 (en) 2016-09-15 2023-10-24 ArcherDX, LLC. Methods of nucleic acid sample preparation

Also Published As

Publication number Publication date
WO2010077288A3 (en) 2010-11-04

Similar Documents

Publication Publication Date Title
US10619206B2 (en) Sequential sequencing
Duff et al. Transgenic mouse models of Alzheimer's disease: how useful have they been for therapeutic development?
Morozova et al. Applications of new sequencing technologies for transcriptome analysis
US8236499B2 (en) Methods and compositions for nucleic acid sample preparation
CN113166797A (en) Nuclease-based RNA depletion
US7553947B2 (en) Method for gene identification signature (GIS) analysis
US20030165843A1 (en) Oligonucleotide library for detecting RNA transcripts and splice variants that populate a transcriptome
Rosenkranz et al. Characterizing the mouse ES cell transcriptome with Illumina sequencing
WO2010077288A2 (en) Methods for identifying differences in alternative splicing between two rna samples
CN103917654A (en) Methods and systems for sequencing long nucleic acids
US20160024556A1 (en) ENRICHMENT AND NEXT GENERATION SEQUENCING OF TOTAL NUCLEIC ACID COMPRISING BOTH GENOMIC DNA AND cDNA
CN108463559A (en) The deep sequencing profile analysis of tumour
US20230383336A1 (en) Method for nucleic acid detection by oligo hybridization and pcr-based amplification
US20060228714A1 (en) Nucleic acid representations utilizing type IIB restriction endonuclease cleavage products
Akintunde et al. The evolution of next-generation sequencing technologies
Kapranov et al. Beyond expression profiling: next generation uses of high density oligonucleotide arrays
Conze et al. Single molecule analysis of combinatorial splicing
WO2010083046A2 (en) Methods for using next generation sequencing to identify 5-methyl cytosines in the genome
Tian et al. Genomic Architecture of Cells in Tissues (GeACT): Study of Human Mid-gestation Fetus
Carbonell-Sala et al. CapTrap-Seq: A platform-agnostic and quantitative approach for high-fidelity full-length RNA transcript sequencing
Bhattacharjee Advances of transcriptomics in crop improvement: A Review
Zıplar et al. Genomic and Transcriptomic Sequencing and Analysis Approaches
Shin et al. Assembly of Mb-size genome segments from linked read sequencing of CRISPR DNA targets
Røsok et al. Discovery of differentially expressed genes: technical considerations
RU2746126C9 (en) Method for preparing dna library

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09836485

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09836485

Country of ref document: EP

Kind code of ref document: A2