WO2011009089A1 - SMALL NON-CODING REGULATORY RNAs AND METHODS FOR THEIR USE - Google Patents

SMALL NON-CODING REGULATORY RNAs AND METHODS FOR THEIR USE Download PDF

Info

Publication number
WO2011009089A1
WO2011009089A1 PCT/US2010/042346 US2010042346W WO2011009089A1 WO 2011009089 A1 WO2011009089 A1 WO 2011009089A1 US 2010042346 W US2010042346 W US 2010042346W WO 2011009089 A1 WO2011009089 A1 WO 2011009089A1
Authority
WO
WIPO (PCT)
Prior art keywords
allele
expression
disease
rna molecule
rna
Prior art date
Application number
PCT/US2010/042346
Other languages
French (fr)
Inventor
Gennadi V. Glinsky
Original Assignee
Ordway Research Institute, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ordway Research Institute, Inc. filed Critical Ordway Research Institute, Inc.
Priority to US13/261,142 priority Critical patent/US20120316218A1/en
Publication of WO2011009089A1 publication Critical patent/WO2011009089A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/11Antisense
    • C12N2310/111Antisense spanning the whole gene, or a large part of it
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Definitions

  • the invention relates to small, non-coding RNA molecules having gene regulatory activity, compositions comprising same, and methods for their use.
  • RNAs small interfering RNA molecules
  • shRNAs short hairpin RNA molecules
  • IiRNAs interfering antisense non-coding RNAs
  • miRNAs microRNAs
  • siRNAs involved in gene silencing have been described in various organisms including S. pombe, T. thermophila, A. thaliana, D. melanogaster and C. elegans.
  • the human proteins involved in chromatin remodeling include methyltransferases such as the de novo DNA methyltransferase Dnmt3A, histone deacetylase 1 ("HDACl”), and the histone lysine methyltransferase KMT6, also known as EZH2.
  • methyltransferases such as the de novo DNA methyltransferase Dnmt3A, histone deacetylase 1 ("HDACl"), and the histone lysine methyltransferase KMT6, also known as EZH2.
  • IiRNAs have been identified in mammalian cells acting to silence particular chromosomal regions, such as the HOX family of genes in eukaryotes and the X chromosome in mice and humans. 231 IiRNAs were identified as transcribed from the intergenic regions of the HOX loci. The majority of these were antisense compared to the HOX genes. At least one IiRNA was identified (HOTAIR) that negatively regulates a gene (HOXD) distant from its site of transcription. The mechanism apparently involves recruiting proteins of the Polycomb complex to the promoter region and thereby increasing the amount of repressive H3K27me3.
  • the Polycomb (PcG) proteins are transcriptional repressors which act as genome-wide regulators of expression during development.
  • the PcG proteins alter the epigenetic state of chromatin, for example, by increasing histone methylation or ubiquination. It is not clear how the PcG complex is targeted to a specific promoter region, but recruitment of the complex and the subsequent formation of heterochromatin is believed to underlie PcG- mediated gene silencing.
  • an IiRNA was identified in humans and mice that mediates silencing. Although the mechanism of action is not known in human cells, in the mouse it appears to involve recruitment of a PcG complex to the promoter region through direct interaction between the IiRNA and a subunit of the complex.
  • IiRNAs are also involved in genomic imprinting of autosomal genes.
  • Imprinting is a mono-allelic mechanism of gene silencing based on the parent-of-origin.
  • the IiRNAs silence large domains of the genome through their interaction with chromatin, specifically be recruiting methyltransferases and PcG complexes to the loci of the silenced genes.
  • RNA molecules function in combination with PcG proteins and perhaps other, unidentified proteins, to silence the expression of particular genes in cancer cells, such as tumor suppressor genes, analogous to their putative role during development.
  • cancer cells such as tumor suppressor genes
  • the complex role of these molecules in transcriptional silencing during normal development and in diseases such as cancer remains to be established.
  • miRNAs are a class of small (20-30 nucleotides in length) non-coding regulatory RNAs that perfectly match the 3' untranslated regions (3 TJTR) of target messenger
  • RNAs Binding of the miRNA to its target sequence results in degradation of the messenger
  • GWAS genome-wide associations studies
  • SNPs small nucleotide polymorphisms
  • miRNAs and also within miRNA target sites in messenger RNAs.
  • the present inventors demonstrated that many disease-linked SNPs are located far from protein-coding genes but in transcriptionally active regions of the genome.
  • the invention is based upon the discovery of a novel class of non-coding RNAs transcribed from these intergenic regions containing disease-linked SNPs.
  • the present invention is based upon the discovery that genomic regions containing disease-associated small nucleotide polymorphisms (SNPs) are actively transcribed to produce small non-coding SNP -bearing RNA molecules having biological activity. These RNA molecules are referred to herein as "snpRNAs".
  • the small non-coding SNP -bearing RNA molecules of the invention have biological activity.
  • specific RNA molecules of the invention are demonstrated to modulate the expression of other non- coding RNA molecules as well as protein-coding genes.
  • the small non- coding SNP -bearing RNA molecules of the invention modulate the activity of the innate immunity/inflammasome pathway by modulating the expression of particular genes in that pathway.
  • an snpRNA molecule of the invention modulates the expression of a gene selected from NLRP3, NLRPl, HMGAl, and MYB.
  • an snpRNA molecule of the invention facilitates hormone-independent growth of a hormone-dependent cell or cell line.
  • the hormone-dependent cell is a prostate cell.
  • the cell is a prostate cancer cell.
  • the snpRNAs regulate the expression of genes distant from their site of transcription, and thus may also be referred to as "transRNAs.”
  • the invention provides the sequences of specific cDNA molecules corresponding to the snpRNAs described herein, methods and reagents for their detection in a biological sample from a subject, and methods for their use in diagnostic and prognostic assays.
  • An snpRNA molecule of the invention contains a disease-associated SNP which is located within a loop structure of the RNA molecule.
  • this loop structure containing the SNP also contains a binding site for a microRNA (“miRNA”) molecule.
  • miRNA microRNA
  • the SNP is located within a binding site for one or more of the following proteins: H3K27Me3, CBP/CREB, Ezh2, and POL2.
  • the binding sites overlap.
  • the SNP is within the binding site for a nuclear lamina protein.
  • the SNP is located within 200 basepairs of a binding site for a lamin Bl protein.
  • the invention provides isolated, purified cDNA molecules corresponding to the snpRNA molecules described herein.
  • the cDNA molecules are useful to express the snpRNA molecules of the invention in heterologous cells and to detect the presence of the snpRNA molecules in a biological sample from a subject.
  • the cDNA molecules are useful as probes to detect the snpRNA molecules in the sample, e.g., in hybridization based assays.
  • the cDNA molecules are used as positive controls for the detection of the snpRNA molecules in a biological sample from a subject.
  • the invention provides an isolated small non-coding RNA molecule transcribed from an intergenic region of the human genome, wherein the RNA molecule is less than 500, less than 400, less than 300, less than 200, less than 150, less than 100, or less than 75 nucleotides and the intergenic region contains at least one small nucleotide polymorphism (SNP) associated with one or more human diseases or disorders.
  • the intergenic region contains only one SNP.
  • the snpRNA molecule is contiguous.
  • the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1-101, 332, and 333. In another embodiment, the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1, 4, 6, 7, 9-18, 39, 88-90, 332, and 333. In another embodiment, the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1, 7, 332, and 333.
  • the SNP is selected from the group consisting of rs2670660, rs6596075, rs6983561, rsl6901979, rsl3281615, rsl0505477, rsl0808556, rs6983267, rs7014346, rs7000448, rsl447295, rs2820037, rs889312, rsl937506, rsl3387042, rs7716600, rsl 1249433, and rs3803662.
  • the SNP is selected from the group consisting of, rs9469220, rs9270986, rs6457617, rs615672, rs7837688, rs6997709, rsl 6892766, rs2670660, and rs2542151.
  • the invention also provides a vector comprising a polynucleotide encoding an
  • the vector comprises the cDNA form of an RNA molecule described herein.
  • the invention further provides a cell comprising said vector.
  • the cell is ex vivo or in vitro.
  • the invention also provides a kit comprising, in one or more containers, a vector comprising a polynucleotide encoding an RNA molecule of the invention.
  • the vector comprises the cDNA form of an RNA molecule described herein and instructions for expressing the RNA molecule from the vector.
  • the kit further comprises one or more polynucleotide primers for amplifying an RNA or a cDNA molecule of the invention.
  • the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102-331.
  • the one or more primers comprises a sequence selected from the group consisting of SEQ ID
  • the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102, 103, 114, 115, 326, and 327.
  • the invention also provides a kit comprising, in one or more containers, a cell comprising said vector and instructions for expressing the RNA molecule in the cell.
  • the invention also provides a method for detecting the small non-coding RNA molecules described herein in a sample from a subject, the method comprising detecting the
  • step of detecting the RNA molecules comprises the step of detecting the cDNA form of the RNA molecule in the sample.
  • the cDNA form is detected by a method comprising reverse transcription and polymerase chain reaction (RT-PCR) technology.
  • RT-PCR reverse transcription and polymerase chain reaction
  • the cDNA form is detected by a method comprising nucleic acid hybridization technology.
  • the method further comprises the steps of isolating the small RNA fraction from the sample and converting the RNA into cDNA prior to the step of detecting the cDNA in the sample.
  • the invention also provides a method for evaluating the risk that a human subject will develop a disease or condition associated with a specific allele of an SNP ("the pathological allele") by detecting the presence of an RNA molecule of claim 1 in a sample from the subject, wherein the RNA molecule is transcribed from the pathological allele, and wherein detection of said RNA molecule indicates that the subject has an increased risk for developing the disease or condition and the failure to detect said RNA molecule indicates that the subject has a decreased risk for developing the disease or condition.
  • the pathological allele an SNP
  • the method further comprises detecting the expression level of the RNA molecule transcribed from the pathological allele relative to its expression in a population of healthy subjects, wherein an increased or decreased level of expression relative to the population of healthy subjects indicates that the subject has an increased risk for developing the disease or condition.
  • the step of detecting the presence of an RNA molecule transcribed from the pathological allele is performed indirectly, by detecting the expression of one or more genes whose expression is regulated by the RNA molecule.
  • the invention also provides a method for diagnosing a disease or condition associated with a specific allele of an SNP ("the pathological allele") in a human subject, the method comprising detecting the presence of an RNA molecule of claim 1 in a sample from the subject, wherein the RNA molecule is transcribed from the pathological allele, and wherein the disease or condition is positively diagnosed if the RNA molecule is detected in the sample.
  • the invention also provides a method for treating, preventing, or ameliorating a disease or condition associated with a specific allele of an SNP ("the pathological allele") in a subject in need thereof, the method comprising administering one or more therapeutic agents that act to suppress the expression or antagonize the activity of an RNA molecule of claim 1, wherein the RNA molecule is transcribed from the pathological allele.
  • the presence of the G-allele snpRNA of rs2670660 is detected in a sample from the subject, wherein the presence of the
  • G-allele snpRNA indicates that the subject is at an increased risk for developing a disease or disorder selected from vitiligo, Crohn's disease, rheumatoid arthritis, Huntington's disease,
  • Alzheimer's disease breast cancer, metastatic breast cancer, prostate cancer, metastatic prostate cancer, autism, and obesity.
  • the presence of the A-allele snpRNA of rs 16901979 is detected in a sample from the subject, wherein the presence of the
  • A-allele snpRNA indicates that the subject is at an increased risk for developing a cancer of epithelia origin.
  • the cancer is selected from breast cancer, metastatic breast cancer, prostate cancer, and metastatic prostate cancer.
  • the subject is human.
  • the sample is a blood, tissue, or cell sample.
  • the disease or condition is selected from the group consisting of vitiligo, Crohn's disease, rheumatoid arthritis, Huntington's disease,
  • Alzheimer's disease breast cancer, metastatic breast cancer, prostate cancer, metastatic prostate cancer, autism, and obesity.
  • the disease or condition is selected from the group consisting of autism, alzheimer's disease, schizophrenia and bipolar disorder.
  • the disease or condition is an autoimmune disease or disorder.
  • the disease or condition is selected from the group consisting of vitiligo, ankylosing spondylitis, rheumatoid arthritis, multiple sclerosis, systemic lupus erythematosus and autoimmune thyroid disease.
  • the disease or condition is selected from the group consisting of ulcerative colitis and Crohn's disease.
  • the disease or condition is selected from the group consisting of breast cancer, colorectal cancer, lung cancer, ovarian cancer, and prostate cancer.
  • the disease or condition is selected from the group consisting of coronary artery disease, hypertension, type 1 diabetes, type 2 diabetes, and obesity.
  • the invention also provides an apparatus for evaluating a disease or condition, or evaluating the risk of developing a disease or condition, in a subject, the apparatus comprising a model configured to evaluate a dataset for the subject to thereby evaluate the risk of disease in the subject, wherein the model is based upon determining the similarity in the expression profile of a defined set of genes in a sample from the subject and the expression profile for that set of genes in one or more reference sets of the model, wherein a reference set comprises one or more of a population of healthy subjects and a population of subjects suffering from the disease, wherein the set of genes is a set of genes whose expression is regulated by a small RNA molecule of claim 1.
  • Figure 1 Identification of 12 small RNAs encoded by intergenic disease- associated SNPs using reverse-transcription PCR-based screening. Small RNA fractions were isolated from various human cell lines and subjected to the RT-PCR based screen. PCR products of expected size were purified, subjected to the nested PCR analysis and gel electrophoresis. Molecular identities of identified RNA molecules were validated by sequencing of primary PCR and nested PCR products.
  • the 12 RNAs identified by this method are designated A3, A6, A9, A16, A21-26, A28, and A29. The sequences are given in Table 1.
  • the primers used to amplify the sequences are given in Table 3.
  • Figure 15 shows the identification of other RNAs from the "A" set in different cell lines.
  • Figure 2 (A) Genomic coordinates of the endogenous small RNAs described in Figure 1 and corresponding disease-associated SNPs. Abbreviations used: Crohn's disease (CD), rheumatoid arthritis (RA), type 1 diabetes (TlD), autoimmune disorders (AID), hypertension (HT), prostate cancer (PC), breast cancer (BC), ovarian cancer (OC), colorectal cancer (CRC).
  • CD Crohn's disease
  • RA rheumatoid arthritis
  • TlD type 1 diabetes
  • AID autoimmune disorders
  • HT hypertension
  • PC prostate cancer
  • BC breast cancer
  • OC ovarian cancer
  • CRC colorectal cancer
  • Bottom right panel shows alignments of the miRNA target sites in RNA A21, which is transcribed from a region containing the prostate cancer susceptibility SNP rs7837688.
  • Individual human miRNAs are aligned along the A21 RNA sequence according to the positions of respective target sites.
  • Single vertical bar marks the position of the prostate cancer-predisposition SNP. Note that a vast majority of microRNA target sites segregates to the A21 transRNA segment around the SNP and includes SNP nucleotides.
  • Sense and anti-sense variants of a 52 nucleotide (“nt") rs2670660 sequence (shown in a shaded box, SEQ ID NO:1) were chemically synthesized, cloned into GFP-expressing lentiviral vectors, and utilized in biological and mechanistic experiments.
  • Sequences of PCR products were confirmed by direct sequencing. Nested PCR of the 152 nt product with primer set 1 using small RNA fractions (containing RNA of less than 200 nt in length) from various cell lines as template. Product of the expected size (110 nt) is shown. Sequences of PCR products were confirmed by direct sequencing.
  • RNA molecules 52 nt (top right) RNA molecules, and position of the miRNA-target sites along the 152 nt transRNA sequence (bottom right).
  • c, d) miRNAs which are differentially regulated in BJl cells expressing distinct allelic variants of the NALPl -locus transRNAs share multiple sequence identity segments of at least 11 nucleotides in length with sequences of MEG3 (c) and MALATl (d) long non-coding RNAs.
  • Figure 4 Expression of a small RNA transcribed from the G-allele of rs2670660 inhibits cell growth and results in Gl arrest. The following notation is used to designate the 4 small RNAs transcribed from the A-allele, the G-allele, and their antisense counterparts: A, G, asA, and asG. These 4 RNAs are also referred to collectively as "the '660 RNAs.” Transfected BJI cells were sorted by GFP expression and an enriched population (>90% GFP positive) was used in monolayer and clonal growth assays.
  • RNAs from the G-allele (rs2670660_G) or the A-allele (rs2670660_A) of the SNP rs2670660 were cultured for five days; cells were counted every 24 hours. Top line in graph is A;
  • middle line is GFP only; bottom line is G.
  • FACs Flow cytometric analysis
  • GFP empty vector
  • as sense and anti-sense variants of the A- and G-allele RNAs.
  • Representative FACs plots are shown above the bar graphs which represent the number of cells in each phase of the cell cycle (Gl, S, G2M), normalized to the vector control. Average values of three independent biological replicates are shown.
  • Figure 5 Representative results of clonogenic growth experiments of BJl cells expressing sense and anti-sense allele small RNAs encoded by rs2670660.
  • G and vector control G and vector control
  • asG G and vector control
  • asA G and vector control
  • a and asA vector control alone
  • G and asA G and asA
  • Figure 6 Constitutive expression of distinct allelic variants of NALPl -locus transRNAs exerts allele-specif ⁇ c effects on phenotypes of human cells.
  • THP-induced monocyte/macrophage differentiation THP-I cells expressing control vector or allele-specif ⁇ c sense and anti-sense variants of rs2670660-encoded RNAs were treated with TPA for 4 days to induce differentiation into macrophages. Left panels (top to bottom) show light microscopy images of control, A-allele, and G-allele transfected cells. Right panels show fluorescence images of the same. The cells expressing the G-allele variant failed to differentiate and retained a non-differentiated state.
  • THP-I cells expressing the G- allele of the rs2670660-encoded RNA undergo massive apoptosis and produce ⁇ 5-fold less macrophages which are twice less potent in the sheep erythrocyte phagocytosis assay compared to macrophages derived from THP-I cells expressing A-allele RNAs.
  • RNAs encoded by the rs2670660 sequence were analyzed using Affymetrix HG-Ul 33 A Pus 2.0 microarrays.
  • Panels A-D each show two (A, C) or three (B, D) rows of paired bars representing the expression of representative genes in cells expressing, from left to right, G, A, asG, asA, or GFP only (unlabeled, 5 th set of bars for each gene).
  • Panel A shows the expression data for 4 particular genes
  • Panel B for 9 genes Panel C for 4 genes
  • Panel D for 9 genes.
  • Panels E-M show the same relationships for large sets of genes using linear regression analysis to demonstrate the concordant and discordant patterns of gene expression under the various allele-specif ⁇ c conditions.
  • the y-axis is mRNA expression and the x-axis represents individual genes. Thus, each dot on the graph represents the mRNA expression level of a particular gene.
  • Panel K (top) shows the discordant expression profile for these genes in asA-allele RNA-expressing cells compared to asG-allele RNA-expressing cells.
  • the lower panel shows the discordant expression of a subset of 352 genes whose expression was differentially regulated by at least 4-fold.
  • Panel K shows the discordant expression profile for these genes in asG-allele RNA-expressing cells compared to asA-allele RNA-expressing cells.
  • the lower panel shows the discordant expression of a subset of 342 genes whose expression was differentially regulated by at least 4-fold.
  • RNAs induces mRNA expression changes of the inflammasome regulatory genes (NLRPl, NLRP3, HMGAl, Myb).
  • top right panel genes in BJl cells expressing the A- or G- alleles of the rs2670660-encoded RNAs.
  • Bottom panels show the ratios of NLRP3 to NLRPl (bottom left) and HMGAl to Myb (bottom right).
  • Left panels (top and bottom) show the expression in unstimulated cells.
  • Right panels show expression in LPS-stimulated cells.
  • Bottom panels show NLRP3/NLRP1 expression ratios in unstimulated (bottom left) and LPS-stimulated cells (bottom right).
  • FIG. 9 Microarray analysis of gene expression signatures (GES) associated with expression of rs2670660-encoded small RNAs identifies human cells with experimentally-induced activation of the inflammasome pathway. Expression profiles of G- allele concordant and G-allele discordant signatures in individual experimental and control samples of each data set were evaluated by calculating Pearson correlation coefficients (signature scores) using log 10-trans formed fold expression changes of G-allele-specif ⁇ c GES in BJl cells as a multidimensional standard vector. Shaded area identifies the range defined by the average +1- 2STD EV values of the signature scores in control set of samples.
  • GES gene expression signatures
  • LPS bronchoscopic endotoxin
  • FIG. 10 microRNA-signatures induced by expression of rs2670660- encoded transRNAs and associated mRNA GES recapitulating miRNA expression patterns. miRNAs differentially-regulated by rs2670660-allele-specific sense and anti-sense 52 nt small RNAs in BJl cells were identified using the quantitative PCR protocol for detection of 365 human miRNAs in a 384-well-format TaqMan Low Density Arrays (TaqMan Human MicroRNA Array vl.O; Applied Biosystems).
  • miRNAs the abundance levels of which in human cells are induced (miR-302a; miR-629; miR-548d; miR- 200a; miR-627; miR-770-5p) or repressed (miR-133a; miR-20b; miR-205; let-7b) by forced expression of pathology-linked G-allele snpRNAs compared to ancestral A-allele-expressing cells. Insert bars show the results Q-PCR analysis of expression of corresponding
  • FIG. 11 rs2670660-encoded RNAs alter expression of the PluriNet network transcripts and Polycomb pathway genes.
  • Gene expression signatures (GES) associated with expression of rs2670660-encoded sense and anti-sense allele-specific 52 nt small RNAs in BJl cells were independently identified for each experimental setting using t- statistics and 155 differentially-regulated transcripts of the PluriNet network and Polycomb pathway were selected for visualization.
  • GES Gene expression signatures
  • G-allele-specific rs2670660-encoded transRNAs induce concomitant upregualtion of the Polycomb Repressive Complex 2 (PRC2) genes Ezh2, Suzl2, and EED. Individual measurements of the mRNA expression levels of corresponding genes derived from two independent biological replicate experiments are shown. Note that in contrast to the PRC2 genes, the expression level of the BMIl gene, a key component of the PRCl complex, is decreased in BJl cells expressing G-allele-specific rs2670660-encoded transRNAs compared to A-allele-specific transRNA-expressing cells.
  • PRC2 Polycomb Repressive Complex 2
  • FIG. 12 Microarray analysis of gene expression signatures (GES) associated with expression of rs2670660-encoded small RNAs discriminates peripheral blood mononuclear cells (PBMC) from patients with multiple common human disorders and control subjects.
  • G Expression profiles (bars) and linear regression analysis (scatter) of a 377 gene G-allele discordant signature in PBMC of patients with symptomatic Huntington's disease (left set of bars), asymptomatic Huntington's disease (middle set of bars), and control subjects (right set of bars). Note distinct expression profiles of G-allele concordant signatures in PBMC from patients and control individuals.
  • FIG. 13 Microarray analysis of gene expression signatures (GES) associated with expression of rs2670660-encoded small RNAs discriminates normal and pathological tissue samples from patients with multiple common human disorders and control subjects.
  • GES gene expression signatures
  • FIG. 14 Microarray analysis of gene expression signatures (GES) associated with expression of rs2670660-encoded small RNAs discriminates normal and pathological tissue samples from patients with autism and control subjects (A) as well lean and obese subjects (B,C). GES associated with expression of G-allele-specif ⁇ c 52 nt small RNAs in BJl cells was identified using t-statistics and screened for concordant and discordant features in corresponding clinical settings to segregate G-allele concordant and G- allele discordant signatures.
  • GES gene expression signatures
  • Figure 15 Intergenic trans-regulatory RNAs represent a most prevalent class of transcripts containing SNP variants associated with common human disorders (A) and display cell-type specific patterns of expression in human cells (B; C).
  • small transRNAs AlO, Al 1, A18 are expressed exclusively in human cells of epithelial origin (RWPEl); transRNA A9 is expressed in cells of mesenchymal (BJl) and lymphoid (U937) origins, but not in epithelial RWPEl cells; transRNA Al 8 is expressed in epithelial RWPEl cells and mesenchymal BJl cells, but not in lymphoid U937 cells;
  • transRNA A21 is expressed in epithelial RWPEl cells and lymphoid U937 cells, but not in mesenchymal BJl cells. Nearly ubiquitous patterns of expression of long noncoding RNAs containing the corresponding SNP sequences suggest a model of cell type-specific biogenesis of small tarnsRNAs based on differentiation-associated processing of long non-coding RNAs. Small transRNAs and long noncoding RNAs containing identical SNP variants are aligned in columns designated A5, A6, A9, AlO, Al l, A13, A14, A18, A19, A20, and A21.
  • Figure 16 (A) Expression of RNA A6 (SEQ ID NO:7) facilitates androgen- independent growth of the androgen-dependent human prostate cancer cell line LNCap and the highly metastatic cell line LNCapLN3. (B) Expression of RNA A6 enhances the colony- formation ability of LNCap cells in soft agar.
  • Figure 17 Concordance analysis of 3299 and 1561 rs2670660 G-allele RNA- regulated transcripts.
  • Figure 18 Concordance analysis of 3268 and 1636 rs2670660 G-allele RNA- regulated transcripts.
  • the present invention is based upon the discovery of small SNP sequence- bearing RNA molecules having gene regulatory activity.
  • the small non-coding RNA molecules of the present invention are distinct from the non-coding RNA molecules of the prior art, which include, e.g., small and large interfering RNA molecules, hairpin RNA molecules, and microRNA molecules. See background, infra.
  • the term "non-coding" means that the RNA molecule is not translated into an amino acid sequence. Thus, the RNA molecules of the invention do not encode proteins.
  • the small RNA molecules of the invention are transcribed from intergenic or intronic regions of the human genome containing at least one disease-linked SNP.
  • snpRNAs small non-coding RNA molecules.
  • the snpRNA molecules of the invention are able to regulate the expression of genes distant from the genomic site of their transcription. Accordingly, they may also be referred to as “transRNA” molecules. As used herein, the terms “snpRNAs” and “transRNAs” are synonymous.
  • the snpRNA molecules of the invention, and their corresponding DNA and cDNA molecules, are isolated and preferably purified.
  • isolated in the context of a polynucleotide molecule of the invention, refers to a polynucleotide molecule that has been isolated from a cell.
  • An isolated polynucleotide may contain various impurities which are removed by subsequent purification. Methods for purifying polynucleotides from various cellular contaminants are known in the art.
  • purified in the context of a polynucleotide molecule of the invention, refers to a polynucleotide molecule that is substantially free of cellular material or contaminating proteins from the cell or tissue source from which it is isolated or
  • a purified polynucleotide of the invention has less than about 30%, 20%, 10%, or 5% (by dry weight) of heterologous protein, polypeptide, peptide, or antibody (also referred to as a "contaminating protein").
  • contaminating protein also referred to as a "contaminating protein"
  • the purified polynucleotide is 60%, preferably 65%, 70%, 75%, 80%, 85%, 90%, or 99% free of contaminating proteins, cellular material, chemical agents, and precursors.
  • the snpRNA molecules of the invention are non-coding RNA molecules transcribed from a genomic sequence containing a disease-linked SNP.
  • the SNP- containing genomic sequence is an intergenic sequence.
  • An intergenic sequence is one that is distant from a protein coding region of the genome.
  • An SNP refers to a particular kind of DNA sequence variation occurring in a population, preferably a human population, in which a single nucleotide (denoted A, T, C, or G, in accordance with the convention in the art) in the genome differs between members of a species at a particular location in the genome, also referred to as a genetic locus.
  • the differences are referred to as alleles based on the identity of the possible single nucleotide differences.
  • the nucleotide at the variant position is either C or T
  • these variants are referred to as the C-allele and the T-allele, respectively.
  • the SNP has only two alleles. Since an individual has paired sets of chromosomes, an individual is said to be homozygous or heterozygous for a particular allele depending on whether both chromosomes contain the same or different alleles, respectively.
  • SNPs can be assigned an allele frequency which refers to the frequency of a particular allele at a given genetic locus within the population.
  • allelic frequency is based upon a geographical population or an ethnic population.
  • containing at least one disease-linked SNP it is meant that the snpRNA is transcribed from an SNP -bearing allele of a DNA molecule.
  • the snpRNA is transcribed from one or both alleles of the DNA molecule bearing the SNP.
  • the allele of the SNP that is associated with a disease or disorder is referred to as the
  • disease-linked or “disease-associated” and synonymous terms when used in the context of an SNP refers to an SNP that has been associated with one or more diseases or disorders in a population of subjects, preferably human subjects, using methods known in the art. Such methods include, for example, genome-wide association studies of SNP variations. For example, a particular SNP may be associated with an increased incidence of the disease or disorder, meaning that individuals containing a particular allele at the site of the SNP are statistically more likely to have the disease or disorder. The statistical methods used to establish the association between SNPs and diseases or disorders are well known by those skilled in the art.
  • the SNP is selected from the group consisting of rs2670660, rs6596075, rs6983561, rsl6901979, rsl3281615, rsl0505477, rsl0808556, rs6983267, rs7014346, rs7000448, rsl447295, rs2820037, rs889312, rsl937506, rsl3387042, rs7716600, rsl 1249433, and rs3803662.
  • the SNP is selected from the group consisting of, rs9469220, rs9270986, rs6457617, rs615672, rs7837688, rs6997709, rsl 6892766, rs2670660, and rs2542151.
  • an isolated small non-coding RNA molecule is meant to refer to one or more isolated small non-coding RNA molecules.
  • the invention provides an isolated small non-coding RNA molecule transcribed from an intergenic region of the human genome, wherein the RNA molecule is less than 1000, less than 800, less than 500, less than 400, less than 200, less than 150, less than 100, or less than 75 nucleotides and the intergenic region contains at least one SNP associated with one or more human diseases or disorders.
  • the intergenic region contains only one SNP.
  • An intergenic region is a genomic region, preferably the human genome, located between clusters of genes. It is substantially devoid of protein-coding genes.
  • RNA molecules of the present invention are depicted as their cDNA forms.
  • the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1-101, 332, and 333.
  • the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1, 4, 7, 10, 17, 22-28, 32-34, 332, and 333.
  • the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1, 7, 332, and 333.
  • the invention also provides a vector comprising a polynucleotide molecule of the invention.
  • the vector comprises the cDNA form of an RNA molecule described herein.
  • vector in this context refers to a cloning vector or an expression vector, or both (i.e., the same vector may be designed for cloning and expression). The terms are used consistent with their common meaning in the art.
  • a cloning vector refers to a DNA molecule, typically a plasmid molecule, into which a foreign DNA fragment can be inserted, e.g., by restriction digest and ligation.
  • Non- limiting examples of cloning vectors include genetically engineered plasmids and
  • an expression vector is typically engineered to contain regulatory sequences that act as enhancer and promoter regions and lead to efficient transcription of the foreign DNA.
  • the vector is a viral vector.
  • the vector is an expression vector.
  • the vector is a cloning vector.
  • the invention further provides a cell comprising said vector.
  • the cell is a mammalian cell and most preferably a human cell.
  • the cell stably expresses the vector.
  • the invention also provides a kit comprising, in one or more containers, a vector comprising a polynucleotide molecule of the invention.
  • the kit comprises an RNA molecule described herein and instructions for expressing the RNA molecule from the vector.
  • the kit comprises the cDNA form of an RNA molecule described herein and instructions for expressing the RNA molecule from the vector.
  • the kit further comprises one or more polynucleotide primers for amplifying the cDNA molecule.
  • the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102-331.
  • the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102-161.
  • the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102, 103, 114, 115, 326, and 327.
  • the invention also provides a kit comprising, in one or more containers, a cell comprising said vector and instructions for expressing the RNA molecule in the cell.
  • the invention also provides a method for detecting the small non-coding RNA molecules described herein in a sample from a subject, the method comprising detecting the RNA molecules in the sample.
  • the step of detecting the RNA molecules comprises the step of detecting the cDNA form of the RNA molecule in the sample.
  • the cDNA form is detected by a method comprising reverse transcription and polymerase chain reaction (RT-PCR) technology.
  • the method comprises the technique of nested PCR.
  • RT-PCR refers to a PCR technique in which reverse transcriptase is first used to reverse transcribe RNA into its complementary DNA, also referred to as cDNA. The cDNA is then amplified by PCR.
  • PCR is a well known technique used to amplify a particular DNA molecule of interest, typically from a mixture containing a high background of non-specific DNA molecules. Nested PCR employs two sets of primers in two successive PCR reactions to achieve increased specificity.
  • the method further comprises the steps of isolating the small RNA fraction from the sample and converting the RNA into cDNA prior to the step of detecting the cDNA in the sample.
  • the cDNA form of the RNA molecules is detected by a method comprising nucleic acid hybridization technology.
  • the invention also provides a method for evaluating the risk that a human subject will develop a disease or condition associated with a specific allele of an SNP ("the pathological allele") by detecting the presence of an RNA molecule of claim 1 in a sample from the subject, wherein the RNA molecule is transcribed from the pathological allele, and wherein detection of said RNA molecule indicates that the subject has an increased risk for developing the disease or condition and the failure to detect said RNA molecule indicates that the subject has a decreased risk for developing the disease or condition.
  • the pathological allele an SNP
  • the method further comprises detecting the expression level of the RNA molecule transcribed from the pathological allele relative to its expression in a population of healthy subjects, wherein an increased or decreased level of expression relative to the population of healthy subjects indicates that the subject has an increased risk for developing the disease or condition.
  • the step of detecting the presence of an RNA molecule transcribed from the pathological allele is performed indirectly, by detecting the expression of one or more genes whose expression is regulated by the RNA molecule.
  • the invention also provides a method for diagnosing a disease or condition associated with a specific allele of an SNP ("the pathological allele") in a human subject, the method comprising detecting the presence of an RNA molecule of claim 1 in a sample from the subject, wherein the RNA molecule is transcribed from the pathological allele, and wherein the disease or condition is positively diagnosed if the RNA molecule is detected in the sample.
  • the invention also provides a method for treating, preventing, or ameliorating a disease or condition associated with a specific allele of an SNP ("the pathological allele") in a subject in need thereof, the method comprising administering one or more therapeutic agents that act to suppress the expression or antagonize the activity of an RNA molecule of claim 1, wherein the RNA molecule is transcribed from the pathological allele.
  • the term "subject” refers to an animal, preferably a mammal including a non-primate (e.g. , a cow, pig, horse, cat, dog, rat, and mouse) and a primate (e.g. , a chimpanzee, a monkey such as a cynomolgous monkey and a human), and more preferably a human.
  • a non-primate e.g. , a cow, pig, horse, cat, dog, rat, and mouse
  • a primate e.g. , a chimpanzee, a monkey such as a cynomolgous monkey and a human
  • the subject is human.
  • the sample is a blood, tissue, or cell sample.
  • the disease or condition is selected from the group consisting of Crohn's disease, rheumatoid arthritis, Huntington's disease, Alzheimer's disease, breast cancer, prostate cancer, autism, and obesity.
  • the invention also provides an apparatus for evaluating a disease or condition, or evaluating the risk of developing a disease or condition, in a subject, the apparatus comprising a model configured to evaluate a dataset for the subject to thereby evaluate the risk of disease in the subject, wherein the model is based upon determining the similarity in the expression profile of a defined set of genes in a sample from the subject and the expression profile for that set of genes in one or more reference sets of the model, wherein a reference set comprises one or more of a population of healthy subjects and a population of subjects suffering from the disease, wherein the set of genes is a set of genes whose expression is regulated by a small RNA molecule of claim 1.
  • the disease or disorder is selected from Crohn's disease, rheumatoid arthritis, bipolar disorder, Alzheimer's disease, vitiligo, ulcerative colitis, type 1 diabetes, type 2 diabetes, autoimmune thyroid disease, coronary artery diseases, hypertension, multiple sclerosis, obesity, and epithelial cancers.
  • the epithelial malignancy is selected from prostate, breast, ovarian, and colorectal cancer.
  • the snpRNA molecules of the invention are a novel class of non-coding RNA molecule transcribed from intergenic SNP-containing regions of the human genome.
  • This class of RNA molecule is defined by the following structural features.
  • the RNA molecules of the invention each contain a disease-associated SNP.
  • the disease-associated SNP is located within a loop structure of the RNA molecule.
  • this loop structure containing the SNP also contains a binding site for an miRNA molecule.
  • the SNP is located within a binding site for one or more of the following proteins: H3K27Me3, CBP/CREB, Ezh2, and POL2.
  • the binding sites overlap.
  • the SNP is within the binding site for a nuclear lamina protein.
  • the SNP is located within 200 basepairs of a binding site for a lamin Bl protein.
  • the invention provides isolated snpRNA molecules, their cDNA counterparts, and primers for their detection in a biological sample using, e.g., reverse-transcription polymerase chain reaction (RT-PCR) technology.
  • the isolated snpRNA molecules are purified.
  • the snpRNA molecules are in the form of their cDNA counterparts.
  • the snpRNA molecules of the invention are
  • polynucleotide sequences comprising the bases adenine (A), guanine (G), cytosine (C), and uracil (U).
  • the counterpart cDNA molecules are polynucleotide sequences comprising the bases adenine (A), guanine (G), cytosine (C), and thymine (T).
  • the sequences are denoted as strings of these bases, in accordance with the common practice in the art.
  • the sequences of the present invention are denoted as cDNA sequences of the corresponding RNA molecules.
  • the corresponding RNA molecule is easily envisioned from the cDNA sequences depicted here using methods routine in the art.
  • the snpRNA is an allelic variant.
  • An "allelic variant" of an snpRNA molecule of the invention refers to the allele of the SNP from which the snpRNA is transcribed.
  • the snpRNA corresponds to the pathological allele of the SNP.
  • the snpRNA corresponds to the ancestral allele.
  • the snpRNA is an A-allele RNA, a G-allele RNA, a C-allele RNA, or a T- allele RNA, wherein the reference to the particular allele is in the context of the SNP which encodes the RNA.
  • the snpRNA molecule of the invention is an SNP- containing fragment of a larger RNA molecule.
  • an snpRNA molecule of the invention is a processing variant of a longer non-coding RNA molecule.
  • the snpRNA molecules of the invention are molecules of 50 to 300 nucleotides in length, each containing at least one disease-linked SNP.
  • the snpRNA molecules of the invention are molecules of 50 to 300 nucleotides in length, each containing at least one disease-linked SNP.
  • an snpRNA molecule of the invention is about 25, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275, or 300 nucleotides in length.
  • the snpRNA molecule is between 50-100, 50-75, or 50-60 nucleotides in length.
  • the snpRNA molecule is about 50 nucleotides in length.
  • the snpRNA molecules comprise about 50, 60, 70, 80, 90, 100, 125 or 150 nucleotides flanking a disease- associated SNP.
  • an snpRNA molecule of the invention comprises 50, 60, 70, 80 or 90 nucleotides flanking the SNP.
  • the snpRNA molecule is contiguous.
  • the term "contiguous" in the context of an snpRNA molecule means that the snpRNA molecule is a single sequence, uninterrupted by any intervening sequence or sequences.
  • the snpRNA molecule of the invention acts as a transcriptional suppressor on one or more genes encoding proteins selected from the
  • Polycomb group refers to a family of chromatin remodeling proteins that function in the epigenetic silencing of genes.
  • NALPl and NALP3 refer to proteins that assemble into complexes called “inflammasomes” which activate caspase-1, resulting in the processing of pro-inflammatory cytokines and triggering an innate immune response.
  • PluriNet refers to a protein network common to pluripotent cells which enables them to differentiate into multiple cell types. See e.g., M ⁇ ller, F.J. et al., Regulatory networks define phenotypic classes of human stem cell lines, Nature 455:401-405 (18 September 2008).
  • the invention provides isolated snpRNA molecules and the cDNA
  • RNA molecules counterparts of the RNA molecules.
  • the following tables give the cDNA sequences of the snpRNA molecules of the invention. Each sequence in the table below represents two sequences, one for each allelic variant of the SNP. The two sequences for each allelic variant are identical except for a single nucleotide at the position indicated in the sequence as variable. The variable position is denoted in the sequence as, e.g., "[G/ A]” which indicates that one allele contains a "G" at that position in the sequence and the other allele contains an "A” at that position in the sequence.
  • the sequences below are referred to as "cDNA” sequences because they are the DNA sequence complementary to the RNA molecules transcribed from the genomic DNA.
  • the intergenic RNA molecules of the invention are represented by their respective cDNA sequences in Table 1. Additional RNA molecules identified or predicted to be encoded by intronic sequences are represented by their respective cDNA sequences in Table 2. Primers which can be used to amplify the RNA molecules of the invention using reverse transcription followed by a polymerase chain reaction are shown in Table 3.
  • Table 1 cDNA sequences of small snpRNA molecules transcribed from intergenic SNP 's.
  • the invention provides methods and reagents for the detection of specific snpRNAs in a biological sample from a subject.
  • the invention provides primers that can be used in an RT-PCR-based assay to identify the presence of one or more snpRNAs in a sample.
  • the invention also provides probes, in the form of cDNA molecules of the snpRNAs, for use in detecting the snpRNAs in a sample, and allelic variants thereof.
  • the invention also provides diagnostic and prognostic methods based on the detection of the snpRNAs.
  • the presence of a particular allelic variant of the snpRNA is detected according to the methods of the invention.
  • the allelic variant is the A-allele, the G-allele, the C-allele, or the T-allele, denoted with respect to the SNP sequence.
  • the allele is the pathological allele of the SNP. In another embodiment the allele is the ancestral allele of the SNP.
  • the pathological allele is selected from the G-allele of rs2670660 or the A-allele of rsl6901979.
  • An snpRNA molecule of the invention is an RNA molecule transcribed from a genomic sequence containing a disease-linked SNP.
  • the snpRNA can be transcribed from either allele, or from both alleles, of the SNP -bearing genomic sequence.
  • the detection of an snpRNA molecule transcribed from the pathological allele of the SNP indicates an increased risk for the disease or disorder linked to the SNP. The risk is based upon the risk associated with the specific allele of the SNP.
  • the presence of an snpRNA transcribed from a pathological allele translates to an increased risk of developing the disease or disorder or an increased risk of having a more severe or refractory form of the disease or disorder.
  • the failure to detect an snpRNA transcribed from a pathological allele, or the detection of an snpRNA transcribed from an ancestral allele indicates a decreased risk for the disease or disorder.
  • the term "refractory” describes patients treated with a currently available therapy for a disease or disorder, wherein the treatment with the currently available therapy is not clinically adequate either (i) to relieve one or more symptoms associated with the disease or disorder, (ii) to stop or adequately slow the progression of the disease or disorder, or (iii) to resolve the pathological effects of the disease or disorder.
  • the methods of the present invention because they are based upon the detection of snpRNA molecules, and allelic variants thereof, offer an improvement over methods based on the detection of the SNPs themselves. This is because, according to the present invention, the SNP itself is not functional and its mere presence, like that of a gene, does not necessarily have a biological consequence. Rather, the biological consequence results from its transcription, in this case into a non-coding regulatory RNA molecule.
  • the invention provides methods for detecting an snpRNA molecule in a sample.
  • the sample comprises the fraction of small RNA molecules from a cell or tissue.
  • the fraction of small RNA molecules is substantially free of contaminating DNA molecules and protein.
  • the method comprises contacting the sample with one or more short (10-30 base pairs) oligonucleotides under conditions permitting the hybridization of the one or more short oligonucleotides with the snpRNA molecule or a corresponding cDNA thereof.
  • the method further comprises one or more rounds of a polymerase chain reaction ("PCR") after the contacting step.
  • PCR polymerase chain reaction
  • a step of reverse transcription precedes the contacting step.
  • the PCR reaction is a nested PCR reaction.
  • the method further comprises the step of visualizing the PCR products of the PCR reaction using gel
  • the snpRNA molecule is detected in the sample if a PCR product of the predicted size is amplified in the PCR reaction.
  • the oligonucleotides are labeled with a detectable label.
  • the method comprises contacting the sample with one or more longer oligonucleotides (50-300 base pairs) under conditions permitting the hybridization of the oligonucleotides with the snpRNA molecule or a corresponding cDNA thereof.
  • the oligonucleotides are labeled with a detectable label.
  • the sample is bound to a solid support.
  • the solid support is a bead or a membrane support.
  • the snpRNA molecule is detected in the sample if the oligonucleotide selectively hybridizes with a molecule of the predicted size. Selective hybridization is determined using methods routine in the art of nucleic acid hybridization assays. For example, increasing the salt content of the wash buffers and the number, length, and temperature of the washing steps increases the specificity of binding.
  • the invention provides methods for determining the likelihood that a human subject will develop a disease or condition linked to an SNP by detecting the presence of an SNP sequence-bearing RNA molecule in a sample from the subject.
  • the subject has an increased likelihood of developing the disease or condition where an snpRNA transcribed from a pathological allele of the SNP is detected in a sample from the subject.
  • the subject has a decreased likelihood of developing the disease or condition where either no snpRNA is detected in the sample or an snpRNA transcribed from an ancestral allele is detected in the sample.
  • the invention provides a method for determining the risk to a subject of developing a particular disease or disorder, wherein a risk of developing the disease or disorder has been associated with an SNP, the method comprising detecting a small RNA containing the SNP in a sample from the subject by (1) obtaining a biological sample from the subject; (2) extracting the population of small RNAs from the sample; and (3) performing a reverse transcription polymerase chain reaction (RT-PCR) on the extract of small RNA from the sample, wherein the PCR is performed with a set of primers designed to amplify a complementary DNA fragment (cDNA) corresponding to the genomic region containing the SNP.
  • RT-PCR reverse transcription polymerase chain reaction
  • the primers are designed to amplify a cDNA fragment that is either sense or antisense with respect to the genomic DNA containing the SNP.
  • more than one set of primers is used to amplify the cDNA, wherein the more than one set of primers includes a set of nested PCR primers.
  • the more than one set of primers includes a set of primers to amplify the antisense cDNA fragment and the sense cDNA fragment.
  • the sample is a cell or tissue sample, a tumor tissue sample, a blood sample, or the sample comprises or is enriched for peripheral blood mononuclear cells (PBMC).
  • PBMC peripheral blood mononuclear cells
  • the embodiment in which the sample is "a cell” includes a plurality a cells.
  • the cells are a line of immortalized cells.
  • the cells are primary cells which have been cultured for a period of time to increase their cell number.
  • "a cell” or a plurality of cells refers to cells which are outside of a body, i.e., cells in vitro.
  • the presence of the G-allele snpRNA of rs2670660 is detected in a sample from the subject, wherein the presence of the
  • G-allele snpRNA indicates that the subject is at an increased risk for developing an autoimmune disorder.
  • the autoimmune disorder is selected from the group consisting of vitiligo, ankylosing spondylitis, rheumatoid arthritis, multiple sclerosis, systemic lupus erythematosus and autoimmune thyroid disease.
  • the presence of the A-allele snpRNA of rs 16901979 is detected in a sample from the subject, wherein the presence of the
  • A-allele snpRNA indicates that the subject is at an increased risk for developing a cancer of epithelia origin.
  • the cancer is selected from breast cancer, metastatic breast cancer, prostate cancer, and metastatic prostate cancer.
  • the cancer is prostate cancer or metastatic prostate cancer.
  • the presence of the C-allele snpRNA of rs6596075 is detected in a sample from the subject, wherein the presence of the
  • C-allele snpRNA indicates that the subject is at an increased risk for developing Crohn's disease.
  • the presence of the C-allele snpRNA of rs6983561 is detected in a sample from the subject, wherein the presence of the
  • C-allele snpRNA indicates that the subject is at an increased risk for developing prostate cancer.
  • the presence of the G-allele snpRNA of rsl3281615 is detected in a sample from the subject, wherein the presence of the
  • G-allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
  • the presence of the T-allele snpRNA of rsl0505477 is detected in a sample from the subject, wherein the presence of the
  • T-allele snpRNA indicates that the subject is at an increased risk for developing colorectal or prostate cancer.
  • the presence of the C-allele snpRNA of rsl0808556 is detected in a sample from the subject, wherein the presence of the
  • C-allele snpRNA indicates that the subject is at an increased risk for developing colorectal or prostate cancer.
  • the presence of the G-allele snpRNA of rs6983267 is detected in a sample from the subject, wherein the presence of the G-allele snpRNA indicates that the subject is at an increased risk for developing colorectal or prostate cancer.
  • the presence of the A-allele snpRNA of rs7014346 is detected in a sample from the subject, wherein the presence of the
  • A-allele snpRNA indicates that the subject is at an increased risk for developing colorectal cancer.
  • the presence of the T-allele snpRNA of rs7000448 is detected in a sample from the subject, wherein the presence of the
  • T-allele snpRNA indicates that the subject is at an increased risk for developing prostate cancer.
  • the presence of the A-allele snpRNA of rs 1447295 is detected in a sample from the subject, wherein the presence of the
  • A-allele snpRNA indicates that the subject is at an increased risk for developing prostate cancer.
  • the presence of the T-allele snpRNA of rs2820037 is detected in a sample from the subject, wherein the presence of the
  • T-allele snpRNA indicates that the subject is at an increased risk for developing hypertension.
  • the presence of the C-allele snpRNA of rs889312 is detected in a sample from the subject, wherein the presence of the C- allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
  • the presence of the A-allele snpRNA of rs 1937506 is detected in a sample from the subject, wherein the presence of the
  • A-allele snpRNA indicates that the subject is at an increased risk for developing
  • the presence of the A-allele snpRNA of rsl3387042 is detected in a sample from the subject, wherein the presence of the
  • A-allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
  • the presence of the A-allele snpRNA of rs7716600 is detected in a sample from the subject, wherein the presence of the
  • A-allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
  • the presence of the C-allele snpRNA of rsl 1249433 is detected in a sample from the subject, wherein the presence of the C-allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
  • the presence of the T-allele snpRNA of rs3803662 is detected in a sample from the subject, wherein the presence of the
  • T-allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
  • Table 3 Selected examples ofpathological alleles andthe associated disease or disorder
  • snpRNAs small non-coding RNAs of the invention
  • Chromatin-state maps based on H3K4me3-H3K36me3 signatures show that many intergenic disease-linked SNPs are located within the boundaries of the K4-K36 domains indicating that these intergenic SNP-harboring genomic regions are transcribed, even though none are located within the boundaries of exons of genomic sequences encoding long non-coding RNAs identified to date.
  • the following data demonstrate that these SNP- containing intergenic regions are in fact transcribed to produce non-coding RNA molecules having gene regulatory activity.
  • Table 4 SNP classes defined by analysis of genomic coordinates of disease-linked SNPs identified in genome -wide association studies of 22 common human disorders. Five intergenic SNPs are associated with multiple diseases (3 with 3; and 2 with 2); 4 intronic SNPs are
  • RNAs identified in the intial screen using human cells of mesenchymal (BJl) and lymphoid (U937) origin are shown in Figures 1 and 2. The sequences of these RNA molecules are represented by their respective cDNA sequences in Table 1, supra.
  • the RT-PCR based screening protocol comprised the following steps:
  • RNA molecules [208] Analysis of the predicted secondary structures of these RNA molecules revealed the presence of loop sequences containing SNP -bearing segments of 8-11 nucleotides in length which are identical to primary sequences of microRNAs (Fig. 2B). The loop structures of the allelic variants also are predicted to have distinct secondary structures.
  • the RNA molecules contain multiple potential target sites for microRNAs which are often clustered around SNP nucleotides. These data suggested an epigenetic regulatory cross-talk between the intergenic RNAs and microRNAs.
  • microarray expression profiling of human cell lines stably expressing distinct allelic variants of the NALPl -locus SNP rs2670660 RNAs identified microRNAs whose expression was differentially regulated by the '660 RNAs in an allele-specific manner.
  • NALPl NALPl regulatory region
  • rs2670660 One of the NALPl -associated SNPs, rs2670660, is of particular interest because it occurs within a segment of the genome that is remarkably conserved among species, including human, chimpanzee, macaque, bush baby, cow, mouse, and rat.
  • primers Four sets of primers were designed to detect the predicted RNA molecules encoded by the rs2670660 sequences. The primer sequences (5' to 3')are as follows:
  • the expected size of the PCR product generated by each primer set is as follows: Set 1 : 110 basepairs (bp); Set 2: 152 bp; Set 3: 205 bp; Set 4: 225 bp.
  • the primers' specificity was validated by PCR of the genomic sequences. Only primer set 2 consistently amplified products of the expected size (152 nt) in RT-PCR of the small RNA fraction ( ⁇ 200 nt) isolated from various cells. Nested PCR of the 152 nt sequence using primer set 1 also generated products of the expected size (110 nt). The purified PCR products were confirmed by direct sequencing. The sequences of the 152 and 110 nt PCR products are shown below
  • a short 52 nucleotide subsequence around the rs2670660 SNP was selected for further analysis.
  • the sequence of the 52 nucleotide rs2670660 subsequence used in the biological experiments is SEQ ID NO:1 (see Table 1, infra).
  • this minimal SNP-containing sequence was biologically active. Without being bound by any particular theory, it is suggested that the minimal 52 nucleotide sequence represents a biologically active splice variant of the longer endogenous RNA sequence and that this small SNP-containing variant is the active species catalyzing the changes in gene transcription that underlie the observed effects of the SNP on disease association.
  • RNAs transcribed from the A-allele of rs2670660, the G-allele of rs2670660, and their antisense counterparts: "A- allele RNA”, “G-allele RNA”, “asA-allele RNA”, and "asG-allele RNA”. These 4 RNAs are also referred to collectively as “the '660 RNAs” or the “rs2670660-encoded small RNAs.” These RNAs may also be referred to herein as NAPLl -locus RNAs or NALPl-lous transRNAs.
  • '660 RNAs may physically interact with certain miRNAs.
  • the set of miRNAs analyzed was one of those whose expression was found to be modulated by ectopic expression of the '660 RNAs (see below).
  • 36 miRNAs had at least one potential target site within the 152 nt '660 RNA sequence (Fig. 3G).
  • Many miRNA target sites showed allele-associated changes in the minimal free energy (mfe) of hybridization (between the '660 RNA alleleic variant and the miRNA).
  • the miRNAs also share multiple sequence identity segments of at least 11 nucleotides in length with the MEG3 and MALATl long non-coding RNAs (Fig. 3G).
  • a panel of GFP-tagged lentiviral vectors containing allele-specific variants of the rs2670660 sequence under the constitutive expression of the CMV promoter was constructed.
  • the 52 nt allele-specific variants of the rs2670660 sequence were chemically synthesized in sense and anti-sense orientations and cloned into the lentiviral vectors. The sequences were confirmed by restriction mapping and direct sequencing. Preliminary experiments established that hTERT -immortalized BJl cells consistently produced the highest transfection efficiency (> 90% of GFP-expressing cells by flow cytometry (FACS) analysis). These cells were used for subsequent experiments.
  • FACS Fluorescence assisted cell sorting
  • THP-I cells undergo differentiation from monocytes to macrophages in response to TPA. Differentiated cells are easily recognized due to their morphological appearance.
  • THP-I cells expressing the rs2670660-encoded RNAs were identified and sorted by flow cytometry so that cells used for analysis were more than 90% GFP-positive.
  • Cells containing either vector alone (control), A-allele, or G-allele RNAs were exposed to TPA for 4 days.
  • Figure 6A shows light microscopy (left 3 panels) and fluorescence (right 3 panels) images of cells transfected with vector alone (top 2 panels), A-allele RNA (middle panels), or G-allele RNA (bottom panels). Both the vector-transfected and A-allele expressing cells show a high proportion of cells exhibiting the morphology of the differentiated phenotype.
  • G- allele expressing cells failed to differentiate in response to TPA. Instead, the G-allele expressing cells underwent apoptosis during TPA-induced differentiation and as a consequence generated 5 -fold fewer macrophages compared to cells expressing the A-allele (Fig. 6B). In contrast, A-allele expressing cells produced nearly 2-fold more macrophages than control cells expressing only GFP. These cells also exhibited more potent phagocytic activity compared to controls or G-allele expressing cells (Fig. 6B, inset). These phenotypic changes were not the result of generally diminished cellular function in the G-allele expressing cells because cells expressing the G-allele showed a sustained long-term viability and increased motility (Fig. 6E).
  • G-allele expressing cells had pleiotropic deficiencies within the inflammasome/innate immunity pathways.
  • G-allele-associated molecular defects included a concomitant decrease in expression of the NLRPl, CASPl, and ILl -beta genes. These genes are key linear components of an essential functional axis within
  • NALPl -locus transRNAs containing a disease-associated G-allele may cause a significant functional deficiency of the immune system. Markedly enhanced apoptosis during differentiation would reduce the production of specialized immune cells, including effector cells and cells with critical immuno-regulatory functions. Significantly diminished expression of NLRPl, CASPl, and ILl -beta genes would likely severely limit the functional potency of the
  • Microarray analysis revealed allele-specific changes in the global gene expression profiles of cells expressing the A- and G-allele RNAs of rs2670660 compared to cells expressing the vector alone. Analysis of individual genes showed that expression of the asA- or asG- allele RNA specifically antagonized the expression pattern observed with the corresponding sense allele (Fig. 7A-D).
  • Microarray analyses revealed genome-wide allele specific concordant and discordant expression profiles in BJl cells expressing the rs2670660 RNAs (Fig. 7E-L). Linear regression analysis of the gene expression data was used to graphically illustrate concordant (E-H) and discordant (I-L) expression patterns.
  • Table 5 a set of 3299 genes whose expression was differentially regulated in cells expressing the G-allele RNA of rs2670660 compared to vector controls was defined by t-statistics. The expression of these 3299 genes was then evaluated in cells expressing the G-allele RNA and in cells expressing the A-allele RNA of rs2670660. Regression analysis shows highly concordant expression of this set of genes in cells expressing the G- and A- allele RNA of rs2670660. 87% of the 3299 genes were concordantly expressed (1562 up- and 1732 down- regulated). See also Fig. 7E.
  • Figures 17 and 18 show the complete set of genes identified in the
  • RNAs included the NLRPl, NLRP3, HMGAl, and Myb genes, which are regulators of inflammation and innate immunity (Figure 8A, top panels). These changes in gene expression are further illustrated by the ratios of the functionally-related transcripts,
  • NLRP3/NLRP1 Figure 8 A, bottom left panel
  • HMGAl /Myb Figure 8 A, bottom right panel
  • the following tables show the total numbers of genes whose expression changed (either up or down) under various experimental conditions modeling activation of the innate immunity/inflammasome pathways in cells expressing the G-allele RNA of rs2670660 and in control cells expressing only GFP. As shown in the tables, a statistically significant subset of genes regulated by the G-allele RNA of rs2670660 is also differentially regulated when the innate immunity/inflammasome pathways are activated.
  • Table 7 rs2670660-associated gene expression signatures in transdifferentiating human monocytes
  • Table 8 rs2670660-associated gene expression signatures in LPS-challenged human leukocytes
  • rs2670660-encoded RNAs The genome-wide effects of rs2670660-encoded RNAs on gene expression described above indicate that the specific targets of these RNAs are either transcription factors or miRNAs, both of which control the expression of multiple genes. As discussed above, the predicted secondary structures for many of the identified intergenic small non- coding RNAs also indicated some interaction with miRNAs. Indeed, as demonstrated by the following experiments, the rs2670660 RNAs affect the expression of hundreds of miRNAs and miRNA-targeted proteins.
  • miR-20b is one of the up-regulated miRNAs shown in Fig. 1OA and mRNAs comprising the 59-gene signature are a sub-set of mRNAs comprising the 140-gene signature shown in Fig. 1OC.
  • RNAs differentially regulated in BJl cells expressing distinct allelic variants of the rs2670660-encoded RNAs (Fig. 1OH, I). These represent distinct classes of non-coding RNAs including snoRNAs and snoRNA-host genes (SNORDl 13; SNHGl; SNHG3; SNHG8); long non-coding RNAs (MEG3, tncRNA, and MALATl); microRNAs, microRNA-precursors, and protein-coding microRNA-host genes (ATAD2; KIAAl 199).
  • 18 of 36 (50%) of these miRNAs are derived from the single miRNA cluster on ⁇ 200 kb continuous region of 14q32 band of chromosome 14, which suggests that the 14q32 cluster miRNAs may be a primary molecular target of the rs2670660-encoded RNAs.
  • let-7 miRNA release from complexes with Argonaute proteins and subsequent degradation can both be blocked by addition of miRNA target RNA which results in increased levels of let-7 miRNA (Chatterjee et al., Nature 461 :546-9, 2009).
  • Computer modeling experiments demonstrated that let-7b miRNA follows the pattern of allele-associated mfe changes characteristic of miRNAs expression levels of which are lower in G-allele expressing cells (Fig. 10J(d)). If the let-7 bioactivity model is valid for the snpRNA-mediated effects on miRNAs, then let-7b expression and activity should be higher in A-allele expressing cells.
  • rs2670660-associated GES are enriched for genes with an established role in controlling the transition from pluripotency to a differentiated state during development such.
  • rs2670660-associated GES are enriched for genes of loci containing bivalent chromatin domains and PluriNet network genes ( Figure 1 IA, Table 12).
  • Microarray analysis revealed that expression of rs2670660-encoded RNAs trigger concomitant allele-specific activation of the Polycomb pathway genes (PcG) comprising the Polycomb repressive complex 2 (PRC2).
  • PcG Polycomb pathway genes
  • PRC2 Polycomb repressive complex 2
  • the PRC2 complex catalyzes histone H3 lysine 27 trimethylation (H3K27me3), induces a chromatin silencing state, and mediates transcriptional repression (Figure 1 IB).
  • the table below shows the genes whose expression was regulated by all 4 alleles at a statistical significance of p ⁇ 0.05. The log-transformed expression values are shown. Positive numbers indicate increased expression, negative numbers indicate decreased expression. Also shown is the primer probe set used in the microarray analysis for each gene.
  • Table 12 Patient samples analyzed by microarray gene expression profiling. Abbreviations: PBMC, peripheral blood mononuclear cells. List of GEO accession numbers and original references for microarray analyses and associated clinical information can be found in references listed in Materials and Methods.
  • Table 14 rs2670660-associated rheumatoid arthritis (RA) gene expression signatures
  • Table 20 Expression signatures of hESC bivalent domain genes (BDG) in rs2670660 G-allele- associated gene expression models of human diseases
  • rs2670660-associated allele- specif ⁇ c GES the sets of genes whose expression was altered in cells expressing the small RNAs of rs2670660 are referred to as rs2670660-associated allele- specif ⁇ c GES.
  • rs2670660-associated allele-specif ⁇ c GES there are four rs2670660-associated allele-specif ⁇ c GES, namely, the signatures of the A-allele, the G-allele, the antisense-A, or antisense-G allele.
  • PBMC peripheral blood mononuclear cells
  • rs2670660-associated allele-specific GES were detected with a level of statistical significance that markedly exceeded the probability of random co-occurrence by chance alone in clinical samples from patients diagnosed with Crohn's disease, rheumatoid arthritis, Huntington's disease, and Alzheimer's disease (Fig. 12).
  • GES associated with the expression of the G-allele-specific 52 nt small RNAs in BJl cells was identified in clinical samples using t-statistics and screened for concordant and discordant features in
  • rs2670660-associated allele-specific GES in these clinical samples indicates that the GES are detectable in about 80-100% of samples from patients diagnosed with one of several common diseases manifested by activation of the innate immunity/inflammasome pathways. These data indicate that assays for rs2670660-associated GES may be useful diagnostic and prognostic tools for diseases and disorders characterized by activation of these pathways.
  • RNAs to discriminate normal and pathological tissue samples was further validated in a set of patients with Alzheimer's disease, prostate cancer, and breast cancer (Fig. 13).
  • the set of genes whose expression was differentially regulated by ectopic expression of the rs2670660 G-allele RNA was identified in BJl cells using t-statistics. This set of genes was then screened for concordant and discordant expression in clinical samples and matched controls (see Table 13, supra). Expression profiles of G-allele concordant and G-allele discordant signatures in individual samples of each data set were evaluated by calculating Pearson correlation coefficients (signature scores) using the log 10-trans formed fold expression changes of G-allele-specific GES in BJl cells as a multidimensional standard vector.
  • Figure 13A shows the expression profiles of G-allele concordant (left panel) and discordant (right panel) genes in hippocamal tissue from Alzheimer's patients and normal subjects.
  • Each bar represents the G-allele-specific GES for a particular subject calculated as described above.
  • the group of 9 bars on the far left shows the GES from tissue in each of 9 control subjects.
  • the next three groups of bars in each panel represent the GES of tissue from Alzheimer's patients segregated based on the clinically-defined severity of the disease, left to right: incipient (7 subjects), moderate (8 subjects), and severe (7 subjects), for a total of 22 subjects.
  • the data show distinct expression profiles in the tissues from
  • Figure 13B shows the expression profiles of G-allele concordant (left panel) and discordant (right panel) genes in normal and prostate cancer tissues.
  • Each bar represents the G-allele-specific GES for a particular subject calculated as described above.
  • the group of 18 bars on the far left shows the GES from normal prostate tissue in each of 18 control subjects.
  • the next three groups of bars in each panel represent the GES of prostate cancer tissues segregated based on histological examination (left to right):
  • FIG. 13C shows the expression profiles of G-allele concordant (left panel) and discordant (right panel) genes in normal and breast cancer tissues. Each bar represents the G-allele-specific GES for a particular subject calculated as described above. In each panel, the group of 8 bars on the far left shows the GES from normal breast tissue.
  • the next five groups of bars in each panel represent the GES of breast cancer tissues segregated based on histological examination as follows (left to right): morphologically normal breast tissues adjacent to tumor (8 samples); primary breast tumors from patients without metastatic disease; primary breast tumors from patients with metastatic disease (99 total for primary tumors); lymph nodes from patients with metastatic disease (26); metastatic breast tumors in distant organs (12).
  • the data show distinct expression profiles, particularly for the metastatic tumors, compared to controls and morphologically normal tissues adjacent to tumor tissue.
  • RNAs transcribed from disease-linked SNPs (such the rs2670660-encoded RNAs) in epigenetic reprogramming during development, clonal specialization, and differentiation, as well as during disease progression.
  • Table 21 Small non-coding RNAs and associated long non-coding RNAs containing SNP sequences expressed in human cells. Molecular identities of listed non-coding small RNAs were validated by sequencing of the purified PCR products.
  • Table 22 Classification of SNPs associated with common human disorders.
  • Haiman CA et al. A common genetic risk factor for colorectal and prostate cancer. Nat Genet 2007 39: 954-6.
  • chromosome 15ql3.3 influence colorectal cancer risk. Nat Genet. 2008 40: 26-8.
  • Sense and anti-sense variants of the 52 nt rs2670660 sequence were chemically synthesized, cloned into GFP-expressing lentiviral vectors, and transfected into BJl cells.
  • Corresponding BJl cell line variants were isolated by sterile FACS sorting to contain >90% of GFP-expressing cells, expanded in vitro in monolayer cultures, and analyzed for gene expression.
  • miRNA was extracted from adherent cells lysed on culture plates using the miNana miRNA Isolation kit (Ambion). Homogenized cell lysates were frozen at -80 0 C for at least 24 hours prior to miRNA purification. miRNA concentration was checked using a NanoDrop (Thermo Scientific) before checking quality on a Bioanalyzer (Agilent
  • Luciferase Reporter Vector specific for the microRNA of interest.
  • the target site sequence of the reporter vector is complementary to the miRNA, therefore a decrease in luciferase signal would indicate an increase in microRNA activity.
  • Cells were transfected with the reporter vector using FuGENE 6 Transfection Reagent (Roche); the transfection was allowed to run 48 hours before the cells were lysed using Luciferase Cell Culture Lysis Reagent (Promega). The lysates were read using the FLUOstar OPTIMA system (BMG Lab Technologies), with 20 micro liters of Luciferase Assay Reagent (Promega) injected into each well immediately prior to reading. miRNA expression analysis
  • HEPES buffered saline HBSS
  • Antibodies at appropriate dilutions CD14- Pacific Blue, Biolegend, Inc; and CDl lb-Alexa Fluor® 647, Biolegend, Inc
  • Staining duration was for 30 min with rotation at 4°C.
  • Cells were then washed with staining medium three times and resuspended in staining medium. The stained specimens were then analyzed using FACSVantage (BD Biosciences, San Diego, CA;
  • Lentiviruses were generated by co-transfecting pLentiviral vector with GFP only plasmids (control cultures) or GFP plasmids with synthetic, allele-specific 52 nt sequences of the SNP rs2670660 and packaging mix (Invitrogen) into 293FT cells using Lipofectamine 2000 according to the manufacturer's instructions (Invitrogen), and then BJl, U937., or THP-I cells were infected with viral supernatant for 24hr. Flow cytometry analysis for GFP expression were performed to confirm the infection and assess the transfection efficiency. Experiments were carried out using cultures with transfection efficiency > 90%.
  • Sense and anti-sense variants of the 52 nt snpRNA were synthesized, cloned into GFP-lentiviral vectors, and transfected into BJl cells. GFP-expressing cells were isolated by flow cytometry and enriched populations (> 90% GFP positive) were used for assays. Cells from sub-confluent cultures (about 70% confluence) were seeded in triplicates into Ewell plates (100 cells per well), cultured for 2 weeks, and then stained with 0.1% crystal violet for 5 min. Plates were scanned and number of colonies containing > 50 cells was counted. Protocols for identification of endogenous trans-regulatory small RNAs encoded by the SNP rs2670660
  • PCR reagents to a 25 ul final volume: Water, RNase-free; PCR Buffer (10X) 2.5 ul; PCR Nucleotide Mix (1OmM) 0.5 ul; Taq DNA polymerase (50X) 0.5 ul; template; Forward primer (10 uM) 1 ul (0.4 uM final cone); Reverse primer (10 uM) 1 pl(0.4 uM final cone).
  • Thermal cycle profile 95 0 C 3 min followed by 40 or more cycles: 95 0 C 30s, 55 0 C 30s, 72 0 C 1 min (or 1-2 min per kilobase); followed by final extension 72 0 C 3 min and hold at 4 0 C.
  • IL-converting enzyme/caspase-1 inhibitor VX-765 blocks the hypersensitive response to an inflammatory stimulus in monocytes fromfamilial cold autoinflammatory syndrome patients. J Immunol 2005;175:2630-4.
  • Glinsky, GV et al. Microarray analysis identifies a death- from-cancer signature predicting therapy failure in patients with multiple types of cancer. J Clin Invest; 2005; 115: 1503 - 1521. Glinsky GV et al., Classification of human breast cancer using gene expression profiling as a component of the survival predictor algorithm. Clin Cancer Res. 2004 10: 2272-2283.
  • Affymetrix Microarray Suite version 5.0 software in these experiments.
  • the concordance analysis of differential gene expression across the data sets was performed using Affymetrix MicroDB version 3.0 and DMT version 3.0 software as described in the references above.
  • the microarray data was processed using the Affymetrix Microarray Suite version 5.0 software and statistical analysis of the expression data set was performed using the
  • Affymetrix MicroDB and Affymetrix DMT software The Pearson correlation coefficient for individual test samples and the appropriate reference standard were determined using GraphPad Prism version 4.00 software (GraphPad Software). The significance of the overlap between the lists of differentially-regulated genes was calculated by using the hypergeometric distribution test (See Seila, A.C. et al. Divergent transcription from active promoters, Science (2008) 322:1849-51).
  • Expression profiling data included 697 clinical samples obtained from 185 control subjects and 350 patients diagnosed with 9 common human disorders including Crohn's disease (59 patients), ulcerative colitis (26 patients), rheumatoid arthritis (20 patients), Huntington's disease (17 patients), autism (15 patients), Alzheimer's disease (36 patients), obesity (14 subjects), prostate cancer (64 patients), and breast cancers (99 patients).
  • Microarray data and associated clinical information are publically available in the Gene Expression Omnibus (GEO) database maintained by the National Center for Biotechnology Information using the following GEO accession numbers: GDS2601; GDS810; GDS2824; GDS1615; GDS711; GDS1480; GDS2545; GDS1331; GDS1407; GDS3203; GDS2255.
  • Genomic information related to the PluriNet network genes is publically available from the Stem Cell Mesa microarray data server and also from Stem Cell Matrix.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Disclosed are methods and compositions related to small, non-coding RNA molecules having gene regulatory activity, compositions comprising same, and methods for their use. Provided are isolated small non-coding RNA molecules transcribed from an intergenic region of the human genome, wherein the intergenic region contains at least one small nucleotide polymorphism (SNP) associated with one or more human diseases or disorders. Also disclosed are methods for the detection of these small non-coding RNA molecules in a biological sample and related therapeutic, diagnostic, and prognostic methods.

Description

SMALL NON-CODING REGULATORY RNAs AND METHODS FOR THEIR USE
CROSS REFERENCE TO RELATED APPLICATIONS
[01] This application claims priority to U.S. Provisional Application Nos.
61/226,448, filed July 17, 2009; 61/264,057, filed November 24, 2009; 61/307,666, filed February 24, 2010; and 61/263,556, filed November 23, 2009, each of which is incorporated herein by reference in its entirety.
INCORPORATION BY REFERENCE OF SEQUENCE LISTING
[02] The contents of the text file named "26141 511001WO_SeqList_ST25.txt" which was created on July 16, 2010 and is 92 KB in size, are hereby incorporated by reference in their entirety.
FIELD OF THE INVENTION
[03] The invention relates to small, non-coding RNA molecules having gene regulatory activity, compositions comprising same, and methods for their use.
BACKGROUND OF THE INVENTION
[04] Recent genome-wide analyses of transcription in humans has revealed the surprisingly pervasive transcription of non-coding regions of DNA, both within introns and in intergenic sequences distant from known protein-coding genes. See for review, Malecova and Morris, Curr. Opin. MoI. Ther. 12(2):214-22 (2010). Evidence has emerged of widespread divergent transcription at protein-encoding gene promoters. See Seila, A. C. et al., Science (2008) 322: 1849-51. Transcription start site-associated RNAs were found to nonrandomly flank active promoters, with peaks of antisense and sense short RNAs at 250 nucleotides upstream and 50 nucleotides downstream, respectively. These transcription start site RNAs form part of a diverse family of small non-coding RNAs generated from posttranscriptional processing of messenger RNAs. See Fejes-Toth, K. et al., Nature (2009) 457: 1028-32. Several kinds of non-coding RNA molecules have been identified that act to regulate gene expression by transcriptional or translational silencing. These are small interfering RNA molecules ("siRNAs"), short hairpin RNA molecules ("shRNAs"), long interfering antisense non-coding RNAs (referred to herein as "IiRNAs"), and microRNAs ("miRNAs").
[05] siRNAs involved in gene silencing have been described in various organisms including S. pombe, T. thermophila, A. thaliana, D. melanogaster and C. elegans.
Transcriptional suppression of human genes by exogenously added siRNAs targeted to specific promoters has been well documented. But the mechanism of siRNA action is not well understood. It is believed to involve chromosomal remodeling in the vicinity and downstream of the initial siRNA target site. One type of "remodeling" takes the form of enriching the chromatin at the siRNA-targeted promoter with silent chromatin "marks." Two of these marks are posttranslational modifications of histone proteins. Specifically, the dimethylation of histone 3 at lysine 9 ("H3K9me2") and the trimethylation of histone 3 at lysine 27 ("H3K27me3"). The human proteins involved in chromatin remodeling include methyltransferases such as the de novo DNA methyltransferase Dnmt3A, histone deacetylase 1 ("HDACl"), and the histone lysine methyltransferase KMT6, also known as EZH2.
[06] There is one published case of an exogenously added non-coding RNA molecule mediating long-term transcriptional silencing. This was an shRNA targeted to the promoter of the UBC gene in human cells. UBC gene expression was suppressed for one month even though the shRNA was expressed for only 7 days. The data suggested that the silencing was initially established by histone methylation and followed by DNA methylation. The methylation of CpG islands in the promoter regions of genes is known to play a significant role in the stable, long-term epigenetic silencing of genes throughout
development.
[07] IiRNAs have been identified in mammalian cells acting to silence particular chromosomal regions, such as the HOX family of genes in eukaryotes and the X chromosome in mice and humans. 231 IiRNAs were identified as transcribed from the intergenic regions of the HOX loci. The majority of these were antisense compared to the HOX genes. At least one IiRNA was identified (HOTAIR) that negatively regulates a gene (HOXD) distant from its site of transcription. The mechanism apparently involves recruiting proteins of the Polycomb complex to the promoter region and thereby increasing the amount of repressive H3K27me3. The Polycomb (PcG) proteins are transcriptional repressors which act as genome-wide regulators of expression during development. The PcG proteins alter the epigenetic state of chromatin, for example, by increasing histone methylation or ubiquination. It is not clear how the PcG complex is targeted to a specific promoter region, but recruitment of the complex and the subsequent formation of heterochromatin is believed to underlie PcG- mediated gene silencing.
[08] With respect to the X chromosome, an IiRNA was identified in humans and mice that mediates silencing. Although the mechanism of action is not known in human cells, in the mouse it appears to involve recruitment of a PcG complex to the promoter region through direct interaction between the IiRNA and a subunit of the complex.
[09] IiRNAs are also involved in genomic imprinting of autosomal genes.
Imprinting is a mono-allelic mechanism of gene silencing based on the parent-of-origin. In at least two cases (Air and Kcnqlotl) the IiRNAs silence large domains of the genome through their interaction with chromatin, specifically be recruiting methyltransferases and PcG complexes to the loci of the silenced genes.
[10] The limited data that exists suggests that non-coding RNA molecules function in combination with PcG proteins and perhaps other, unidentified proteins, to silence the expression of particular genes in cancer cells, such as tumor suppressor genes, analogous to their putative role during development. However, the complex role of these molecules in transcriptional silencing during normal development and in diseases such as cancer remains to be established.
[11] miRNAs are a class of small (20-30 nucleotides in length) non-coding regulatory RNAs that perfectly match the 3' untranslated regions (3 TJTR) of target messenger
RNAs. Binding of the miRNA to its target sequence results in degradation of the messenger
RNA or inhibition of its translation. See for review, He, L. and Hannon, G.J. Nat. Rev.
Genet. (2004) 5:522-531.
[12] Large-scale genome-wide associations studies (GWAS) of small nucleotide polymorphisms (SNPs) have identified genetic variants associated with disease phenotypes at high levels of statistical confidence. The dominant approach to understanding how these genetic variations contribute to disease has been to examine the effects of the SNP allelic variants on nearby protein-coding genes. This protein-centric strategy was recently extended to the SNPs residing within the boundaries of genomic regions encoding microRNAs
(miRNAs) and also within miRNA target sites in messenger RNAs.
[13] The present inventors demonstrated that many disease-linked SNPs are located far from protein-coding genes but in transcriptionally active regions of the genome. The invention is based upon the discovery of a novel class of non-coding RNAs transcribed from these intergenic regions containing disease-linked SNPs. SUMMARY OF THE INVENTION
[14] The present invention is based upon the discovery that genomic regions containing disease-associated small nucleotide polymorphisms (SNPs) are actively transcribed to produce small non-coding SNP -bearing RNA molecules having biological activity. These RNA molecules are referred to herein as "snpRNAs". The small non-coding SNP -bearing RNA molecules of the invention have biological activity. In particular, specific RNA molecules of the invention are demonstrated to modulate the expression of other non- coding RNA molecules as well as protein-coding genes. In one embodiment, the small non- coding SNP -bearing RNA molecules of the invention modulate the activity of the innate immunity/inflammasome pathway by modulating the expression of particular genes in that pathway. In a specific embodiment, an snpRNA molecule of the invention modulates the expression of a gene selected from NLRP3, NLRPl, HMGAl, and MYB. In another embodiment, an snpRNA molecule of the invention facilitates hormone-independent growth of a hormone-dependent cell or cell line. In a specific embodiment, the hormone-dependent cell is a prostate cell. In one embodiment, the cell is a prostate cancer cell.
[15] In certain embodiments, the snpRNAs regulate the expression of genes distant from their site of transcription, and thus may also be referred to as "transRNAs." The invention provides the sequences of specific cDNA molecules corresponding to the snpRNAs described herein, methods and reagents for their detection in a biological sample from a subject, and methods for their use in diagnostic and prognostic assays.
[16] An snpRNA molecule of the invention contains a disease-associated SNP which is located within a loop structure of the RNA molecule. Preferably, this loop structure containing the SNP also contains a binding site for a microRNA ("miRNA") molecule.
Preferably, the SNP is located within a binding site for one or more of the following proteins: H3K27Me3, CBP/CREB, Ezh2, and POL2. In certain embodiments where the SNP is located within the binding site for more than one protein, the binding sites overlap. In another embodiment, the SNP is within the binding site for a nuclear lamina protein. In a specific embodiment, the SNP is located within 200 basepairs of a binding site for a lamin Bl protein.
[17] In one embodiment, the invention provides isolated, purified cDNA molecules corresponding to the snpRNA molecules described herein. The cDNA molecules are useful to express the snpRNA molecules of the invention in heterologous cells and to detect the presence of the snpRNA molecules in a biological sample from a subject. In certain embodiments, the cDNA molecules are useful as probes to detect the snpRNA molecules in the sample, e.g., in hybridization based assays. In other embodiments the cDNA molecules are used as positive controls for the detection of the snpRNA molecules in a biological sample from a subject.
[18] The invention provides an isolated small non-coding RNA molecule transcribed from an intergenic region of the human genome, wherein the RNA molecule is less than 500, less than 400, less than 300, less than 200, less than 150, less than 100, or less than 75 nucleotides and the intergenic region contains at least one small nucleotide polymorphism (SNP) associated with one or more human diseases or disorders. In a particular embodiment, the intergenic region contains only one SNP. In one embodiment, the snpRNA molecule is contiguous.
[19] In one embodiment, the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1-101, 332, and 333. In another embodiment, the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1, 4, 6, 7, 9-18, 39, 88-90, 332, and 333. In another embodiment, the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1, 7, 332, and 333.
[20] In one embodiment, the SNP is selected from the group consisting of rs2670660, rs6596075, rs6983561, rsl6901979, rsl3281615, rsl0505477, rsl0808556, rs6983267, rs7014346, rs7000448, rsl447295, rs2820037, rs889312, rsl937506, rsl3387042, rs7716600, rsl 1249433, and rs3803662.
[21] In one embodiment, the SNP is selected from the group consisting of, rs9469220, rs9270986, rs6457617, rs615672, rs7837688, rs6997709, rsl 6892766, rs2670660, and rs2542151.
[22] The invention also provides a vector comprising a polynucleotide encoding an
RNA molecule of the invention. In one embodiment, the vector comprises the cDNA form of an RNA molecule described herein. The invention further provides a cell comprising said vector. In one embodiment, the cell is ex vivo or in vitro.
[23] The invention also provides a kit comprising, in one or more containers, a vector comprising a polynucleotide encoding an RNA molecule of the invention. In one embodiment, the vector comprises the cDNA form of an RNA molecule described herein and instructions for expressing the RNA molecule from the vector. In one embodiment, the kit further comprises one or more polynucleotide primers for amplifying an RNA or a cDNA molecule of the invention. In one embodiment, the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102-331. In one embodiment, the one or more primers comprises a sequence selected from the group consisting of SEQ ID
NOs: 102-161. In one embodiment, the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102, 103, 114, 115, 326, and 327.
[24] The invention also provides a kit comprising, in one or more containers, a cell comprising said vector and instructions for expressing the RNA molecule in the cell.
[25] The invention also provides a method for detecting the small non-coding RNA molecules described herein in a sample from a subject, the method comprising detecting the
RNA molecules in the sample. In one embodiment, step of detecting the RNA molecules comprises the step of detecting the cDNA form of the RNA molecule in the sample. In one embodiment, the cDNA form is detected by a method comprising reverse transcription and polymerase chain reaction (RT-PCR) technology. In another embodiment, the cDNA form is detected by a method comprising nucleic acid hybridization technology.
[26] In one embodiment, the method further comprises the steps of isolating the small RNA fraction from the sample and converting the RNA into cDNA prior to the step of detecting the cDNA in the sample.
[27] The invention also provides a method for evaluating the risk that a human subject will develop a disease or condition associated with a specific allele of an SNP ("the pathological allele") by detecting the presence of an RNA molecule of claim 1 in a sample from the subject, wherein the RNA molecule is transcribed from the pathological allele, and wherein detection of said RNA molecule indicates that the subject has an increased risk for developing the disease or condition and the failure to detect said RNA molecule indicates that the subject has a decreased risk for developing the disease or condition.
[28] In one embodiment, the method further comprises detecting the expression level of the RNA molecule transcribed from the pathological allele relative to its expression in a population of healthy subjects, wherein an increased or decreased level of expression relative to the population of healthy subjects indicates that the subject has an increased risk for developing the disease or condition.
[29] In one embodiment, the step of detecting the presence of an RNA molecule transcribed from the pathological allele is performed indirectly, by detecting the expression of one or more genes whose expression is regulated by the RNA molecule.
[30] The invention also provides a method for diagnosing a disease or condition associated with a specific allele of an SNP ("the pathological allele") in a human subject, the method comprising detecting the presence of an RNA molecule of claim 1 in a sample from the subject, wherein the RNA molecule is transcribed from the pathological allele, and wherein the disease or condition is positively diagnosed if the RNA molecule is detected in the sample.
[31] The invention also provides a method for treating, preventing, or ameliorating a disease or condition associated with a specific allele of an SNP ("the pathological allele") in a subject in need thereof, the method comprising administering one or more therapeutic agents that act to suppress the expression or antagonize the activity of an RNA molecule of claim 1, wherein the RNA molecule is transcribed from the pathological allele.
[32] In one embodiment of the claimed methods, the presence of the G-allele snpRNA of rs2670660 is detected in a sample from the subject, wherein the presence of the
G-allele snpRNA indicates that the subject is at an increased risk for developing a disease or disorder selected from vitiligo, Crohn's disease, rheumatoid arthritis, Huntington's disease,
Alzheimer's disease, breast cancer, metastatic breast cancer, prostate cancer, metastatic prostate cancer, autism, and obesity.
[33] In one embodiment of the claimed methods, the presence of the A-allele snpRNA of rs 16901979 is detected in a sample from the subject, wherein the presence of the
A-allele snpRNA indicates that the subject is at an increased risk for developing a cancer of epithelia origin. In one embodiment, the cancer is selected from breast cancer, metastatic breast cancer, prostate cancer, and metastatic prostate cancer.
[34] Preferably, with respect to any of the methods described above, the subject is human.
[35] In certain embodiments of the methods described above, the sample is a blood, tissue, or cell sample.
[36] In one embodiment, the disease or condition is selected from the group consisting of vitiligo, Crohn's disease, rheumatoid arthritis, Huntington's disease,
Alzheimer's disease, breast cancer, metastatic breast cancer, prostate cancer, metastatic prostate cancer, autism, and obesity.
[37] In one embodiment, the disease or condition is selected from the group consisting of autism, alzheimer's disease, schizophrenia and bipolar disorder.
[38] In one embodiment, the disease or condition is an autoimmune disease or disorder. In one embodiment, the disease or condition is selected from the group consisting of vitiligo, ankylosing spondylitis, rheumatoid arthritis, multiple sclerosis, systemic lupus erythematosus and autoimmune thyroid disease. [39] In one embodiment, the disease or condition is selected from the group consisting of ulcerative colitis and Crohn's disease.
[40] In one embodiment, the disease or condition is selected from the group consisting of breast cancer, colorectal cancer, lung cancer, ovarian cancer, and prostate cancer.
[41] In one embodiment, the disease or condition is selected from the group consisting of coronary artery disease, hypertension, type 1 diabetes, type 2 diabetes, and obesity.
[42] The invention also provides an apparatus for evaluating a disease or condition, or evaluating the risk of developing a disease or condition, in a subject, the apparatus comprising a model configured to evaluate a dataset for the subject to thereby evaluate the risk of disease in the subject, wherein the model is based upon determining the similarity in the expression profile of a defined set of genes in a sample from the subject and the expression profile for that set of genes in one or more reference sets of the model, wherein a reference set comprises one or more of a population of healthy subjects and a population of subjects suffering from the disease, wherein the set of genes is a set of genes whose expression is regulated by a small RNA molecule of claim 1.
BRIEF DESCRIPTION OF THE FIGURES
[43] Figure 1 : Identification of 12 small RNAs encoded by intergenic disease- associated SNPs using reverse-transcription PCR-based screening. Small RNA fractions were isolated from various human cell lines and subjected to the RT-PCR based screen. PCR products of expected size were purified, subjected to the nested PCR analysis and gel electrophoresis. Molecular identities of identified RNA molecules were validated by sequencing of primary PCR and nested PCR products. The 12 RNAs identified by this method are designated A3, A6, A9, A16, A21-26, A28, and A29. The sequences are given in Table 1. The primers used to amplify the sequences are given in Table 3. Figure 15 shows the identification of other RNAs from the "A" set in different cell lines.
[44] Figure 2: (A) Genomic coordinates of the endogenous small RNAs described in Figure 1 and corresponding disease-associated SNPs. Abbreviations used: Crohn's disease (CD), rheumatoid arthritis (RA), type 1 diabetes (TlD), autoimmune disorders (AID), hypertension (HT), prostate cancer (PC), breast cancer (BC), ovarian cancer (OC), colorectal cancer (CRC). [45] (B): Examples of predicted secondary structures of RNAs. Arrows indicate the positions of nucleotides variations which are associated with increased risk of developing corresponding disorders. Bottom right panel shows alignments of the miRNA target sites in RNA A21, which is transcribed from a region containing the prostate cancer susceptibility SNP rs7837688. Individual human miRNAs (short horizontal bars) are aligned along the A21 RNA sequence according to the positions of respective target sites. Single vertical bar marks the position of the prostate cancer-predisposition SNP. Note that a vast majority of microRNA target sites segregates to the A21 transRNA segment around the SNP and includes SNP nucleotides.
[46] (C) Chromatin state map analysis of genomic sequences encoding
evolutionary conserved snpRNAs reveals a consensus chromatin domain signature comprising histone H3K27Me3, CBP/CREB, EZH2, and POL2 proteins. Chromatin state maps of corresponding human and mouse genome sequences are visualized using the custom tracks of the UCSC Genome Browser. Color-coded horizontal lines depict alignments of DNA sequences derived from Chip-Seq experiments using antibodies against corresponding proteins. Each color-coded horizontal line represents data from independent biological replicates. Note nearly ubiquitous alignments of the evolutionary-conserved RNA-encoding sequences within binding sites of the histone H3K27Me3, CBP/CREB, EZH2, and POL2 proteins. Positions of disease-linked SNP nucleotides within RNA-encoding sequences are indicated by arrows and vertical lines. Original experiments describing the corresponding mouse and human genome-wide chromatin state maps were reported elsewhere
[47] Figure 3 : Identification of rs2670660-encoded endogenous transRNAs.
[48] (A) Sequence mapping of nucleotide primer sets utilized for identification of rs2670660-encoded endogenous small RNAs and corresponding PCR products. Sense and anti-sense variants of a 52 nucleotide ("nt") rs2670660 sequence (shown in a shaded box, SEQ ID NO:1) were chemically synthesized, cloned into GFP-expressing lentiviral vectors, and utilized in biological and mechanistic experiments.
[49] (B) PCR analysis of genomic DNA products generated by individual sets of primers shown in (A).
[50] (C) PCR analysis of cDNA products derived from small RNA fraction <200 nt using primer sequences shown in (A). Only primer set 2 generated a product of the expected size (152 nt). [51] (D-F) Nested PCR using primer sets 1 and 2 in the small RNA fraction from
BJl cells. Products of the expected size for set 2 (152) and set 1 (110 nt) are shown.
Sequences of PCR products were confirmed by direct sequencing. Nested PCR of the 152 nt product with primer set 1 using small RNA fractions (containing RNA of less than 200 nt in length) from various cell lines as template. Product of the expected size (110 nt) is shown. Sequences of PCR products were confirmed by direct sequencing.
[52] (G) Sequence homology profiling of rs2670660-encoded RNAs, miRNAs, and long non-coding RNAs identifies extensive sequence homo logy/complementarity features.
[53] a) Genomic location (top left), secondary structures of 152 nt (bottom left) and
52 nt (top right) RNA molecules, and position of the miRNA-target sites along the 152 nt transRNA sequence (bottom right).
[54] b) Visualization of individual miRNA-target sites within the rs2670660- encoded RNA.
[55] c, d) miRNAs which are differentially regulated in BJl cells expressing distinct allelic variants of the NALPl -locus transRNAs share multiple sequence identity segments of at least 11 nucleotides in length with sequences of MEG3 (c) and MALATl (d) long non-coding RNAs.
[56] Figure 4: Expression of a small RNA transcribed from the G-allele of rs2670660 inhibits cell growth and results in Gl arrest. The following notation is used to designate the 4 small RNAs transcribed from the A-allele, the G-allele, and their antisense counterparts: A, G, asA, and asG. These 4 RNAs are also referred to collectively as "the '660 RNAs." Transfected BJI cells were sorted by GFP expression and an enriched population (>90% GFP positive) was used in monolayer and clonal growth assays.
[57] (A) Monolayer cultures expressing GFP only (BJI/GFP), or 50 nucleotide
RNAs from the G-allele (rs2670660_G) or the A-allele (rs2670660_A) of the SNP rs2670660 were cultured for five days; cells were counted every 24 hours. Top line in graph is A;
middle line is GFP only; bottom line is G.
[58] (B) Clonal growth of cells expressing GFP only (EGFP), the G-allele RNA
(1), the A-allele RNA (2), the anti-sense G allele RNA (3), or the anti-sense A-allele (4). Cells were cultured as described in methods. The average of triplicates is shown.
[59] (C) Flow cytometric analysis (FACs) of cells expressing empty vector (GFP), sense and anti-sense (as) variants of the A- and G-allele RNAs. Representative FACs plots are shown above the bar graphs which represent the number of cells in each phase of the cell cycle (Gl, S, G2M), normalized to the vector control. Average values of three independent biological replicates are shown.
[60] Figure 5: Representative results of clonogenic growth experiments of BJl cells expressing sense and anti-sense allele small RNAs encoded by rs2670660.
[61] (A): cells expressing GFP from vector controls lacking insert (GFP, top row), or one of the following small RNAs encoded by rs2670660 (next 4 rows): A-allele (A), G- allele (G), anti-sense A (asA), or anti-sense G (asG).
[62] (B): top to bottom rows show cells co-expressing the following transcripts: G and vector control (GFP); asG; asA and vector control; A and asA; vector control alone; G and asA.
[63] Figure 6: Constitutive expression of distinct allelic variants of NALPl -locus transRNAs exerts allele-specifϊc effects on phenotypes of human cells.
[64] (A) Expression of the G-allele of the rs2670660-encoded RNA interferes with
TPA-induced monocyte/macrophage differentiation. THP-I cells expressing control vector or allele-specifϊc sense and anti-sense variants of rs2670660-encoded RNAs were treated with TPA for 4 days to induce differentiation into macrophages. Left panels (top to bottom) show light microscopy images of control, A-allele, and G-allele transfected cells. Right panels show fluorescence images of the same. The cells expressing the G-allele variant failed to differentiate and retained a non-differentiated state.
[65] (B) In response to induction of differentiation, THP-I cells expressing the G- allele of the rs2670660-encoded RNA undergo massive apoptosis and produce ~ 5-fold less macrophages which are twice less potent in the sheep erythrocyte phagocytosis assay compared to macrophages derived from THP-I cells expressing A-allele RNAs.
[66] (C) Human cells stably expressing G-allele RNAs manifest diminished expression levels of the genes comprising PRCl -type Polycomb group (PcG) proteins chromatin remodeling complexes (BMIl and RINGlB) compared to components of the PRC2-type PcG proteins chromatin silencing complexes (EZH2, EED, SUZ 12) and differential regulation of the 586 transcripts encoded by PcG pathway-targets, bivalent chromatin domain genes.
[67] (D) Allele-specifϊc effects on monocyte/macrophage differentiation are modulated by BMIl expression. BMIl knock-down markedly diminishes macrophage production by A-allele expressing THP-I cells (top and bottom left panels), whereas BMIl over-expression rescues the macrophage-producing defect of G-allele expressing THP-I cells (bottom right panels). Inserts show the results of RT-PCR analysis validating the efficiency of the gene knock-down (insert, bottom left panels) and gene transfer (inset, bottom left panels) experiments.
[68] (E) G-allele expressing human fibroblast BJl cells manifest significantly higher motility compared to ancestral A-allele expressing BJl cells. Gaps of defined distances were created in confluent cultures of BJl cells and motility sequences were continuously monitored and recorded using time-lapse video cinematography. For each culture, the initial distance, motility sequence time (time to complete closing of the gap), and motility speed were measured. Average values of six replicate measurements are reported.
[69] Figure 7: Gene expression patterns of BJI cells expressing allele-specifϊc
RNAs encoded by the rs2670660 sequence. Gene expression was analyzed using Affymetrix HG-Ul 33 A Pus 2.0 microarrays. Panels A-D each show two (A, C) or three (B, D) rows of paired bars representing the expression of representative genes in cells expressing, from left to right, G, A, asG, asA, or GFP only (unlabeled, 5th set of bars for each gene). Panel A shows the expression data for 4 particular genes, Panel B for 9 genes, Panel C for 4 genes, and Panel D for 9 genes. Panels E-M show the same relationships for large sets of genes using linear regression analysis to demonstrate the concordant and discordant patterns of gene expression under the various allele-specifϊc conditions. In panels E-M, the y-axis is mRNA expression and the x-axis represents individual genes. Thus, each dot on the graph represents the mRNA expression level of a particular gene.
[70] (A, B): examples of allele specific antagonism of gene expression for genes showing increased expression in BJl cells in response to ectopic expression of the G-allele RNA and decreased expression in response to ectopic expression of the A-allele RNA of rs2670660.
[71] (C, D): Examples of allele specific antagonism of gene expression for genes showing decreased expression in BJl cells in response to ectopic expression of the G-allele RNA and increased expression in response to ectopic expression of the A-allele RNA of rs2670660.
[72] (E, F): A set of 3299 genes whose expression was differentially regulated in cells expressing the G-allele RNA of rs2670660 compared to vector controls was defined by t-statistics. The expression of these 3299 genes was then evaluated in cells expressing the G- allele RNA and in cells expressing the A-allele RNA of rs2670660. Regression analysis shows highly concordant expression of this set of genes in cells expressing the G- and A- allele RNA of rs2670660. 87% of the 3299 genes were concordantly expressed (1562 up- and 1732 down-regulated)(Panel E). Concordance was greater 95% for a subset of 1491 genes identified as differentially expressed in cells expressing the G-allele RNA of rs2670660 (at p = 0.05) and then evaluated in in cells expressing the G-allele RNA and in cells expressing the A-allele RNA of rs2670660 (at p = 0.1) (Panel F).
[73] (G, H): A set of 3268 genes whose expression was differentially regulated in cells expressing the G-allele compared to cells expressing the A-allele RNA of rs2670660 was defined by t-statistics. The expression of these 3268 genes was then evaluated in cells expressing the G-allele of rs2670660 compared to vector controls. Regression analysis shows highly concordant expression of this set of genes. 89% of 3268 genes were concordantly expressed (1583 up- and 1685 down-regulated). Concordance was greater than 95% for a subset of 1568 genes identified as differentially expressed in cells expressing the G-allele RNA of rs2670660 (at p = 0.05) and then evaluated in cells expressing the G-allele RNA and in cells expressing vector controls (at p = 0.1) (Panel H).
[74] (I-L): The set of 3299 genes whose expression was differentially regulated in cells expressing the G-allele RNA of rs2670660 compared to vector controls was evaluated in cells expressing the G-allele RNA and in cells expressing the A-allele RNA of rs2670660. Panel I (top) shows the discordant expression of these genes (A- versus G-). The lower panel shows the discordant expression of a subset of 418 genes whose expression was differentially regulated by at least 4-fold.
[75] (J): 2598 genes were identified as differentially regulated by t-statistics in A- allele small RNA-expressing cells compared to the control cultures. Panel J (top) shows the discordant expression profile for these genes in G-allele RNA-expressing cells compared to A-allele RNA-expressing cells. The lower panel shows the discordant expression of a subset of 379 genes whose expression was differentially regulated by at least 4-fold.
[76] (K): 2844 genes were identified as differentially regulated by t-statistics in asG-allele small RNA-expressing cells compared to the control cultures. Panel K (top) shows the discordant expression profile for these genes in asA-allele RNA-expressing cells compared to asG-allele RNA-expressing cells. The lower panel shows the discordant expression of a subset of 352 genes whose expression was differentially regulated by at least 4-fold. [77] (L): 2766 genes were identified as differentially regulated by t-statistics in asA-allele small RNA-expressing cells compared to the control cultures. Panel K (top) shows the discordant expression profile for these genes in asG-allele RNA-expressing cells compared to asA-allele RNA-expressing cells. The lower panel shows the discordant expression of a subset of 342 genes whose expression was differentially regulated by at least 4-fold.
[78] Figure 8: Expression of rs2670660-encoded allele-specific variants of small
RNAs induces mRNA expression changes of the inflammasome regulatory genes (NLRPl, NLRP3, HMGAl, Myb).
[79] (A) mRNA expression changes of the NLRPl (top left panel) and HMGAl
(top right panel) genes in BJl cells expressing the A- or G- alleles of the rs2670660-encoded RNAs. Bottom panels show the ratios of NLRP3 to NLRPl (bottom left) and HMGAl to Myb (bottom right).
[80] (B) mRNA expression of the NLRPl and NLRP3 genes in circulating human neutrophils (left panels) and alveolar neutrophils (right panels) after bronchoscopic endotoxin (LPS) challenge. Top panels show NLRPl and NLRP3 expression. Bottom panels show the ratio of NLRP3 to NLRPl expression.
[81] (C) mRNA expression changes of the NLRPl and NLRP3 genes in human leukocytes after in vitro LPS challenge. Left panels (top and bottom) show the expression in unstimulated cells. Right panels show expression in LPS-stimulated cells. Bottom panels show NLRP3/NLRP1 expression ratios in unstimulated (bottom left) and LPS-stimulated cells (bottom right).
[82] (D) mRNA expression changes of the HMGAl and Myb genes in human circulating human neutrophils (left panels) and alveolar neutrophils (right panels) after bronchoscopic endotoxin (LPS) challenge. Top panels show HMGAl and Myb expression. Bottom panels show the ratio of HMGAl to Myb expression.
[83] (E) mRNA expression changes of the HMGAl (top left) and Myb (top right) genes in human monocytes undergoing adhesion-induced transdifferentiation. Bottom panels show HMGAl /Myb mRNA expression ratios in non-adherent cultures (bottom left) and differentiating cultures (bottom right).
[84] Figure 9: Microarray analysis of gene expression signatures (GES) associated with expression of rs2670660-encoded small RNAs identifies human cells with experimentally-induced activation of the inflammasome pathway. Expression profiles of G- allele concordant and G-allele discordant signatures in individual experimental and control samples of each data set were evaluated by calculating Pearson correlation coefficients (signature scores) using log 10-trans formed fold expression changes of G-allele-specifϊc GES in BJl cells as a multidimensional standard vector. Shaded area identifies the range defined by the average +1- 2STD EV values of the signature scores in control set of samples.
[85] (A) Expression profiles (bars) and linear regression analysis (scatter) of an 82 gene G-allele concordant signature in human circulating leukocytes after in vitro endotoxin (LPS) challenge. Note distinct expression profiles of G-allele concordant signatures in experimental (left set of bars) and control (right set of bars) samples.
[86] (B) Expression profiles (bars) and linear regression analysis (scatter) of a 262 gene G-allele concordant signature in human alveolar (left set of bars) and circulating (right set of bars) neutrophils after in vivo bronchoscopic endotoxin (LPS) challenge. Note distinct expression profiles of G-allele concordant signatures in alveolar (left set of bars) and circulating (right set of bars) neutrophils.
[87] (C) Expression profiles (bars) and linear regression analysis (scatter) of a 43 gene G-allele concordant signature in human circulating neutrophils after in vivo
bronchoscopic endotoxin (LPS) challenge. Note distinct expression profiles of G-allele concordant signatures in circulating neutrophils from LPS-exposed subjects (left set of bars) and control subjects (right set of bars).
[88] (D) Expression profiles (bars) and linear regression analysis (scatter) of a 134 gene G-allele discordant signature in human circulating leukocytes after in vitro endotoxin (LPS) challenge. Note distinct expression profiles of G-allele concordant 45 signatures in experimental (left set of bars) and control (right set of bars) samples.
[89] (E) Expression profiles (bars) and linear regression analysis (scatter) of a 325 gene G-allele discordant signature in human alveolar (left set of bars) and circulating (right of bars) neutrophils after in vivo bronchoscopic endotoxin (LPS) challenge, distinct expression profiles of G-allele concordant signatures in alveolar (left of bars) and circulating (right set of bars) neutrophils.
[90] (F) Expression profiles (bars) and linear regression analysis (scatter) of a 51 gene G-allele concordant signature in human circulating neutrophils after in vivo
bronchoscopic endotoxin (LPS) challenge. Note distinct expression profiles allele concordant signatures in circulating neutrophils from LPS-exposed subjects (left set of bars) and control subjects (right set of bars).
[91] (G) Diminished sample discrimination by GES associated with expression of
G-specifϊc 52 nt small RNAs without segregation into concordant and discordant subsets. Designations of control and experimental samples as in A - F. From left to right, the number of genes in each signature is 216, 587, and 94.
[92] Figure 10: microRNA-signatures induced by expression of rs2670660- encoded transRNAs and associated mRNA GES recapitulating miRNA expression patterns. miRNAs differentially-regulated by rs2670660-allele-specific sense and anti-sense 52 nt small RNAs in BJl cells were identified using the quantitative PCR protocol for detection of 365 human miRNAs in a 384-well-format TaqMan Low Density Arrays (TaqMan Human MicroRNA Array vl.O; Applied Biosystems). Expression of selected differentially-regulated microRNAs (miR-20b and miR-375) and control miRNAs (miR-205) was induced in BJl cells by lentiviral gene transfer and resulting cell lines were subjected to microarray analysis using Affymetrix HG-Ul 33 Plus 2.0 chips.
[93] (A) Expression profiles (bars) and linear regression analysis of expression patterns (scatter) of the 47 miRNA-signature manifesting highly concordant patterns of expression induced by all four allelic variants of the rs2670660 RNAs.
[94] (B) Expression profiles defined by the RQ values (left) and log 10-rans formed
RQ values of the 38 miRNA-signature manifesting highly allele-specifϊc patterns of expression induced by distinct sense and anti-sense allelic variants of the rs2670660 RNAs. Note that expression of each miRNA is below Q-PCR detection limit in at least one cell variant and markedly up-regulated (8.4-fold to 496.3-fold) in at least one cell variant.
[95] (C) Expression profiles (bars) and linear regression analysis of expression patterns (scatter) of the 140-gene mRNA- signature manifesting highly concordant patterns of expression induced by all four allelic variants of the rs2670660 RNAs.
[96] (D) Expression profiles of the 59-gene mRNA-signature defined by expression of the miR-20b microRNA in BJl cells and manifesting highly concordant patterns of expression induced by all four allelic variants of the rs2670660 RNAs.
[97] (E,F) Expression profiles (bars) and linear regression analysis of expression patterns (scatter) of the 86-gene mRNA-signature which was selected to resemble allele- specific patterns of expression of miR-375 (bottom left set of bars). Note that expression profile of 14-gene mRNA-signature (bottom right sets of bars), which was independently defined by induced expression of miR-375 in BJl cells, recapitulates G/A-allele-antagonistic patterns of expression of the 86-gene mRNA-signature and miR-375 micro RNA. mRNAs comprising the 14-gene signature are sub-set of mRNAs comprising the 86-gene signature.
[98] (G) Linear regression analysis of microRNA expression patterns exhibiting concordant (top two scatter plots) and discordant (bottom two scatter plots) allelic context- defined expression profiles induced by expression of the rs2670660-encoded 52 nt transRNAs (top left, G and asA alleles; top right, A and asG alleles; bottom left, A and asA alleles; bottom right, G and asG alleles).
[99] (H) Microarray analysis of human BJl cells stably expressing distinct allelic variants of the rs2670660-encoded snpRNAs reveals allele-specifϊc alterations of expression in multiple classes of non-coding RNAs including snoRNAs and snoRNA-host genes (SNORDl 13; SNHGl; SNHG3; SNHG8), long non-coding RNAs (MEG3, tncRNA, and MALATl), miRNAs, miRNA-precursors, and protein-coding miRNA-host genes (ATAD2; KIAAl 199).
[100] (I) An ABI PCR-based screen identified a statistically significant set of 36 microRNAs expression of which is altered at least 1.5 -fold in NALPl -locus snpRNA- expressing cells compared to control BJ1/EGFP cells and differentially regulated in pathology- linked G-allele-expressing BJl cells compared to the ancestral A-allele-expressing cells.
[101] (J) Allele affinity model of snpRNA-mediated regulation of miRNA expression and activity.
[102] (a)-(c): high affinity (low mfe) snpRNA alleles facilitate increase abundance levels of corresponding miRNAs. Inverse correlation between allele-specific changes in minimal free energy (mfe) of snpRNA/miRNA hybridization and experimentally-defined changes of miRNA expression and activity that is lower mfe values correspond to higher levels of miRNA expression and activity. These relationships are shown for miRNAs the abundance levels of which in human cells are induced (miR-302a; miR-629; miR-548d; miR- 200a; miR-627; miR-770-5p) or repressed (miR-133a; miR-20b; miR-205; let-7b) by forced expression of pathology-linked G-allele snpRNAs compared to ancestral A-allele-expressing cells. Insert bars show the results Q-PCR analysis of expression of corresponding
microRNAs. [103] (d) Luciferase reporter assay of miR-205 and let- 7b activities in RWPEl cells stably expressing distinct allelic variants of the NALPl -locus transRNAs demonstrates increased activity of both microRNAs in high affinity ancestral A-allele-expressing cells compared to low affinity pathology-linked G-allele-expressing cells.
[104] (e) Application of the allele affinity model of transRNA-mediated regulation of microRNA expression and activity to development of the allele equilibrium hypothesis explaining the phenotype-altering effects of transRNAs as the consequence of direct actions on microRNAs abundance and activity and down-stream effects of transRNA-regulated microRNAs on expression of protein-coding genes.
[105] Figure 11: rs2670660-encoded RNAs alter expression of the PluriNet network transcripts and Polycomb pathway genes. Gene expression signatures (GES) associated with expression of rs2670660-encoded sense and anti-sense allele-specific 52 nt small RNAs in BJl cells were independently identified for each experimental setting using t- statistics and 155 differentially-regulated transcripts of the PluriNet network and Polycomb pathway were selected for visualization.
[106] (A) Expression profiles (bars) and linear regression analysis of expression patterns (scatters) of PluriNet network transcripts defined as differentially regulated by the indicated allele-specific variants of the rs2670660-encoded transRNAs: the G-allele signature of 100 PluriNet genes; the A-allele signature of 28 PluriNet genes; the asA-allele signature of 77 PluriNet genes; and the asG signature of 42 PluriNet genes.
[107] Note highly concordant expression profiles for G and as A (top left); A and asG (top right); asA and G (bottom left); asG and A (bottom right) signatures. Middle panel shows integrated allele-context-defmed views of expression profiles of 155 PluriNet network transcripts expression of which is altered by rs2670660-encoded small RNAs. Note that almost all PluriNet transcripts expression of which is altered by G and asA allele-specific rs26700660 transRNAs are upregulated suggesting that expression of G-allele-specific transRNAs would favor retention of a less-differentiated state in a cell.
[108] (B) G-allele-specific rs2670660-encoded transRNAs induce concomitant upregualtion of the Polycomb Repressive Complex 2 (PRC2) genes Ezh2, Suzl2, and EED. Individual measurements of the mRNA expression levels of corresponding genes derived from two independent biological replicate experiments are shown. Note that in contrast to the PRC2 genes, the expression level of the BMIl gene, a key component of the PRCl complex, is decreased in BJl cells expressing G-allele-specific rs2670660-encoded transRNAs compared to A-allele-specific transRNA-expressing cells.
[109] Figure 12: Microarray analysis of gene expression signatures (GES) associated with expression of rs2670660-encoded small RNAs discriminates peripheral blood mononuclear cells (PBMC) from patients with multiple common human disorders and control subjects. GES associated with expression of G-allele-specifϊc 52 nt small RNAs in BJl cells was identified using t-statistics and screened for concordant and discordant features in corresponding clinical settings to segregate G-allele concordant and G-allele discordant signatures. Expression profiles of G-allele concordant and G-allele discordant signatures in individual samples of each data set were evaluated by calculating Pearson correlation coefficients (signature scores) using loglO-transformed fold expression changes of G-allele- specific GES in BJl cells as a multidimensional standard vector. Shaded area identifies the range defined by the average +/- 2STDEV values of the signature scores in control set of samples.
[110] (A) Expression profiles (bars) and linear regression analysis (scatter) of a 309 gene G-allele concordant signature in PBMC of patients with Crohn's disease (left set of bars), ulcerative colitis (right set of bars), and control subjects (middle set of bars). Note distinct expression profiles of G-allele concordant signatures in PBMC from patients and control individuals.
[Ill] (B) Expression profiles (bars) and linear regression analysis (scatter) of a 203 gene G-allele concordant signature in PBMC of patients with rheumatoid arthritis (left set of bars) and control subjects (middle set of bars). Note distinct expression profiles of G-allele concordant signatures in PBMC from patients and control individuals.
[112] (C) Expression profiles (bars) and linear regression analysis (scatter) of a 525 gene G-allele concordant signature in PBMC of patients with symptomatic Huntington's disease (left set of bars), asymptomatic Huntington's disease (middle set of bars), and control subjects (right set of bars). Note distinct expression profiles of G-allele concordant signatures in PBMC from patients and control individuals.
[113] (D) Expression profiles (bars) and linear regression analysis (scatter) of a 25 gene G-allele concordant signature in PBMC of patients with Alzheimer's disease (left set of bars) and control subjects (middle set of bars). Note distinct expression profiles of G-allele concordant signatures in PBMC from patients and control individuals. [114] (E) Expression profiles (bars) and linear regression analysis (scatter) of a 439 gene G-allele discordant signature in PBMC of patients with Crohn's disease (left set of bars), ulcerative colitis (right set of bars), and control subjects (middle set of bars). Note distinct expression profiles of G-allele concordant signatures in PBMC from patients and control individuals.
[115] (F) Expression profiles (bars) and linear regression analysis (scatter) of a 190 gene G-allele discordant signature in PBMC of patients with rheumatoid arthritis (left set of bars) and control subjects (middle set of bars). Note distinct expression profiles of G-allele concordant signatures in PBMC from patients and control individuals.
[116] (G) Expression profiles (bars) and linear regression analysis (scatter) of a 377 gene G-allele discordant signature in PBMC of patients with symptomatic Huntington's disease (left set of bars), asymptomatic Huntington's disease (middle set of bars), and control subjects (right set of bars). Note distinct expression profiles of G-allele concordant signatures in PBMC from patients and control individuals.
[117] (H) Expression profiles (bars) and linear regression analysis (scatter) of a 33 gene G-allele discordant signature in PBMC of patients with Alzheimer's disease (left set of 48 bars) and control subjects (middle set of bars). Note distinct expression profiles of G-allele concordant signatures in PBMC from patients and control individuals.
[118] (I) Diminished clinical sample discrimination by GES associated with expression of G-allele-specific 52 nt small RNAs without segregation into concordant and• discordant subsets. Designations of PBMC samples from patients and control subjects as in A-H.
[119] Figure 13: Microarray analysis of gene expression signatures (GES) associated with expression of rs2670660-encoded small RNAs discriminates normal and pathological tissue samples from patients with multiple common human disorders and control subjects. GES associated with expression of G-allele-specific 52 nt small RNAs in BJl cells was identified using t-statistics and screened for concordant and discordant features in corresponding clinical settings to segregate G-allele concordant and G-allele discordant signatures. Expression profiles of G-allele concordant and G-allele discordant signatures in individual samples of each data set were evaluated by calculating Pearson correlation coefficients (signature scores) using loglO-transformed fold expression changes of G-allele- specific GES in BJl cells as a multidimensional standard vector. Shaded area identifies the range defined by the average +/- 2STDEV values of the signature scores in control set of samples.
[120] (A) Expression profiles of a 102 gene G-allele concordant signature (left panel) and a 148 gene G-allele discordant signature (right panel) in normal and pathological tissue samples (brain hippocampus) of control subjects (far left sets of bars) and patients with Alzheimer's disease (right sets of bars). Tissue samples from Alzheimer's patients are segregated into three sub-sets based on clinically-defined severity of the disease (left to right): incipient, moderate, and severe. Note highly statistically significant distinct expression profiles of G-allele concordant signatures in normal and pathological tissue samples from patients and control individuals.
[121] (B) Expression profiles of a 490 gene G-allele concordant signature (left panel) and a 299 gene G-allele discordant signature (right panel) in normal and pathological tissue samples of control subjects (far left sets of bars; normal prostate tissues) and patients with prostate cancer (right sets of bars). Tissue samples from prostate cancer patients are segregated into three sub-sets based on pathology-defined types of tissue samples (left to right): defined by histological examination morphologically normal prostate tissues adjacent to tumor; primary prostate tumors; metastatic prostate tumors in distant organs. Note highly statistically significant distinct expression profiles of G-allele concordant signatures in normal and pathological tissue samples from patients and control individuals.
[122] (C) Expression profiles of a 29 gene G-allele concordant signature (left panel) and a 16 gene G-allele discordant signature (right panel) in normal and pathological tissue samples of control subjects (far left sets of bars; normal breast tissues) and patients with breast cancer (right sets of bars). Tissue samples from breast cancer patients are segregated into five sub-sets based on pathology-defined types of tissue samples (left to right): defined by histological examination morphologically normal breast tissues adjacent to tumor; primary breast tumors from patients without metastatic disease; primary breast tumors from patients with metastatic disease; lymph nodes from patients with metastatic disease; metastatic breast tumors in distant organs. Note highly statistically significant distinct expression profiles of G-allele concordant signatures in normal and pathological tissue samples from patients and control individuals.
[123] Figure 14: Microarray analysis of gene expression signatures (GES) associated with expression of rs2670660-encoded small RNAs discriminates normal and pathological tissue samples from patients with autism and control subjects (A) as well lean and obese subjects (B,C). GES associated with expression of G-allele-specifϊc 52 nt small RNAs in BJl cells was identified using t-statistics and screened for concordant and discordant features in corresponding clinical settings to segregate G-allele concordant and G- allele discordant signatures. Expression profiles of G-allele concordant and G-allele discordant signatures in individual samples of each data set were evaluated by calculating Pearson correlation coefficients (signature scores) using loglO-transformed fold expression changes of G-allele-specifϊc GES in BJl cells as a multidimensional standard vector. Shaded area identifies the range defined by the average +/- 2STDEV values of the signature scores in control set of samples.
[124] Figure 15: Intergenic trans-regulatory RNAs represent a most prevalent class of transcripts containing SNP variants associated with common human disorders (A) and display cell-type specific patterns of expression in human cells (B; C).
[125] (A) Graphical representation of the relative prevalence of distinct SNP classes defined by analysis of genomic coordinates of disease-linked SNPs identified in genome- wide association studies (GWAS) of 22 common human disorders. Distinct SNP classes were defined based on the assessment of chromosomal positions of 277 SNPs identified in genome-wide association studies (GWAS) of up to 712,263 samples comprising 221,158 disease cases, 322,862 controls and 168,233 case/control subjects of obesity GWAS.
[126] (B) Cell type-specific expression profiles of 11 intergenic small trans RNAs containing SNP sequences associated with high risk of developing prostate cancer. Note that small transRNAs AlO, Al 1, A18 (marked in boxes) are expressed exclusively in human cells of epithelial origin (RWPEl); transRNA A9 is expressed in cells of mesenchymal (BJl) and lymphoid (U937) origins, but not in epithelial RWPEl cells; transRNA Al 8 is expressed in epithelial RWPEl cells and mesenchymal BJl cells, but not in lymphoid U937 cells;
transRNA A21 is expressed in epithelial RWPEl cells and lymphoid U937 cells, but not in mesenchymal BJl cells. Nearly ubiquitous patterns of expression of long noncoding RNAs containing the corresponding SNP sequences suggest a model of cell type-specific biogenesis of small tarnsRNAs based on differentiation-associated processing of long non-coding RNAs. Small transRNAs and long noncoding RNAs containing identical SNP variants are aligned in columns designated A5, A6, A9, AlO, Al l, A13, A14, A18, A19, A20, and A21.
[127] (C) Cell type-specific expression profiles of six intergenic small transRNAs containing SNP sequences associated with high risk of developing breast cancer, Small transRNAs A7; A8; and B6 (shown in boxes) are expressed exclusively in human cells of epithelial origin (RWPEl); transRNA B7 is expressed in human cells of lymphoid (U937) origin, but not in epithelial (RWPEl) and mesenchymal (BJl) cells. Note that long non- coding RNAs containing corresponding SNP sequences manifest more uniform expression profiles compared to small transRNA counterparts. Small transRNAs and long non-coding RNAs containing identical SNP variants are aligned in columns designated A7, A8, A16, B5, B6, and B7.
[128] Figure 16: (A) Expression of RNA A6 (SEQ ID NO:7) facilitates androgen- independent growth of the androgen-dependent human prostate cancer cell line LNCap and the highly metastatic cell line LNCapLN3. (B) Expression of RNA A6 enhances the colony- formation ability of LNCap cells in soft agar.
[129] Figure 17: Concordance analysis of 3299 and 1561 rs2670660 G-allele RNA- regulated transcripts.
[130] Figure 18: Concordance analysis of 3268 and 1636 rs2670660 G-allele RNA- regulated transcripts.
DETAILED DESCRIPTION OF THE INVENTION
[131] The present invention is based upon the discovery of small SNP sequence- bearing RNA molecules having gene regulatory activity. The small non-coding RNA molecules of the present invention are distinct from the non-coding RNA molecules of the prior art, which include, e.g., small and large interfering RNA molecules, hairpin RNA molecules, and microRNA molecules. See background, infra. The term "non-coding" means that the RNA molecule is not translated into an amino acid sequence. Thus, the RNA molecules of the invention do not encode proteins. The small RNA molecules of the invention are transcribed from intergenic or intronic regions of the human genome containing at least one disease-linked SNP. These small non-coding RNA molecules are referred to herein as "snpRNAs." The snpRNA molecules of the invention are able to regulate the expression of genes distant from the genomic site of their transcription. Accordingly, they may also be referred to as "transRNA" molecules. As used herein, the terms "snpRNAs" and "transRNAs" are synonymous. The snpRNA molecules of the invention, and their corresponding DNA and cDNA molecules, are isolated and preferably purified.
[132] The term "isolated," in the context of a polynucleotide molecule of the invention, refers to a polynucleotide molecule that has been isolated from a cell. An isolated polynucleotide may contain various impurities which are removed by subsequent purification. Methods for purifying polynucleotides from various cellular contaminants are known in the art.
[133] The term "purified," in the context of a polynucleotide molecule of the invention, refers to a polynucleotide molecule that is substantially free of cellular material or contaminating proteins from the cell or tissue source from which it is isolated or
recombinantly produced, or substantially free of chemical precursors or other chemical agents when chemically synthesized. Preferably, a purified polynucleotide of the invention has less than about 30%, 20%, 10%, or 5% (by dry weight) of heterologous protein, polypeptide, peptide, or antibody (also referred to as a "contaminating protein"). In a specific
embodiment, the purified polynucleotide is 60%, preferably 65%, 70%, 75%, 80%, 85%, 90%, or 99% free of contaminating proteins, cellular material, chemical agents, and precursors.
[134] The snpRNA molecules of the invention are non-coding RNA molecules transcribed from a genomic sequence containing a disease-linked SNP. Preferably, the SNP- containing genomic sequence is an intergenic sequence. An intergenic sequence is one that is distant from a protein coding region of the genome. An SNP refers to a particular kind of DNA sequence variation occurring in a population, preferably a human population, in which a single nucleotide (denoted A, T, C, or G, in accordance with the convention in the art) in the genome differs between members of a species at a particular location in the genome, also referred to as a genetic locus. The differences are referred to as alleles based on the identity of the possible single nucleotide differences. Thus, where the nucleotide at the variant position is either C or T, these variants are referred to as the C-allele and the T-allele, respectively. In a preferred embodiment, the SNP has only two alleles. Since an individual has paired sets of chromosomes, an individual is said to be homozygous or heterozygous for a particular allele depending on whether both chromosomes contain the same or different alleles, respectively. Within a population, SNPs can be assigned an allele frequency which refers to the frequency of a particular allele at a given genetic locus within the population. Preferably, allelic frequency is based upon a geographical population or an ethnic population.
[135] By "containing at least one disease-linked SNP" it is meant that the snpRNA is transcribed from an SNP -bearing allele of a DNA molecule. In certain embodiments, the snpRNA is transcribed from one or both alleles of the DNA molecule bearing the SNP. The allele of the SNP that is associated with a disease or disorder is referred to as the
"pathological allele." The allele of the SNP that is not so associated is referred to as the "ancestral allele." [136] All polynucleotide sequences described herein are written in the 5' to 3' orientation, unless specifically denoted otherwise.
[137] The term "disease-linked" or "disease-associated" and synonymous terms when used in the context of an SNP refers to an SNP that has been associated with one or more diseases or disorders in a population of subjects, preferably human subjects, using methods known in the art. Such methods include, for example, genome-wide association studies of SNP variations. For example, a particular SNP may be associated with an increased incidence of the disease or disorder, meaning that individuals containing a particular allele at the site of the SNP are statistically more likely to have the disease or disorder. The statistical methods used to establish the association between SNPs and diseases or disorders are well known by those skilled in the art.
[138] In one embodiment, the SNP is selected from the group consisting of rs2670660, rs6596075, rs6983561, rsl6901979, rsl3281615, rsl0505477, rsl0808556, rs6983267, rs7014346, rs7000448, rsl447295, rs2820037, rs889312, rsl937506, rsl3387042, rs7716600, rsl 1249433, and rs3803662.
[139] In one embodiment, the SNP is selected from the group consisting of, rs9469220, rs9270986, rs6457617, rs615672, rs7837688, rs6997709, rsl 6892766, rs2670660, and rs2542151.
[140] As used herein, the singular form of a noun is meant to encompass both the singular and plural forms. Thus, "an isolated small non-coding RNA molecule" is meant to refer to one or more isolated small non-coding RNA molecules.
[141] The invention provides an isolated small non-coding RNA molecule transcribed from an intergenic region of the human genome, wherein the RNA molecule is less than 1000, less than 800, less than 500, less than 400, less than 200, less than 150, less than 100, or less than 75 nucleotides and the intergenic region contains at least one SNP associated with one or more human diseases or disorders. In a particular embodiment, the intergenic region contains only one SNP. An intergenic region is a genomic region, preferably the human genome, located between clusters of genes. It is substantially devoid of protein-coding genes.
[142] The RNA molecules of the present invention are depicted as their cDNA forms. In one embodiment, the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1-101, 332, and 333. In another embodiment, the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1, 4, 7, 10, 17, 22-28, 32-34, 332, and 333. In another embodiment, the cDNA form of the RNA molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 1, 7, 332, and 333.
[143] The invention also provides a vector comprising a polynucleotide molecule of the invention. In one embodiment, the vector comprises the cDNA form of an RNA molecule described herein. As used herein, the term "vector" in this context refers to a cloning vector or an expression vector, or both (i.e., the same vector may be designed for cloning and expression). The terms are used consistent with their common meaning in the art. Thus, a cloning vector refers to a DNA molecule, typically a plasmid molecule, into which a foreign DNA fragment can be inserted, e.g., by restriction digest and ligation. Non- limiting examples of cloning vectors include genetically engineered plasmids and
bacteriophages (such as phage λ) or other viruses, as well as bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs). An expression vector is typically engineered to contain regulatory sequences that act as enhancer and promoter regions and lead to efficient transcription of the foreign DNA. In a preferred embodiment, the vector is a viral vector. In one embodiment, the vector is an expression vector. In another embodiment, the vector is a cloning vector.
[144] The invention further provides a cell comprising said vector. Preferably, the cell is a mammalian cell and most preferably a human cell. In a preferred embodiment, the cell stably expresses the vector.
[145] The invention also provides a kit comprising, in one or more containers, a vector comprising a polynucleotide molecule of the invention. In one embodiment, the kit comprises an RNA molecule described herein and instructions for expressing the RNA molecule from the vector. In one embodiment, the kit comprises the cDNA form of an RNA molecule described herein and instructions for expressing the RNA molecule from the vector.
[146] In one embodiment, the kit further comprises one or more polynucleotide primers for amplifying the cDNA molecule. In one embodiment, the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102-331. In one embodiment, the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102-161. In one embodiment, the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102, 103, 114, 115, 326, and 327.
[147] The invention also provides a kit comprising, in one or more containers, a cell comprising said vector and instructions for expressing the RNA molecule in the cell. [148] The invention also provides a method for detecting the small non-coding RNA molecules described herein in a sample from a subject, the method comprising detecting the RNA molecules in the sample. In one embodiment, the step of detecting the RNA molecules comprises the step of detecting the cDNA form of the RNA molecule in the sample. In one embodiment, the cDNA form is detected by a method comprising reverse transcription and polymerase chain reaction (RT-PCR) technology. In one embodiment, the method comprises the technique of nested PCR. These terms are used here in accordance with their normal and customary meaning in the art. Thus, "RT-PCR" refers to a PCR technique in which reverse transcriptase is first used to reverse transcribe RNA into its complementary DNA, also referred to as cDNA. The cDNA is then amplified by PCR. PCR is a well known technique used to amplify a particular DNA molecule of interest, typically from a mixture containing a high background of non-specific DNA molecules. Nested PCR employs two sets of primers in two successive PCR reactions to achieve increased specificity.
[149] In one embodiment, the method further comprises the steps of isolating the small RNA fraction from the sample and converting the RNA into cDNA prior to the step of detecting the cDNA in the sample.
[150] In another embodiment, the cDNA form of the RNA molecules is detected by a method comprising nucleic acid hybridization technology.
[151] The invention also provides a method for evaluating the risk that a human subject will develop a disease or condition associated with a specific allele of an SNP ("the pathological allele") by detecting the presence of an RNA molecule of claim 1 in a sample from the subject, wherein the RNA molecule is transcribed from the pathological allele, and wherein detection of said RNA molecule indicates that the subject has an increased risk for developing the disease or condition and the failure to detect said RNA molecule indicates that the subject has a decreased risk for developing the disease or condition.
[152] In one embodiment, the method further comprises detecting the expression level of the RNA molecule transcribed from the pathological allele relative to its expression in a population of healthy subjects, wherein an increased or decreased level of expression relative to the population of healthy subjects indicates that the subject has an increased risk for developing the disease or condition.
[153] In one embodiment, the step of detecting the presence of an RNA molecule transcribed from the pathological allele is performed indirectly, by detecting the expression of one or more genes whose expression is regulated by the RNA molecule. [154] The invention also provides a method for diagnosing a disease or condition associated with a specific allele of an SNP ("the pathological allele") in a human subject, the method comprising detecting the presence of an RNA molecule of claim 1 in a sample from the subject, wherein the RNA molecule is transcribed from the pathological allele, and wherein the disease or condition is positively diagnosed if the RNA molecule is detected in the sample.
[155] The invention also provides a method for treating, preventing, or ameliorating a disease or condition associated with a specific allele of an SNP ("the pathological allele") in a subject in need thereof, the method comprising administering one or more therapeutic agents that act to suppress the expression or antagonize the activity of an RNA molecule of claim 1, wherein the RNA molecule is transcribed from the pathological allele.
[156] As used herein, the term "subject" refers to an animal, preferably a mammal including a non-primate (e.g. , a cow, pig, horse, cat, dog, rat, and mouse) and a primate (e.g. , a chimpanzee, a monkey such as a cynomolgous monkey and a human), and more preferably a human.
[157] Preferably, with respect to any of the methods described above, the subject is human.
[158] In certain embodiments of the methods described above, the sample is a blood, tissue, or cell sample.
[159] In a specific embodiment, the disease or condition is selected from the group consisting of Crohn's disease, rheumatoid arthritis, Huntington's disease, Alzheimer's disease, breast cancer, prostate cancer, autism, and obesity.
[160] The invention also provides an apparatus for evaluating a disease or condition, or evaluating the risk of developing a disease or condition, in a subject, the apparatus comprising a model configured to evaluate a dataset for the subject to thereby evaluate the risk of disease in the subject, wherein the model is based upon determining the similarity in the expression profile of a defined set of genes in a sample from the subject and the expression profile for that set of genes in one or more reference sets of the model, wherein a reference set comprises one or more of a population of healthy subjects and a population of subjects suffering from the disease, wherein the set of genes is a set of genes whose expression is regulated by a small RNA molecule of claim 1.
[161] In one embodiment, the disease or disorder is selected from Crohn's disease, rheumatoid arthritis, bipolar disorder, Alzheimer's disease, vitiligo, ulcerative colitis, type 1 diabetes, type 2 diabetes, autoimmune thyroid disease, coronary artery diseases, hypertension, multiple sclerosis, obesity, and epithelial cancers. In a specific embodiment, the epithelial malignancy is selected from prostate, breast, ovarian, and colorectal cancer. snpRNA MOLECULES AND PRIMERS FOR THEIR DETECTION
[162] The snpRNA molecules of the invention are a novel class of non-coding RNA molecule transcribed from intergenic SNP-containing regions of the human genome. This class of RNA molecule is defined by the following structural features. The RNA molecules of the invention each contain a disease-associated SNP. The disease-associated SNP is located within a loop structure of the RNA molecule. Preferably, this loop structure containing the SNP also contains a binding site for an miRNA molecule. Preferably, the SNP is located within a binding site for one or more of the following proteins: H3K27Me3, CBP/CREB, Ezh2, and POL2. In certain embodiments where the SNP is located within the binding site for more than one protein, the binding sites overlap. In another embodiment, the SNP is within the binding site for a nuclear lamina protein. In a specific embodiment, the SNP is located within 200 basepairs of a binding site for a lamin Bl protein.
[163] The invention provides isolated snpRNA molecules, their cDNA counterparts, and primers for their detection in a biological sample using, e.g., reverse-transcription polymerase chain reaction (RT-PCR) technology. In certain embodiments the isolated snpRNA molecules are purified. In some embodiments, the snpRNA molecules are in the form of their cDNA counterparts. The snpRNA molecules of the invention are
polynucleotide sequences comprising the bases adenine (A), guanine (G), cytosine (C), and uracil (U). The counterpart cDNA molecules are polynucleotide sequences comprising the bases adenine (A), guanine (G), cytosine (C), and thymine (T). The sequences are denoted as strings of these bases, in accordance with the common practice in the art. The sequences of the present invention are denoted as cDNA sequences of the corresponding RNA molecules. The corresponding RNA molecule is easily envisioned from the cDNA sequences depicted here using methods routine in the art.
[164] In one embodiment, the snpRNA is an allelic variant. An "allelic variant" of an snpRNA molecule of the invention refers to the allele of the SNP from which the snpRNA is transcribed. In one embodiment, the snpRNA corresponds to the pathological allele of the SNP. In another embodiment, the snpRNA corresponds to the ancestral allele. In particular embodiments, the snpRNA is an A-allele RNA, a G-allele RNA, a C-allele RNA, or a T- allele RNA, wherein the reference to the particular allele is in the context of the SNP which encodes the RNA. [165] In some embodiments, the snpRNA molecule of the invention is an SNP- containing fragment of a larger RNA molecule. In one embodiment, an snpRNA molecule of the invention is a processing variant of a longer non-coding RNA molecule.
[166] Preferably, the snpRNA molecules of the invention are molecules of 50 to 300 nucleotides in length, each containing at least one disease-linked SNP. In specific
embodiments, an snpRNA molecule of the invention is about 25, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275, or 300 nucleotides in length. Preferably, the snpRNA molecule is between 50-100, 50-75, or 50-60 nucleotides in length. In specific embodiments, the snpRNA molecule is about 50 nucleotides in length. In certain embodiments, the snpRNA molecules comprise about 50, 60, 70, 80, 90, 100, 125 or 150 nucleotides flanking a disease- associated SNP. Preferably, an snpRNA molecule of the invention comprises 50, 60, 70, 80 or 90 nucleotides flanking the SNP.
[167] In one embodiment, the snpRNA molecule is contiguous. As used herein, the term "contiguous" in the context of an snpRNA molecule means that the snpRNA molecule is a single sequence, uninterrupted by any intervening sequence or sequences.
[168] In one embodiment, the snpRNA molecule of the invention acts as a transcriptional suppressor on one or more genes encoding proteins selected from the
Polycomb group (PcG), the bivalent chromatin domain (BCD) group, NALPl, NALP3, and the PluriNet group. The term "Polycomb group" refers to a family of chromatin remodeling proteins that function in the epigenetic silencing of genes. The terms "NALPl and NALP3" refer to proteins that assemble into complexes called "inflammasomes" which activate caspase-1, resulting in the processing of pro-inflammatory cytokines and triggering an innate immune response. The term "PluriNet" refers to a protein network common to pluripotent cells which enables them to differentiate into multiple cell types. See e.g., Mϋller, F.J. et al., Regulatory networks define phenotypic classes of human stem cell lines, Nature 455:401-405 (18 September 2008).
[169] The invention provides isolated snpRNA molecules and the cDNA
counterparts of the RNA molecules. The following tables give the cDNA sequences of the snpRNA molecules of the invention. Each sequence in the table below represents two sequences, one for each allelic variant of the SNP. The two sequences for each allelic variant are identical except for a single nucleotide at the position indicated in the sequence as variable. The variable position is denoted in the sequence as, e.g., "[G/ A]" which indicates that one allele contains a "G" at that position in the sequence and the other allele contains an "A" at that position in the sequence. The sequences below are referred to as "cDNA" sequences because they are the DNA sequence complementary to the RNA molecules transcribed from the genomic DNA.
[170] The intergenic RNA molecules of the invention are represented by their respective cDNA sequences in Table 1. Additional RNA molecules identified or predicted to be encoded by intronic sequences are represented by their respective cDNA sequences in Table 2. Primers which can be used to amplify the RNA molecules of the invention using reverse transcription followed by a polymerase chain reaction are shown in Table 3.
Table 1: cDNA sequences of small snpRNA molecules transcribed from intergenic SNP 's.
Figure imgf000032_0001
Figure imgf000033_0001
Figure imgf000034_0001
Figure imgf000035_0001
Figure imgf000036_0001
Figure imgf000037_0001
Figure imgf000038_0001
Figure imgf000039_0001
Table 2: Primer sequences (Forward -F; and Reverse-R)
Figure imgf000039_0002
Figure imgf000040_0001
Figure imgf000041_0001
Figure imgf000042_0001
Figure imgf000043_0001
METHODS OF USE
[171] The invention provides methods and reagents for the detection of specific snpRNAs in a biological sample from a subject. In one embodiment, the invention provides primers that can be used in an RT-PCR-based assay to identify the presence of one or more snpRNAs in a sample. The invention also provides probes, in the form of cDNA molecules of the snpRNAs, for use in detecting the snpRNAs in a sample, and allelic variants thereof. The invention also provides diagnostic and prognostic methods based on the detection of the snpRNAs.
[172] Preferably, the presence of a particular allelic variant of the snpRNA is detected according to the methods of the invention. In a specific embodiment, the allelic variant is the A-allele, the G-allele, the C-allele, or the T-allele, denoted with respect to the SNP sequence. In one embodiment, the allele is the pathological allele of the SNP. In another embodiment the allele is the ancestral allele of the SNP.
[173] In a specific embodiment, the pathological allele is selected from the G-allele of rs2670660 or the A-allele of rsl6901979.
[174] An snpRNA molecule of the invention is an RNA molecule transcribed from a genomic sequence containing a disease-linked SNP. Thus, the snpRNA can be transcribed from either allele, or from both alleles, of the SNP -bearing genomic sequence. In accordance with the invention, the detection of an snpRNA molecule transcribed from the pathological allele of the SNP indicates an increased risk for the disease or disorder linked to the SNP. The risk is based upon the risk associated with the specific allele of the SNP.
[175] In certain embodiments, the presence of an snpRNA transcribed from a pathological allele translates to an increased risk of developing the disease or disorder or an increased risk of having a more severe or refractory form of the disease or disorder.
Likewise, the failure to detect an snpRNA transcribed from a pathological allele, or the detection of an snpRNA transcribed from an ancestral allele, indicates a decreased risk for the disease or disorder. In this context, the term "refractory" describes patients treated with a currently available therapy for a disease or disorder, wherein the treatment with the currently available therapy is not clinically adequate either (i) to relieve one or more symptoms associated with the disease or disorder, (ii) to stop or adequately slow the progression of the disease or disorder, or (iii) to resolve the pathological effects of the disease or disorder.
[176] The methods of the present invention, because they are based upon the detection of snpRNA molecules, and allelic variants thereof, offer an improvement over methods based on the detection of the SNPs themselves. This is because, according to the present invention, the SNP itself is not functional and its mere presence, like that of a gene, does not necessarily have a biological consequence. Rather, the biological consequence results from its transcription, in this case into a non-coding regulatory RNA molecule.
[177] The invention provides methods for detecting an snpRNA molecule in a sample. In a preferred embodiment, the sample comprises the fraction of small RNA molecules from a cell or tissue. Preferably the fraction of small RNA molecules is substantially free of contaminating DNA molecules and protein.
[178] In one embodiment, the method comprises contacting the sample with one or more short (10-30 base pairs) oligonucleotides under conditions permitting the hybridization of the one or more short oligonucleotides with the snpRNA molecule or a corresponding cDNA thereof. In accordance with this embodiment, the method further comprises one or more rounds of a polymerase chain reaction ("PCR") after the contacting step. In one embodiment, a step of reverse transcription precedes the contacting step. In one embodiment, the PCR reaction is a nested PCR reaction. In one embodiment, the method further comprises the step of visualizing the PCR products of the PCR reaction using gel
electrophoresis with or without an additional step comprising Southern hybridization. In accordance with this embodiment, the snpRNA molecule is detected in the sample if a PCR product of the predicted size is amplified in the PCR reaction. In one embodiment, the oligonucleotides are labeled with a detectable label.
[179] In another embodiment, the method comprises contacting the sample with one or more longer oligonucleotides (50-300 base pairs) under conditions permitting the hybridization of the oligonucleotides with the snpRNA molecule or a corresponding cDNA thereof. In one embodiment, the oligonucleotides are labeled with a detectable label. In one embodiment, the sample is bound to a solid support. In a specific embodiment, the solid support is a bead or a membrane support. In accordance with this embodiment, the snpRNA molecule is detected in the sample if the oligonucleotide selectively hybridizes with a molecule of the predicted size. Selective hybridization is determined using methods routine in the art of nucleic acid hybridization assays. For example, increasing the salt content of the wash buffers and the number, length, and temperature of the washing steps increases the specificity of binding.
[180] The invention provides methods for determining the likelihood that a human subject will develop a disease or condition linked to an SNP by detecting the presence of an SNP sequence-bearing RNA molecule in a sample from the subject. In accordance with this embodiment, the subject has an increased likelihood of developing the disease or condition where an snpRNA transcribed from a pathological allele of the SNP is detected in a sample from the subject. Likewise, the subject has a decreased likelihood of developing the disease or condition where either no snpRNA is detected in the sample or an snpRNA transcribed from an ancestral allele is detected in the sample.
[181] In one embodiment, the invention provides a method for determining the risk to a subject of developing a particular disease or disorder, wherein a risk of developing the disease or disorder has been associated with an SNP, the method comprising detecting a small RNA containing the SNP in a sample from the subject by (1) obtaining a biological sample from the subject; (2) extracting the population of small RNAs from the sample; and (3) performing a reverse transcription polymerase chain reaction (RT-PCR) on the extract of small RNA from the sample, wherein the PCR is performed with a set of primers designed to amplify a complementary DNA fragment (cDNA) corresponding to the genomic region containing the SNP. In specific embodiments, the primers are designed to amplify a cDNA fragment that is either sense or antisense with respect to the genomic DNA containing the SNP. In certain embodiments, more than one set of primers is used to amplify the cDNA, wherein the more than one set of primers includes a set of nested PCR primers. In certain embodiments, the more than one set of primers includes a set of primers to amplify the antisense cDNA fragment and the sense cDNA fragment.
[182] In particular embodiments of the methods of the invention, the sample is a cell or tissue sample, a tumor tissue sample, a blood sample, or the sample comprises or is enriched for peripheral blood mononuclear cells (PBMC). It is understood that the embodiment in which the sample is "a cell" includes a plurality a cells. In one embodiment, the cells are a line of immortalized cells. In another embodiment the cells are primary cells which have been cultured for a period of time to increase their cell number. In each of these embodiments "a cell" or a plurality of cells refers to cells which are outside of a body, i.e., cells in vitro. [183] In one embodiment of the claimed methods, the presence of the G-allele snpRNA of rs2670660 is detected in a sample from the subject, wherein the presence of the
G-allele snpRNA indicates that the subject is at an increased risk for developing an autoimmune disorder. In one embodiment, the autoimmune disorder is selected from the group consisting of vitiligo, ankylosing spondylitis, rheumatoid arthritis, multiple sclerosis, systemic lupus erythematosus and autoimmune thyroid disease.
[184] In one embodiment of the claimed methods, the presence of the A-allele snpRNA of rs 16901979 is detected in a sample from the subject, wherein the presence of the
A-allele snpRNA indicates that the subject is at an increased risk for developing a cancer of epithelia origin. In one embodiment, the cancer is selected from breast cancer, metastatic breast cancer, prostate cancer, and metastatic prostate cancer. In a preferred embodiment, the cancer is prostate cancer or metastatic prostate cancer.
[185] In one embodiment of the claimed methods, the presence of the C-allele snpRNA of rs6596075 is detected in a sample from the subject, wherein the presence of the
C-allele snpRNA indicates that the subject is at an increased risk for developing Crohn's disease.
[186] In one embodiment of the claimed methods, the presence of the C-allele snpRNA of rs6983561 is detected in a sample from the subject, wherein the presence of the
C-allele snpRNA indicates that the subject is at an increased risk for developing prostate cancer.
[187] In one embodiment of the claimed methods, the presence of the G-allele snpRNA of rsl3281615 is detected in a sample from the subject, wherein the presence of the
G-allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
[188] In one embodiment of the claimed methods, the presence of the T-allele snpRNA of rsl0505477 is detected in a sample from the subject, wherein the presence of the
T-allele snpRNA indicates that the subject is at an increased risk for developing colorectal or prostate cancer.
[189] In one embodiment of the claimed methods, the presence of the C-allele snpRNA of rsl0808556 is detected in a sample from the subject, wherein the presence of the
C-allele snpRNA indicates that the subject is at an increased risk for developing colorectal or prostate cancer.
[190] In one embodiment of the claimed methods, the presence of the G-allele snpRNA of rs6983267 is detected in a sample from the subject, wherein the presence of the G-allele snpRNA indicates that the subject is at an increased risk for developing colorectal or prostate cancer.
[191] In one embodiment of the claimed methods, the presence of the A-allele snpRNA of rs7014346 is detected in a sample from the subject, wherein the presence of the
A-allele snpRNA indicates that the subject is at an increased risk for developing colorectal cancer.
[192] In one embodiment of the claimed methods, the presence of the T-allele snpRNA of rs7000448 is detected in a sample from the subject, wherein the presence of the
T-allele snpRNA indicates that the subject is at an increased risk for developing prostate cancer.
[193] In one embodiment of the claimed methods, the presence of the A-allele snpRNA of rs 1447295 is detected in a sample from the subject, wherein the presence of the
A-allele snpRNA indicates that the subject is at an increased risk for developing prostate cancer.
[194] In one embodiment of the claimed methods, the presence of the T-allele snpRNA of rs2820037 is detected in a sample from the subject, wherein the presence of the
T-allele snpRNA indicates that the subject is at an increased risk for developing hypertension.
[195] In one embodiment of the claimed methods, the presence of the C-allele snpRNA of rs889312 is detected in a sample from the subject, wherein the presence of the C- allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
[196] In one embodiment of the claimed methods, the presence of the A-allele snpRNA of rs 1937506 is detected in a sample from the subject, wherein the presence of the
A-allele snpRNA indicates that the subject is at an increased risk for developing
hypertension.
[197] In one embodiment of the claimed methods, the presence of the A-allele snpRNA of rsl3387042 is detected in a sample from the subject, wherein the presence of the
A-allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
[198] In one embodiment of the claimed methods, the presence of the A-allele snpRNA of rs7716600 is detected in a sample from the subject, wherein the presence of the
A-allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
[199] In one embodiment of the claimed methods, the presence of the C-allele snpRNA of rsl 1249433 is detected in a sample from the subject, wherein the presence of the C-allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
[200] In one embodiment of the claimed methods, the presence of the T-allele snpRNA of rs3803662 is detected in a sample from the subject, wherein the presence of the
T-allele snpRNA indicates that the subject is at an increased risk for developing breast cancer.
[201] In accordance with the methods of the invention, the table below lists the pathological allele of a number of exemplary SNPs which encode an snpRNA molecule of the invention.
Table 3: Selected examples ofpathological alleles andthe associated disease or disorder
SNP Pathological Associated Disease/Disorder
Allele
rs2670660 G allele Autoimmune disorders
A3: rs6596075 C allele Crohn's disease
A5: rs6983561 C allele Prostate cancer
A6: rsl6901979 A allele Prostate Cancer
A8: rsl3281615 G allele Breast Cancer
A9: rsl0505477 T allele Colorectal and Prostate Cancer
AlO rsl0808556 C allele Colorectal and Prostate Cancer
All rs6983267 G allele Prostate and Colorectal Cancers
A12 rs7014346 A allele Colorectal Cancers
A13 rs7000448 T allele Prostate cancer
A14 rsl447295 A allele Prostate cancer
A15 rs2820037 T allele Hypertension
A16 rs889312 C allele Breast Cancer
A17 rsl937506 A allele Hypertension
B5: rsl3387042 A allele Breast Cancer
El: rs7716600 A allele Breast Cancer
E2: rsll249433 C allele Breast Cancer
E3: rs3803662 T allele Breast Cancer
EXAMPLES
[202] The following examples describe the identification of small non-coding RNAs of the invention (snpRNAs) and the biological activity of specific examples of these snpRNAs.
1.1 META-ANALYSIS OF DISEASE-LINKED SNPS REVEALS THAT THE MAJORITY OCCUR WITHIN NON-CODING GENOMIC REGIONS
[203] To assess the genomic distribution of disease-linked SNPs, a meta-analysis was carried out using SNPs identified in several genome-wide association studies. See Glinskii et al, Cell Cycle 2009 Dec;8(23):3925-42. The data set consisted of up to 712,253 samples (comprising 221,158 disease cases, 322,862 controls, and 168,233 case/control subjects of obesity GWAS). This analysis revealed that 39% of SNPs associated with 22 common human disorders are located within intergenic regions and 29% within introns. Thus, a majority of disease-linked SNPs identified to date are located within introns (29%) or intergenic (39%) regions of the human genome having no direct relation either to known protein-coding sequences or to known non-coding RNA sequences such as miRNA or IiRNA sequences. These data are summarized in the table below.
[204] Chromatin-state maps based on H3K4me3-H3K36me3 signatures show that many intergenic disease-linked SNPs are located within the boundaries of the K4-K36 domains indicating that these intergenic SNP-harboring genomic regions are transcribed, even though none are located within the boundaries of exons of genomic sequences encoding long non-coding RNAs identified to date. The following data demonstrate that these SNP- containing intergenic regions are in fact transcribed to produce non-coding RNA molecules having gene regulatory activity.
Table 4: SNP classes defined by analysis of genomic coordinates of disease-linked SNPs identified in genome -wide association studies of 22 common human disorders. Five intergenic SNPs are associated with multiple diseases (3 with 3; and 2 with 2); 4 intronic SNPs are
associated with 2 different diseases; 4 missense SNPs are associated with 2 different diseases.
Figure imgf000049_0001
1.2 IDENTIFICATION OF SMALL TRANSRNAS ENCODED BY
INTERGENIC SEQUENCES CONTAINING DISEASE-LINKED SNPS
[205] An RT-PCR-based screening protocol was used to identify RNA molecules encoded by disease-associated SNP sequences. This protocol was initially used to identify RNAs 100 to 200 nucleotides in length encoded by intergenic SNPs associated with multiple common human disorders including Crohn's disease, rheumatoid arthritis, type 1 diabetes, vitiligo, and multiple types of epithelial malignancies (prostate, breast, ovarian, and colorectal cancers). RNAs identified in the intial screen using human cells of mesenchymal (BJl) and lymphoid (U937) origin are shown in Figures 1 and 2. The sequences of these RNA molecules are represented by their respective cDNA sequences in Table 1, supra.
Tables 1 and 3). Further experiments also included human cells of epithelial origin (RWPEl) (Fig. 15, Tables 1 and 3). The results demonstrate the cell-type specific expression of many of the small RNAs.
[206] The RT-PCR based screening protocol comprised the following steps:
extraction of small RNA from cells; determination of DNA contamination by PCR for beta- actin; synthesis of cDNA; first PCR using primer set 2 (GC2F and GC2R); nested PCR of purified first PCR product using primer set 1 (GClF, GClR); gel purification of final PCR product; confirm sequence of final PCR product by direct sequencing. Detailed protocols are found infra, in the section entitled Materials and Methods.
[207] Further analysis identified a subset of sequences flanked by the same protein- coding genes in both human and mouse genomes. These sequences are selected from A6, A9-11, A16, A23, B6, C12, D2, D5, D26, E3, E12, and the rs2670660 (NALPl Loci) RNAs, all of which are shown in Table 1, supra. Further analysis using genome-wide chromatin domain maps (see Kim et al., Nature 465:182-87 (2010) and Ku et al., PLoS Genet.
4:el000242 (2008) suggested that these intergenic disease-associated genetic loci represent Polycomb-regulated intergenic chromatin domains.
[208] Analysis of the predicted secondary structures of these RNA molecules revealed the presence of loop sequences containing SNP -bearing segments of 8-11 nucleotides in length which are identical to primary sequences of microRNAs (Fig. 2B). The loop structures of the allelic variants also are predicted to have distinct secondary structures. The RNA molecules contain multiple potential target sites for microRNAs which are often clustered around SNP nucleotides. These data suggested an epigenetic regulatory cross-talk between the intergenic RNAs and microRNAs. As shown infra, microarray expression profiling of human cell lines stably expressing distinct allelic variants of the NALPl -locus SNP rs2670660 RNAs identified microRNAs whose expression was differentially regulated by the '660 RNAs in an allele-specific manner.
1.3 NALPl LOCI-ASSOCIATED INTERGENIC SNP, RS2670660
ENCODES SMALL RNAS THAT CAUSE ALLELE-SPECIFIC
CHANGES IN HUMAN CELLS
[209] The NLRP 1/NALP 1 loci, including the hypothetical extended NLRP 1
(NALPl) regulatory region, is strongly associated with vitiligo and multiple autoimmune and autoinflammatory disorders. One of the NALPl -associated SNPs, rs2670660, is of particular interest because it occurs within a segment of the genome that is remarkably conserved among species, including human, chimpanzee, macaque, bush baby, cow, mouse, and rat. Four sets of primers were designed to detect the predicted RNA molecules encoded by the rs2670660 sequences. The primer sequences (5' to 3')are as follows:
Set 1 : (forward) CACGC AC AAGTGATCTACCAG (SEQ ID NO: 326)
(reverse) GCATCAGGATVCACCAGTC (SEQ ID NO: 327)
Set 2: (forward) CCACGCACAAGTGATCTACC (SEQ ID NO: 102)
(reverse) CAAGATGCCTCTATGCCTTAAA (SEQ ID NO: 103)
Set 3: (forward) CCACGCACAAGTGATCTACC (SEQ ID NO: 328)
(reverse) TCCCCTTACATCTGCCACTT (SEQ ID NO: 329)
Set 4: (forward) GTGTTC AGGAGCTGGGTGAC (SEQ ID NO: 330)
(reverse) TCCCCTTACATCTGCCACTT (SEQ ID NO: 331)
[210] The expected size of the PCR product generated by each primer set is as follows: Set 1 : 110 basepairs (bp); Set 2: 152 bp; Set 3: 205 bp; Set 4: 225 bp. The primers' specificity was validated by PCR of the genomic sequences. Only primer set 2 consistently amplified products of the expected size (152 nt) in RT-PCR of the small RNA fraction (< 200 nt) isolated from various cells. Nested PCR of the 152 nt sequence using primer set 1 also generated products of the expected size (110 nt). The purified PCR products were confirmed by direct sequencing. The sequences of the 152 and 110 nt PCR products are shown below
152 nt sequence: 5'-
CCACGCACAAGTGATCTACCAGTCTTTTAAA[AZG]TTCTATTATTAAAACCCAAACATGCT CTTTCATTTCCACAGAACACTGGGTCTAAATTTAGACTGGTGCATCCTGATGCTGCACCA GTCTGCTCTTAATTTAAGGCATACAGGCATCTTG - 3' SEQ ID NO: 332
110 nt sequence: 5'-
CACGCACAAGTGATCTACCAGTCTTTTAAA[AZG]TTCTATTATTAAAACCCAAACATGCTC TTTCATTTCCACAGAACACTGGGTCTAAATTTAGACTGGTGCATCCTGATGC - 3' SEQ ID NO: 333
[211] A short 52 nucleotide subsequence around the rs2670660 SNP (which did not include other SNPs) was selected for further analysis. The sequence of the 52 nucleotide rs2670660 subsequence used in the biological experiments is SEQ ID NO:1 (see Table 1, infra). As demonstrated by the following experiments, this minimal SNP-containing sequence was biologically active. Without being bound by any particular theory, it is suggested that the minimal 52 nucleotide sequence represents a biologically active splice variant of the longer endogenous RNA sequence and that this small SNP-containing variant is the active species catalyzing the changes in gene transcription that underlie the observed effects of the SNP on disease association.
[212] The following terms are used to designate the 4 small RNAs transcribed from the A-allele of rs2670660, the G-allele of rs2670660, and their antisense counterparts: "A- allele RNA", "G-allele RNA", "asA-allele RNA", and "asG-allele RNA". These 4 RNAs are also referred to collectively as "the '660 RNAs" or the "rs2670660-encoded small RNAs." These RNAs may also be referred to herein as NAPLl -locus RNAs or NALPl-lous transRNAs.
[213] Sequence homology profiling and structure/function analyses showed that the
'660 RNAs may physically interact with certain miRNAs. The set of miRNAs analyzed was one of those whose expression was found to be modulated by ectopic expression of the '660 RNAs (see below). 36 miRNAs had at least one potential target site within the 152 nt '660 RNA sequence (Fig. 3G). Many miRNA target sites showed allele-associated changes in the minimal free energy (mfe) of hybridization (between the '660 RNA alleleic variant and the miRNA). The miRNAs also share multiple sequence identity segments of at least 11 nucleotides in length with the MEG3 and MALATl long non-coding RNAs (Fig. 3G).
Comparisons of the allele-associated changes of the mfe values and experimentally-defined changes of the miRNA expression levels revealed a highly significant inverse correlation between these two variables. Lower mfe values correlated with higher levels of miRNA expression (Fig. X). These results suggest a model of snpRNA-mediated regulation of miRNA expression according to which high affinity (low mfe) snpRNA alleles would facilitate increase abundance levels of corresponding microRNAs.
1.4 EXPRESSION OF RS2670660 SEQUENCE-BEARING SMALL RNAS CAUSES ALLELE-SPECIFIC CHANGES IN THE BIOLOGICAL BEHAVIOR OF CELLS
[214] A panel of GFP-tagged lentiviral vectors containing allele-specific variants of the rs2670660 sequence under the constitutive expression of the CMV promoter was constructed. The same vector, without the rs2670660 sequences and expressing GFP only, was used as a control (referred to variously in the following and the figures as "vector," "control," or "GFP"). The 52 nt allele-specific variants of the rs2670660 sequence were chemically synthesized in sense and anti-sense orientations and cloned into the lentiviral vectors. The sequences were confirmed by restriction mapping and direct sequencing. Preliminary experiments established that hTERT -immortalized BJl cells consistently produced the highest transfection efficiency (> 90% of GFP-expressing cells by flow cytometry (FACS) analysis). These cells were used for subsequent experiments.
Monolayer cell growth and clonogenic cell growth
[215] Monolayer cultures of BJl cells expressing 50 nucleotide RNAs from the G-allele of rs2670660 showed reduced growth compared to either cells transfected with the empty GFP vector or cells expressing 50 nucleotide RNAs from the A-allele of rs2670660 (Fig. 4A). Clonogenicity assays demonstrated that cells expressing G-allele RNA and anti-sense A- allele RNA also had markedly reduced clonogenic growth compared to vector control and cells expressing the A-allele RNA (Fig. 4B). In contrast, cells expressing anti-sense G-allele RNA showed increased clonogenic growth. These data indicate that the antisense transcripts are able to antagonize the biological activity of the A- and G- allele transcripts.
Cell cycle progression
[216] Fluorescence assisted cell sorting ("FACS"), also referred to herein as "flow cytometry" was used to evaluate the cell-cycle specific effects of these small RNAs. Cells expressing either the anti-sense A (asA) or G-allele (G) showed an increase in the Gl phase and a concomitant decrease in S and G2/M phases. In contrast, cells expressing either the anti-sense G- (asG) or A-allele (A) RNAs showed a decrease in Gl and an increase in S phase (Figure 4C). These results indicate that the growth inhibitory effects of the asA and G RNAs is associated with Gl arrest while the growth stimulatory effects of asG and A are associated with increased entry into S-phase.
[217] The sequence-specificity of the observed effects on cell growth was tested in a series of allele-combination experiments. In these experiments, cells were co-transfected with lentiviruses expressing complimentary rs2670660 sequences in sense and anti-sense orientations (Fig. 5 A-B). Co-expression of asG with G allele RNAs markedly reduced the inhibition of clonogenic growth observed for cells expressing only the G allele RNA
(compare top 2 rows of Fig. 5B). Co-expression of A allele RNAs with asA RNAs substantially reduced the growth inhibitory effects of the A-allele RNAs. The simultaneous expression of the G- and asA allele RNAs resulted in the almost complete inhibition of clonogenic growth (Figure 5B, compare bottom row (row 6 from top) with row 5 (GFP only)). These results further indicate that the growth inhibitory effects of the G-allele RNA and asA allele RNA are sequence specific.
TPA-induced differentiation
[218] THP-I cells undergo differentiation from monocytes to macrophages in response to TPA. Differentiated cells are easily recognized due to their morphological appearance.
THP-I cells expressing the rs2670660-encoded RNAs were identified and sorted by flow cytometry so that cells used for analysis were more than 90% GFP-positive. Cells containing either vector alone (control), A-allele, or G-allele RNAs were exposed to TPA for 4 days. Figure 6A shows light microscopy (left 3 panels) and fluorescence (right 3 panels) images of cells transfected with vector alone (top 2 panels), A-allele RNA (middle panels), or G-allele RNA (bottom panels). Both the vector-transfected and A-allele expressing cells show a high proportion of cells exhibiting the morphology of the differentiated phenotype. In contrast, G- allele expressing cells failed to differentiate in response to TPA. Instead, the G-allele expressing cells underwent apoptosis during TPA-induced differentiation and as a consequence generated 5 -fold fewer macrophages compared to cells expressing the A-allele (Fig. 6B). In contrast, A-allele expressing cells produced nearly 2-fold more macrophages than control cells expressing only GFP. These cells also exhibited more potent phagocytic activity compared to controls or G-allele expressing cells (Fig. 6B, inset). These phenotypic changes were not the result of generally diminished cellular function in the G-allele expressing cells because cells expressing the G-allele showed a sustained long-term viability and increased motility (Fig. 6E).
[219] Cells stably expressing the rs2670660-encoded RNAs were further analyzed for gene expression changes by microarray analysis. The G-allele expressing cells showed lower expression of genes comprising the PRCl-type PcG protein complexes (BMIl and RINGlB) compare to components of the PRC2-type PcG complexes (EZH2, EED, and SUZ12). There was also differential regulation of 586 PcG targeted bivalent chromatin domain genes (see Fig. 6C)
[220] Lentiviral gene transfer was used to (1) inhibit the expression of BMIl gene in ancestral A-allele-expressing THP-I cells (using shRNAs) and (2) overexpress the BMIl gene in pathological G-allele-expressing THP-I cells. RT-PCR analysis was used to validate the specificity of gene silencing and gene transfer experiments. The cells were assessed for their ability to undergo the differentiation from monocyte to macrophage (Figure 6D). The BMIl knock-down markedly diminished macrophage production by A-allele expressing THP-I cells (Figure 6D, top and bottom left panels), whereas BMIl over-expression rescued the macrophage-producing defect of G-allele expressing THP-I cells (Figure 6D, bottom right panels).
[221] Further analysis revealed that G-allele expressing cells had pleiotropic deficiencies within the inflammasome/innate immunity pathways. G-allele-associated molecular defects included a concomitant decrease in expression of the NLRPl, CASPl, and ILl -beta genes. These genes are key linear components of an essential functional axis within
inflammasome/innate immunity pathway.
[222] Collectively, these data indicate that expression of NALPl -locus transRNAs containing a disease-associated G-allele may cause a significant functional deficiency of the immune system. Markedly enhanced apoptosis during differentiation would reduce the production of specialized immune cells, including effector cells and cells with critical immuno-regulatory functions. Significantly diminished expression of NLRPl, CASPl, and ILl -beta genes would likely severely limit the functional potency of the
inflammasome/innate immunity pathways.
1.5 EXPRESSION OF RS2670660 SEQUENCE-BEARING 50 NT RNAS
CAUSES GENOME- WIDE ALLELE-SPECIFIC CHANGES IN GENE EXPRESSION
[223] Microarray analysis revealed allele-specific changes in the global gene expression profiles of cells expressing the A- and G-allele RNAs of rs2670660 compared to cells expressing the vector alone. Analysis of individual genes showed that expression of the asA- or asG- allele RNA specifically antagonized the expression pattern observed with the corresponding sense allele (Fig. 7A-D).
[224] Microarray analyses revealed genome-wide allele specific concordant and discordant expression profiles in BJl cells expressing the rs2670660 RNAs (Fig. 7E-L). Linear regression analysis of the gene expression data was used to graphically illustrate concordant (E-H) and discordant (I-L) expression patterns.
[225] Gene expression that is concordant across tissues is more likely to be influenced by genetic variability than expression that is discordant between tissues. See e.g., French, D. et al., (2008) Concordant Gene Expression in Leukemia Cells and Normal Leukocytes Is Associated with Germline cis-SNPs, PLoS ONE 3(5): e2144.
doi:l 0.137 I/journal. pone.0002144. Here, the set of genes that was segregated according to specific concordant and discordant expression profiles demonstrated better sample discrimination (see e.g., Fig. 12A-H, compared to Fig. 121) [226] A summary of the concordance analyses is shown in the tables below. In
Table 5, a set of 3299 genes whose expression was differentially regulated in cells expressing the G-allele RNA of rs2670660 compared to vector controls was defined by t-statistics. The expression of these 3299 genes was then evaluated in cells expressing the G-allele RNA and in cells expressing the A-allele RNA of rs2670660. Regression analysis shows highly concordant expression of this set of genes in cells expressing the G- and A- allele RNA of rs2670660. 87% of the 3299 genes were concordantly expressed (1562 up- and 1732 down- regulated). See also Fig. 7E. Concordance was greater 95% for a subset of genes identified as differentially expressed in cells expressing the G-allele RNA of rs2670660 (at p = 0.05) and then evaluated in cells expressing the G-allele RNA and in cells expressing the A-allele RNA of rs2670660 (at p = 0.1). See also Fig. 7F. As shown in Table 5, 1,562 genes showed concordant up-regulation in cells expressing the G-allele RNA compared with cells expressing GFP only. When compared to cells expressing the A-allele RNA, 87% showed concordant up-regulation (1,365 out of 1,562).
Table 5: Concordance analysis of 3299 and 1561 rs2670660 G-allele RNA-regulated
transcripts
Figure imgf000056_0001
Concordance for 3299 transcripts identified at cut-off p = 0.050 (for G vs Control) and
concordant changes in G vs A samples. Concordance for 1561 transcripts identified at P =
0.050 (for G vs Control) and p = 0.10 (for G vs A).
Table 6: Concordance analysis of 3268 and 1636 rs2670660 G-allele RNA-regulated
transcripts
Figure imgf000056_0002
Concordance for 3268 transcripts identified at cut-off p = 0.050 (for G vs A) and
concordant changes in G vs Control samples. Concordance for 1636 transcripts identified at
P = 0.050 (for G vs A) and p = 0.10 (for G vs Control).
[227] In Table 6, a set of 3,268 genes whose expression was differentially regulated in cells expressing the G-allele compared to cells expressing the A-allele RNA of rs2670660 was defined by t-statistics. The expression of these 3268 genes was then evaluated in cells expressing the G-allele of rs2670660 compared to vector (GFP only) controls. Regression analysis shows highly concordant expression of this set of genes. 89% of 3268 genes were concordantly expressed (1583 up- and 1685 down-regulated). See also Fig. 7G.
Concordance was greater than 95% for a subset of 1568 genes identified as differentially expressed in cells expressing the G-allele RNA of rs2670660 (at p = 0.05) and then evaluated in cells expressing the G-allele RNA and in cells expressing vector controls (at p = 0.1) . See also Fig. 7H.
[228] Figures 17 and 18 show the complete set of genes identified in the
concordance analyses summarized in Tables 5 and 6, respectively. Shown in the figures are the probe set used to measure gene transcription next to the gene expression level (i.e., relative to vector controls for Table 5), the normalized (log 10) gene expression level, and the t-statistic, followed by identification of the gene and alignment used in the analysis.
[229] One set of genes identified as being differentially regulated by the rs2670660
RNAs included the NLRPl, NLRP3, HMGAl, and Myb genes, which are regulators of inflammation and innate immunity (Figure 8A, top panels). These changes in gene expression are further illustrated by the ratios of the functionally-related transcripts,
NLRP3/NLRP1 (Figure 8 A, bottom left panel) and HMGAl /Myb (Figure 8 A, bottom right panel).
[230] The changes in the expression of these genes in human neutrophils after bronchoscopic endotoxin (LPS) challenge (Fig. 8B) and in human leukocytes after in vitro LPS challenge (Fig. 8C, E) was also analyzed. Alveolar neutrophils (Fig. 8B right sets of bars) showed a decreased NLRPl mRNA expression, increased NLRP3 mRNA expression, and increased NLRP3/NLRP1 mRNA expression ratios compared to the circulating neutrophils (Fig. 8B left sets of bars). LPS-treated leukocytes (Fig. 8C right sets of bars) showed decreased NLRPl mRNA expression, increased NLRP3 mRNA expression, and increased NLRP3/NLRP1 mRNA expression ratios compared to the control cultures (Fig. 8C left sets of bars). Alveolar neutrophils (Fig. 8D right sets of bars) showed increased Myb mRNA expression, increased HMGAl mRNA expression, and increased HMGAl /Myb mRNA expression ratios compared to the circulating neutrophils (Fig. 8D left sets of bars). Adherent cultures of monocytes (Fig. 8E, right sets of bars) showed decreased Myb mRNA expression, increased HMGAl mRNA expression, and increased HMGAl /Myb mRNA expression ratios compared to the control cultures (Fig. 8E left sets of bars).
[231] The set of genes whose expression was differentially regulated in G-allele expressing cells compared to vector (GFP) controls was identified by t-statistics in BJl cells. This set was screened for concordance in model systems for activation of the inflammasome pathway activation (Figure 9). Concordant G-allele signatures were identified in
experimental (Fig. 9A, left set of bars) and control (Fig. 9A, right set of bars) samples for human circulating leukocytes after in vitro endotoxin (LPS) challenge. Similar results are shown for human alveolar (Fig. 9B, left set of bars) and circulating neutrophils (Fig. 9B, right set of bars) after in vivo bronchoscopic endotoxin (LPS) challenge. Discordant signatures are shown in panels D and E. Results for human circulating neutrophils after in vivo
bronchoscopic endotoxin (LPS) challenge are shown in Fig. 9C, and 9F. Where the gene expression data is not segregated into concordant and discordant groups, diminished sample discrimination is seen (Fig. 9G).
[232] The following tables show the total numbers of genes whose expression changed (either up or down) under various experimental conditions modeling activation of the innate immunity/inflammasome pathways in cells expressing the G-allele RNA of rs2670660 and in control cells expressing only GFP. As shown in the tables, a statistically significant subset of genes regulated by the G-allele RNA of rs2670660 is also differentially regulated when the innate immunity/inflammasome pathways are activated.
Table 7: rs2670660-associated gene expression signatures in transdifferentiating human monocytes
Figure imgf000058_0001
Table 8: rs2670660-associated gene expression signatures in LPS-challenged human leukocytes
Figure imgf000058_0002
Table 9: rs2670660-associated gene expression signatures in human neutrophils after bronchoscopic endotoxin (LPS) challenge
Figure imgf000059_0001
[233] In summary, the allele-specifϊc changes in gene expression in cells expressing the A- and G-allele RNAs of rs2670660 were readily detectable in both in vitro and in vivo models of the activated state of the innate immunity/inflammasome pathways. These results indicate that an rs670660-encoded RNA-driven pathway is activated when innate
immunity/inflammasome pathways are activated in a cell.
1.6 RS2670660-ENCODED RNAS AFFECT EXPRESSION OF
MICRORNAS
[234] The genome-wide effects of rs2670660-encoded RNAs on gene expression described above indicate that the specific targets of these RNAs are either transcription factors or miRNAs, both of which control the expression of multiple genes. As discussed above, the predicted secondary structures for many of the identified intergenic small non- coding RNAs also indicated some interaction with miRNAs. Indeed, as demonstrated by the following experiments, the rs2670660 RNAs affect the expression of hundreds of miRNAs and miRNA-targeted proteins.
[235] The effects of the rs2670660-encoded RNAs on the expression of miRNAs was analyzed using an ABI Q-RT-PCR technology platform. The results demonstrated that the rs2670660-encoded RNAs alter the abundance levels of hundreds miRNAs (Fig. 10). Both allele-specific and allele context-independent patterns of miRNA expression were identified. The matching mRNA expression profiles of both the common 140-gene signature (Fig. 10C) and the allele-specific 86-gene miRNA signatures were identified (Fig. 10E). Forced expression of selected individual miRNAs recapitulated both allele context- independent (Fig. 10D) and allele-specific (Fig. 10F) patterns of mRNA expression changes. Interestingly, many mRNAs comprising the 59-gene signature manifest discordant patterns of regulation in response to expression of the control miRNA, miR-205 (right set of bars), expression of which is not altered by rs2670660-encoded RNAs. Also note that miR-20b is one of the up-regulated miRNAs shown in Fig. 1OA and mRNAs comprising the 59-gene signature are a sub-set of mRNAs comprising the 140-gene signature shown in Fig. 1OC.
[236] Expression profiling experiments also identified 36 miRNAs differentially regulated in BJl cells expressing distinct allelic variants of the rs2670660-encoded RNAs (Fig. 1OH, I). These represent distinct classes of non-coding RNAs including snoRNAs and snoRNA-host genes (SNORDl 13; SNHGl; SNHG3; SNHG8); long non-coding RNAs (MEG3, tncRNA, and MALATl); microRNAs, microRNA-precursors, and protein-coding microRNA-host genes (ATAD2; KIAAl 199). 18 of 36 (50%) of these miRNAs are derived from the single miRNA cluster on ~ 200 kb continuous region of 14q32 band of chromosome 14, which suggests that the 14q32 cluster miRNAs may be a primary molecular target of the rs2670660-encoded RNAs.
[237] Analysis of genomic coordinates revealed that the sequences encoding 18 of these RNAs are located within about 200 kilobase regions on chromosome 14q32 which is immediately adjacent to the long non-coding RNA gene, MEG3. Changes of expression of intron-residing miRNAs miR-548d (intron of the ATAD2 gene) and miR-549 (intron of the KIAAl 199 gene) corresponded to the allele-specific expression levels of corresponding miRNA-host genes, suggesting a coordinated mechanism of regulation. These results indicate that one of the important epigenetic features of the expression of the rs2670660- encoded RNAs is genome-wide changes in expression of multiple diverse classes of non- coding RNAs.
[238] Recent experiments demonstrate that let-7 miRNA release from complexes with Argonaute proteins and subsequent degradation can both be blocked by addition of miRNA target RNA which results in increased levels of let-7 miRNA (Chatterjee et al., Nature 461 :546-9, 2009). Computer modeling experiments demonstrated that let-7b miRNA follows the pattern of allele-associated mfe changes characteristic of miRNAs expression levels of which are lower in G-allele expressing cells (Fig. 10J(d)). If the let-7 bioactivity model is valid for the snpRNA-mediated effects on miRNAs, then let-7b expression and activity should be higher in A-allele expressing cells. As shown in Figure 10J(d), consistent with this, Q-RT-PCR experiments and luciferase reporter assays showed that both expression and activity of the let-7 miRNA are significantly increased in RWPEl cells stably expressing the A-allele of rs2670660. Similar relationships between snpRNA allele-context-specific mfe changes and effects on miRNA expression and activity were demonstrated for the miR-205 microRNA (Fig. 10J(d), bottom panels). These data suggest that the snpRNAs regulate miRNA abundance and activity in an allele-specific manner by interfering with miRNA release from complexes with Argonaute proteins and preventing subsequent degradation of the miRNA.
[239] A survey of the mRNA targets of the rs2670660-encoded RNAs indicated that rs2670660-associated GES are enriched for genes with an established role in controlling the transition from pluripotency to a differentiated state during development such. For example, rs2670660-associated GES are enriched for genes of loci containing bivalent chromatin domains and PluriNet network genes (Figure 1 IA, Table 12). Microarray analysis revealed that expression of rs2670660-encoded RNAs trigger concomitant allele-specific activation of the Polycomb pathway genes (PcG) comprising the Polycomb repressive complex 2 (PRC2). The PRC2 complex catalyzes histone H3 lysine 27 trimethylation (H3K27me3), induces a chromatin silencing state, and mediates transcriptional repression (Figure 1 IB).
Table 10: Correlation matrix of the rs2670660 allele-specific effects on expression of 155 PluriNet transcripts
Pearson G allele A allele AS G allele AS A alelle
G allele 1 0. 2949 0. 0026 <0 .0001
A allele 0 .3148 1 <0 .0001 0. 2215
AS G Allele 0 .6495 0. 961 1 0. 0232
AS A Alelle 0 .8012 0. 364 0. 5177 1
[240] The table below shows the genes whose expression was regulated by all 4 alleles at a statistical significance of p < 0.05. The log-transformed expression values are shown. Positive numbers indicate increased expression, negative numbers indicate decreased expression. Also shown is the primer probe set used in the microarray analysis for each gene.
Figure imgf000061_0001
Figure imgf000062_0001
Figure imgf000063_0001
Correlation matrix for the 14U gene signature
G allele A allele AS A AS G
G allele 1 < 0.0001 < 0.0001 < 0. 0001
A allele 0 851355013 1 < 0.0001 < 0. 0001
AS A 0 905274554 0. 919669399 1 < 0. 0001
AS G 0 891048722 0. 94446759 0. 943972803 1 1.7 CLINICAL RELEVANCE OF ALLELE-SPECIFIC EFFECTS ON
GENE TRANSCRIPTION BY RS2670660-ENCODED TRANS- REGULATORY RNAS
[241] These microarray gene expression profiling results discussed above were expanded to analyze the effects of the expression of the rs2670660 encoded RNAs in other cell types and experimental systems as detailed in the table below. In each of these experimental systems, there was statistically significant evidence of the activation of rs2670660-associated gene expression signatures. The table below shows the spectrum of common human diseases and types of clinical samples analyzed by microarray gene expression profiling.
Table 12: Patient samples analyzed by microarray gene expression profiling. Abbreviations: PBMC, peripheral blood mononuclear cells. List of GEO accession numbers and original references for microarray analyses and associated clinical information can be found in references listed in Materials and Methods.
Figure imgf000064_0001
Figure imgf000065_0001
[242] The following tables show the total numbers of genes differentially expressed in clinical samples of diseased tissues compared to matched healthy tissues and concordance with the set of genes differentially regulated by the G-allele RNA of rs2670660. As shown in the tables, a statistically significant subset of genes regulated by the G-allele RNA of rs2670660 is also differentially regulated in various diseased tissues.
Table 13: rs2670660-associated Crohn's disease (CD) gene expression signatures
Figure imgf000065_0002
Table 14: rs2670660-associated rheumatoid arthritis (RA) gene expression signatures
Figure imgf000065_0003
Figure imgf000065_0004
Table 16: rs2670660-associated autism gene expression signatures
Figure imgf000065_0005
Autism TOTAL 664
Common transcripts 79 7 24 15 33
P value 4.49191E-09 0 .14825 0. 001092 0.003585 3.44537E-06
Table 17: rs2670660-associated metastatic prostate cancer (PC METS) gene expression signatures
Figure imgf000066_0001
Table 18: rs2670660-associated Alzheimer's (ALZH) gene expression signatures
Figure imgf000066_0002
Table 19: rs2670660-associated obesity (OB) gene expression signatures
Figure imgf000066_0003
Table 20: Expression signatures of hESC bivalent domain genes (BDG) in rs2670660 G-allele- associated gene expression models of human diseases
Figure imgf000066_0004
Figure imgf000067_0001
[243] It has been reported that activated state of the innate immunity/inflammasome pathways in patients with Crohn's disease and rheumatoid arthritis is associated with altered expression of the NLRPI , NLRP3, HMGAl, and Myb genes which is reflected in altered NLRP3/NLRP1 and HMGAl/Myb mRNA expression ratios. Clinical samples from patients diagnosed with a broad spectrum of disorders associated with activation of these pathways were analyzed for expression of the genes identified in the global gene expression profiles of cells expressing the A- and G-allele RNAs of rs2670660. The set of genes whose expression is altered in cells expressing SNP-associated small RNA molecules is referred to herein as a gene expression signature ("GES"). Thus, the sets of genes whose expression was altered in cells expressing the small RNAs of rs2670660 are referred to as rs2670660-associated allele- specifϊc GES. Specifically, there are four rs2670660-associated allele-specifϊc GES, namely, the signatures of the A-allele, the G-allele, the antisense-A, or antisense-G allele.
[244] Patient samples of peripheral blood mononuclear cells (PBMC) and diseased tissues were analyzed for the rs2670660-associated allele-specific GES by microarray gene expression analysis. rs2670660-associated allele-specific GES were detected with a level of statistical significance that markedly exceeded the probability of random co-occurrence by chance alone in clinical samples from patients diagnosed with Crohn's disease, rheumatoid arthritis, Huntington's disease, and Alzheimer's disease (Fig. 12). GES associated with the expression of the G-allele-specific 52 nt small RNAs in BJl cells was identified in clinical samples using t-statistics and screened for concordant and discordant features in
corresponding clinical settings to segregate G-allele 46 concordant and G-allele discordant signatures. The assessment of rs2670660-associated allele-specific GES in these clinical samples indicates that the GES are detectable in about 80-100% of samples from patients diagnosed with one of several common diseases manifested by activation of the innate immunity/inflammasome pathways. These data indicate that assays for rs2670660-associated GES may be useful diagnostic and prognostic tools for diseases and disorders characterized by activation of these pathways.
[245] The ability of GES associated with the expression of rs2670660-encoded small
RNAs to discriminate normal and pathological tissue samples was further validated in a set of patients with Alzheimer's disease, prostate cancer, and breast cancer (Fig. 13). The set of genes whose expression was differentially regulated by ectopic expression of the rs2670660 G-allele RNA was identified in BJl cells using t-statistics. This set of genes was then screened for concordant and discordant expression in clinical samples and matched controls (see Table 13, supra). Expression profiles of G-allele concordant and G-allele discordant signatures in individual samples of each data set were evaluated by calculating Pearson correlation coefficients (signature scores) using the log 10-trans formed fold expression changes of G-allele-specific GES in BJl cells as a multidimensional standard vector.
[246] Figure 13A shows the expression profiles of G-allele concordant (left panel) and discordant (right panel) genes in hippocamal tissue from Alzheimer's patients and normal subjects. Each bar represents the G-allele-specific GES for a particular subject calculated as described above. In each panel, the group of 9 bars on the far left shows the GES from tissue in each of 9 control subjects. The next three groups of bars in each panel represent the GES of tissue from Alzheimer's patients segregated based on the clinically-defined severity of the disease, left to right: incipient (7 subjects), moderate (8 subjects), and severe (7 subjects), for a total of 22 subjects. The data show distinct expression profiles in the tissues from
Alzheimer's patients versus controls, indicating that these GES can differentiate between normal and diseased tissue with high statistical significance.
[247] Figure 13B shows the expression profiles of G-allele concordant (left panel) and discordant (right panel) genes in normal and prostate cancer tissues. Each bar represents the G-allele-specific GES for a particular subject calculated as described above. In each panel, the group of 18 bars on the far left shows the GES from normal prostate tissue in each of 18 control subjects. The next three groups of bars in each panel represent the GES of prostate cancer tissues segregated based on histological examination (left to right):
morphologically normal prostate tissues adjacent to tumor (62 samples); primary prostate tumors (64); metastatic prostate tumors in distant organs (25). The data show distinct expression profiles, particularly for the metastatic tumors, compared to controls and morphologically normal tissues adjacent to tumor tissue. These data demonstrate that the G- allele GES, segregated into concordant and discordant expression groups, can differentiate between normal and metastatic tumor tissue with high statistical significance. [248] Figure 13C shows the expression profiles of G-allele concordant (left panel) and discordant (right panel) genes in normal and breast cancer tissues. Each bar represents the G-allele-specific GES for a particular subject calculated as described above. In each panel, the group of 8 bars on the far left shows the GES from normal breast tissue. The next five groups of bars in each panel represent the GES of breast cancer tissues segregated based on histological examination as follows (left to right): morphologically normal breast tissues adjacent to tumor (8 samples); primary breast tumors from patients without metastatic disease; primary breast tumors from patients with metastatic disease (99 total for primary tumors); lymph nodes from patients with metastatic disease (26); metastatic breast tumors in distant organs (12). The data show distinct expression profiles, particularly for the metastatic tumors, compared to controls and morphologically normal tissues adjacent to tumor tissue. These data demonstrate that the G-allele GES, segregated into concordant and discordant expression groups, can differentiate between normal and metastic tumor tissue with high statistical significance.
[249] The above data show the ability of the gene expression signatures of the G- allele RNA to discriminate between diseased and normal tissues in Crohn's disease, rheumatoid arthritis, Huntington's disease, Alzheimer's disease, breast cancer, and prostate cancers (Figs. 12, 13, Table 12). Several GES were also identified, using the same protocols as described above, to discriminate between autistic and control subjects using gene expression from lymphoblastoid cells (Table 12, Fig. 14A). A 36-gene signature was particularly useful in discriminating between autistic and control subjects. In addition, a 133- gene G-allele concordant signature was identified using preadipocytes from lean and obese subjects that was able to effectively discriminate between these two groups (Table 12, Fig. 14B). A further 112-gene G-allele discordant signature was also identified that could distinguish obese from lean subjects (Fig. 14C).
[250] The data presented in Figures 12-14 indicate that the activated states of the innate immunity/inflammasome pathways (as evidenced by rs2670660-associated GES, see Figs. 8, 9, 11) are readily detectable in pathology-affected tissues of patients with Crohn's disease, rheumatoid arthritis, Huntington's disease, Alzheimer's disease, breast cancer, prostate cancer, autism, and obesity. Accordingly, the rs2670660-associated GES identified here provide useful research and diagnostic tools for studying and detecting these disease states in tissue from human subjects.
[251] The data presented here demonstrate that intergenic small regulatory RNAs represent a prevalent class of transcripts containing SNP variants associated with common human disorders (Fig. 15 A, Tables 21, 22). The data also show that these small RNAs display cell-type specific patterns of expression in human cells (Fig. 1; Fig. 15B, C). This is in contrast to the expression of long non-coding RNAs containing the small RNAs described here. As shown in Figure 15B and 15C, the long non-coding RNAs are expressed nearly ubiquitously among cells of mesenchymal (BJl), lymphoid (U937), and epithelial (RWPEl) origin. This suggests a model of cell type-specific biogenesis of these small non-coding RNA molecules based on differentiation-associated processing of the long non-coding RNAs.
[252] In summary, the data presented here indicate a role for these small non-coding
RNAs transcribed from disease-linked SNPs (such the rs2670660-encoded RNAs) in epigenetic reprogramming during development, clonal specialization, and differentiation, as well as during disease progression.
Table 21: Small non-coding RNAs and associated long non-coding RNAs containing SNP sequences expressed in human cells. Molecular identities of listed non-coding small RNAs were validated by sequencing of the purified PCR products.
Figure imgf000070_0001
rs7020996; rsl0490072; rsll53188;
rsl3071168; rs358806; rs7659604
Ulcerative colitis 1 (0) rs660895
Vitiligo 3 (3) rs2670660; rs2733359; rs8182354
Total 87 (62)
Table 22: Classification of SNPs associated with common human disorders.
Figure imgf000071_0001
Figure imgf000072_0001
Figure imgf000073_0001
Figure imgf000074_0001
Figure imgf000075_0001
1.8 MATERIALS AND METHODS
Disease associated SNP meta-analysis and mapping of genomic
coordinates
[253] Primary data sets of SNPs for meta-analysis of genomic coordinates of SNP variations identified in genome-wide association studies (GWAS) of up to 712,253 samples comprising 221,158 disease cases, 322,862 controls, and 168,233 case/control subjects of obesity GWAS were obtained from the following previously published studies:
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007 447: 661- 678.
Tenesa A, Farrington SM, Prendergast JG, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on 1 Iq23 and replicates risk loci at 8q24 and 18q21. Nat Genet 2008 40: 631-7.
Haiman CA et al., A common genetic risk factor for colorectal and prostate cancer. Nat Genet 2007 39: 954-6.
Zeggini E et al., Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet 2008 40: 638-645.
Barton A. et al., Re-evaluation of putative rheumatoid arthritis susceptibility genes in the post-genome wide association study era and hypothesis of a key pathway underlying susceptibility. Hum MoI Genet. 2008 Apr 22.
Remmers EF et al., STAT4 and the risk of rheumatoid arthritis and systemic lupus erythematosus. N Engl J Med. 2007 357: 977-986.
Plenge RM et al., Two independent alleles at 6q23 associated with risk of rheumatoid arthritis. Nat Genet 2007 39: 1477-1482.
Thomson W et al., Wellcome Trust Case Control Consortium, Wilson AG, Marinou I, Morgan A, Emery P et al., Rheumatoid arthritis association at 6q23. Nat Genet. 2007 39: 1431-1433.
Wellcome Trust Case Control Consortium; Australo-Anglo-American Spondylitis Consortium (TASC), Burton PR et al., Association scan of 21 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants. Nat Genet 2007 39: 1329-
1337.
International Consortium for Systemic Lupus Erythematosus Genetics (SLEGEN), Harley JB et al., Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAAl 542 and other loci. Nat Genet 2008 40: 204-210.
Nath SK et al., A nonsynonymous functional variant in integrin-alpha(M) (encoded by ITGAM) is associated with systemic lupus erythematosus. Nat Genet 2008 40: 152-154.
Kozyrev SV et al., Functional variants in the B-cell gene BANKl are associated with systemic lupus erythematosus. Nat Genet 2008 40:211-216. Horn G, et al., Association of systemic lupus erythematosus with C8orfl3-BLK and ITGAM-ITGAX. N Engl J Med. 2008 358: 900-909.
Zheng SL, et al., Cumulative association of five genetic variants with prostate cancer. N Engl J Med 2008 358: 910-919.
Gudmundsson J, et al., Common sequence variants on 2pl5 and XpI 1.22 confer susceptibility to prostate cancer. Nat Genet 2008 40: 281-283.
Jin Y, et al., NALPl in vitiligo-associated multiple autoimmune disease. N Engl J Med 2007 356:1216-1225.
Fisher SA, et al., Genetic determinants of ulcerative colitis include the ECMl locus and five loci implicated in Crohn's disease. Nat Genet 2008 40:710-712.
Cox A, et al., A common coding variant in CASP8 is associated with breast cancer risk. Nat Genet 2007; 39:352-8.
Easton DF, et al., Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 2007; 447:1087-93.
Hunter DJ, et al.,A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 2007;
39:870-4.
Stacey SN et al., Common variants on chromosomes 2q35 and 16ql2 confer susceptibility to estrogen receptor-positive breast cancer. Nat Genet 2007; 39:865-9.
Tomlinson IP et al., A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10pl4 and 8q23.3. Nat Genet 2008 40: 623-30.
Jaeger E et al., Common genetic variants at the CRACl (HMPS) locus on
chromosome 15ql3.3 influence colorectal cancer risk. Nat Genet. 2008 40: 26-8.
Broderick P, et al., A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat Genet 2007 39: 1315-7.
Tomlinson I, et al., A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet. 2007 39: 984-8.
Gruber SB, et al., Genetic Variation in 8q24 Associated with Risk of Colorectal Cancer. Cancer Biol Ther. 2007 6
[254] Mapping of the SNP genomic coordinates was performed based on the NCBI release of Human Genome Build 36.3 (reference assembly). Genomic coordinates of the human K4-K36 domains and human lincRNAs are publically available in the online
Supplemental data set of Khalil AM et al., Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A. 2009 JuI l.
[255] Genomic coordinates and gene names of the human bivalent domain genes were obtained from the recently published study, Ku, M. et al., Genomewide analysis of PRCl and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet. 2008; 4: el000242. Cell Lines
[256] Human BJl , U937, and THP-I cell lines were obtained from ATCC. hTERT- immortalized BJl cells were previously described in Holt SE et al., Resistance to apoptosis in human cells conferred by telomerase function and telomere stability. MoI Carcinog. 1999; 25: 241-8.
Microarray gene expression analysis
[257] Sense and anti-sense variants of the 52 nt rs2670660 sequence were chemically synthesized, cloned into GFP-expressing lentiviral vectors, and transfected into BJl cells. Corresponding BJl cell line variants were isolated by sterile FACS sorting to contain >90% of GFP-expressing cells, expanded in vitro in monolayer cultures, and analyzed for gene expression.
[258] Technical and analytical aspects as well as stringent QC and statistical protocols for gene expression analysis experiments is essentially as described in the following published works:
Glinsky, GV et al., Microarray analysis identifies a death- from-cancer signature predicting therapy failure in patients with multiple types of cancer. J Clin Invest; 2005; 115: 1503 - 1521.
Glinsky GV et al., Classification of human breast cancer using gene expression profiling as a component of the survival predictor algorithm. Clin Cancer Res. 2004 10: 2272-2283.
Glinsky GV et al., Gene expression profiling predicts clinical outcome of prostate cancer. J Clin Invest. 2004 113: 913-923.
Glinsky GV, et al., Microarray analysis of xenograft-derived cancer cell lines representing multiple experimental models of human prostate cancer. MoI Carcinog. 2003 37: 209-221.
[259] Briefly, the array hybridization and processing, data retrieval and analysis were carried out using standard sets of the Affymetrix equipment, software, and protocols in a state-of-the-art Affymetrix microarray core facility. RNA was extracted from cell cultures of two independent biological replicates for each experimental condition and analyzed for sample purity and integrity using a BioAnalyzer (Agilent). Expression analysis of 54,675 transcripts was carried out for each sample in duplicate using Affymetrix HG-Ul 33 A Plus 2.0 arrays. Data retrieval and analysis was performed using MAS5.0 software and
concordant changes of gene expression for each experimental condition were determined at the statistical threshold p value < 0.05 (two-tailed T-test). microRNA isolation and activity analysis.
[260] miRNA was extracted from adherent cells lysed on culture plates using the miNana miRNA Isolation kit (Ambion). Homogenized cell lysates were frozen at -80 0C for at least 24 hours prior to miRNA purification. miRNA concentration was checked using a NanoDrop (Thermo Scientific) before checking quality on a Bioanalyzer (Agilent
Technologies).
[261] To assay the activity of microRNAs in transfected cells we used a miRNA
Luciferase Reporter Vector (Signosis) specific for the microRNA of interest. The target site sequence of the reporter vector is complementary to the miRNA, therefore a decrease in luciferase signal would indicate an increase in microRNA activity. Cells were transfected with the reporter vector using FuGENE 6 Transfection Reagent (Roche); the transfection was allowed to run 48 hours before the cells were lysed using Luciferase Cell Culture Lysis Reagent (Promega). The lysates were read using the FLUOstar OPTIMA system (BMG Lab Technologies), with 20 micro liters of Luciferase Assay Reagent (Promega) injected into each well immediately prior to reading. miRNA expression analysis
[262] To analyze a spectrum of miRNA activity in the infected cell lines, we performed qPCR using the TaqMan Human MicroRNA Array vl .0 (Applied Biosystems) run on the 7900HT Fast Real-Time PCR System, fitted with the specific block to run 384-well TaqMan Low Density Arrays (Applied Biosystems). This TaqMan array is distributed on a micro fluidics card, which allows for high reproducibility with minimal error. The array contains 365 different human miRNA assays and two small nucleolar RNAs that function as endogenous controls for data normalization. All miRNA samples were analyzed for quality control and processed at the Functional Genomics Core of the University of Rochester in Rochester, New York. We used the SDS 2.2 software, the platform for the computer interface with the 7900HT PCR System, to generate normalized data, compare samples, and calculate RQ.
Cell Staining and Flow Cytometry
[263] Cells were stained at a concentration of 1 x 106 cells per 100 microliters (ul) of
HEPES buffered saline (HBSS) with 2% HICS. Antibodies at appropriate dilutions (CD14- Pacific Blue, Biolegend, Inc; and CDl lb-Alexa Fluor® 647, Biolegend, Inc) were added. Staining duration was for 30 min with rotation at 4°C. Cells were then washed with staining medium three times and resuspended in staining medium. The stained specimens were then analyzed using FACSVantage (BD Biosciences, San Diego, CA;
http://www.bdbiosciences.com ) or FACSAria with either Diva or CellQuest software (BD Biosciences): The cell counter of the flow cytometers was used to determine cell numbers. Cells were collected into HBSS with 2% HICS.
Induced Differentiation of 0937 and THP-I cells
[264] Approximately 2xlO6 U937 or THP-I cells (5xlO5 cells/ml) in a 25 cm flask were induced to differentiate by treatment with 20 uM PMA (Sigma- Aldrich) for 4 days.
Lentivirus Production and Generation of stably transfected BJl, 0937, and THP-I cells
[265] Allele-specifϊc sense and anti-sense variants of the 52 nucleotide rs2670660 sequence, SEQ ID NO:_ (5' CACAA GTGAT CTACC AGTCT TTTAA A(G/A)TTC TATTA TTAAA ACCCA AACAT GC 3') were chemically synthesized and cloned sequentially into pUC57 plasmid by EcORV (GeneScript Corporation) and pCDH-CMV- MCS-EFl-copGFP plasmid by EcoRl and Notl (SystemsBio). The integrity and molecular identity of the synthetic sequences as well as designed plasmid vectors were monitored by restriction enzyme mapping analysis and direct sequencing. Lentiviruses were generated by co-transfecting pLentiviral vector with GFP only plasmids (control cultures) or GFP plasmids with synthetic, allele-specific 52 nt sequences of the SNP rs2670660 and packaging mix (Invitrogen) into 293FT cells using Lipofectamine 2000 according to the manufacturer's instructions (Invitrogen), and then BJl, U937., or THP-I cells were infected with viral supernatant for 24hr. Flow cytometry analysis for GFP expression were performed to confirm the infection and assess the transfection efficiency. Experiments were carried out using cultures with transfection efficiency > 90%.
Colony Growth Assay
[266] Sense and anti-sense variants of the 52 nt snpRNA were synthesized, cloned into GFP-lentiviral vectors, and transfected into BJl cells. GFP-expressing cells were isolated by flow cytometry and enriched populations (> 90% GFP positive) were used for assays. Cells from sub-confluent cultures (about 70% confluence) were seeded in triplicates into Ewell plates (100 cells per well), cultured for 2 weeks, and then stained with 0.1% crystal violet for 5 min. Plates were scanned and number of colonies containing > 50 cells was counted. Protocols for identification of endogenous trans-regulatory small RNAs encoded by the SNP rs2670660
1. Extract small RNA from cells (mirVana™ miRNA Isolation Kit from Ambion, Inc., according to manufacturer's directions)
2. Detect if there is DNA contamination by performing PCR using extracted RNA as template and beta-actin as primer
3. Synthesize cDNA from small RNA using standard protocols
4. Perform first PCR using primer set 2 (GC2F and GC2R): In a clean tube on ice, combine PCR reagents to a 25 ul final volume: Water, RNase-free; PCR Buffer (10X) 2.5 ul; PCR Nucleotide Mix (1OmM) 0.5 ul; Taq DNA polymerase (50X) 0.5 ul; template; Forward primer (10 uM) 1 ul (0.4 uM final cone); Reverse primer (10 uM) 1 pl(0.4 uM final cone). Thermal cycle profile: 95 0C 3 min followed by 40 or more cycles: 95 0C 30s, 55 0C 30s, 72 0C 1 min (or 1-2 min per kilobase); followed by final extension 72 0C 3 min and hold at 4 0C.
5. Clean up PCR product and evaluate cleanup PCR product on 1.2% gel (Montage PCR Centrifugal Filter Devices available from Millipore, Inc., according to manufacturer's instructions)
6. Perform nested PCR using cleanup of the first PCR product as template and primer set 1 (GClF and GClR) and evaluate nested PCR product on 1.2% gel (protocol as per no. 4, supra)
7. Cut the DNA band of interest from the gel, extract and purify the DNA for further sequencing analysis (QIAquick Gel Extraction Kit, Qiagen, Inc., according to manufacturer's instructions)
Statistical and Bioinformatics Analysis
[267] Detailed protocols for data analysis and documentation of the sensitivity, reproducibility, and other aspects of the quantitative statistical microarray analysis using Affymetrix technology have been described in:
Stack JH et al., IL-converting enzyme/caspase-1 inhibitor VX-765 blocks the hypersensitive response to an inflammatory stimulus in monocytes fromfamilial cold autoinflammatory syndrome patients. J Immunol 2005;175:2630-4.
Holt SE et al., Resistance to apoptosis in human cells conferred by telomerase function and telomere stability. MoI Carcinog. 1999; 25: 241-8.
Glinsky, GV et al., Microarray analysis identifies a death- from-cancer signature predicting therapy failure in patients with multiple types of cancer. J Clin Invest; 2005; 115: 1503 - 1521. Glinsky GV et al., Classification of human breast cancer using gene expression profiling as a component of the survival predictor algorithm. Clin Cancer Res. 2004 10: 2272-2283.
Glinsky GV et al., Gene expression profiling predicts clinical outcome of prostate cancer. J Clin Invest. 2004 113: 913-923.
Glinsky GV, et al., Microarray analysis of xenograft-derived cancer cell lines representing multiple experimental models of human prostate cancer. MoI Carcinog. 2003 37: 209-221.
[268] Briefly, forty to sixty percent of the surveyed genes were called present by
Affymetrix Microarray Suite version 5.0 software in these experiments. The concordance analysis of differential gene expression across the data sets was performed using Affymetrix MicroDB version 3.0 and DMT version 3.0 software as described in the references above. The microarray data was processed using the Affymetrix Microarray Suite version 5.0 software and statistical analysis of the expression data set was performed using the
Affymetrix MicroDB and Affymetrix DMT software. The Pearson correlation coefficient for individual test samples and the appropriate reference standard were determined using GraphPad Prism version 4.00 software (GraphPad Software). The significance of the overlap between the lists of differentially-regulated genes was calculated by using the hypergeometric distribution test (See Seila, A.C. et al. Divergent transcription from active promoters, Science (2008) 322:1849-51).
Expression profiling data included 697 clinical samples obtained from 185 control subjects and 350 patients diagnosed with 9 common human disorders including Crohn's disease (59 patients), ulcerative colitis (26 patients), rheumatoid arthritis (20 patients), Huntington's disease (17 patients), autism (15 patients), Alzheimer's disease (36 patients), obesity (14 subjects), prostate cancer (64 patients), and breast cancers (99 patients). Microarray data and associated clinical information are publically available in the Gene Expression Omnibus (GEO) database maintained by the National Center for Biotechnology Information using the following GEO accession numbers: GDS2601; GDS810; GDS2824; GDS1615; GDS711; GDS1480; GDS2545; GDS1331; GDS1407; GDS3203; GDS2255. Genomic information related to the PluriNet network genes is publically available from the Stem Cell Mesa microarray data server and also from Stem Cell Matrix. EQUIVALENTS
[269] Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims.
[270] All references cited herein are incorporated herein by reference in their entirety and for all purposes to the same extent as if each individual publication or patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety for all purposes.
[271] The present invention is not to be limited in scope by the specific
embodiments described herein. Indeed, various modifications of the invention in addition to those described herein will become apparent to those skilled in the art from the foregoing description and accompanying figures. Such modifications are intended to fall within the scope of the appended claims.

Claims

What is claimed is:
1. An isolated small non-coding RNA molecule transcribed from an intergenic region of the human genome, wherein the RNA molecule is less than 300 nucleotides and the intergenic region contains at least one small nucleotide polymorphism (SNP) associated with one or more human diseases or disorders.
2. The RNA molecule of claim 1 , wherein the cDNA form of the RNA molecule
comprises a sequence selected from the group consisting of SEQ ID NOs: 1-101, 332, and 333.
3. The RNA molecule of claim 1, wherein the SNP is selected from the group consisting of rs2670660, rs6596075, rs6983561, rsl6901979, rsl3281615, rsl0505477, rsl0808556, rs6983267, rs7014346, rs7000448, rsl447295, rs2820037, rs889312, rsl937506, rsl3387042, rs7716600, rsl 1249433, and rs3803662.
4. The RNA molecule of claim 3, wherein the cDNA form of the RNA molecule
comprises a sequence selected from the group consisting of SEQ ID NOs: 1, 4, 6, 7, 9-18, 39, 88-90, 332, and 333.
5. The RNA molecule of claim 4, wherein the cDNA form of the RNA molecule
comprises a sequence selected from the group consisting of SEQ ID NOs: 1, 7, 332, and 333.
6. A vector comprising the cDNA form of the RNA molecule of any one of claims 1 -5.
7. A cell comprising the vector of claim 6.
8. A kit comprising, in one or more containers, the vector of claim 6 and instructions for expressing the RNA molecule from the vector.
9. A kit comprising, in one or more containers, the cell of claim 6 and instructions for expressing the RNA molecule in the cell.
10. A kit comprising, in one or more containers, the vector of claim 6 and one or more polynucleotide primers for amplifying the cDNA molecule.
11. The kit of claim 10, wherein the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102-331.
12. The kit of claim 11 , wherein the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102-161.
13. The kit of claim 10, wherein the one or more primers comprises a sequence selected from the group consisting of SEQ ID NOs: 102, 103, 114, 115, 326, and 327.
14. A method for detecting the small non-coding RNA molecule of any one of claims 1-5 in a sample from a subject, the method comprising the step of detecting the cDNA form of the small non-coding RNA molecule in the sample.
15. The method of claim 14, wherein the cDNA form is detected by a method comprising reverse transcription and polymerase chain reaction (RT-PCR) technology.
16. The method of claim 14, wherein the cDNA form is detected by a method comprising nucleic acid hybridization technology.
17. The method of any one of claims 14-16, further comprising the steps of isolating the small RNA fraction from the sample and converting the RNA into cDNA prior to the step of detecting the cDNA in the sample.
18. The method of claim 14, wherein the method comprising detecting the cDNA form of the RNA molecule of claim 5.
19. A method for evaluating the risk that a human subject will develop a disease or
condition associated with a specific allele of an SNP ("the pathological allele") by detecting the presence of an RNA molecule of claim 1 in a sample from the subject, wherein the RNA molecule is transcribed from the pathological allele, and wherein detection of said RNA molecule indicates that the subject has an increased risk for developing the disease or condition and the failure to detect said RNA molecule indicates that the subject has a decreased risk for developing the disease or condition.
20. The method of claim 19, further comprising detecting the expression level of the RNA molecule transcribed from the pathological allele relative to its expression in a population of healthy subjects, wherein an increased or decreased level of expression relative to the population of healthy subjects indicates that the subject has an increased risk for developing the disease or condition.
21. The method of claim 19 or 20, wherein the step of detecting the presence of an RNA molecule transcribed from the pathological allele is performed indirectly, by detecting the expression of one or more genes whose expression is regulated by the RNA molecule.
22. A method for diagnosing a disease or condition associated with a specific allele of an SNP ("the pathological allele") in a human subject, the method comprising detecting the presence of an RNA molecule of claim 1 in a sample from the subject, wherein the RNA molecule is transcribed from the pathological allele, and wherein the disease or condition is positively diagnosed if the RNA molecule is detected in the sample.
23. A method for treating, preventing, or ameliorating a disease or condition associated with a specific allele of an SNP ("the pathological allele") in a subject in need thereof, the method comprising administering one or more therapeutic agents that act to suppress the expression or antagonize the activity of an RNA molecule of claim 1 , wherein the RNA molecule is transcribed from the pathological allele.
24. The method of any one of claims 14-23, wherein the subject is human.
25. The method of any one of claims 14-23, wherein the sample is a blood, tissue, or cell sample.
26. The method of any one of claims 19-23, wherein the disease or condition is selected from the group consisting of Crohn's disease, rheumatoid arthritis, Huntington's disease, Alzheimer's disease, breast cancer, prostate cancer, autism, and obesity.
27. An apparatus for evaluating a disease or condition, or evaluating the risk of
developing a disease or condition, in a subject, the apparatus comprising a model configured to evaluate a dataset for the subject to thereby evaluate the risk of disease in the subject, wherein the model is based upon determining the similarity in the expression profile of a defined set of genes in a sample from the subject and the expression profile for that set of genes in one or more reference sets of the model, wherein a reference set comprises one or more of a population of healthy subjects and a population of subjects suffering from the disease, wherein the set of genes is a set of genes whose expression is regulated by a small RNA molecule of claim 1.
PCT/US2010/042346 2009-07-17 2010-07-16 SMALL NON-CODING REGULATORY RNAs AND METHODS FOR THEIR USE WO2011009089A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/261,142 US20120316218A1 (en) 2009-07-17 2010-07-16 SMALL NON-CODING REGULARTORY RNA's and METHODS FOR THEIR USE

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US22644809P 2009-07-17 2009-07-17
US61/226,448 2009-07-17
US26355609P 2009-11-23 2009-11-23
US61/263,556 2009-11-23
US26405709P 2009-11-24 2009-11-24
US61/264,057 2009-11-24
US30766610P 2010-02-24 2010-02-24
US61/307,666 2010-02-24

Publications (1)

Publication Number Publication Date
WO2011009089A1 true WO2011009089A1 (en) 2011-01-20

Family

ID=43449841

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2010/042346 WO2011009089A1 (en) 2009-07-17 2010-07-16 SMALL NON-CODING REGULATORY RNAs AND METHODS FOR THEIR USE

Country Status (1)

Country Link
WO (1) WO2011009089A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014014767A1 (en) * 2012-07-18 2014-01-23 Idexx Laboratories, Inc. Boone cardiovirus
CN109750042A (en) * 2019-03-27 2019-05-14 石家庄市第一医院 Systemic loupus erythematosus auxiliary diagnosis marker and its application

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008050356A1 (en) * 2006-10-27 2008-05-02 Decode Genetics Cancer susceptibility variants on chr8q24.21
US20090099789A1 (en) * 2007-09-26 2009-04-16 Stephan Dietrich A Methods and Systems for Genomic Analysis Using Ancestral Data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008050356A1 (en) * 2006-10-27 2008-05-02 Decode Genetics Cancer susceptibility variants on chr8q24.21
US20090099789A1 (en) * 2007-09-26 2009-04-16 Stephan Dietrich A Methods and Systems for Genomic Analysis Using Ancestral Data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014014767A1 (en) * 2012-07-18 2014-01-23 Idexx Laboratories, Inc. Boone cardiovirus
CN109750042A (en) * 2019-03-27 2019-05-14 石家庄市第一医院 Systemic loupus erythematosus auxiliary diagnosis marker and its application

Similar Documents

Publication Publication Date Title
Chandra et al. Genetic and epigenetic basis of psoriasis pathogenesis
Zhang et al. Genome-wide DNA methylation analysis identifies hypomethylated genes regulated by FOXP3 in human regulatory T cells
Ai et al. Hypomethylation of SNCA in blood of patients with sporadic Parkinson's disease
Romanoski et al. Systems genetics analysis of gene-by-environment interactions in human cells
Zou et al. Circular RNA expression profile and potential function of hsa_circRNA_101238 in human thoracic aortic dissection
Luo et al. A functional variant in microRNA-146a promoter modulates its expression and confers disease risk for systemic lupus erythematosus
Javierre et al. Changes in the pattern of DNA methylation associate with twin discordance in systemic lupus erythematosus
Li et al. Identification of allele-specific alternative mRNA processing via transcriptome sequencing
Sharp et al. Methylation profiling in individuals with uniparental disomy identifies novel differentially methylated regions on chromosome 15
US10519501B2 (en) Common and rare genetic variations associated with common variable immunodeficiency (CVID) and methods of use thereof for the treatment and diagnosis of the same
CA2814081A1 (en) Micrornas (mirna) as biomakers for the identification of familial and non-familial colorectal cancer
Maierhofer et al. Epigenetic signatures of Werner syndrome occur early in life and are distinct from normal epigenetic aging processes
US20140038840A1 (en) DNA Methylation Changes Associated with Major Psychosis
Calabrese et al. A survey of imprinted gene expression in mouse trophoblast stem cells
Yuan et al. Current advances in lupus genetic and genomic studies in Asia
Yun et al. Rs2262251 in lncRNA RP11‐462G12. 2 is associated with nonsyndromic cleft lip with/without cleft palate
Saferali et al. Cell culture-induced aberrant methylation of the imprinted IG DMR in human lymphoblastoid cell lines
WO2011009089A1 (en) SMALL NON-CODING REGULATORY RNAs AND METHODS FOR THEIR USE
WO2017046714A1 (en) Methylation signature in squamous cell carcinoma of head and neck (hnscc) and applications thereof
Masson et al. Copy number variants associated with 18p11. 32, DCC and the promoter 1B region of APC in colorectal polyposis patients
US20120316218A1 (en) SMALL NON-CODING REGULARTORY RNA&#39;s and METHODS FOR THEIR USE
Tomsic et al. Variants in microRNA genes in familial papillary thyroid carcinoma
US20180148783A1 (en) Method of epigenetic analysis for determining clinical genetic risk
Smieszek et al. Nuclear-Mitochondrial interactions influence susceptibility to HIV-associated neurocognitive impairment
CA2926943A1 (en) Methods and uses related to rhabdoid tumors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10800637

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 13261142

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10800637

Country of ref document: EP

Kind code of ref document: A1