WO2008053358A2 - A common gene expression signature in dilated cardiomyopathy - Google Patents

A common gene expression signature in dilated cardiomyopathy Download PDF

Info

Publication number
WO2008053358A2
WO2008053358A2 PCT/IB2007/004191 IB2007004191W WO2008053358A2 WO 2008053358 A2 WO2008053358 A2 WO 2008053358A2 IB 2007004191 W IB2007004191 W IB 2007004191W WO 2008053358 A2 WO2008053358 A2 WO 2008053358A2
Authority
WO
WIPO (PCT)
Prior art keywords
gene
microarray
nucleic acid
genes
cardiomyopathy
Prior art date
Application number
PCT/IB2007/004191
Other languages
French (fr)
Other versions
WO2008053358A3 (en
WO2008053358A8 (en
Inventor
Ruprecht Kuner
Holger Sueltmann
Markus Ruschhaupt
Andreas Buness
Andreas Barth
Michael Nabauer
Annemarie Poustka
Original Assignee
Deutsches Krebsforschungszentrum
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deutsches Krebsforschungszentrum filed Critical Deutsches Krebsforschungszentrum
Priority to EP07866600A priority Critical patent/EP2046997A2/en
Priority to JP2009521384A priority patent/JP2009544306A/en
Publication of WO2008053358A2 publication Critical patent/WO2008053358A2/en
Publication of WO2008053358A3 publication Critical patent/WO2008053358A3/en
Publication of WO2008053358A8 publication Critical patent/WO2008053358A8/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention is drawn to a convenient and highly effective microarray tool and method for diagnosing a cardiomyopathy state, especially dilated cardiomyopathy, in a subject, with a high degree of accuracy.
  • DCM Dilated cardiomyopathy
  • natriuretic peptides are the typical markers for diagnosing and managing heart failure. However, there is significant heterogeneity in expression levels of these natriuretic molecular disease markers that is not explained by left ventricular function alone.
  • Transcriptional signature analysis is a powerful technique for identifying potential molecular targets that could ultimately become important for diagnosis and therapy of heart failure. Yet, differences in platform technologies, experimental design, and the biological heterogeneity associated with the use of human tissue samples are obstacles to the successful comparison and integration of results obtained by different microarray studies in heart failure. In addition, substantial regional variation in gene expression exists in mammalian myocardium including atrium, ventricle and septum or left and right side of the heart. See Barth et al., "Functional profiling of human atrial and ventricular gene expression," Pfl ⁇ gers Archiv.
  • the present invention takes into account the various variables that can otherwise confound a cardiomyopathy diagnosis, by integrating independent microarray studies from large numbers of failing and non-failing hearts. Accordingly, the present invention provides a convenient and highly effective tool and method for diagnosing a cardiomyopathic state with accuracy.
  • a general aspect of the present invention is the detection of particular nucleic acid or protein expression levels in a biological sample, which is useful for preparing an expression profile of that sample, wherein the profile is indicative of a healthy or diseased condition associated with that sample.
  • the invention provides expression profiles in various conditions associated or symptomatic of cardiomyopathy. Hence, the invention produces expression profiles that are useful for distinguishing, for instance, between cardiomyopathic tissue and healthy tissue, failing and non-failing heart conditions, and ischemic and non-ischemic cardiomyopathy.
  • an aspect of the invention entails comparing the expression profile from a biological sample from an individual, such as a human patient, with the expression profile of a healthy individual or the expression profile of an individual who does not have the cardiomyopathic condition that the tested individual is believed or suspected of having.
  • nucleic acids each one of which shares sequence identity with either the sense or antisense strand sequence of a particular target gene or target nucleic acid implicated in a particular cardiomyopathic condition, that is useful for determining the expression levels of those target genes in any given sample.
  • each nucleic acid comprises a sequence that is completely or partially complementary to a sequence of the target gene or target nucleic acid.
  • a nucleic acid sequence of the collection is completely or partially complementary to 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 350, 400, 450,
  • a nucleic acid sequence that is completely complementary is one that comprises a sequence that is identical in its complement to the equivalent sequence of the target. That is, the nucleic acid of the collection comprises a sequence that has no nucleotide mismatches compared to the corresponding target sequence.
  • an array such as a micro- or nanoarrays, that comprises a collection of nucleic acid molecules, wherein each nucleic acid molecule comprises a nucleotide sequence that is complementary to a target sequence of one of the following genes listed below from SEQ ID NOs: 1-27.
  • the collection of nucleic acids that are affixed or immobilized on the surface of the array may include nucleic acids that share sequence identity with some or all of these twenty-seven sequences.
  • the array may comprise multiple nucleic acids each of which
  • Synuclein alpha non A4 component of amyloid precursor
  • SNCA non A4 component of amyloid precursor
  • Asporin (SEQ ID NO: 2);
  • Pleckstrin homology-like domain, family A, member 1 (PHLDAl) (SEQ ID NO: 4); (5) Frizzled-related protein (“FRZB”) (SEQ ID NO: 5);
  • MYH6 Myosin heavy chain 6, cardiac muscle, alpha (cardiomyopathy, hypertrophic 1) ("MYH6”) (SEQ ID NO: 6);
  • CCL2 Chemokine (C-C motif) ligand 2
  • CCL2 Chemokine (C-C motif) ligand 2
  • ODCl Ornithine decarboxylase 1
  • Retinoic acid receptor responder tazarotene induced 1 (“RARRESl ”) (SEQ ID NO: 9);
  • MYHlO Myosin heavy chain 10, non-muscle (MYHlO) (SEQ ID NO: 12);
  • FCN3 Ficolin (collagen/ fibrinogen domain containing) 3 (Hakata antigen) (FCN3”) (SEQ ID NO: 13);
  • S100A8 SEQ ID NO: 14
  • CORIN Corin serine peptidase
  • NPPA N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoeth
  • PCOLCE2 Procollagen C-endopeptidase enhancer 2
  • NPPB N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl)-2-aminoethyl-N-(2-aminoethyl) (SEQ ID NO: 18);
  • ATF3 Activating transcription factor 3
  • CTGF Connective tissue growth factor
  • G0/G lswitch 2 (SEQ ID NO: 23);
  • KLHL3 Kelch-like 3 (Drosophila)
  • ZBTB16 Zinc finger and BTB domain containing 16
  • AEBPl AE binding protein 1
  • ETS variant gene 5 (ets-related molecule) (ETV5) (SEQ ID NO: 27).
  • each of the nucleic acid molecules in the collection on the microarray comprises a different sequence as compared to each of the other nucleic acid molecules in the collection.
  • each of the nucleic acid molecules in the collection comprises a complementary sequence to a target gene that is different from the complementary target gene sequence in each of the other nucleic acid molecules in the collection.
  • the nucleic acid molecule is an oligonucleotide or a probe.
  • the oligonucleotide or probe is labeled with a moiety to promote signal detection after the oligonucleotide or probe hybridizes to its corresponding target sequence.
  • the collection comprises nucleic acid molecules that comprise complementary sequences to at least the following sequences: a myosin heavy polypeptide 10 (non-muscle) gene, a synuclein gene, an alpha putative lymphocyte G0/G1 switch gene gene, an ets variant gene 5, an AE binding protein 1 gene, a kelch-like 3 gene, a zinc finger and BTB domain containing 16 gene, and a procollagen C-endopeptidase enhancer 2 gene.
  • Another aspect of the present invention is directed to a method for diagnosing a cardiomyopathic state in a subject comprising: (i) exposing the microarray of claim 1 to an isolated biological sample (ii) determining which nucleic acid molecules of the collection hybridize to its corresponding target gene sequence, and (iii) comparing that hybridization pattern to the hybridization pattern of a control for the same genes, wherein a difference between the two patterns is indicative that the individual has a cardiomyopathic disease.
  • the biological sample is blood or a tissue biopsy.
  • the tissue biopsy is a heart muscle biopsy.
  • the cardiomyopathic disease is dilated cardiomyopathy or idiopathic dilated cardiomyopathy.
  • Another aspect of the present invention is directed to an assay for diagnosing a cardiomyopathic state in a subject, comprising: (i) determining the gene expression levels of one or more of the genes or nucleic acids selected from the group consisting of:
  • the gene or nucleic acid is in a biological sample isolated from the subject, and (ii) comparing the expression levels to a biological sample from a healthy individual, wherein a difference in the expression levels indicates that the subject has a cardiomyopathy disease.
  • the cardiomyopathy disease is dilated cardiomyopathy or idiopathic dilated cardiomyopathy.
  • the expression levels that are determined are the levels of either the respective gene transcripts or proteins.
  • the expression levels of one or more, at least about two, at least about three, at least about four, at least about five, at least about six, at least about seven, at least about eight, at least about nine, at least about ten, at least about eleven, at least about twelve, at least about thirteen, at least about fourteen, at least about fifteen, at least about sixteen, at least about seventeen, at least about eighteen, at least about nineteen, at least about twenty, at least about twenty one, at least about twenty two, at least about twenty three, at least about twenty-four, at least about twenty five, at least about twenty six, or about twenty seven genes are determined in the subject's biological sample.
  • the expression levels of all 27 genes are determined in the subject's biological sample and compared to the expression levels of the same 27 genes in the healthy subject's biological sample.
  • the biological sample is blood or a tissue biopsy.
  • the tissue biopsy is cardiac muscle.
  • the step of determining gene expression levels is performed by at least one of a method selected from the group consisting of microarray, PCR, RT-PCR, TaqMan RT-PCR, Northern Blot, Western Blot, antibody detection, ELISA, or any combination thereof.
  • the present invention also encompasses the diagnosis of other cardiomyopathies using the compositions and methods described herein.
  • the present invention encompasses the diagnosis or prognosis of ischemic cardiomyopathy (ICM) using the genes, detection methods, and assays disclosed herein.
  • ICM ischemic cardiomyopathy
  • Figure 1 Shows a functional analysis based on Gene Ontology for selected gene classes comparing up-regulated genes in DCM (yellow bars) and down- regulated (green bars) genes in DCM of Dataset A (open bars) and B (striped bars). Differences in gene classes marked by an asterisk were statistically significant (Fisher's exact test, p ⁇ 0.05) according to "FatiGO" (20).
  • FIG. 2 Shows a Prediction analysis for microarrays (“PAM”) classification.
  • PAM microarrays
  • the first step PAM classification was applied to all four datasets separately. Very low misclassification rates were found in datasets A, B and D for the classification of non-failing (NF) vs. DCM samples, whereas the classification algorithm did not show any power in Dataset C.
  • the second step the procedure was repeated with the smallest gene signature obtained from Dataset B, now achieving more than 90% accuracy for classifying DCM and NF samples across all studies, including Dataset C.
  • Figure 3 Shows mean expression ⁇ S. E. M of pro-BNP in NF (black bars) and DCM samples (red bars) in datasets A-D. Statistical comparison was carried out by Student's t-test.
  • the present invention provides methods and compositions for identifying genetic biomarkers that are useful for classifying and diagnosing cardiomyopathy disease states.
  • the present invention identifies genes and biological processes that are either known to be involved with, or are implicated in, dilated cardiomyopathy (DCM).
  • DCM dilated cardiomyopathy
  • NF non-failing
  • protein is understood to include the terms “polypeptide” and “peptide” (which, at times, may be used interchangeably herein) within its meaning.
  • Recombinant proteins or polypeptides refer to proteins or polypeptides produced by recombinant DNA techniques, i.e., produced from cells, microbial or mammalian, transformed by an exogenous recombinant DNA expression construct encoding the desired protein or polypeptide. Proteins or polypeptides expressed in most bacterial cultures will typically be free of glycan. Proteins or polypeptides expressed in yeast may have a glycosylation pattern different from that expressed in mammalian cells.
  • a DNA or polynucleotide “coding sequence” is a DNA or polynucleotide sequence that is transcribed into mRNA and translated into a polypeptide in a host cell when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are the start codon at the 5' N-terminus and the translation stop codon at the 3' C-terminus.
  • a coding sequence can include prokaryotic sequences, cDNA from eukaryotic mRNA, genomic DNA sequences from eukaryotic DNA, and synthetic DNA sequences. A transcription termination sequence will usually be located 3' to the coding sequence.
  • DNA or polynucleotide sequence is a heteropolymer of deoxyribonucleotides (bases adenine, guanine, thymine, cytosine). DNA or polynucleotide sequences can be assembled from synthetic cDNA-derived DNA fragments and short oligonucleotide linkers.
  • analogs when referring to the nucleic acids of this invention mean analogs, fragments, derivatives, and variants of such nucleotides having, for example, at least about 60% sequence identity, at least about 70% sequence identity, at least about 80% sequence identity, at least about 90% sequence identity, at least about 91% sequence identity, at least about 92% sequence identity, at least about 93% sequence identity, at least about 94% sequence identity, at least about 95% sequence identity, at least about 96% sequence identity, at least about 97% sequence identity, at least about 98% sequence identity, at least about 99% sequence identity, or at least about 100% sequence identity to the native or naturally occurring nucleic acid, as described herein.
  • Similarity between two polynucleotides is determined by comparing the amino acid sequence corresponding to each polynucleotide to the amino acid sequence corresponding to the second polynucleotide.
  • An amino acid of one amino acid sequene is similar to the corresponding amino acid of a second amino acid sequence if it is identical or a conservative amino acid substitution.
  • Conservative substitutions include those described in Dayhoff, M.O., ed., The Atlas of Protein Sequence and Structure 5, National Biomedical Research Foundation, Washington, D. C. (1978), and in Argos, P. (1989) EMBOJ. 8:779-785.
  • amino acids belonging to one of the following groups represent conservative changes or substitutions:
  • -Ala Pro, GIy, GIn, Asn, Ser, Thr: -Cys, Ser, Tyr, Thr; -VaI, lie, Leu, Met, Ala, Phe; -Lys, Arg, His; -Phe, Tyr, Trp, His; and
  • “Mammal” includes humans and domesticated animals, such as cats, dogs, swine, cattle, sheep, goats, horses, rabbits, and the like.
  • One aspect of the present invention is directed to a collection of 27 genes that represent useful targets for successfully distinguishing non-failing heart samples from dilated cardiomyopathy heart samples with over 90% accuracy. That is, identifying the presence or expression level of one or more or all of these 27 genes can be indicative of a cardiomyopathy disease phenotype.
  • expression levels of any combination of the 27 genes can be measured, such as the expression levels of one or more, at least about two, at least about three, at least about four, at least about five, at least about six, at least about seven, at least about eight, at least about nine, at least about ten, at least about eleven, at least about twelve, at least about thirteen, at least about fourteen, at least about fifteen, at least about sixteen, at least about seventeen, at least about eighteen, at least about nineteen, at least about twenty, at least about twenty one, at least about twenty two, at least about twenty three, at least about twenty-four, at least about twenty five, at least about twenty six, or about twenty seven genes can be determined in the subject's biological sample.
  • the expression levels of all 27 genes are determined in the subject's biological sample and compared to the expression levels of the same 27 genes in the healthy subject's biological sample.
  • the group of 27 genes includes the following, which are categorized based on existing knowledge of their individual involvement in DCM.
  • BNP natriuretic peptide precursor B
  • NPPA natriuretic peptide precursor A
  • CORRIN cardiomyopathy corin
  • C-C motif chemokine (C-C motif) ligand 2 (CCL2) myosin, heavy polypeptide 6, cardiac, alpha (MYH6)
  • ATF3 activating transcription factor 3
  • CGF connective tissue growth factor
  • FRZB frizzled-related protein
  • PLDAl pleckstrin homology-like domain, family Al
  • ODC l retinoic acid receptor responder 1
  • AXT2L1 complement factor H-related 3
  • SPOCK osteonectin proteoglycan
  • AEBPl AE binding protein 1
  • KLHL3 kelch-like 3
  • ZBTB 16 BTB domain containing 16
  • procollagen C-endopeptidase enhancer 2 PCOLCE2
  • the present invention provides an arrangement of markers for these 27 genes, and the markers can collectively be used to determine whether a particular biological sample is indicative of cardiomyopathic heart disease.
  • the abbreviations are used in Tables elsewhere in this application.
  • the present invention encompasses the use of markers to all 27 genes, as well as the use of markers to subsets of the 27 genes. For instance, markers to one or more of the Group 4 genes may be arranged or combined alongside one or more markers of the Group 1 genes. Hence, the present invention contemplates various combinations of gene markers based on the four groups outlined above, such as one or more markers of each Group according to the following exemplary combinations:
  • Microarrays are useful for identifying cancer-specific genes and inflammatory-specific genes. See DeRisi et a , Nat. Genet. 14(4):457-60 (1996); and Heller et al. , Proc. NaU. Acad. Sci. USA 94(6):2150-55 ( 1997).
  • a microarray may typically be composed of a number of unique, single- stranded nucleic acid sequences, usually either synthetic antisense oligonucleotides or fragments of cDNAs, fixed to a solid support.
  • microarray connotes an array of polynucleotides or oligonucleotides that are placed, arranged, or otherwise affixed on to a substrate, such as paper, nylon or other type of membrane, filter, chip, glass slide, or any other such suitable solid support.
  • a substrate such as paper, nylon or other type of membrane, filter, chip, glass slide, or any other such suitable solid support.
  • one microarray may comprise one or more combinations of nucleic acid sequences that share sequence identity with, or share sequence identity with the complement of, one or more of the 27 genes denoted above characterized into the four groups.
  • One embodiment of the invention uses solid support-based oligonucleotide hybridization methods to detect gene expression.
  • Solid support-based methods suitable for practicing the present invention are widely known and are described, for example, in PCT application WO 95/ 1 1755; Huber et al., Anal. Biochem. 299: 24 (2001); Meiyanto et al, Biotechniques. 31 : 406 (2001); Relogio et al, Nucleic Adds Res. 30:e51 (2002).
  • Any solid surface to which oligonucleotides can be bound, covalently or non-covalently can be used.
  • Such solid supports include, but are not limited to, filters, polyvinyl chloride dishes, silicon or glass based chips.
  • the nucleic acid molecule can be directly bound to the solid support or bound through a linker arm, which is typically positioned between the nucleic acid sequence and the solid support.
  • a linker arm that increases the distance between the nucleic acid molecule and the substrate can increase hybridization efficiency.
  • the solid support is coated with a polymeric layer that provides linker arms with a lot of reactive ends/sites.
  • a common example of this type is glass slides coated with polylysine (see, U.S. Patent No. 5667976), which are commercially available.
  • the linker arm may be synthesized as part of or conjugated to the nucleic acid molecule, and then this complex is bonded to the solid support.
  • the streptavidin-biotinylated reaction is stable enough to withstand stringent washing conditions and is sufficiently stable that it is not cleaved by laser pulses used in some detection systems, such as matrix-assisted laser desorption/ ionization time of flight (MALDI-TOF) mass spectrometry. Therefore, streptavidin may be covalently attached to a solid support, and the nucleic acid molecule is labeled with a biotin group (or vice versa).
  • MALDI-TOF matrix-assisted laser desorption/ ionization time of flight
  • biotinylated nucleic acid molecule effectively sticks wherever it is placed on the streptavidin-covered support surface.
  • an amino- coated silicon wafer is reacted with the n-hydroxysuccinimido-ester of biotin and complexed with streptavidin.
  • Biotinylated oligonucleotides are bound to the surface at a concentration of about 20 fmol DNA per mm 2 .
  • the support is coated with hydraztde groups, then treated with carbodiimide.
  • Carboxy-modified nucleic acid molecules are then coupled to the treated support.
  • Epoxide-based chemistries are also being employed with amine modified oligonucleotides.
  • Other chemistries for coupling nucleic acid molecules to solid substrates are known to those of skill in the art.
  • the nucleic acid molecules are typically delivered to the substrate material. Because of the miniaturization of the arrays, delivery techniques should be capable of positioning very small amounts of liquids (e.g., less than 1 nanoliter) in very small regions (e.g., 100 m diameter dots), very close to one another (e.g., 250 m separation) and amenable to automation. Several techniques and apparatus are available to achieve such delivery. Among these are mechanical mechanisms [e.g., arrayers from GeneticMicroSystems, MA, USA) and ink-jet technology. Very fine pipettes may also be used. Other formats are also suitable within the context of this invention. For example, a 96-well format with fixation of the nucleic acids to a nitrocellulose or nylon membrane may also be employed.
  • the probes After the nucleic acid molecules have been bound to the solid support, it is often useful to block reactive sites on the solid support that are not consumed in binding to the nucleic acid molecule. Otherwise, the probes will, to some extent, bind directly to the solid support itself, giving rise to so-called non-specific binding. Non-specific binding can sometimes hinder the ability to detect low levels of specific binding.
  • a variety of effective blocking agents e.g., milk powder, serum albumin or other proteins with free amine groups, polyvinylpyrrolidine
  • the choice depends at least in part upon the binding chemistry.
  • An oligonucleotide may preferably be about 6 to about 60 nucleotides in length, or any length in between these two parameters. In other embodiments of the invention, an oligonucleotide may be about 15 to about 30 nucleotides in length, or about 20 to about 25 nucleotides in length. For a certain type of microarray, it may be preferable to use oligonucleotides which are about 7 to about 10 nucleotides in length.
  • the microarray may comprise oligonucleotides which cover the known 5', or 3', sequence, sequential oligonucleotides which cover the full length sequence; or unique oligonucleotides selected from particular areas along the length of the sequence.
  • Polynucleotides used in the microarray may be oligonucleotides that are specific to a gene or genes of interest in which at least a fragment of the sequence is known or that are specific to one or more unidentified cDNAs which are common to a particular cell type, development or disease state.
  • oligonucleotide arrays i.e. microarrays, to simultaneously observe the expression of a number of genes or gene products.
  • Oligonucleotide arrays comprise two or more oligonucleotide probes provided on a solid support, wherein each probe occupies a unique location on the support.
  • the location of each probe may be predetermined, such that detection of a detectable signal at a given location is indicative of hybridization to an oligonucleotide probe of a known identity.
  • Each predetermined location can contain more than one molecule of a probe, but each molecule within the predetermined location has an identical sequence. Such predetermined locations are termed features.
  • each oligonucleotide is located at a unique position on an array at least 2, at least 3, at least 4, at least 5, at least 6, or at least 10 times.
  • Oligonucleotide probe arrays for detecting gene expression can be made and used according to conventional techniques described, for example, in Lockhart et al, Natl Biotech. 14: 1675 (1996), McGaIl et al, Proc. Natl Acad. Sd. USA 93: 13555 (1996), and Hughes et al, Nature Biotechnol. 19:342 (2001).
  • a variety of oligonucleotide array designs is suitable for the practice of this invention.
  • the one or more oligonucleotides include a plurality of oligonucleotides that each hybridize to a different gene expressed in a particular tissue type.
  • oligonucleotides of the present invention hybridize to nucleic acid sequences of any of the following genes: Synuclein alpha (non A4 component of amyloid precursor) ("SNCA"), Asporin ("ASPN"), Secreted frizzled- related protein 4 ("SFRP4"), Pleckstrin homology-like domain, family A, member 1 ("PHLDAl”), Frizzled- related protein (“FRZB”), Myosin heavy chain 6, cardiac muscle, alpha (cardiomyopathy, hypertrophic 1) ("MYH6"), Chemokine (C-C motif) ligand 2 (“CCL2”), Ornithine decarboxylase 1 (“ODCl”), Retinoic acid receptor responder (tazarotene induced) 1 (“RARRESl”), Complement factor H
  • a detectable molecule also referred to herein as a label
  • a label will be incorporated or added to an array's nucleic acid sequences.
  • Many types of molecules can be used within the context of this invention. Such molecules include, but are not limited to, fluorochromes, chemiluminescent molecules, chromogenic molecul es, radioactive molecules, mass spectometry tags, proteins, and the like. Other labels will be readily apparent to one skilled in the art. Indirect detection can also be used within the context of this invention. Proteins and other molecules are available that will bind to double-stranded DNA but not to single-stranded DNA. Thus, hybridization can be measured.
  • a nucleic acid sample obtained from an individual can be amplified and, optionally labeled with a detectable label.
  • Any method of nucleic acid amplification and any detectable label suitable for such purpose can be used.
  • amplification reactions can be performed using, e.g. Ambion's MessageAmp, which creates "antisense” RNA or "aRNA" (complementary in nucleic acid sequence to the RNA extracted from the sample tissue).
  • the RNA can optionally be labeled using CyDye fluorescent labels.
  • CyDye fluorescent labels are coupled to the aaUTPs in a non-enzymatic reaction.
  • labeled amplified antisense RNAs are precipitated and washed with appropriate buffer, and then assayed for purity.
  • purity can be assay using a NanoDrop spectrophotometer.
  • the nucleic acid sample is then contacted with an oligonucleotide array having, attached to a solid substrate (a "microarray slide"), oligonucleotide sample probes capable of hybridizing to nucleic acids of interest which may be present in the sample.
  • the step of contacting is performed under conditions where hybridization can occur between the nucleic acids of interest and the oligonucleotide probes present on the array.
  • the array is then washed to remove non-specifically bound nucleic acids and the signals from the labeled molecules that remain hybridized to oligonucleotide probes on the solid substrate are detected.
  • the step of detection can be accomplished using any method appropriate to the type of label used.
  • the step of detecting can accomplished using a laser scanner and detector.
  • on can use and Axon scanner which optionally uses GenePix Pro software to analyze the position of the signal on the microarray slide. Data from one or more microarray slides can analyzed by any appropriate method known in the art.
  • Oligonucleotide probes used in the methods of the present invention can be generated using PCR.
  • PCR primers used in generating the probes are chosen, for example, based on the sequences of SEQ ID NOs: 1-27.
  • oligonucleotide control probes also are used.
  • Exemplary control probes can fall into at least one of three categories referred to herein as (1) normalization controls, (2) expression level controls and (3) negative controls.
  • one or more of these control probes may be provided on the array with the inventive cell cycle gene-related oligonucleotides.
  • Normalization controls correct for dye biases, tissue biases, dust, slide irregularities, malformed slide spots, etc.
  • Normalization controls are oligonucleotide or other nucleic acid probes that are complementary to labeled reference oligonucleotides or other nucleic acid sequences that are added to the nucleic acid sample to be screened.
  • the signals obtained from the normalization controls, after hybridization provide a control for variations in hybridization conditions, label intensity, reading efficiency and other factors that can cause the signal of a perfect hybridization to vary between arrays.
  • signals ⁇ e.g., fluorescence intensity or radioactivity) read from all other probes used in the method are divided by the signal from the control probes, thereby normalizing the measurements.
  • Virtually any probe can serve as a normalization control. Hybridization efficiency varies, however, with base composition and probe length. Preferred normalization probes are selected to reflect the average length of the other probes being used, but they also can be selected to cover a range of lengths. Further, the normalization control(s) can be selected to reflect the average base composition of the other probes being used. In one embodiment, only one or a few normalization probes are used, and they are selected such that they hybridize well ⁇ i.e., without forming secondary structures) and do not match any test probes. In one embodiment, the normalization controls are mammalian genes.
  • Expression level controls probes hybridize specifically with constitutively expressed genes present in the biological sample. Virtually any constitutively expressed gene provides a suitable target for expression level control probes. Typically, expression level control probes have sequences complementary to subsequences of constitutively expressed "housekeeping genes" including, but not limited to certain photosynthesis genes.
  • Negative control probes are not complementary to any of the test oligonucleotides [i.e., the inventive cell cycle gene-related oligonucleotides), normalization controls, or expression controls.
  • the negative control is a mammalian gene which is not complementary to any other sequence in the sample.
  • background and background signal intensity refer to hybridization signals resulting from non-specific binding or other interactions between the labeled target nucleic acids (i.e., mRNA present in the biological sample) and components of the oligonucleotide array. Background signals also can be produced by intrinsic fluorescence of the array components themselves. A single background signal can be calculated for the entire array, or a different background signal can be calculated for each target nucleic acid. In a one embodiment, background is calculated as the average hybridization signal intensity for the lowest 5 to 10 percent of the oligonucleotide probes being used, or, where a different background signal is calculated for each target gene, for the lowest 5 to 10 percent of the probes for each gene.
  • background can be calculated as the average hybridization signal intensity produced by hybridization to probes that are not complementary to any sequence found in the sample [e.g., probes directed to nucleic acids of the opposite sense or to genes not found in the sample).
  • background can be calculated as the average signal intensity produced by regions of the array that lack any oligonucleotides probes at all.
  • the nucleic acid molecules are directly or indirectly coupled to an enzyme.
  • a chromogenic substrate is applied and the colored product is detected by a camera, such as a charge-coupled camera.
  • enzymes include alkaline phosphatase, horseradish peroxidase and the like.
  • the invention also provides methods of labeling nucleic acid molecules with cleavable mass spectrometry tags (CMST) (see for example, U.S. Patent No: 60279890). After an assay is complete, and the uniquely CMST-labeled probes are distributed across the array, a laser beam is sequentially directed to each member of the array.
  • CMST cleavable mass spectrometry tags
  • the light from the laser beam both cleaves the unique tag from the tag-nucleic acid molecule conjugate and volatilizes it.
  • the volatilized tag is directed into a mass spectrometer. Based on the mass spectrum of the tag and knowledge of how the tagged nucleotides were prepared, one can unambiguously identify the nucleic acid molecules to which the tag was attached (see, e.g., WO9905319).
  • the nucleic acids can be labeled readily by any of a variety of techniques.
  • the nucleic acids can be labeled during the reaction by incorporation of a labeled dNTP or use of labeled amplification primer.
  • the amplification primers include a promoter for an RNA polymerase, a post-reaction labeling can be achieved by synthesizing RNA in the presence of labeled NTPs.
  • Amplified fragments that were unlabeled during amplification or unamplified nucleic acid molecules can be labeled by one of a number of end labeling techniques or by a transcription method, such as nick- translation, random-primed DNA synthesis.
  • PCR-based methods are used to detect gene expression. These methods include reverse-transcriptase-mediated polymerase chain reaction (RT-PCR) including real-time and endpoint quantitative reverse- transcriptase-mediated polymerase chain reaction (Q-RTPCR). These methods are well known in the art. For example, methods of quantitative PCR can be carried out using kits and methods that are commercially available from, for example, Applied BioSystems and Stratagene®. See also Kochanowski, QUANTITATIVE PCR PROTOCOLS (Humana Press, 1999); Innis et al., supra.; Vandesompele et al., Genome Biol. 3: RESEARCH0034 (2002); Stein, CeH MoI. Life Sd.
  • RT-PCR reverse-transcriptase-mediated polymerase chain reaction
  • Q-RTPCR quantitative reverse-transcriptase-mediated polymerase chain reaction
  • Q-RTPCR relies on detection of a fluorescent signal produced proportionally during amplification of a PCR product. See Innis et al, supra.
  • this technique employs PCR oligonucleotide primers, typically 15-30 bases long, that hybridize to opposite strands and regions flanking the DNA region of interest.
  • a probe ⁇ e.g., TaqMan®, Applied Biosystems is designed to hybridize to the target sequence between the forward and reverse primers traditionally used in the PCR technique.
  • the probe is labeled at the 5' end with a reporter fluorophore, such as 6-carboxyfluorescein (6-FAM) and a quencher fluorophore like 6-carboxy-tetramethyl-rhodamine (TAMRA).
  • a reporter fluorophore such as 6-carboxyfluorescein (6-FAM) and a quencher fluorophore like 6-carboxy-tetramethyl-rhodamine (TAMRA).
  • 6-FAM 6-carboxyfluorescein
  • TAMRA 6-carboxy-tetramethyl-rhodamine
  • the forward and reverse amplification primers and internal hybridization probe is designed to hybridize specifically and uniquely with one nucleotide derived from the transcript of a target gene.
  • the selection criteria for primer and probe sequences incorporates constraints regarding nucleotide content and size to accommodate TaqMan® requirements.
  • SYBR Green® can be used as a probe-less Q-RTPCR alternative to the Taqman®-type assay, discussed above.
  • a device measures changes in fluorescence emission intensity during PCR amplification. The measurement is done in "real time," that is, as the amplification product accumulates in the reaction. Other methods can be used to measure changes in fluorescence resulting from probe digestion. For example, fluorescence polarization can distinguish between large and small molecules based on molecular tumbling (see U.S. patent No. 5,593,867).
  • stringent hybridization and washing conditions are useful for nucleic acid molecules over about 500 bp.
  • Stringent hybridization conditions include a solution comprising about 1 M Na+ at 25° to 30 0 C below the Tm; e.g., 5 x SSPE, 0.5% SDS, at 65DC; see, Ausubel, et al, Current Protocols in Molecular Biology, Greene Publishing, 1995; Sambrook et al.. Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, 1989).
  • Tm is dependent on both the G+C content and the concentration of Na+.
  • Tm 81.5 + 0.41(%(G+C)) - log 10 [Na+]. Washing conditions are generally performed at least at equivalent stringency conditions as the hybridization. If the background levels are high, washing may be performed at higher stringency, such as around 15°C below the Tm.
  • Low stringency hybridizations are performed at conditions approximately 40 0 C below Tm, and are used for short fragments, e.g., less than about 500 bp. For fragments between about 100 and 500 bp, the Tm decreases about 1.5°C for every fewer 50 bp than 500. For very small fragments, e.g., less than about 50 bp, a formula for calculating Tm is 2°C for each AT pair and 4°C for each GC pair. Very high stringency hybridizations are performed at conditions approximately 10°C below Tm.
  • Hybridization conditions are tailored to the length and GC content of the oligonucleotide. Suitable hybridization conditions may be found in Sambrook et al., supra, Ausubel et al., supra, and furthermore hybridization solutions may contain additives such as tetramethylammonium chloride or other chaotropic reagents or hybotropic reagents to increase specificity of hybridization (see for example, PCT/US97/ 17413).
  • Hybridization may be detected in a variety of ways and with a variety of equipment.
  • the methods may be categorized as those that rely upon detectable molecules incorporated into the diversity panels and those that rely upon measurable properties of double-stranded nucleic acids (i.e., hybridized nucleic acids) that distinguish them from single-stranded nucleic acids (i.e., unhybridized nucleic acids).
  • the latter category of methods includes intercalation of dyes, such as ethidium bromide, into double-stranded nucleic acids, differential absorbance properties of double and single stranded nucleic acids, binding of proteins that preferentially bind double-stranded nucleic acids, and the like.
  • a radioactive label is used, autoradiography or storage phosphor screens (Phosphorlmager) are common methods of detection.
  • An alternative detection system that can be used with radioactive, fluorescent or chemiluminescent labels is a CCD integrated silicon wafer.
  • a charge-coupled device designed to detect high energy beta particles or photons, is placed in direct contact with a silicon support for an array. Upon binding of the sample to the immobilized nucleic acids, a radioisotope decay product or photon is generated. Electron-hole pairs are generated in the silicon and then electrons are collected by the CCD.
  • An alternative detection system for fluorescent molecules is a lens based camera detecting one or more fluorescent labels.
  • these cameras include epifluorescent microscopes, confocal microscopes, and charge-coupled cameras.
  • a laser excites a fluorescent label, the emitted light is collected through a bandpass filter, and the signal is detected by a photomultiplier tube that has electronics for counting photons.
  • labels are also amenable to use with either a lens-based camera or a CCD.
  • chemiluminescent labels or chromogenic substrates can be detected with a lens-based charge-coupled camera.
  • the label is a cleavable mass-spectrometry tag.
  • Such labels are then detected using a mass-spectrometer.
  • Many detection systems are commercially available (e.g., Affymetrix, Santa Clara, CA).
  • Affymetrix e.g., Affymetrix, Santa Clara, CA.
  • One skilled in the art is able to choose an appropriate detection means and equipment for the label used.
  • Patterns of hybridization can be expressed as presence or absence of hybridization, the degree of hybridization, or some combination of these.
  • the simplest analysis is performed by determining the presence or absence of hybridization.
  • the complexity of the genome of the organism to be genotyped is greater than the complexity of the genome(s) represented on the array, the absence of hybridization conclusively signifies a polymorphism.
  • the complexity is less than on the array, the absence of hybridization can signify either a polymorphism or a lack of representation of those sequences in the probing diversity panel.
  • the presence of hybridization does not necessarily signify the absence of a polymorphism under either scenario.
  • the pattern of hybridization is informative.
  • each addressable area is queried for hybridization using a method appropriate to the label. For example, when fluorescent labels are used, such as Cy3 and Cy5, both green and red signals are assayed.
  • a method appropriate to the label For example, when fluorescent labels are used, such as Cy3 and Cy5, both green and red signals are assayed.
  • positive and negative controls are included on the array, signals are compared to the controls and each addressable area is assigned a value, e.g., 1 for detectable hybridization and 0 for no detectable hybridization. In general, a value of 1 is assigned for detection over a threshold level and 0 assigned for detection under a threshold level. It will be appreciated by those skilled in the art that detection of polymorphisms is based primarily on finding a binary distribution of signal values for any particular array feature when hybridized with multiple diversity panels.
  • the panels are the same as those used to create the diversity array (see Example 5).
  • a diversity panel is generated from a heterozygote for a polymorphism, one will then detect a trimodal distribution.
  • two threshold values are calculated, the: first threshold separates the "0" cluster (lack of hybridization) from the "0/ 1° cluster (heterozygote) and the second threshold separates the "0/ 1" cluster from the "1" cluster (hybridization present).
  • Conventional statistical methods may be used to determine the threshold levels.
  • the genotype of the organism may then be expressed as a value for each addressable area.
  • the addressable array is a 96-spot format (a grid of 8 rows (A-G) x 12 columns ( 1- 12))
  • the value for hybridization is 1 and no detectable hybridization is 0, then it is possible to visualize the individual's expression profile on a two-dimensional grid reflecting those 1/0 detections.
  • relative values are assigned to each addressable location. The relative values will generally be normalized to controls. All data can be collected into database formats to facilitate comparisons as well as perform further analyses, such as construction of genotype trees.
  • oligonucleotides of some or all of those 27 genes disclosed herein may be used as components of a microarray.
  • the present invention is not limited to markers, such as oligonucleotides and nucleic acid probes, that specifically target the denoted 27 genes.
  • Nucleic acid probes or oligonucleotides also may be designed to target isoforms and homologs of any one of the 27 genes.
  • Proteins also can be observed by any means known in the art, including immunological methods, enzyme assays and protein array/ proteomics techniques, for determining the expression profile of a sample instead of, or in addition to, determining expression levels by detecting nucleic acid transcripts. Measurement of the translational state of proteins can be performed according to several protein methods. For example, whole genome monitoring of protein — the "proteome” — can be carried out by constructing a microarray in which binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of proteins having an amino acid sequence of any of SEQ ID NOs: 236-470 and 718-737 or proteins encoded by the genes of SEQ ID NOs: 1-235 and 698-717 or conservative variants thereof.
  • proteins can be separated by two-dimensional gel electrophoresis systems.
  • Two-dimensional gel electrophoresis is well-known in the art and typically involves isoelectric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension. See, e.g., Hames et al, , GEL ELECTROPHORESIS OF PROTEINS: A PRACTICAL APPROACH (IRL Press, 1990).
  • the resulting electropherograms can be analyzed by numerous techniques, including mass spectrometric techniques, western blotting and immunoblot analysis using polyclonal and monoclonal antibodies, and internal and N-terminal micro- sequencing.
  • a nucleic acid marker to, say, the corin gene can be a nucleic acid sequence, such as an oligonucleotide or probe, that is complementary to a corin- specific gene sequence; so that, when it is affixed to a substrate surface, the probe will anneal to or hybridize to a corin gene nucleic acid transcript, be it genomic DNA, cDNA, or RNA.
  • an oligonucleotide may be designed to any and all of the 27 genes denoted above.
  • the present invention contemplates the presence of multiple oligonucleotides on a particular substrate that is designed to anneal to or hybridize to the same gene.
  • the "pairs" will be identical, except for one nucleotide which preferably is located in the center of the sequence.
  • the second oligonucleotide in the pair serves as a control.
  • the number of oligonucleotide pairs may range from two to one million.
  • the oligomers are synthesized at designated areas on a substrate using a light-directed chemical process.
  • the substrate may be paper, nylon or other type of membrane, filter, chip, glass slide or any other suitable solid support.
  • the gene of interest may be examined using a computer algorithm which starts at the 5' or more preferably at the 3' end of the nucleotide sequence.
  • the algorithm identifies oligomers of defined length that are preferably unique to the gene, have a GC content within a range suitable for hybridization, and lack predicted secondary structure that may interfere with hybridization.
  • an oligonucleotide may be synthesized on the surface of the substrate by using a chemical coupling procedure and an ink jet application apparatus, as described in PCT application WO95/2511 16 (Baldeschweiler et al.) which is incorporated herein in its entirety by reference.
  • a "gridded" array analogous to a dot or slot blot may be used to arrange and link cDNA fragments or oligonucleotides to the surface of a substrate using a vacuum system, thermal, UV, mechanical or chemical bonding procedures.
  • An array such as those described above, may be produced by hand or by using available devices (slot blot or dot blot apparatus), materials (any suitable solid support), and machines (including robotic instruments).
  • oligonucleotides, probes, or pieces of target-specific nucleic acids may be labeled in such fashion that a detectable signal is generated from their annealing or hybridizing to the target nucleic acid.
  • a microarray of the present invention is placed into contact with a biological sample, such as blood, urine, saliva, phlegm, gastric juices, cultured cells, tissue biopsies, or other tissue preparations; and a detection system may then be used to measure the absence, presence, and/or amount of hybridization for all of the distinct sequences simultaneously.
  • a biological sample such as blood, urine, saliva, phlegm, gastric juices, cultured cells, tissue biopsies, or other tissue preparations.
  • the present invention contemplates a microarray of at least about one, at least about two, or any number in between about two and up to 27, or all of SEQ ID NOs: 1-27 genes denoted above in Groups 1-4, which can be used as described herein to evaluate a biological sample from a subject to determine whether that subject is symptomatic or is afflicted with a cardiomyopathic disease, such as dilated cardiomyopathy.
  • a cardiomyopathic disease such as dilated cardiomyopathy.
  • the present invention is not limited to the use of a microarray assay, however, for diagnosing a cardiomyopathic disease via gene expression level analysis.
  • the present invention also contemplates the use of techniques such as polymerase chain reaction (PCR), quantitative-PCR and Real Time PCR, Quantitative Competitive Reverse Transcription-PCR and Real Time Detection 5'-Nuclease-PCR.
  • PCR polymerase chain reaction
  • quantitative-PCR quantitative-PCR
  • Real Time PCR Quantitative Competitive Reverse Transcription-PCR
  • Real Time Detection 5'-Nuclease-PCR Real Time Detection 5'-Nuclease-PCR.
  • TaqMan RT-PCR also is known as TaqMan RT-PCR.
  • TaqMan RT-PCR is useful for correlating the concentration of a protein in a sample tissue to its mRNA expression. See Hirayama et al., "Concentrations of Thrombopoietin in Bone Marrow in Normal Subjects and in Patients with Idiopathic Thrombocytopenic Purpura, Aplastic Anemia, and Essential Thrombocythemia Correlate With Its mRNA Expression of Bone Marrow Stromal Cells," Blood, 92(1): 46 52, 1998.
  • This quantitative replicative method relies on the presence of a 5'-nuclease assay in the RT-PCR reactions, wherein a probe specific for the target protein, contains a fluorescent moiety such as 6-carboxyfluorescein (FAM) on its 5'-end, and a phosphate-capped quencher fluor moiety such as 6-carboxytetramethylfluorescein (TAMRA) on its 3'-end.
  • FAM 6-carboxyfluorescein
  • TAMRA 6-carboxytetramethylfluorescein
  • TaqMan RT-PCR also may be coupled with an ABI Prism 7700 Sequence Detection System, or Competitive PCR for quantification of DNA. See Desjardin et al., "Comparison of the ABI 7700 System (TaqMan) and Competitive PCR for Quantification of IS61 10 DNA in Sputum During Treatment of Tuberculosis," J. Clin. Microbiol., 36(7): 1964- 1976, 1998.
  • Another assay contemplated by the present invention is an ELISA assay to detect protein products of one or more of the denoted 27 genes from a subject's biological sample.
  • ELISA assays and associated plate-reader apparatus are well known to those skilled in the art.
  • PAM Prediction analysis for microarrays
  • the PAM method Based on the smallest gene set for classification from Dataset B, the PAM method classified all four human heart failure datasets with low misclassification rates. Notably, despite the large variation of gene expression values for single genes, the classifier as a whole is highly valuable to distinguish DCM and NF samples. These results support the usefulness of this molecular approach for diagnostic applications. In addition, the classificator gene set based on DCM and NF hearts also achieved a similarly high accuracy of classification in ICM like in DCM samples, suggesting that this gene set could be representative of molecular changes of heart failure in general.
  • the classificator based on Dataset B performed as well in Datasets A and D as if one used classificators generated from these two datasets alone.
  • the gene signature from Dataset B was also able to accurately discriminate NF and DCM samples in Dataset C.
  • differences in gene expression were greater between left ventricular assist-device (LVAD) and non- LVAD hearts in the DCM group than between DCM and NF samples (13). This peculiarity might impede the PAM approach for identifying a useful classifier between DCM and NF within this Dataset itself.
  • the classifier gene signature can be grouped into different functional sets with respect to the pathogenesis of DCM.
  • Up-regulation of the cardiomyopathy markers pro-ANP and pro-BNP is well established in heart failure (23) and mediated by neurohormonal dysregulation (27).
  • Activation of pro-f ⁇ brotic stress hormone pathways lead to prominent structural remodeling in DCM, exemplified by deregulation of genes coding for sarcomer structure and extracellular matrix proteins like myosin 6 and 10, asporin, procollagen C-endopeptidase enhancer 2 (PCOLCE2), kelch-like 3 (KLHL3) and AE binding protein 1 (AEBPl).
  • PCOLCE2 procollagen C-endopeptidase enhancer 2
  • KLHL3 kelch-like 3
  • AEBPl AE binding protein 1
  • transcripts of this set of classifier genes including the transcription factor ZBTB 16 (28), the connective tissue growth factor CTGF (29) and the chemokine CCL2 (30), characterize important targets of the renin-angiotensin system in failing myocardium, as they all have been shown to be induced by angiotensin-II.
  • transcripts of this classifier gene set belong to anti-apoptotic (PHLDAl, SNCA, CCL2) and cell growth processes (FRZB, SFRP4, SPOCK, CTGF).
  • FRZB frizzled-related protein
  • SFRP4 secreted frizzled related protein 4
  • FRZB frizzled-related protein
  • SFRP4 secreted frizzled related protein 4
  • CFHL3 complement factor H-related 3
  • FCN3 ficolin 3
  • CCL2 chemokine ligand 2
  • S100A8 calgranulin A
  • CCL2 is a prominent member of the broader functional group of immune and inflammatory processes and was found to be down- regulated in both datasets A and B.
  • This chemokine capable of interacting with TNF-alpha and IL-6-related pathways, has been localized to the cardiomyocyte compartment by immunohistochemistry (24). It promotes attraction and invasion of activated leukocytes into the failing myocardium, but is also involved in shaping the extracellular matrix by modulating the activity of matrix metalloproteinases and collagen turnover (35) as well as cell proliferation and induction of apoptosis (24). Down-regulation of CCL2 transcripts in end-stage heart failure may therefore represent an adaptive mechanism to promote cell survival.
  • additional chemokines like CCLl 1 and CCL 18 were also found to be down- regulated in Affymetrix and Unigene arrays, respectively.
  • Dataset A cDNA microarray study with 28 septal myocardial samples obtained from 13 DCM hearts at the time of transplantation and 15 NF donor hearts which were not transplanted because of palpable coronary calcifications. The latter patient group was not known to have any history of overt cardiovascular disease. Detailed patient characteristics are listed in Table 3.
  • Dataset B oligonucleotide microarray study with twelve independent subendocardial left ventricular samples were collected from seven DCM patients and five NF donors. Detailed patient characteristics are listed in Table 3.
  • RNA isolation, sample preparation, labeling, hybridization to RZPD LJnigene 3.1 cDNA (37.5K) and to Affymetrix U133A (22.2K) arrays was carried out as described previously (6, 11 , 12).
  • Dataset C six NF, 21 DCM and 10 ICM samples hybridized to Affymetrix HG-U 133 A arrays (13). Normalized gene expression data were downloaded from Gene Expression Omnibus (accession number GSE1869).
  • Dataset D available online through a program for genomic application funded by the National Heart, Lung, and Blood Institute and consisted of 14 NF, 27 DCM and 32 ICM samples hybridized to Affymetrix HG-U 133 2.0 plus arrays (http: / / www.cardiogenomics.org) (14).
  • Prediction analysis for microarrays was used for classification.
  • the ability to correctly classify the status of DCM and NF samples was assessed by complete cross-validation implemented in the Bioconductor package "MCRestimate" (22).
  • MCRestimate Bioconductor package "MCRestimate"
  • the samples in every study were randomly divided into equally sized subsets. In each following step, one subset was left aside and the classifier (filtering and PAM) was built on the remaining samples (training set).
  • the status (NF vs. DCM) of the left-out samples was predicted and compared with the clinically diagnosed status. Optimization of the PAM parameter and of the number of genes remaining after variance filtering was achieved through a second cross-validation within each training set.
  • To estimate the variability of the cross-validation result based on different sample compositions of the training set the procedure was repeated 50 times. A sample was called "misclassified” if it was incorrectly classified in more than half of all cross-validations.
  • Dataset A 1353 transcripts were up-regulated and 384 were down -regulated in DCM.
  • Dataset B 399 transcripts were up-regulated and 75 transcripts were down- regulated in DCM.
  • up-regulation was about four- to five-times more common than down -regulation, indicating a net transcriptional activation in heart failure.
  • 76 transcripts were found to be consistently deregulated in both studies, representing an approximate 16% overlap at the single gene level between both microarray studies.
  • NPPB pro-brain natriuretic peptide
  • CCL2 chemokine ligand 2
  • differentially expressed genes were related to their respective GO classes. Thereby, it was possible to identify specific biological processes which were consistently enriched in up- or down-regulated transcripts of both studies. For example, both studies showed a marked up-regulation of transcripts involved in protein biosynthesis in DCM.
  • extracellular matrix protein 2 (a member of the small leucine rich proteoglycans (SLRP), important for collagen fibrillogenesis), asporin and most other members of the SLRP family were found to be up-regulated as well (decorin, lumican, biglycan, fibromodulin, osteoglycin, and osteomodulin), highlighting their importance in extracellular remodeling.
  • Z-disc genes coding for Z-disc components were noted, including caldesmon 1, sarcospan, sarcoglycan epsilon, utrophin, spectrin, titin, vinculin, sarcoglycan D and G, aJpha-actinin, LIM-domain binding 3, and alpha-2-capping protein.
  • the Z-disc is thought to act as a sensor, linking biomechanical forces to the activation of stress pathways (25).
  • pro- and anti-apoptotic programs may determine if relevant loss of myocytes occurs.
  • up-regulation of anti- (FGFl, DSIPI, CCL2) and pro-apoptotic transcripts (BCLAFl , FOXO3A) were noted in these studies.
  • the second goal of the experiments of the present invention was to identify a specific set of transcripts which could reliably classify DCM and NF samples.
  • the classification method "PAM" was performed on four independent microarray studies. Very low misclassification rates were found in two studies (Datasets A and B) and in Dataset D for the classification of NF versus DCM samples ( Figure 2). Specifically, one out of twelve samples was misclassified in Dataset B. Likewise, Datasets A and D showed similar results, with one out of 28 and three out of 41 misclassified samples, respectively. In contrast, the classification algorithm did not show any predictive power in Dataset C. This was unexpected as the expression levels of established molecular cardiomyopathy markers, including pro-BNA or pro- ANP, suggested a clear separation into NF and failing ventricular samples ( Figure 3).
  • the 27-gene signature included known marker genes of heart failure: pro- BNP, pro-ANP, corin (converts pro- ANP to biologically active ANP), transcripts encoding for sarcomer structure proteins (MYH6, MYHlO), anti-apoptotic processes (CCL2, PHLDAl , SNCA), cell growth (FRZB, SFRP4, SPOCK 1 CTGF) and cell cycle control (G0S2, ETV5, RARRES l).
  • pro- BNP pro- ANP
  • pro-ANP corin (converts pro- ANP to biologically active ANP)
  • CCL2, PHLDAl , SNCA anti-apoptotic processes
  • FRZB SFRP4, SPOCK 1 CTGF
  • G0S2, ETV5, RARRES l cell cycle control
  • RNA samples used for microarray hybridization were amplified once. Linear amplification was performed using the MessageAmpTM aRNA Kit (Ambion, Huntingdon, United Kingdom) according to the manufactures instructions.
  • RNA amplified RNA
  • Agilent 2100 bioanalyzer Agilent Technologies GmbH, Waldbronn, Germany.
  • all aRNA samples showed a length distribution of 50 - 6000 nucleotides with maximum peaks at about 900 - 1000 nucleotide.
  • a slight length shortening of the aRNA samples was observed.
  • Utility of T7 RNA polymerase based linear amplification has been shown previously. See Sultmann et al., "Gene expression in kidney cancer is associated with cytogenetic abnormalities, metastasis formation, and patient survival," Clin Cancer Res. 2005; l l :646-655.
  • Cy3- and Cy5-labeled probes were purified with Microcon YM-30 columns (Milipore, Bedford, MA, USA), combined and resuspended in 50 ⁇ l Ix DIG-Easy hybridization buffer (Roche Diagnostics, Mannheim, Germany), containing 10x Denhardt's solution and 2 ng/ ⁇ l Cotl-DNA (Invitrogen). Hybridizations were carried out in duplicate on Unigene 3.1 microarrays.
  • the hybridized arrays were scanned with the GenePix 4000B microarray scanner (Axon Instruments Inc., Union City, CA, USA), and analyzed using GenePix Pro 4.1 software (Axon Instruments).
  • HG-U 133A chip (Affymetrix, Santa Clara, CA, USA) representing 22.283 probe sets was used for each human heart sample.
  • the sequences were derived from GenBank, dbEST and RefSeq. Sequence clusters were created from Build 133 of UniGene (April 20, 2001). Further information about the Gene Chip System can be obtained at www.affymetrix.com. See Liu et al., "NetAffx: Affymetrix probesets and annotations," Nucleic Acids Res., 2003;31 :82-6. mRNA-Preparation and Hybridization to Affymetrix HG-U 133A Microarravs
  • Double-stranded cDNA was synthesized from lO ⁇ g total RNA by using the Superscript double-stranded cDNA synthesis kit (Invitrogen, Düsseldorf, Germany) with an HPLC-purified oligo(dT) primer containing a T7 RNA polymerase promoter (GENSET, La Jolla, CA, USA) following the manufacturer's protocol.
  • Biotinylated cRNA probes were synthesized by in vitro transcription using ENZO BioArray RNA transcript labeling kit (ENZO Diagnostics, Farmingdale, NY, USA). Fragmentation of lO ⁇ g biotinylated cRNA as well as subsequent steps of hybridization, washing and staining followed instructions provided by Affymetrix (Affymetrix, Santa Clara, CA, USA).
  • the hybridized arrays were scanned with the GeneChip Scanner 2500 (Affymetrix, Santa Clara, CA, USA) and preprocessed using Microarray Suite 5 software (Affymetrix, Santa Clara, CA, USA).
  • Gene-specific primers and probes were designed using Primer 3 software (Applied Biosystems, Foster City, CA, USA) to amplify fragments of 70-150 base pairs in length close to the 3'-end of the transcript. Real-time PCR was performed in triplicate for each sample with 10 ⁇ l aliquot of diluted cDNA (1 :3).
  • a 2x Universal PCR Master-Mix from Perkin Elmer (containing AmpliTaq GoldTM DNA-Polymerase, AmpErase UNG, dNTPs with dUTPs, passive reference dye and optimized buffer including MgCh), 900 nM primer and 200 nM probe were used.
  • PCR-amplification of cDNA started with a "hot start -activation of SureStart Taq polymerase at 95°C for 10 minutes, followed by 40 cycles of 15s denaturation at 95°C, annealing for 60s at 58°C, and 10s elongation at 72"C. All experimental results for the samples with a coefficient of variation >10% were retested.
  • ⁇ Ct relative quantification method based on the REST-program developed by Pfaffl was used. See Pfaffl et al., "Relative expression software tool (REST) for group-wise comparison and statistical analysis of relative expression results in real-time PCR," Nucleic Acids Res., 2002;30(9):e36.
  • Taqman validation of twelve candidate genes served as housekeeping gene.
  • GAPDH served as housekeeping gene.
  • q-value based on "Significance Analysis of Microarrays" (SAM) is given, whereas Taqman data was analyzed by Student's t-test.
  • SAM Signal Analysis of Microarrays
  • BCL2-associated transcription factor 1 basic helix-loop-helix domain containing, class B, 3, chromosome 16 open reading frame 45, caldesmon 1, CDC 14 cell division cycle 14 homolog B (S. cerevisiae), carbohydrate (N- acetylglucosamine 6-O) sulfotransferase 5, collagen, type V, alpha 1, collagen, type VlII, alpha 1 , coatomer protein complex, s ⁇ bunit zeta 2, cofactor required for SpI transcriptional activation, subunit 6, 77kDa, connective tissue growth factor, discs, large homolog 1 (Drosophila), dynein, cytoplasmic, light polypeptide 1, dedicator of cytokinesis 9, delta sleep inducing peptide, immunoreactor, extracellular matrix protein 2, female organ and adipocyte specific, exostoses (multiple) 1 , Fibroblast growth factor 1 (acidic), hypothetical protein FLJ22662, forkhead box O3A
  • septin 2 sarcoglycan, epsilon, solute carrier family 25 (mitochondrial carrier), member 5, solute carrier family 30 (zinc transporter), member 1, sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican), sprouty-related, EVH l domain containing 2, sprouty homolog 1, antagonist of FGF signaling (Drosophila), sarcospan, t-complex-associated-testis-expressed 1-like 1 , transmembrane protein 43, thioredoxin domain containing 7, and exportin 1 (CRM l homolog, yeast).
  • transcripts classifying DCM and NF samples (generated by PAM classification from dataset B and listed in alphabetical order). In addition to expression values for DCM and NF samples of dataset B, the ranking of the single genes for classification of datasets A-D based on PAM-parameters is given. When transcripts were represented by two probe sets, the ranking of both is indicated. Genes signed with a line in dataset A were not resent in the cDNA arra dataset.
  • Kittleson MM Minhas KM, Irizarry RA, et al. Gene expression analysis of ischemic and nonischemic cardiomyopathy: shared and distinct genes in the development of heart failure. Physiol Genomics. 2005;21:299-307.

Abstract

The present invention provides a convenient and highly effective microarray tool and method for diagnosing a cardiomyopathy state, especially dilated cardiomyopathy, in a subject, with a high degree of accuracy.

Description

A COMMON GBNE EXPRESSION SIGNATURE IN DILATED CARDIOMYOPATHY
This International application claims priority to United States Provisional Application Serial Number 60/832,959, which was filed on July 25, 2007, and which is incorporated herein by reference.
FIELD OF THE INVENTION
The present invention is drawn to a convenient and highly effective microarray tool and method for diagnosing a cardiomyopathy state, especially dilated cardiomyopathy, in a subject, with a high degree of accuracy.
BACKGROUND OF THE INVENTION
Dilated cardiomyopathy (DCM) is characterized by dilatation and impaired contraction of one or both ventricles in the absence of significant coronary artery disease. The incidence of DCM is estimated to be about 5-8 cases per 100,000 individuals, with a prevalence of 36 per 100,000. See Dec et al., "Idiopathic dilated cardiomyopathy," N. Engl. J. Med., 1994; 331 : 1564-75. For this reason, DCM is deemed to be a leading cause of heart failure and cardiac transplantation in Western countries. See Roger et al., "Trends in heart failure incidence and survival in a community-based population," J. Am. Med. Assoc, 2004; 292:344-50.
The high morbidity and mortality associated with DCM underscores the need for a better understanding of the underlying molecular events leading to heart failure in DCM. To date, natriuretic peptides are the typical markers for diagnosing and managing heart failure. However, there is significant heterogeneity in expression levels of these natriuretic molecular disease markers that is not explained by left ventricular function alone. See Hwang et al., "Microarray gene expression profiles in dilated and hypertrophic cardiomyopathic end-stage heart failure," Physiol Genomics, 2002; 10:31-44; Tan et al., "The gene expression fingerprint of human heart failure," Proc NaU Acad Sci U S A., 2002;99: l 1387-92; and et al., "Gene expression profiles in end-stage human idiopathic dilated cardiomyopathy: altered expression of apoptotic and cytoskeletal genes," Genomics, 2004;83:281-97.
Transcriptional signature analysis is a powerful technique for identifying potential molecular targets that could ultimately become important for diagnosis and therapy of heart failure. Yet, differences in platform technologies, experimental design, and the biological heterogeneity associated with the use of human tissue samples are obstacles to the successful comparison and integration of results obtained by different microarray studies in heart failure. In addition, substantial regional variation in gene expression exists in mammalian myocardium including atrium, ventricle and septum or left and right side of the heart. See Barth et al., "Functional profiling of human atrial and ventricular gene expression," Pflύgers Archiv. 2005;450:201-8; Nabauer et al., "Regional differences in current density and rate-dependent properties of the transient outward current in subepicardial and subendocardial myocytes of human left ventricle," Circulation. 1996;93: 168-77; Ramakers et al., "Molecular and electrical characterization of the canine cardiac ventricular septum," J MoI Cell Cardiol. 2005;38: 153-61; and Tabibiazar et al., "Transcriptional profiling of the heart reveals chamber-specific gene expression patterns," Circ Res. 2003; 93: 1193-201.
These differences make it difficult to determine which transcripts are actually related to DCM. Indeed, different etiologies and duration of dilated cardiomyopathy, differences in age, gender and medications, as well as individual course of the disease, all contribute to the variability of gene expression data. In addition, it can be difficult to obtain true "non-failing" human ventricular tissue, as donor hearts may have been exposed to varying degrees of hypoxia or hemodynamic stress, which are known to be potent inducers of chemokine and BNP gene expression. See Goetze et al., "Acute myocardial hypoxia increases BNP gene expression," FASEB J. 2004; 18: 1928-30.
The present invention takes into account the various variables that can otherwise confound a cardiomyopathy diagnosis, by integrating independent microarray studies from large numbers of failing and non-failing hearts. Accordingly, the present invention provides a convenient and highly effective tool and method for diagnosing a cardiomyopathic state with accuracy.
SUMMARY OF THE INVENTION
A general aspect of the present invention is the detection of particular nucleic acid or protein expression levels in a biological sample, which is useful for preparing an expression profile of that sample, wherein the profile is indicative of a healthy or diseased condition associated with that sample. In one embodiment, the invention provides expression profiles in various conditions associated or symptomatic of cardiomyopathy. Hence, the invention produces expression profiles that are useful for distinguishing, for instance, between cardiomyopathic tissue and healthy tissue, failing and non-failing heart conditions, and ischemic and non-ischemic cardiomyopathy.
Accordingly, an aspect of the invention entails comparing the expression profile from a biological sample from an individual, such as a human patient, with the expression profile of a healthy individual or the expression profile of an individual who does not have the cardiomyopathic condition that the tested individual is believed or suspected of having.
One aspect of the present invention, therefore, provides a collection of nucleic acids, each one of which shares sequence identity with either the sense or antisense strand sequence of a particular target gene or target nucleic acid implicated in a particular cardiomyopathic condition, that is useful for determining the expression levels of those target genes in any given sample. Ideally, each nucleic acid comprises a sequence that is completely or partially complementary to a sequence of the target gene or target nucleic acid. In one embodiment, a nucleic acid sequence of the collection is completely or partially complementary to 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41 , 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 350, 400, 450, 500, or more than 500 contiguous nucleotides of the target gene or target nucleic acid. A nucleic acid sequence that is completely complementary is one that comprises a sequence that is identical in its complement to the equivalent sequence of the target. That is, the nucleic acid of the collection comprises a sequence that has no nucleotide mismatches compared to the corresponding target sequence.
Thus one embodiment of the present invention is an array, such as a micro- or nanoarrays, that comprises a collection of nucleic acid molecules, wherein each nucleic acid molecule comprises a nucleotide sequence that is complementary to a target sequence of one of the following genes listed below from SEQ ID NOs: 1-27. The collection of nucleic acids that are affixed or immobilized on the surface of the array may include nucleic acids that share sequence identity with some or all of these twenty-seven sequences. In certain instances, the array may comprise multiple nucleic acids each of which
(1) Synuclein alpha (non A4 component of amyloid precursor) ("SNCA") (SEQ ID NO: 1);
(2) Asporin ("ASPN") (SEQ ID NO: 2);
(3) Secreted frizzled-related protein 4 ("SFRP4") (SEQ ID NO: 3);
(4) Pleckstrin homology-like domain, family A, member 1 ("PHLDAl") (SEQ ID NO: 4); (5) Frizzled-related protein ("FRZB") (SEQ ID NO: 5);
(6) Myosin heavy chain 6, cardiac muscle, alpha (cardiomyopathy, hypertrophic 1) ("MYH6") (SEQ ID NO: 6);
(7) Chemokine (C-C motif) ligand 2 ("CCL2") (SEQ ID NO: 7);
(8) Ornithine decarboxylase 1 ("ODCl") (SEQ ID NO: 8);
(9) Retinoic acid receptor responder (tazarotene induced) 1 ("RARRESl ") (SEQ ID NO: 9);
(10) Complement factor H ("CFH") (SEQ ID NO: 10);
(11) Alanine -glyoxylate aminotransferase 2-like 1 ("AGXT2L1") (SEQ ID NO: i i);
(12) Myosin heavy chain 10, non-muscle (MYHlO) (SEQ ID NO: 12);
(13) Ficolin (collagen/ fibrinogen domain containing) 3 (Hakata antigen) ("FCN3") (SEQ ID NO: 13);
(14) S lOO calcium binding protein A8 ("S100A8") (SEQ ID NO: 14);
(15) Corin serine peptidase ("CORIN") (SEQ ID NO: 15);
(16) Natriuretic peptide precursor A ("NPPA") (SEQ ID NO: 16);
(17) Procollagen C-endopeptidase enhancer 2 ("PCOLCE2") (SEQ ID NO: 17);
(18) Natriuretic peptide precursor B ("NPPB") (SEQ ID NO: 18);
(19) Activating transcription factor 3 ("ATF3") (SEQ ID NO: 19);
(20) Inhibitor of DNA binding 4, dominant negative helix-loop-helix protein ("ID4") (SEQ ID NO: 20);
(21) Sparc/ osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 ("SPOCKl") (SEQ ID NO: 21);
(22) Connective tissue growth factor ("CTGF") (SEQ ID NO: 22);
(23) G0/G lswitch 2 ("G0S2") (SEQ ID NO: 23);
(24) Kelch-like 3 (Drosophila) ("KLHL3") (SEQ ID NO: 24); (25) Zinc finger and BTB domain containing 16 ("ZBTB16") (SEQ ID NO: 25);
(26) AE binding protein 1 ("AEBPl") (SEQ ID NO: 26); and
(27) ETS variant gene 5 (ets-related molecule) (ETV5) (SEQ ID NO: 27).
In one embodiment, each of the nucleic acid molecules in the collection on the microarray comprises a different sequence as compared to each of the other nucleic acid molecules in the collection.
In another embodiment, each of the nucleic acid molecules in the collection comprises a complementary sequence to a target gene that is different from the complementary target gene sequence in each of the other nucleic acid molecules in the collection.
In one embodiment, the nucleic acid molecule is an oligonucleotide or a probe. In one embodiment, the oligonucleotide or probe is labeled with a moiety to promote signal detection after the oligonucleotide or probe hybridizes to its corresponding target sequence.
In another embodiment, the collection comprises nucleic acid molecules that comprise complementary sequences to at least the following sequences: a myosin heavy polypeptide 10 (non-muscle) gene, a synuclein gene, an alpha putative lymphocyte G0/G1 switch gene gene, an ets variant gene 5, an AE binding protein 1 gene, a kelch-like 3 gene, a zinc finger and BTB domain containing 16 gene, and a procollagen C-endopeptidase enhancer 2 gene.
Another aspect of the present invention is directed to a method for diagnosing a cardiomyopathic state in a subject comprising: (i) exposing the microarray of claim 1 to an isolated biological sample (ii) determining which nucleic acid molecules of the collection hybridize to its corresponding target gene sequence, and (iii) comparing that hybridization pattern to the hybridization pattern of a control for the same genes, wherein a difference between the two patterns is indicative that the individual has a cardiomyopathic disease.
In one embodiment, the biological sample is blood or a tissue biopsy. In one embodiment, the tissue biopsy is a heart muscle biopsy.
In another embodiment of this method, the cardiomyopathic disease is dilated cardiomyopathy or idiopathic dilated cardiomyopathy.
Another aspect of the present invention is directed to an assay for diagnosing a cardiomyopathic state in a subject, comprising: (i) determining the gene expression levels of one or more of the genes or nucleic acids selected from the group consisting of:
synuclein alpha (non A4 component of amyloid precursor) ("SNCA"), Asporin ("ASPN"), Secreted frizzled-related protein 4 ("SFRP4"), Pleckstrin homology-like domain, family A, member 1 ("PHLDAl"), Frizzled-related protein ("FRZB"), Myosin heavy chain 6, cardiac muscle, alpha (cardiomyopathy, hypertrophic 1) ("MYH6"), Chemokdne (C-C motif) ligand 2 ("CCL2"), Ornithine decarboxylase 1 ("ODCl"), Retinoic acid receptor responder (tazarotene induced) 1 ("RARRESl"), Complement factor H ("CFH"), Alanine-glyoxylate aminotransferase 2-like 1 ("AGXT2L1"), Myosin heavy chain 10, non-muscle (MYH lO), Ficolin (collagen/ fibrinogen domain containing) 3 (Hakata antigen) ("FCN3"), SlOO calcium binding protein A8 ("S100A8"), Corin serine peptidase ("CORIN"), Natriuretic peptide precursor A ("NPPA"), Procollagen C-endopeptidase enhancer 2 ("PCOLCE2"), Natriuretic peptide precursor B ("NPPB"), Activating transcription factor 3 ("ATF3"), Inhibitor of DNA binding 4, dominant negative helix-loop-helix protein ("ID4"), Sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 ("SPOCKl"), Connective tissue growth factor ("CTGF"), G0/Glswitch 2 ("G0S2"), Kelch-like 3 (Drosophila) ("KLHL3"), Zinc finger and BTB domain containing 16 ("ZBTBl 6"), AE binding protein 1 ("AEBPl"), and ETS variant gene 5 (ets-related molecule) (ETV5)
wherein the gene or nucleic acid is in a biological sample isolated from the subject, and (ii) comparing the expression levels to a biological sample from a healthy individual, wherein a difference in the expression levels indicates that the subject has a cardiomyopathy disease. In one embodiment, the cardiomyopathy disease is dilated cardiomyopathy or idiopathic dilated cardiomyopathy.
In one embodiment, the expression levels that are determined are the levels of either the respective gene transcripts or proteins.
In yet another embodiment, the expression levels of one or more, at least about two, at least about three, at least about four, at least about five, at least about six, at least about seven, at least about eight, at least about nine, at least about ten, at least about eleven, at least about twelve, at least about thirteen, at least about fourteen, at least about fifteen, at least about sixteen, at least about seventeen, at least about eighteen, at least about nineteen, at least about twenty, at least about twenty one, at least about twenty two, at least about twenty three, at least about twenty-four, at least about twenty five, at least about twenty six, or about twenty seven genes are determined in the subject's biological sample. In another embodiment, the expression levels of all 27 genes are determined in the subject's biological sample and compared to the expression levels of the same 27 genes in the healthy subject's biological sample.
In one embodiment of this assay the biological sample is blood or a tissue biopsy. In another embodiment, the tissue biopsy is cardiac muscle.
In one embodiment of this assay the step of determining gene expression levels is performed by at least one of a method selected from the group consisting of microarray, PCR, RT-PCR, TaqMan RT-PCR, Northern Blot, Western Blot, antibody detection, ELISA, or any combination thereof.
In addition to diagnosing whether a subject has dilated cardiomyopathy, the present invention also encompasses the diagnosis of other cardiomyopathies using the compositions and methods described herein. For instance, the present invention encompasses the diagnosis or prognosis of ischemic cardiomyopathy (ICM) using the genes, detection methods, and assays disclosed herein. Both the foregoing summary of the invention and the following brief description of the drawings and the detailed description of the invention are exemplary and explanatory and are intended to provide further details of the invention as claimed. Other objects, advantages, and novel features will be readily apparent to those skilled in the art from the following detailed description of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1: Shows a functional analysis based on Gene Ontology for selected gene classes comparing up-regulated genes in DCM (yellow bars) and down- regulated (green bars) genes in DCM of Dataset A (open bars) and B (striped bars). Differences in gene classes marked by an asterisk were statistically significant (Fisher's exact test, p<0.05) according to "FatiGO" (20).
Figure 2: Shows a Prediction analysis for microarrays ("PAM") classification. In the first step, PAM classification was applied to all four datasets separately. Very low misclassification rates were found in datasets A, B and D for the classification of non-failing (NF) vs. DCM samples, whereas the classification algorithm did not show any power in Dataset C. In the second step, the procedure was repeated with the smallest gene signature obtained from Dataset B, now achieving more than 90% accuracy for classifying DCM and NF samples across all studies, including Dataset C. Figure 3: Shows mean expression ± S. E. M of pro-BNP in NF (black bars) and DCM samples (red bars) in datasets A-D. Statistical comparison was carried out by Student's t-test.
DETAILED DESCRIPTION
The present invention provides methods and compositions for identifying genetic biomarkers that are useful for classifying and diagnosing cardiomyopathy disease states. The present invention identifies genes and biological processes that are either known to be involved with, or are implicated in, dilated cardiomyopathy (DCM). By identifying and collating a subset of genes that are differentially expressed in failing and non-failing (NF) hearts, the present invention creates a genetic "signature" that can be used in diagnostic tests to evaluate a subject's condition that may be symptomatic of a cardiomyopathy, such as DCM.
A. Definitions
As used herein, the singular forms "a," "an," and "the" designate both the singular and the plural, unless expressly stated to designate the singular only. specific conformation or aggregative state of a protein.
The term "protein" is understood to include the terms "polypeptide" and "peptide" (which, at times, may be used interchangeably herein) within its meaning.
As used herein, "about" will be understood by persons of ordinary skill in the art and will vary to some extent on the context in which it is used. If there are uses of the term which are not clear to persons of ordinary skill in the art given the context in which it is used, "about" will mean up to plus or minus 10% of the particular term.
"Recombinant proteins or polypeptides" refer to proteins or polypeptides produced by recombinant DNA techniques, i.e., produced from cells, microbial or mammalian, transformed by an exogenous recombinant DNA expression construct encoding the desired protein or polypeptide. Proteins or polypeptides expressed in most bacterial cultures will typically be free of glycan. Proteins or polypeptides expressed in yeast may have a glycosylation pattern different from that expressed in mammalian cells.
"Native" or "naturally occurring" proteins or polypeptides refer to proteins or polypeptides recovered from a source occurring in nature. A DNA or polynucleotide "coding sequence" is a DNA or polynucleotide sequence that is transcribed into mRNA and translated into a polypeptide in a host cell when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are the start codon at the 5' N-terminus and the translation stop codon at the 3' C-terminus. A coding sequence can include prokaryotic sequences, cDNA from eukaryotic mRNA, genomic DNA sequences from eukaryotic DNA, and synthetic DNA sequences. A transcription termination sequence will usually be located 3' to the coding sequence.
"DNA or polynucleotide sequence" is a heteropolymer of deoxyribonucleotides (bases adenine, guanine, thymine, cytosine). DNA or polynucleotide sequences can be assembled from synthetic cDNA-derived DNA fragments and short oligonucleotide linkers.
The terms "analog", "fragment", "derivative", and "variant", when referring to the nucleic acids of this invention mean analogs, fragments, derivatives, and variants of such nucleotides having, for example, at least about 60% sequence identity, at least about 70% sequence identity, at least about 80% sequence identity, at least about 90% sequence identity, at least about 91% sequence identity, at least about 92% sequence identity, at least about 93% sequence identity, at least about 94% sequence identity, at least about 95% sequence identity, at least about 96% sequence identity, at least about 97% sequence identity, at least about 98% sequence identity, at least about 99% sequence identity, or at least about 100% sequence identity to the native or naturally occurring nucleic acid, as described herein.
"Similarity" between two polynucleotides is determined by comparing the amino acid sequence corresponding to each polynucleotide to the amino acid sequence corresponding to the second polynucleotide. An amino acid of one amino acid sequene is similar to the corresponding amino acid of a second amino acid sequence if it is identical or a conservative amino acid substitution. Conservative substitutions include those described in Dayhoff, M.O., ed., The Atlas of Protein Sequence and Structure 5, National Biomedical Research Foundation, Washington, D. C. (1978), and in Argos, P. (1989) EMBOJ. 8:779-785. For example, amino acids belonging to one of the following groups represent conservative changes or substitutions:
-Ala, Pro, GIy, GIn, Asn, Ser, Thr: -Cys, Ser, Tyr, Thr; -VaI, lie, Leu, Met, Ala, Phe; -Lys, Arg, His; -Phe, Tyr, Trp, His; and
-Asp, GIu.
"Mammal" includes humans and domesticated animals, such as cats, dogs, swine, cattle, sheep, goats, horses, rabbits, and the like.
All other technical terms used herein have the same meaning as is commonly used by those skilled in the art to which the present invention belongs.
B. Embodiments of the Invention
One aspect of the present invention is directed to a collection of 27 genes that represent useful targets for successfully distinguishing non-failing heart samples from dilated cardiomyopathy heart samples with over 90% accuracy. That is, identifying the presence or expression level of one or more or all of these 27 genes can be indicative of a cardiomyopathy disease phenotype. In other embodiments of the invention, expression levels of any combination of the 27 genes can be measured, such as the expression levels of one or more, at least about two, at least about three, at least about four, at least about five, at least about six, at least about seven, at least about eight, at least about nine, at least about ten, at least about eleven, at least about twelve, at least about thirteen, at least about fourteen, at least about fifteen, at least about sixteen, at least about seventeen, at least about eighteen, at least about nineteen, at least about twenty, at least about twenty one, at least about twenty two, at least about twenty three, at least about twenty-four, at least about twenty five, at least about twenty six, or about twenty seven genes can be determined in the subject's biological sample. In another embodiment, the expression levels of all 27 genes are determined in the subject's biological sample and compared to the expression levels of the same 27 genes in the healthy subject's biological sample.
The group of 27 genes includes the following, which are categorized based on existing knowledge of their individual involvement in DCM.
Group (H: Genes that are known cardiomyopathy bio markers
natriuretic peptide precursor B (BNP) natriuretic peptide precursor A (NPPA) Group (2): Genes with a strong association with cardiomyopathy corin (CORRIN)
chemokine (C-C motif) ligand 2 (CCL2) myosin, heavy polypeptide 6, cardiac, alpha (MYH6)
activating transcription factor 3 (ATF3) connective tissue growth factor (CTGF)
Group (3|: Genes that are not so strongly associated with cardiomyopathy secreted frizzled-related protein 4 (SFRP4) asporin (ASPN)
frizzled-related protein (FRZB) pleckstrin homology-like domain, family Al (PHLDAl) ornithine decarboxylase 1 (ODC l) retinoic acid receptor responder 1 (RARRESl) alanine-glyoxylate aminotransferase 2-like 1 (AGXT2L1) complement factor H-related 3 (CFHL3) spare/ osteonectin proteoglycan (SPOCK) SlOO calcium binding protein A8 (S100A8) ficolin 3 (FCN3) inhibitor of DNA binding 4 (ID4)
Group (4): Genes that hitherto have not been associated with any cardiomyopathy
myosin, heavy polypeptide 10, non-muscle (MYH lO) synuclein, alpha (SNCA) putative lymphocyte GO/ Gl switch gene (GOS2) ets variant gene 5 (ETV5)
AE binding protein 1 (AEBPl)
kelch-like 3 (KLHL3) zinc finger and BTB domain containing 16 (ZBTB 16)
procollagen C-endopeptidase enhancer 2 (PCOLCE2)
The present invention provides an arrangement of markers for these 27 genes, and the markers can collectively be used to determine whether a particular biological sample is indicative of cardiomyopathic heart disease. The abbreviations are used in Tables elsewhere in this application.
The present invention encompasses the use of markers to all 27 genes, as well as the use of markers to subsets of the 27 genes. For instance, markers to one or more of the Group 4 genes may be arranged or combined alongside one or more markers of the Group 1 genes. Hence, the present invention contemplates various combinations of gene markers based on the four groups outlined above, such as one or more markers of each Group according to the following exemplary combinations:
(a) Groups 1 and 2.
(b) Groups 1 and 3.
(c) Groups 1 and 4.
(d) Groups 1 , 2, and 3.
(e) Groups 1 , 2, 3, and 4.
(f) Groups 2 and 3.
(g) Groups 2 and 4.
(h) Groups 2, 3, and 4. (i) Groups 3 and 4.
Types of microarrays
Microarrays are useful for identifying cancer-specific genes and inflammatory-specific genes. See DeRisi et a , Nat. Genet. 14(4):457-60 (1996); and Heller et al. , Proc. NaU. Acad. Sci. USA 94(6):2150-55 ( 1997). A microarray may typically be composed of a number of unique, single- stranded nucleic acid sequences, usually either synthetic antisense oligonucleotides or fragments of cDNAs, fixed to a solid support. Hence, the term "microarray" connotes an array of polynucleotides or oligonucleotides that are placed, arranged, or otherwise affixed on to a substrate, such as paper, nylon or other type of membrane, filter, chip, glass slide, or any other such suitable solid support. According to the present invention, one microarray may comprise one or more combinations of nucleic acid sequences that share sequence identity with, or share sequence identity with the complement of, one or more of the 27 genes denoted above characterized into the four groups.
Nucleic acids bound to a surface
One embodiment of the invention uses solid support-based oligonucleotide hybridization methods to detect gene expression. Solid support-based methods suitable for practicing the present invention are widely known and are described, for example, in PCT application WO 95/ 1 1755; Huber et al., Anal. Biochem. 299: 24 (2001); Meiyanto et al, Biotechniques. 31 : 406 (2001); Relogio et al, Nucleic Adds Res. 30:e51 (2002). Any solid surface to which oligonucleotides can be bound, covalently or non-covalently, can be used. Such solid supports include, but are not limited to, filters, polyvinyl chloride dishes, silicon or glass based chips.
In certain embodiments, the nucleic acid molecule can be directly bound to the solid support or bound through a linker arm, which is typically positioned between the nucleic acid sequence and the solid support. A linker arm that increases the distance between the nucleic acid molecule and the substrate can increase hybridization efficiency. There are a number of ways to position a linker arm. In one common approach, the solid support is coated with a polymeric layer that provides linker arms with a lot of reactive ends/sites. A common example of this type is glass slides coated with polylysine (see, U.S. Patent No. 5667976), which are commercially available. Alternatively, the linker arm may be synthesized as part of or conjugated to the nucleic acid molecule, and then this complex is bonded to the solid support. For example, one approach takes advantage of the extremely high affinity biotin-streptavidin interaction. The streptavidin-biotinylated reaction is stable enough to withstand stringent washing conditions and is sufficiently stable that it is not cleaved by laser pulses used in some detection systems, such as matrix-assisted laser desorption/ ionization time of flight (MALDI-TOF) mass spectrometry. Therefore, streptavidin may be covalently attached to a solid support, and the nucleic acid molecule is labeled with a biotin group (or vice versa). The biotinylated nucleic acid molecule effectively sticks wherever it is placed on the streptavidin-covered support surface. In one version of this method, an amino- coated silicon wafer is reacted with the n-hydroxysuccinimido-ester of biotin and complexed with streptavidin. Biotinylated oligonucleotides are bound to the surface at a concentration of about 20 fmol DNA per mm2.
Alternatively, one may directly bind DNA to the support using carbodiimides, for example. In one such method, the support is coated with hydraztde groups, then treated with carbodiimide. Carboxy-modified nucleic acid molecules are then coupled to the treated support. Epoxide-based chemistries are also being employed with amine modified oligonucleotides. Other chemistries for coupling nucleic acid molecules to solid substrates are known to those of skill in the art.
Positioning of nucleic acids on a surface
The nucleic acid molecules are typically delivered to the substrate material. Because of the miniaturization of the arrays, delivery techniques should be capable of positioning very small amounts of liquids (e.g., less than 1 nanoliter) in very small regions (e.g., 100 m diameter dots), very close to one another (e.g., 250 m separation) and amenable to automation. Several techniques and apparatus are available to achieve such delivery. Among these are mechanical mechanisms [e.g., arrayers from GeneticMicroSystems, MA, USA) and ink-jet technology. Very fine pipettes may also be used. Other formats are also suitable within the context of this invention. For example, a 96-well format with fixation of the nucleic acids to a nitrocellulose or nylon membrane may also be employed.
Blocking non-specific sites on a surface
After the nucleic acid molecules have been bound to the solid support, it is often useful to block reactive sites on the solid support that are not consumed in binding to the nucleic acid molecule. Otherwise, the probes will, to some extent, bind directly to the solid support itself, giving rise to so-called non-specific binding. Non-specific binding can sometimes hinder the ability to detect low levels of specific binding. A variety of effective blocking agents (e.g., milk powder, serum albumin or other proteins with free amine groups, polyvinylpyrrolidine) can be used and others are known to those skilled in the art (see, for example U.S. Patent No. 5994065). The choice depends at least in part upon the binding chemistry.
Use of oligonucleotides
Methods of making and using oligonucleotide microarrays suitable for diagnostic use are disclosed in U.S. Pat. Nos. 5,492,806; 5,525,464; 5,589,330; 5,695,940; 5,849,483; 6,018,041 ; 6,045,996; 6, 136,541; 6,142,681; 6, 156,501 ; 6,197,506; 6,223, 127; 6,225,625; 6,229,911; 6,239,273; WO 00/52625; WO 01 /25485; WO 01/29259, which are~all incorporated herein by reference.
An oligonucleotide may preferably be about 6 to about 60 nucleotides in length, or any length in between these two parameters. In other embodiments of the invention, an oligonucleotide may be about 15 to about 30 nucleotides in length, or about 20 to about 25 nucleotides in length. For a certain type of microarray, it may be preferable to use oligonucleotides which are about 7 to about 10 nucleotides in length. The microarray may comprise oligonucleotides which cover the known 5', or 3', sequence, sequential oligonucleotides which cover the full length sequence; or unique oligonucleotides selected from particular areas along the length of the sequence. Polynucleotides used in the microarray may be oligonucleotides that are specific to a gene or genes of interest in which at least a fragment of the sequence is known or that are specific to one or more unidentified cDNAs which are common to a particular cell type, development or disease state.
One embodiment of the present invention, therefore, uses oligonucleotide arrays, i.e. microarrays, to simultaneously observe the expression of a number of genes or gene products. Oligonucleotide arrays comprise two or more oligonucleotide probes provided on a solid support, wherein each probe occupies a unique location on the support. The location of each probe may be predetermined, such that detection of a detectable signal at a given location is indicative of hybridization to an oligonucleotide probe of a known identity. Each predetermined location can contain more than one molecule of a probe, but each molecule within the predetermined location has an identical sequence. Such predetermined locations are termed features. There can be, for example, from 2, 10, 100, 1,000, 2,000 or 5,000 or more of such features on a single solid support. In one embodiment, each oligonucleotide is located at a unique position on an array at least 2, at least 3, at least 4, at least 5, at least 6, or at least 10 times.
Oligonucleotide probe arrays for detecting gene expression can be made and used according to conventional techniques described, for example, in Lockhart et al, Natl Biotech. 14: 1675 (1996), McGaIl et al, Proc. Natl Acad. Sd. USA 93: 13555 (1996), and Hughes et al, Nature Biotechnol. 19:342 (2001). A variety of oligonucleotide array designs is suitable for the practice of this invention.
In one embodiment the one or more oligonucleotides include a plurality of oligonucleotides that each hybridize to a different gene expressed in a particular tissue type. In one embodiment, oligonucleotides of the present invention hybridize to nucleic acid sequences of any of the following genes: Synuclein alpha (non A4 component of amyloid precursor) ("SNCA"), Asporin ("ASPN"), Secreted frizzled- related protein 4 ("SFRP4"), Pleckstrin homology-like domain, family A, member 1 ("PHLDAl"), Frizzled- related protein ("FRZB"), Myosin heavy chain 6, cardiac muscle, alpha (cardiomyopathy, hypertrophic 1) ("MYH6"), Chemokine (C-C motif) ligand 2 ("CCL2"), Ornithine decarboxylase 1 ("ODCl"), Retinoic acid receptor responder (tazarotene induced) 1 ("RARRESl"), Complement factor H ("CFH"), Alanine- glyoxylate aminotransferase 2-like 1 ("AGXT2L1"), Myosin heavy chain 10, non- muscle (MYHlO), Ficolin (collagen/ fibrinogen domain containing) 3 (Hakata antigen) ("FCN3"), SlOO calcium binding protein A8 ("S 100A8"), Corin serine peptidase ("CORlN"), Natriuretic peptide precursor A ("NPPA"), Procollagen C-endopeptidase enhancer 2 ("PCOLCE2"), Natriuretic peptide precursor B ("NPPB"), Activating transcription factor 3 ("ATF3"), Inhibitor of DNA binding 4, dominant negative helix- loop-helix protein ("ID4"), Sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 ("SPOCKl"), Connective tissue growth factor ("CTGF"), G0/G lswitch 2 ("G0S2"), Kelch-like 3 (Drosophila) ("KLHL3"), Zinc finger and BTB domain containing 16 ("ZBTB 16"), AE binding protein 1 ("AEBPl"), and ETS variant gene 5 (ets-related molecule) (ETV5). In another embodiment, oligonucleotides of the present invention hybridize to nucleic acid sequences of any of the sequences of SEQ ID NOs.: 1-27.
Labeling nucleic acids and oligonucleotides
Generally, a detectable molecule, also referred to herein as a label, will be incorporated or added to an array's nucleic acid sequences. Many types of molecules can be used within the context of this invention. Such molecules include, but are not limited to, fluorochromes, chemiluminescent molecules, chromogenic molecul es, radioactive molecules, mass spectometry tags, proteins, and the like. Other labels will be readily apparent to one skilled in the art. Indirect detection can also be used within the context of this invention. Proteins and other molecules are available that will bind to double-stranded DNA but not to single-stranded DNA. Thus, hybridization can be measured. In one embodiment, therefore, a nucleic acid sample obtained from an individual can be amplified and, optionally labeled with a detectable label. Any method of nucleic acid amplification and any detectable label suitable for such purpose can be used. For example, amplification reactions can be performed using, e.g. Ambion's MessageAmp, which creates "antisense" RNA or "aRNA" (complementary in nucleic acid sequence to the RNA extracted from the sample tissue). The RNA can optionally be labeled using CyDye fluorescent labels. During the amplification step, aaUTP is incorporated into the resulting aRNA. The CyDye fluorescent labels are coupled to the aaUTPs in a non-enzymatic reaction. Subsequent to the amplification and labeling steps, labeled amplified antisense RNAs are precipitated and washed with appropriate buffer, and then assayed for purity. For example, purity can be assay using a NanoDrop spectrophotometer. The nucleic acid sample is then contacted with an oligonucleotide array having, attached to a solid substrate (a "microarray slide"), oligonucleotide sample probes capable of hybridizing to nucleic acids of interest which may be present in the sample. The step of contacting is performed under conditions where hybridization can occur between the nucleic acids of interest and the oligonucleotide probes present on the array. The array is then washed to remove non-specifically bound nucleic acids and the signals from the labeled molecules that remain hybridized to oligonucleotide probes on the solid substrate are detected. The step of detection can be accomplished using any method appropriate to the type of label used. For example, the step of detecting can accomplished using a laser scanner and detector. For example, on can use and Axon scanner which optionally uses GenePix Pro software to analyze the position of the signal on the microarray slide. Data from one or more microarray slides can analyzed by any appropriate method known in the art.
Oligonucleotide probes used in the methods of the present invention, including microarray techniques, can be generated using PCR. PCR primers used in generating the probes are chosen, for example, based on the sequences of SEQ ID NOs: 1-27.
In one embodiment, oligonucleotide control probes also are used. Exemplary control probes can fall into at least one of three categories referred to herein as (1) normalization controls, (2) expression level controls and (3) negative controls. In microarray methods, one or more of these control probes may be provided on the array with the inventive cell cycle gene-related oligonucleotides.
Normalization controls correct for dye biases, tissue biases, dust, slide irregularities, malformed slide spots, etc. Normalization controls are oligonucleotide or other nucleic acid probes that are complementary to labeled reference oligonucleotides or other nucleic acid sequences that are added to the nucleic acid sample to be screened. The signals obtained from the normalization controls, after hybridization, provide a control for variations in hybridization conditions, label intensity, reading efficiency and other factors that can cause the signal of a perfect hybridization to vary between arrays. In one embodiment, signals {e.g., fluorescence intensity or radioactivity) read from all other probes used in the method are divided by the signal from the control probes, thereby normalizing the measurements.
Virtually any probe can serve as a normalization control. Hybridization efficiency varies, however, with base composition and probe length. Preferred normalization probes are selected to reflect the average length of the other probes being used, but they also can be selected to cover a range of lengths. Further, the normalization control(s) can be selected to reflect the average base composition of the other probes being used. In one embodiment, only one or a few normalization probes are used, and they are selected such that they hybridize well {i.e., without forming secondary structures) and do not match any test probes. In one embodiment, the normalization controls are mammalian genes.
Expression level controls probes hybridize specifically with constitutively expressed genes present in the biological sample. Virtually any constitutively expressed gene provides a suitable target for expression level control probes. Typically, expression level control probes have sequences complementary to subsequences of constitutively expressed "housekeeping genes" including, but not limited to certain photosynthesis genes.
"Negative control" probes are not complementary to any of the test oligonucleotides [i.e., the inventive cell cycle gene-related oligonucleotides), normalization controls, or expression controls. In one embodiment, the negative control is a mammalian gene which is not complementary to any other sequence in the sample.
The terms "background" and "background signal intensity" refer to hybridization signals resulting from non-specific binding or other interactions between the labeled target nucleic acids (i.e., mRNA present in the biological sample) and components of the oligonucleotide array. Background signals also can be produced by intrinsic fluorescence of the array components themselves. A single background signal can be calculated for the entire array, or a different background signal can be calculated for each target nucleic acid. In a one embodiment, background is calculated as the average hybridization signal intensity for the lowest 5 to 10 percent of the oligonucleotide probes being used, or, where a different background signal is calculated for each target gene, for the lowest 5 to 10 percent of the probes for each gene. Where the oligonucleotide probes corresponding to a particular cell cycle gene hybridize well and, hence, appear to bind specifically to a target sequence, they should not be used in a background signal calculation. Alternatively, background can be calculated as the average hybridization signal intensity produced by hybridization to probes that are not complementary to any sequence found in the sample [e.g., probes directed to nucleic acids of the opposite sense or to genes not found in the sample). In microarray methods, background can be calculated as the average signal intensity produced by regions of the array that lack any oligonucleotides probes at all.
In an alternative embodiment, the nucleic acid molecules are directly or indirectly coupled to an enzyme. Following hybridization, a chromogenic substrate is applied and the colored product is detected by a camera, such as a charge-coupled camera. Examples of such enzymes include alkaline phosphatase, horseradish peroxidase and the like. The invention also provides methods of labeling nucleic acid molecules with cleavable mass spectrometry tags (CMST) (see for example, U.S. Patent No: 60279890). After an assay is complete, and the uniquely CMST-labeled probes are distributed across the array, a laser beam is sequentially directed to each member of the array. The light from the laser beam both cleaves the unique tag from the tag-nucleic acid molecule conjugate and volatilizes it. The volatilized tag is directed into a mass spectrometer. Based on the mass spectrum of the tag and knowledge of how the tagged nucleotides were prepared, one can unambiguously identify the nucleic acid molecules to which the tag was attached (see, e.g., WO9905319).
Thus, the nucleic acids can be labeled readily by any of a variety of techniques. When the diversity panel is generated by amplification, the nucleic acids can be labeled during the reaction by incorporation of a labeled dNTP or use of labeled amplification primer. If the amplification primers include a promoter for an RNA polymerase, a post-reaction labeling can be achieved by synthesizing RNA in the presence of labeled NTPs. Amplified fragments that were unlabeled during amplification or unamplified nucleic acid molecules can be labeled by one of a number of end labeling techniques or by a transcription method, such as nick- translation, random-primed DNA synthesis. Details of these methods are well known to one of skill in the art and are set out in methodology books (e.g., Ausubel et al., supra). Other types of labeling reactions are performed by denaturation of the nucleic acid molecules in the presence of a DNA-binding molecule, such as RecA, and subsequent hybridization under conditions that favor the formation of a stable RecA-incorporated DNA complex.
PCR-based methods for detection
In another embodiment, PCR-based methods are used to detect gene expression. These methods include reverse-transcriptase-mediated polymerase chain reaction (RT-PCR) including real-time and endpoint quantitative reverse- transcriptase-mediated polymerase chain reaction (Q-RTPCR). These methods are well known in the art. For example, methods of quantitative PCR can be carried out using kits and methods that are commercially available from, for example, Applied BioSystems and Stratagene®. See also Kochanowski, QUANTITATIVE PCR PROTOCOLS (Humana Press, 1999); Innis et al., supra.; Vandesompele et al., Genome Biol. 3: RESEARCH0034 (2002); Stein, CeH MoI. Life Sd. 59: 1235 (2002). Gene expression can also be observed in solution using Q-RTPCR. Q-RTPCR relies on detection of a fluorescent signal produced proportionally during amplification of a PCR product. See Innis et al, supra. Like the traditional PCR method, this technique employs PCR oligonucleotide primers, typically 15-30 bases long, that hybridize to opposite strands and regions flanking the DNA region of interest. Additionally, a probe {e.g., TaqMan®, Applied Biosystems) is designed to hybridize to the target sequence between the forward and reverse primers traditionally used in the PCR technique. The probe is labeled at the 5' end with a reporter fluorophore, such as 6-carboxyfluorescein (6-FAM) and a quencher fluorophore like 6-carboxy-tetramethyl-rhodamine (TAMRA). As long as the probe is intact, fluorescent energy transfer occurs which results in the absorbance of the fluorescence emission of the reporter fluorophore by the quenching fluorophore. As Taq polymerase extends the primer, however, the intrinsic 5' to 3" nuclease activity of Taq degrades the probe, releasing the reporter fluorophore. The increase in the fluorescence signal detected during the amplification cycle is proportional to the amount of product generated in each cycle.
The forward and reverse amplification primers and internal hybridization probe is designed to hybridize specifically and uniquely with one nucleotide derived from the transcript of a target gene. In one embodiment, the selection criteria for primer and probe sequences incorporates constraints regarding nucleotide content and size to accommodate TaqMan® requirements.
SYBR Green® can be used as a probe-less Q-RTPCR alternative to the Taqman®-type assay, discussed above. ABI PRISM® 7900 SEQUENCE DETECTION SYSTEM USER GUIDE APPLIED BIOSYSTEMS, chap. 1-8, App. A-F. (2002).
A device measures changes in fluorescence emission intensity during PCR amplification. The measurement is done in "real time," that is, as the amplification product accumulates in the reaction. Other methods can be used to measure changes in fluorescence resulting from probe digestion. For example, fluorescence polarization can distinguish between large and small molecules based on molecular tumbling (see U.S. patent No. 5,593,867).
Hybridization parameters
Typically, for oligonucleotide hybridization, stringent hybridization and washing conditions are useful for nucleic acid molecules over about 500 bp. Stringent hybridization conditions include a solution comprising about 1 M Na+ at 25° to 300C below the Tm; e.g., 5 x SSPE, 0.5% SDS, at 65DC; see, Ausubel, et al, Current Protocols in Molecular Biology, Greene Publishing, 1995; Sambrook et al.. Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, 1989). Tm is dependent on both the G+C content and the concentration of Na+. A formula to calculate the Tm of nucleic acid molecules greater than about 500 bp is Tm= 81.5 + 0.41(%(G+C)) - log10[Na+]. Washing conditions are generally performed at least at equivalent stringency conditions as the hybridization. If the background levels are high, washing may be performed at higher stringency, such as around 15°C below the Tm.
Low stringency hybridizations are performed at conditions approximately 400C below Tm, and are used for short fragments, e.g., less than about 500 bp. For fragments between about 100 and 500 bp, the Tm decreases about 1.5°C for every fewer 50 bp than 500. For very small fragments, e.g., less than about 50 bp, a formula for calculating Tm is 2°C for each AT pair and 4°C for each GC pair. Very high stringency hybridizations are performed at conditions approximately 10°C below Tm.
Hybridization conditions are tailored to the length and GC content of the oligonucleotide. Suitable hybridization conditions may be found in Sambrook et al., supra, Ausubel et al., supra, and furthermore hybridization solutions may contain additives such as tetramethylammonium chloride or other chaotropic reagents or hybotropic reagents to increase specificity of hybridization (see for example, PCT/US97/ 17413).
Hybridization may be detected in a variety of ways and with a variety of equipment. In general, the methods may be categorized as those that rely upon detectable molecules incorporated into the diversity panels and those that rely upon measurable properties of double-stranded nucleic acids (i.e., hybridized nucleic acids) that distinguish them from single-stranded nucleic acids (i.e., unhybridized nucleic acids). The latter category of methods includes intercalation of dyes, such as ethidium bromide, into double-stranded nucleic acids, differential absorbance properties of double and single stranded nucleic acids, binding of proteins that preferentially bind double-stranded nucleic acids, and the like.
Following hybridization, some means of detecting a successful reaction must be addressed. The means of detection depend on the type of label used. For example, if a radioactive label is used, autoradiography or storage phosphor screens (Phosphorlmager) are common methods of detection. Other systems, including chemiluminescent and fluorescent labels in conjunction with autoradiography, charge-coupled cameras or confocal microscopy, are part of an arsenal of detection systems. An alternative detection system that can be used with radioactive, fluorescent or chemiluminescent labels is a CCD integrated silicon wafer. In this system, a charge-coupled device (CCD), designed to detect high energy beta particles or photons, is placed in direct contact with a silicon support for an array. Upon binding of the sample to the immobilized nucleic acids, a radioisotope decay product or photon is generated. Electron-hole pairs are generated in the silicon and then electrons are collected by the CCD.
An alternative detection system for fluorescent molecules is a lens based camera detecting one or more fluorescent labels. As mentioned above, these cameras include epifluorescent microscopes, confocal microscopes, and charge-coupled cameras. In the fluorescent systems, a laser excites a fluorescent label, the emitted light is collected through a bandpass filter, and the signal is detected by a photomultiplier tube that has electronics for counting photons.
Other labels are also amenable to use with either a lens-based camera or a CCD. For example, chemiluminescent labels or chromogenic substrates can be detected with a lens-based charge-coupled camera.
In some embodiments, the label is a cleavable mass-spectrometry tag. Such labels are then detected using a mass-spectrometer. Many detection systems are commercially available (e.g., Affymetrix, Santa Clara, CA). One skilled in the art is able to choose an appropriate detection means and equipment for the label used.
Patterns of hybridization can be expressed as presence or absence of hybridization, the degree of hybridization, or some combination of these. The simplest analysis is performed by determining the presence or absence of hybridization. When the complexity of the genome of the organism to be genotyped is greater than the complexity of the genome(s) represented on the array, the absence of hybridization conclusively signifies a polymorphism. When the complexity is less than on the array, the absence of hybridization can signify either a polymorphism or a lack of representation of those sequences in the probing diversity panel. The presence of hybridization, however, does not necessarily signify the absence of a polymorphism under either scenario. As described in more detail below, the pattern of hybridization is informative.
When the presence or absence of signal is assayed, each addressable area is queried for hybridization using a method appropriate to the label. For example, when fluorescent labels are used, such as Cy3 and Cy5, both green and red signals are assayed. When positive and negative controls are included on the array, signals are compared to the controls and each addressable area is assigned a value, e.g., 1 for detectable hybridization and 0 for no detectable hybridization. In general, a value of 1 is assigned for detection over a threshold level and 0 assigned for detection under a threshold level. It will be appreciated by those skilled in the art that detection of polymorphisms is based primarily on finding a binary distribution of signal values for any particular array feature when hybridized with multiple diversity panels. Preferably, the panels are the same as those used to create the diversity array (see Example 5). In case a diversity panel is generated from a heterozygote for a polymorphism, one will then detect a trimodal distribution. In such a case two threshold values are calculated, the: first threshold separates the "0" cluster (lack of hybridization) from the "0/ 1° cluster (heterozygote) and the second threshold separates the "0/ 1" cluster from the "1" cluster (hybridization present). Conventional statistical methods may be used to determine the threshold levels.
The genotype of the organism may then be expressed as a value for each addressable area. As an exemplary aid to understanding, if the addressable array is a 96-spot format (a grid of 8 rows (A-G) x 12 columns ( 1- 12)), and the value for hybridization is 1 and no detectable hybridization is 0, then it is possible to visualize the individual's expression profile on a two-dimensional grid reflecting those 1/0 detections. In a similar fashion, if the extent of hybridization is to be measured, then relative values are assigned to each addressable location. The relative values will generally be normalized to controls. All data can be collected into database formats to facilitate comparisons as well as perform further analyses, such as construction of genotype trees.
Thus, oligonucleotides of some or all of those 27 genes disclosed herein (SEQ ID NOs: 1-27) in any combination, may be used as components of a microarray. Of course, the present invention is not limited to markers, such as oligonucleotides and nucleic acid probes, that specifically target the denoted 27 genes. Nucleic acid probes or oligonucleotides also may be designed to target isoforms and homologs of any one of the 27 genes.
Protein expression profiles
Proteins also can be observed by any means known in the art, including immunological methods, enzyme assays and protein array/ proteomics techniques, for determining the expression profile of a sample instead of, or in addition to, determining expression levels by detecting nucleic acid transcripts. Measurement of the translational state of proteins can be performed according to several protein methods. For example, whole genome monitoring of protein — the "proteome" — can be carried out by constructing a microarray in which binding sites comprise immobilized, preferably monoclonal, antibodies specific to a plurality of proteins having an amino acid sequence of any of SEQ ID NOs: 236-470 and 718-737 or proteins encoded by the genes of SEQ ID NOs: 1-235 and 698-717 or conservative variants thereof. See Wildt et al, Nature Biotechnol. 18: 989 (2000). Methods for making polyclonal and monoclonal antibodies are well known, as described, for instance, in Harlow & Lane, ANTIBODIES: A LABORATORY MANUAL (Cold Spring Harbor Laboratory Press, 1988).
Alternatively, proteins can be separated by two-dimensional gel electrophoresis systems. Two-dimensional gel electrophoresis is well-known in the art and typically involves isoelectric focusing along a first dimension followed by SDS-PAGE electrophoresis along a second dimension. See, e.g., Hames et al, , GEL ELECTROPHORESIS OF PROTEINS: A PRACTICAL APPROACH (IRL Press, 1990). The resulting electropherograms can be analyzed by numerous techniques, including mass spectrometric techniques, western blotting and immunoblot analysis using polyclonal and monoclonal antibodies, and internal and N-terminal micro- sequencing.
By "target" it is understood that a nucleic acid marker to, say, the corin gene, can be a nucleic acid sequence, such as an oligonucleotide or probe, that is complementary to a corin- specific gene sequence; so that, when it is affixed to a substrate surface, the probe will anneal to or hybridize to a corin gene nucleic acid transcript, be it genomic DNA, cDNA, or RNA.
Thus, according to the present invention an oligonucleotide may be designed to any and all of the 27 genes denoted above. Furthermore, the present invention contemplates the presence of multiple oligonucleotides on a particular substrate that is designed to anneal to or hybridize to the same gene. In certain embodiments, it may be appropriate to use pairs of oligonucleotides on a microarray. The "pairs" will be identical, except for one nucleotide which preferably is located in the center of the sequence. The second oligonucleotide in the pair serves as a control. The number of oligonucleotide pairs may range from two to one million. The oligomers are synthesized at designated areas on a substrate using a light-directed chemical process. The substrate may be paper, nylon or other type of membrane, filter, chip, glass slide or any other suitable solid support.
To produce oligonucleotides to a known sequence for a microarray, the gene of interest may be examined using a computer algorithm which starts at the 5' or more preferably at the 3' end of the nucleotide sequence. The algorithm identifies oligomers of defined length that are preferably unique to the gene, have a GC content within a range suitable for hybridization, and lack predicted secondary structure that may interfere with hybridization. In another aspect, an oligonucleotide may be synthesized on the surface of the substrate by using a chemical coupling procedure and an ink jet application apparatus, as described in PCT application WO95/2511 16 (Baldeschweiler et al.) which is incorporated herein in its entirety by reference. In another aspect, a "gridded" array analogous to a dot or slot blot may be used to arrange and link cDNA fragments or oligonucleotides to the surface of a substrate using a vacuum system, thermal, UV, mechanical or chemical bonding procedures. An array, such as those described above, may be produced by hand or by using available devices (slot blot or dot blot apparatus), materials (any suitable solid support), and machines (including robotic instruments).
The oligonucleotides, probes, or pieces of target-specific nucleic acids may be labeled in such fashion that a detectable signal is generated from their annealing or hybridizing to the target nucleic acid.
Hence, a microarray of the present invention is placed into contact with a biological sample, such as blood, urine, saliva, phlegm, gastric juices, cultured cells, tissue biopsies, or other tissue preparations; and a detection system may then be used to measure the absence, presence, and/or amount of hybridization for all of the distinct sequences simultaneously. This data may be used for large scale correlation studies on the sequences, mutations, variants, or polymorphisms among samples.
Accordingly, the present invention contemplates a microarray of at least about one, at least about two, or any number in between about two and up to 27, or all of SEQ ID NOs: 1-27 genes denoted above in Groups 1-4, which can be used as described herein to evaluate a biological sample from a subject to determine whether that subject is symptomatic or is afflicted with a cardiomyopathic disease, such as dilated cardiomyopathy.
The present invention is not limited to the use of a microarray assay, however, for diagnosing a cardiomyopathic disease via gene expression level analysis. The present invention also contemplates the use of techniques such as polymerase chain reaction (PCR), quantitative-PCR and Real Time PCR, Quantitative Competitive Reverse Transcription-PCR and Real Time Detection 5'-Nuclease-PCR. The latter also is known as TaqMan RT-PCR.
TaqMan RT-PCR is useful for correlating the concentration of a protein in a sample tissue to its mRNA expression. See Hirayama et al., "Concentrations of Thrombopoietin in Bone Marrow in Normal Subjects and in Patients with Idiopathic Thrombocytopenic Purpura, Aplastic Anemia, and Essential Thrombocythemia Correlate With Its mRNA Expression of Bone Marrow Stromal Cells," Blood, 92(1): 46 52, 1998.
This quantitative replicative method relies on the presence of a 5'-nuclease assay in the RT-PCR reactions, wherein a probe specific for the target protein, contains a fluorescent moiety such as 6-carboxyfluorescein (FAM) on its 5'-end, and a phosphate-capped quencher fluor moiety such as 6-carboxytetramethylfluorescein (TAMRA) on its 3'-end. As amplification of the mRNA for the target protein proceeds, the intensity of fluorescence of the FAM moiety increases as a function of time and mRNA concentration due to 5'-endonuclease cleavage of the probe, releasing more and more FAM moiety into the solution.
TaqMan RT-PCR also may be coupled with an ABI Prism 7700 Sequence Detection System, or Competitive PCR for quantification of DNA. See Desjardin et al., "Comparison of the ABI 7700 System (TaqMan) and Competitive PCR for Quantification of IS61 10 DNA in Sputum During Treatment of Tuberculosis," J. Clin. Microbiol., 36(7): 1964- 1976, 1998.
Another assay contemplated by the present invention is an ELISA assay to detect protein products of one or more of the denoted 27 genes from a subject's biological sample. ELISA assays and associated plate-reader apparatus are well known to those skilled in the art.
Prediction analysis for microarrays ("PAM") is a useful tool for attributing a meaning to microarray data. It therefore is a very useful tool for classifying and diagnosing a particular biological test sample. See Tibshirani et al., "Diagnosis of multiple cancer types by shrunken centroids of gene expression," Proc Natl Acad Sci U S A., 2002;99:6567-72.
Based on the smallest gene set for classification from Dataset B, the PAM method classified all four human heart failure datasets with low misclassification rates. Notably, despite the large variation of gene expression values for single genes, the classifier as a whole is highly valuable to distinguish DCM and NF samples. These results support the usefulness of this molecular approach for diagnostic applications. In addition, the classificator gene set based on DCM and NF hearts also achieved a similarly high accuracy of classification in ICM like in DCM samples, suggesting that this gene set could be representative of molecular changes of heart failure in general.
Of note, the classificator based on Dataset B performed as well in Datasets A and D as if one used classificators generated from these two datasets alone. What is more, the gene signature from Dataset B was also able to accurately discriminate NF and DCM samples in Dataset C. As noted by the authors of Dataset C, differences in gene expression were greater between left ventricular assist-device (LVAD) and non- LVAD hearts in the DCM group than between DCM and NF samples (13). This peculiarity might impede the PAM approach for identifying a useful classifier between DCM and NF within this Dataset itself.
The classifier gene signature can be grouped into different functional sets with respect to the pathogenesis of DCM. Up-regulation of the cardiomyopathy markers pro-ANP and pro-BNP is well established in heart failure (23) and mediated by neurohormonal dysregulation (27). Activation of pro-fϊbrotic stress hormone pathways lead to prominent structural remodeling in DCM, exemplified by deregulation of genes coding for sarcomer structure and extracellular matrix proteins like myosin 6 and 10, asporin, procollagen C-endopeptidase enhancer 2 (PCOLCE2), kelch-like 3 (KLHL3) and AE binding protein 1 (AEBPl). In addition to pro-BNP, further transcripts of this set of classifier genes, including the transcription factor ZBTB 16 (28), the connective tissue growth factor CTGF (29) and the chemokine CCL2 (30), characterize important targets of the renin-angiotensin system in failing myocardium, as they all have been shown to be induced by angiotensin-II.
Other transcripts of this classifier gene set belong to anti-apoptotic (PHLDAl, SNCA, CCL2) and cell growth processes (FRZB, SFRP4, SPOCK, CTGF). Of note, the two genes coding for frizzled-related protein (FRZB) and secreted frizzled related protein 4 (SFRP4) are members of the Wnt signaling pathway, implicated in wound healing and regeneration of heart failure (31, 32). Especially the expression of the Wnt antagonist SFRP4 is associated with myocyte apoptosis in overload-induced heart failure (31).
Furthermore, the genes coding for the complement factor H-related 3 (CFHL3), ficolin 3 (FCN3), chemokine ligand 2 (CCL2) and calgranulin A (S100A8) are related to stress and immune response. Remarkably, the GO classes "immune response" and "inflammatory response" showed the most significant changes of all biological processes in our two datasets A and B. Given that DCM has a very heterogeneous etiology, it is plausible that a subgroup of DCM represents postinfectious auto-immune disease, especially in individuals with genetic susceptibility. However, independent experimental models of cardiomyopathy suggest that cardiac remodeling itself is able to trigger immune response (33, 34). CCL2 is a prominent member of the broader functional group of immune and inflammatory processes and was found to be down- regulated in both datasets A and B. This chemokine, capable of interacting with TNF-alpha and IL-6-related pathways, has been localized to the cardiomyocyte compartment by immunohistochemistry (24). It promotes attraction and invasion of activated leukocytes into the failing myocardium, but is also involved in shaping the extracellular matrix by modulating the activity of matrix metalloproteinases and collagen turnover (35) as well as cell proliferation and induction of apoptosis (24). Down-regulation of CCL2 transcripts in end-stage heart failure may therefore represent an adaptive mechanism to promote cell survival. Of note, additional chemokines like CCLl 1 and CCL 18 were also found to be down- regulated in Affymetrix and Unigene arrays, respectively.
In addition to the known cardiomyopathy markers pro-ANP and pro-BNP, further genes involved in the natriuretic system and immune response processes were identified by the present invention. These are promising candidates for disease biomarkers. In this respect, the identification of a molecular diagnostic gene signature of DCM across different microarray platforms and independent studies is useful for validating these biomarkers of heart failure and testing their clinical utility.
It is understood that the following examples are for illustrative purposes only, and should not be interpreted as restricting the spirit and scope of the invention, as defined by the scope of the claims that follow. All references identified herein, including U.S. patents, are hereby expressly incorporated by reference..
EXAMPLES EXAMPLE 1: STUDY DESIGN
Two microarray studies were performed on a total of 40 patient samples according to the following datasets, which were deposited in Gene Expression Omnibus (GEO, http:/ /www.ncbi.nlm.nih.gov/geo) and are accessible through GEO Series accession numbers GSE3585 and GSE3586. All transplanted patients gave written informed consent. The investigation was approved by the Institutional Review Board. As described below, 68 DCM and 40 NF samples from four independent studies were used for classification.
Dataset A: cDNA microarray study with 28 septal myocardial samples obtained from 13 DCM hearts at the time of transplantation and 15 NF donor hearts which were not transplanted because of palpable coronary calcifications. The latter patient group was not known to have any history of overt cardiovascular disease. Detailed patient characteristics are listed in Table 3. Dataset B: oligonucleotide microarray study with twelve independent subendocardial left ventricular samples were collected from seven DCM patients and five NF donors. Detailed patient characteristics are listed in Table 3.
After excision, all tissue specimens were frozen in liquid nitrogen and stored in the cold at -800C. RNA isolation, sample preparation, labeling, hybridization to RZPD LJnigene 3.1 cDNA (37.5K) and to Affymetrix U133A (22.2K) arrays was carried out as described previously (6, 11 , 12).
For the classification and the verification of the classifier gene set, two additional studies were included:
Dataset C: six NF, 21 DCM and 10 ICM samples hybridized to Affymetrix HG-U 133 A arrays (13). Normalized gene expression data were downloaded from Gene Expression Omnibus (accession number GSE1869).
Dataset D: available online through a program for genomic application funded by the National Heart, Lung, and Blood Institute and consisted of 14 NF, 27 DCM and 32 ICM samples hybridized to Affymetrix HG-U 133 2.0 plus arrays (http: / / www.cardiogenomics.org) (14).
EXAMPLE 2: DATA EXTRACTION AND STATISTICAL ANALYSIS
Preprocessing and most of the statistical analysis was performed using R (www.r-project.org) and Bioconductor (www.bioconductor.org). After quality control, all cDNA microarray data were normalized using arrayMagic (15) and "VSN" (16). Normalized data were filtered with respect to signal intensity. Quality of the HG- U133A arrays was assured by controlling for dynamic range, perfect match saturation, pixel noise, grid misalignment and signal to noise ratio. Microarray data of all samples in microarray study B were normalized in common using robust multi-array average (RMA) (17) implemented in Bioconductor's "affy"-package. Probe sets with "absent" calls in more than 50% of tissue samples in either group (NF and DCM) were excluded.
To determine differentially expressed genes, two class unpaired Significance Analysis of Microarrays (SAM) ( 18) was applied in both studies. Differences in gene expression were regarded as statistically relevant if a false discovery rate (FDR) of q<0.05 and a fold-change of ≥ 1.2 were achieved.
Mapping of transcripts between cDNA clones and Affymetrix probe sets was achieved by means of the MatchMiner software tool ( 19). Functional annotation of differentially expressed genes was based on hierarchical system of GO domains "cellular component", "biological process" and "molecular function". Overrepresentation of specific GO classes in a gene set was statistically analyzed by "FatiGO" (20).
Expression values of selected transcripts were validated by quantitative realtime PCR (Taqman). A detailed list of genes examined by RT-PCR including protocols used for RT-PCR is given in the supplemental section of "Materials and Methods".
EXAMPLE 3: IDENTIFICATION OF A GENE EXPRESSION SIGNATURE FOR DCM
Prediction analysis for microarrays (PAM) (21) was used for classification. The ability to correctly classify the status of DCM and NF samples was assessed by complete cross-validation implemented in the Bioconductor package "MCRestimate" (22). First, the samples in every study were randomly divided into equally sized subsets. In each following step, one subset was left aside and the classifier (filtering and PAM) was built on the remaining samples (training set). The status (NF vs. DCM) of the left-out samples was predicted and compared with the clinically diagnosed status. Optimization of the PAM parameter and of the number of genes remaining after variance filtering was achieved through a second cross-validation within each training set. To estimate the variability of the cross-validation result based on different sample compositions of the training set, the procedure was repeated 50 times. A sample was called "misclassified" if it was incorrectly classified in more than half of all cross-validations.
EXAMPLE 4: COMMON CHANGES OF BIOLOGICAL PROCESSES IN DILATED
CARDIOMYOPATHY
To identify differentially expressed genes, Datasets A and B were analyzed using SAM analysis, from which the following results were observed:
Dataset A: 1353 transcripts were up-regulated and 384 were down -regulated in DCM.
Dataset B: 399 transcripts were up-regulated and 75 transcripts were down- regulated in DCM.
In both studies, up-regulation was about four- to five-times more common than down -regulation, indicating a net transcriptional activation in heart failure. Overall, 76 transcripts were found to be consistently deregulated in both studies, representing an approximate 16% overlap at the single gene level between both microarray studies.
These 76 transcripts included known marker genes of heart failure, such as pro-brain natriuretic peptide (NPPB) (23), and chemokine (C-C motif) ligand 2 (CCL2) (24), but also many genes that have not previously been associated with cardiomyopathies .
Validation of microarray expression values was done by using quantitative real-time PCR (Taqman®). A strong correlation between the expression ratios of arrays and quantitative PCR was found for eleven of twelve differentially expressed genes analyzed in Dataset A.
To gain a comprehensive insight into the biological processes associated with DCM, differentially expressed genes were related to their respective GO classes. Thereby, it was possible to identify specific biological processes which were consistently enriched in up- or down-regulated transcripts of both studies. For example, both studies showed a marked up-regulation of transcripts involved in protein biosynthesis in DCM.
It is interesting to note that this functional GO class comprises qualitatively different genes in the two studies. While the Dataset A (cDNA microarray study) detected many elongation factors, ribosomal transcripts were more frequently recognized in Dataset B (the Affymetrix study).
Immunological genes
The biological processes "immune" and "inflammatory response" displayed the most significant changes in the group of down-regulated genes in DCM. These functional gene classes included components of the complement system (ClQB1 ClQRl, ClR, C3), chemokines (CCL2, CCLH, CCL18), interferon-induced genes (IFI27, IFI30, IFITMl, IFITM3, STAT3), calgranulins (S100A8, S100A9) and leukocyte antigens (CD14, CD53, CD163). This suggests a profound deregulation of the immune system in DCM. See Figure 1. In accordance with down-regulation of immune response genes in DCM, the functional class of "chemokine activity" displayed the most prominent down-regulation specified by level six of GO category "molecular function."
Extracellular genes Consistent with prominent structural remodeling in end-stage DCM, many deregulated transcripts were related to extracellular matrix composition and turnover. Notably, up-regulation of collagen transcripts (COL5A1 , COL8A1) and the procollagen COOH-terminal proteinase enhancer 2 (PCOLCE2) which binds to type I procollagen and potentiates its cleavage by procollagen C-proteinases, was observed. In addition, extracellular matrix protein 2 (a member of the small leucine rich proteoglycans (SLRP), important for collagen fibrillogenesis), asporin and most other members of the SLRP family were found to be up-regulated as well (decorin, lumican, biglycan, fibromodulin, osteoglycin, and osteomodulin), highlighting their importance in extracellular remodeling.
Z-disc genes
Furthermore, prominent up-regulation of genes coding for Z-disc components was noted, including caldesmon 1, sarcospan, sarcoglycan epsilon, utrophin, spectrin, titin, vinculin, sarcoglycan D and G, aJpha-actinin, LIM-domain binding 3, and alpha-2-capping protein. The Z-disc is thought to act as a sensor, linking biomechanical forces to the activation of stress pathways (25).
In this sense, we were able to corroborate the desensitization of beta- adrenergic signaling by gene expression analysis with down-regulation of adrenergic, beta- 1 -receptor in end-stage DCM.
Profound changes in signal transduction were also reflected in the down- regulation of the GO class "integral to plasma membrane" specified by level 6 of GO category "cellular component". With regard to the deregulation of important signaling pathways, we found transcriptional repression of the oncostatin M receptor (OSMR), anti-apoptotic gene BCL2L1 and signal transducer STAT3 which are involved in the protection of the myocardium from heart failure via the JAK-STAT pathway.
Ultimately, the balance between pro- and anti-apoptotic programs may determine if relevant loss of myocytes occurs. In line with this notion, up-regulation of anti- (FGFl, DSIPI, CCL2) and pro-apoptotic transcripts (BCLAFl , FOXO3A) were noted in these studies.
EXAMPLE 5: IDENTIFICATION OP A GENE EXPRESSION SIGNATURE FOR DCM
The second goal of the experiments of the present invention was to identify a specific set of transcripts which could reliably classify DCM and NF samples. To do so, the classification method "PAM" was performed on four independent microarray studies. Very low misclassification rates were found in two studies (Datasets A and B) and in Dataset D for the classification of NF versus DCM samples (Figure 2). Specifically, one out of twelve samples was misclassified in Dataset B. Likewise, Datasets A and D showed similar results, with one out of 28 and three out of 41 misclassified samples, respectively. In contrast, the classification algorithm did not show any predictive power in Dataset C. This was unexpected as the expression levels of established molecular cardiomyopathy markers, including pro-BNA or pro- ANP, suggested a clear separation into NF and failing ventricular samples (Figure 3).
The smallest number of probe sets used for classification was found in Dataset B, with a median of 5 probe sets comprising 31 different probe sets and a median absolute deviation of 2.9. Therefore, the ability of this set of 31 probe sets coding for 27 genes (Table 3) to correctly classify DCM and NF samples was evaluated in the remaining three studies: a reduced Dataset C and D down to the 31 probe sets that were used for classification in Dataset B.
Next, PAM was used without filtering on the reduced set for classification and estimated the prediction power by performing a complete classification procedure. For Dataset A, the presence of 17 out of 27 genes on the cDNA microarray was first determined and then proceeded with the classification as described above.
In summary, it was found that a set of 27 genes was sufficient to classify DCM and NF hearts across all four independent studies with more than 90% accuracy (Figure 2). Remarkably, this comprehensive gene set was able to accurately classify DCM and NF samples in Dataset C, for which the PAM method had initially failed.
The 27-gene signature included known marker genes of heart failure: pro- BNP, pro-ANP, corin (converts pro- ANP to biologically active ANP), transcripts encoding for sarcomer structure proteins (MYH6, MYHlO), anti-apoptotic processes (CCL2, PHLDAl , SNCA), cell growth (FRZB, SFRP4, SPOCK1 CTGF) and cell cycle control (G0S2, ETV5, RARRES l). Notably, the selection of individual known marker genes like pro-BNP alone was not sufficient to classify the DCM cases because of the heterogeneous gene expression across all 108 myocardial samples (Figure 3).
Since several previously described genes for heart failure were part of the classifier gene set, its validity might well hold for heart failure in general, irrespective of etiology. This hypothesis was tested by PAM analysis for ICM and NF samples included in Dataset C (ICM n= 10; NF n=6) and D (ICM n=32; NF n= 14) and it was found that this gene set also classified more than 90% of these samples correctly (data not shown).
Table 1.
Figure imgf000036_0001
EXAMPLE 6: MICROARRAY EXPERIMENTS
Hybridization to Unigene 3.1 cDNA microarrav: Microarrav spotting
Glass slides used for this study carried 37,530 cDNA clones selected from the Human UniGene 3.1 clone set (German Resource Center for Genome Research, Berlin, Germany). PCR products from cDNA clones were purified by isopropanol precipitation, washed in 70% ethanol, and dissolved in 3x SSC/ 1.5M betaine. The DNA was spotted on epoxysilane glass slides (Quantifoil, Jena, Germany) using the VersArray ChipWriter Pro System (Biorad, Munich, Germany) and SMP3 pins (Telechem, Sunnyvale, CA, USA). After spotting, microarrays were rehydratized, and DNA was denatured with boiling water prior to washing with 0.2% SDS, water, ethanol, and isopropanol. The arrays were dried with air pressure.
RNA isolation and amplification
Total cellular RNA was isolated from these tissue shavings by RNeasy Mini- Kit (Qiagen, Hilden, Germany) following shock freezing in liquid nitrogen and homogenization with a Micro- Dismembrator S (Braun Biotech, Melsungen, Germany). Quality of total RNA was checked with the Agilent 2100 bioanalyzer (Agilent Technologies GmbH, Waldbronn, Germany). All of the samples yielded high- quality RNA (28S/ 18S rRNA and E260/E280 ratio larger than 1.8) and could be used for further experiments. All RNA samples used for microarray hybridization were amplified once. Linear amplification was performed using the MessageAmp™ aRNA Kit (Ambion, Huntingdon, United Kingdom) according to the manufactures instructions. Quality of the amplified RNA (aRNA) was checked with the Agilent 2100 bioanalyzer (Agilent Technologies GmbH, Waldbronn, Germany). In the electropherogramm all aRNA samples showed a length distribution of 50 - 6000 nucleotides with maximum peaks at about 900 - 1000 nucleotide. In comparison to total RNA samples a slight length shortening of the aRNA samples was observed. Utility of T7 RNA polymerase based linear amplification has been shown previously. See Sultmann et al., "Gene expression in kidney cancer is associated with cytogenetic abnormalities, metastasis formation, and patient survival," Clin Cancer Res. 2005; l l :646-655.
RNA labeling and hybridization
Labeling of 2 μg amplified aRNA was performed using 1 μg random hexamer primers. Hybridization and washing were done as previously described (see Schneider et al., "Systematic analysis of T7 RNA polymerase based in vitro linear RNA amplification for use in microarray experiments," BMC Genomics, 2004. 30;5:29). Sample aRNA were labeled with Cy3 and a common reference aRNA (Stratagene, La Jolla, CA, USA) with Cy5. Cy3- and Cy5-labeled probes were purified with Microcon YM-30 columns (Milipore, Bedford, MA, USA), combined and resuspended in 50 μl Ix DIG-Easy hybridization buffer (Roche Diagnostics, Mannheim, Germany), containing 10x Denhardt's solution and 2 ng/μl Cotl-DNA (Invitrogen). Hybridizations were carried out in duplicate on Unigene 3.1 microarrays.
Image quantification and Data analysis
The hybridized arrays were scanned with the GenePix 4000B microarray scanner (Axon Instruments Inc., Union City, CA, USA), and analyzed using GenePix Pro 4.1 software (Axon Instruments).
Hybridization to Affymetrix oligonucleotide HG-U 133 A microarravs
In a first series of experiments on four samples we conducted duplicate replication to assess the technical reproducibility of the HG-U 133A microarray platform. Scatterplots of the replicate samples revealed an excellent correlation of the corresponding probe sets. For every replicate pair a Pearson correlation coefficient was calculated. All correlation coefficients achieved values >0.99. A two- sample Kolmogorov-Smirnov test revealed equal distributions of the replicate sample values (data not shown). Analysis of the differences of the paired observations showed symmetric distributions (medians lesβ than 0.01 , range -1 to + 1) and confirmed the excellent reproducibility published of this array platform. Considering this high reproducibility and the fact that Affymetrix experiments are still relatively expensive, only one HG-U 133A chip (Affymetrix, Santa Clara, CA, USA) representing 22.283 probe sets was used for each human heart sample. The sequences were derived from GenBank, dbEST and RefSeq. Sequence clusters were created from Build 133 of UniGene (April 20, 2001). Further information about the Gene Chip System can be obtained at www.affymetrix.com. See Liu et al., "NetAffx: Affymetrix probesets and annotations," Nucleic Acids Res., 2003;31 :82-6. mRNA-Preparation and Hybridization to Affymetrix HG-U 133A Microarravs
Total RNA was purified from homogenized deep-frozen tissue samples following the TRIZOL standard protocol as described by the manufacturer (GibcoBRL, Eggenstein, Germany) and quantified by photometry. 15On g total RNA was used for quality control using the RNA 6000 Nano LabChip kit and Agilent Bioanalyzer 2100 (Agilent Technologies, Palo Alto, CA, USA). Double-stranded cDNA was synthesized from lOμg total RNA by using the Superscript double-stranded cDNA synthesis kit (Invitrogen, Karlsruhe, Germany) with an HPLC-purified oligo(dT) primer containing a T7 RNA polymerase promoter (GENSET, La Jolla, CA, USA) following the manufacturer's protocol. Biotinylated cRNA probes were synthesized by in vitro transcription using ENZO BioArray RNA transcript labeling kit (ENZO Diagnostics, Farmingdale, NY, USA). Fragmentation of lOμg biotinylated cRNA as well as subsequent steps of hybridization, washing and staining followed instructions provided by Affymetrix (Affymetrix, Santa Clara, CA, USA).
Image quantification and Data analysis
The hybridized arrays were scanned with the GeneChip Scanner 2500 (Affymetrix, Santa Clara, CA, USA) and preprocessed using Microarray Suite 5 software (Affymetrix, Santa Clara, CA, USA).
EXAMPLE 7: VALIDATION BY TAQMAN® PCR
Expression patterns of six randomly chosen genes were validated by realtime RT-PCR using validated TaqMan probes (Applied Biosystem, Assay on Demands). GAPDH served as an internal control in real-time RT-PCR experiments. Before PCR amplification, contaminating genomic DNA was removed from the isolated RNA using the DNA-free-kit from Ambion (Austin, Texas, USA). Total RNA content was then quantified by spectrophotometry. For in vitro reverse transcription, 200 ng total RNA was preincubated at 70°C for 10 min with random hexamer primer. Then, 1 μl RNase-inhibitor (1.5 U/μl; RNAsin; from Promega, Heidelberg, Germany), 1 μl dNTP-mix (containing each desoxyribonucleotide in a concentration of 10 mM), 2 μl 0.1 M dithiothreitol, 4 μl 5x reaction buffer (250 mM Tris-HCl pH=8.4, 375 mM KCl, 15 mM MgCl2; from Invitrogen, Karlsruhe, Germany) and lμl Superscript II RNase H--reverse transcriptase were added to obtain a total volume of 20 μl and subsequently incubated at 42°C for 60 minutes. Finally, the enzyme was inactivated at 700C for 15 minutes.
Gene-specific primers and probes were designed using Primer 3 software (Applied Biosystems, Foster City, CA, USA) to amplify fragments of 70-150 base pairs in length close to the 3'-end of the transcript. Real-time PCR was performed in triplicate for each sample with 10 μl aliquot of diluted cDNA (1 :3). For the genes examined with TaqMan probes, a 2x Universal PCR Master-Mix from Perkin Elmer (containing AmpliTaq Gold™ DNA-Polymerase, AmpErase UNG, dNTPs with dUTPs, passive reference dye and optimized buffer including MgCh), 900 nM primer and 200 nM probe were used. PCR-amplification of cDNA started with a "hot start -activation of SureStart Taq polymerase at 95°C for 10 minutes, followed by 40 cycles of 15s denaturation at 95°C, annealing for 60s at 58°C, and 10s elongation at 72"C. All experimental results for the samples with a coefficient of variation >10% were retested. To evaluate differences in gene expression, a relative quantification method (ΔΔCt) based on the REST-program developed by Pfaffl was used. See Pfaffl et al., "Relative expression software tool (REST) for group-wise comparison and statistical analysis of relative expression results in real-time PCR," Nucleic Acids Res., 2002;30(9):e36.
Taqman validation of twelve candidate genes. GAPDH served as housekeeping gene. For the microarray analysis, q-value based on "Significance Analysis of Microarrays" (SAM) is given, whereas Taqman data was analyzed by Student's t-test. For Taqman data, the lowest, highest and median Ct-value are shown. Taqman assays were carried out with myocardial samples from dataset A only.
Table 2
Figure imgf000040_0001
Key:
Figure imgf000041_0001
The following is a list of 76 transcripts from genes whose expression levels were all up-regulated all in DCM hearts from Datasets A and B: ankyrin repeat and BTB domain containing 2, actin, alpha 2, smooth muscle, aorta, ATPase family homolog up- regulated in senescence cells, A kinase anchor protein 13, adaptor- related protein complex 3, mu 2 subunit, ADP-ribosylation factor 4, asporin, ATPase, Class VI, type HB, ataxin 10, BMP and activin membrane-bound inhibitor homolog (X. laevis), BCL2-associated transcription factor 1, basic helix-loop-helix domain containing, class B, 3, chromosome 16 open reading frame 45, caldesmon 1, CDC 14 cell division cycle 14 homolog B (S. cerevisiae), carbohydrate (N- acetylglucosamine 6-O) sulfotransferase 5, collagen, type V, alpha 1, collagen, type VlII, alpha 1 , coatomer protein complex, sυbunit zeta 2, cofactor required for SpI transcriptional activation, subunit 6, 77kDa, connective tissue growth factor, discs, large homolog 1 (Drosophila), dynein, cytoplasmic, light polypeptide 1, dedicator of cytokinesis 9, delta sleep inducing peptide, immunoreactor, extracellular matrix protein 2, female organ and adipocyte specific, exostoses (multiple) 1 , Fibroblast growth factor 1 (acidic), hypothetical protein FLJ22662, forkhead box O3A, high- mobility group nucleosomal binding domain 2, heat shock 9OkDa protein 1, beta, isoleucine-tRNA synthetase, insulin receptor, integrin, beta 5, laminin, beta 1 , latent transforming growth factor beta binding protein 1 , latent transforming growth factor beta binding protein 2, Microtubule-associated protein IB, microtubule-associated protein 4, metallothionein IE (functional), myosin, heavy polypeptide 10, non- muscle, nucleosome assembly protein 1-like 3, Nuclear factor I/B, natriuretic peptide precursor B, NAD(P)H:quinone oxidoreductase type 3, polypeptide A2, Ornithine decarboxylase 1, poly(A) binding protein, cytoplasmic 1 , propionyl Coenzyme A carboxylase, beta polypeptide, procollagen C-endopeptidase enhancer 2, phosphoinositide-3-kinase, catalytic, alpha polypeptide, proline synthetase co- transcribed homolog (bacterial), protease, serine, 1 1 (IGF binding), protein tyrosine phosphatase, non-receptor type substrate 1, RAB28, member RAS oncogene family, RNA binding motif protein 15, Rho-related BTB domain containing 1 , ribosomal protein L22, ribosomal protein L4, sin3-associated polypeptide, lδkDa, SEC31-like 1 (S. cerevisiae), septin 2, sarcoglycan, epsilon, solute carrier family 25 (mitochondrial carrier), member 5, solute carrier family 30 (zinc transporter), member 1, sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican), sprouty-related, EVH l domain containing 2, sprouty homolog 1, antagonist of FGF signaling (Drosophila), sarcospan, t-complex-associated-testis-expressed 1-like 1 , transmembrane protein 43, thioredoxin domain containing 7, and exportin 1 (CRM l homolog, yeast).
Three transcripts were down-regulated from both Datasetsia disintegrin-like and metalloprotease (reprolysin type) with thrombospondin type 1 motif, 9; chemokine (C-C motif) ligand 2; and oncostatin M receptor. Table 3 Classifier gene set in DCM
27 transcripts classifying DCM and NF samples (generated by PAM classification from dataset B and listed in alphabetical order). In addition to expression values for DCM and NF samples of dataset B, the ranking of the single genes for classification of datasets A-D based on PAM-parameters is given. When transcripts were represented by two probe sets, the ranking of both is indicated. Genes signed with a line in dataset A were not resent in the cDNA arra dataset.
Figure imgf000043_0001
It will be apparent to those skilled in the art that various modifications and variations can be made in the methods and compositions of the present inventions without departing from the spirit or scope of the invention. Thus, it is intended that the present invention cover the modification and variations of the invention provided they come within the scope of the appended claims and their equivalents.
REFERENCES
(1) Dec GW, Fuster V. Idiopathic dilated cardiomyopathy. N Engl J Med. 1994;331 : 1564-75.
(2) Roger VL, Weston SA, Redfield MM, et al. Trends in heart failure incidence and survival in a community-based population. J Am Med Assoc. 2004;292:344-50.
(3) Hwang JJ, Allen PD, Tseng GC, et al. Microarray gene expression profiles in dilated and hypertrophic cardiomyopathic end-stage heart failure. Physiol Genomics. 2002; 10:31-44.
(4) Tan FL, Moravec CS, Li J, et al. The gene expression fingerprint of human heart failure. Proc Natl Acad Sci U S A. 2002;99: 1 1387-92.
(5) Yung CK, Halperin VL, Tomaselli GF, et al. Gene expression profiles in end-stage human idiopathic dilated cardiomyopathy: altered expression of apoptotic and cytoskeletal genes. Genomics. 2004;83:281-97.
(6) Barth AS, Merk S, Arnoldi E, et al. Functional profiling of human atrial and ventricular gene expression. Pflϋgers Archiv. 2005;450:201-8.
(7) Nabauer M, Beuckelmann DJ, Uberfuhr P, et al. Regional differences in current density and rate-dependent properties of the transient outward current in subepicardial and subendocardial myocytes of human left ventricle. Circulation. 1996;93: 168-77.
(8) Ramakers C1 Stengl M, Spatjens RLHM, et al. Molecular and electrical characterization of the canine cardiac ventricular septum. J MoI Cell Cardiol. 2005:38: 153-61.
(9) Tabibiazar R, Wagner RA, Liao A & Quertermous T. Transcriptional profiling of the heart reveals chamber-specific gene expression patterns. Circ Res. 2003; 93: 1 193-201. (10) Ashburner M, Ball CA1 Blake JA, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25-9.
(11) Suit man n H, von Heydebreck A, Huber W, et al. Gene expression in kidney cancer is associated with cytogenetic abnormalities, metastasis formation, and patient survival. Clin Cancer Res. 2005; 1 1 :646-55.
(12) Barth AS, Merk S, Arnoldi E, et al. Reprogramming of the human atrial transcriptome in permanent atrial fibrillation - Expression of a ventricular-like genomic signature. Circ Res. 2005;96: 1022-9.
(13) Kittleson MM, Minhas KM, Irizarry RA, et al. Gene expression analysis of ischemic and nonischemic cardiomyopathy: shared and distinct genes in the development of heart failure. Physiol Genomics. 2005;21:299-307.
(14) Genomics of Cardiovascular Development, Adaptation, and Remodeling. NHLBI Program for Genomic Applications, Harvard Medical School. URL: http: / /www. cardiogenonriics.org |accessed 16 Oct 2004)
(15) Buness A, Huber W, Steiner K, et al. arrayMagic: two-colour cDNA microarray quality control and preprocessing. Bioinformatics. 2005;21:554-6.
( 16) Huber W, von Heydebreck A1 Sύltmann H, et al. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 2002; 18, Suppl. l:S96-S104
(17) Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4:249-64.
( 18) Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001 ;98:51 16-21.
(19) Bussey KJ1 Kane D, Sunshine M, et al. MatchMiner: a tool for batch navigation among gene and gene product identifiers. Genome Biol. 2003;4:R27.
(20) Al Shahrour F, Diaz-Uriarte R, Dopazo J. FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics. 2004;20:578-80.
(21) Tibshirani R, Hastie T, Narasimhan B, et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A. 2002:99:6567-72. (22) Ruschhaupt M, Huber W, Poustka A, et al. A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks. Statistical Applic Gen MoI Biol. 2004; 3: 1 ; article 37.
(23) Kaab S, Barth AS, Margerie D, et al. Global gene expression in human myocardium-oligonucleotide microarray analysis of regional diversity and transcriptional regulation in heart failure. J MoI Med. 2004;82:308-16.
(24) Damas JK, Eiken HG1 Oie E, et al. Myocardial expression of CC- and CXC-chemokines and their receptors in human end-stage heart failure. Cardiovasc Res. 2000;47:778-87.
(25) Chien KR. Genomic circuits and the integrative biology of cardiac diseases. Nature. 2000;407:227-32.
(26) Goetze JP, Gore A, Moller CH, et al. Acute myocardial hypoxia increases BNP gene expression. FASEB J. 2004; 18: 1928-30.
(27) Wiese S, Breyer T, Dragu A, et al. Gene expression of brain natriuretic peptide in isolated atrial and ventricular human myocardium: influence of angiotensin II and diastolic fiber length. Circulation. 2000; 102:3074-9.
(28) Senbonmatsu T, Saito T, Landon EJ, et al. A novel angiotensin II type 2 receptor signaling pathway, possible role in cardiac hypertrophy. EMBO J. 2003;22:6471-82.
(29) Ruperez M, Lorenzo O, Blanco-Colio LM, et al. Connective tissue growth factor is a mediator of angiotensin II-induced fibrosis. Circulation. 2003; 108: 1499- 1505.
(30) Omura T, Yoshiyama M, Kim S, et al. Involvement of apoptosis signal- regulating kinase- 1 on angiotensin II-induced monocyte chemoattractant protein- 1 expression. Arterioscler Thromb Vase Biol. 2004;24:270-5.
(31) Schumann H1 Holtz J, Zerkowski HR, et al. Expression of secreted frizzled related proteins 3 and 4 in human ventricular myocardium correlates with apoptosis related gene expression. Cardiovasc Res. 2000;45:720-8.
(32) van Gijn ME, Daemen MJ1 Smits JF et al. The wnt-frizzled cascade in cardiovascular disease. Cardiovasc Res. 2002;55: 16-24. (33) Liu HR, Zhao RR, Jiao XY, et al. Relationship of myocardial remodeling to the genesis of serum autoantibodies to cardiac beta(l ^adrenoceptors and muscarinic type 2 acetylcholine receptors in rats. J Am Coll Cardiol. 2002;39: 1866-73.
(34) Torre-Amione G. Immune activation in chronic heart failure. Am J Cardiol. 2005;95:3C-8C.
(35) Yamamoto T, Eckes B, Mauch C et al. Monocyte chemoattractant protein- 1 enhances gene expression and synthesis of matrix metalloproteinase-1 in human fibroblasts by an autocrine IL-I alpha loop. J Immunol. 2000;164:6174-9.

Claims

WHAT IS CLAIMED IS:
1. A microarray comprising at least about two nucleic acid molecules, wherein each nucleic acid molecule comprises a nucleotide sequence that is complementary to a target nucleotide sequence selected from the group consisting of Synuclein alpha (non A4 component of amyloid precursor) ("SNCA"), Asporin ("ASPN"), Secreted frizzled-related protein 4 ("SFRP4"), Pleckstrin homology-like domain, family A, member 1 ("PHLDAl"), Frizzled-related protein ("FRZB"), Myosin heavy chain 6, cardiac muscle, alpha (cardiomyopathy, hypertrophic 1) ("MYH6"), Chemokine (C-C motif) ligand 2 ("CCL2"), Ornithine decarboxylase 1 ("ODCl"), Retinoic acid receptor responder (tazarotene induced) 1 ("RARRESl"), Complement factor H ("CFH"), Alanine-glyoxylate aminotransferase 2-like 1 ("AGXT2L1"), Myosin heavy chain 10, non-muscle (MYHlO), Ficolin (collagen/ fibrinogen domain containing) 3 (Hakata antigen) ("FCN3"), SlOO calcium binding protein A8 ("S100A8"), Corin serine peptidase ("CORIN"), Natriuretic peptide precursor A ("NPPA"), Procollagen C-endopeptidase enhancer 2 ("PCOLCE2"), Natriuretic peptide precursor B ("NPPB"), Activating transcription factor 3 ("ATF3"), Inhibitor of DNA binding 4, dominant negative helix- loop-helix protein ("ID4"), Sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 ("SPOCK 1"), Connective tissue growth factor ("CTGF"), G0/Glswitch 2 ("G0S2"), Kelch-like 3 (Drosophila) ("KLHL3"), Zinc finger and BTB domain containing 16 ("ZBTB16"), AE binding protein 1 ("AEBPl"), and ETS variant gene 5 (ets-related molecule) (ETV5).
2. The microarray of claim 1, wherein each of the nucleic acid molecules comprises a nucleotide sequence that is complementary to a target nucleotide sequence selected from the group consisting of SEQ ID NOs 1-27.
3. The microarray of claim 1, wherein each of the nucleic acid molecules comprises a different nucleotide sequence as compared to each of the other nucleic acid molecules present in the microarray.
4. The microarray of claim 1, wherein the nucleic acid molecule is an oligonucleotide or a probe.
5. The microarray of claim 4, wherein the oligonucleotide or probe is labeled with a moiety to promote signal detection after the oligonucleotide or probe hybridizes to its corresponding target sequence.
6. The microarray of claim 1, comprising nucleic acid molecules that comprise complementary sequences to at least the following sequences a myosin heavy polypeptide 10 (non-muscle) gene, a synuclein gene, an alpha putative lymphocyte
1 12 GO/G1 switch gene gene, an ets variant gene 5, an AE binding protein 1 gene, a kelch- like 3 gene, a zinc finger and BTB domain containing 16 gene, and a procollagen C- endopeptidase enhancer 2 gene.
7. The microarray of claim 1 , wherein the microarray comprises nucleic acids that are completely or partially complementary to a sequence from each sequence depicted by SEQ ID NOs: 1-27.
8. A method for diagnosing a cardiomyopathy state in a subject, comprising:
(a) exposing the microarray of claim 1 to an isolated biological sample;
(b) determining which nucleic acid molecules of the collection hybridize to its corresponding target gene sequence, and
(c) comparing that hybridization pattern to the hybridization pattern of a control for the same genes, wherein a difference between the two patterns is indicative that the subject has a cardiomyopathy disease.
9. The method of claim 8, wherein the biological sample is a tissue biopsy.
10. The method of claim 9, wherein the tissue biopsy is a heart muscle biopsy.
11. The method of claim 8, wherein the cardiomyopathy disease is dilated cardiomyopathy or idiopathic dilated cardiomyopathy.
12. An assay for diagnosing a cardiomyopathic state in a subject, comprising:
(a) determining the gene expression levels of at least one of the genes selected from the group consisting of SEQ ID NOs 1-27, in a biological sample isolated from the subject, and
(b) comparing the expression levels to a biological sample from a healthy individual, wherein a difference in the expression levels indicates that the subject has a cardiomyopathic disease.
13. The assay of claim 12, wherein the expression levels that are determined are the levels of either the respective gene transcripts or proteins.
14. The assay of claim 12, wherein the expression levels of all 27 genes are determined in the subject's biological sample and compared to the expression levels of the same 27 genes in the healthy subject's biological sample.
15. The assay of claim 12, wherein the biological sample is blood or muscle.
1 13
16. The assay of claim 16, wherein the muscle is cardiac muscle.
17. The assay of claim 12, wherein the step of determining gene expression levels is performed by at least one of a microarray, PCR, RT-PCR, TaqMan RT-PCR, Northern Blot, Western Blot, antibody detection, or ELISA.
18. The assay of claim 12, wherein the cardiomyopathic disease is dilated cardiomyopathy or idiopathic dilated cardiomyopathy.
1 14
PCT/IB2007/004191 2006-07-25 2007-07-24 A common gene expression signature in dilated cardiomyopathy WO2008053358A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP07866600A EP2046997A2 (en) 2006-07-25 2007-07-24 A common gene expression signature in dilated cardiomyopathy
JP2009521384A JP2009544306A (en) 2006-07-25 2007-07-24 Common gene expression signature for dilated cardiomyopathy

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US83295906P 2006-07-25 2006-07-25
US60/832,959 2006-07-25

Publications (3)

Publication Number Publication Date
WO2008053358A2 true WO2008053358A2 (en) 2008-05-08
WO2008053358A3 WO2008053358A3 (en) 2008-11-20
WO2008053358A8 WO2008053358A8 (en) 2009-08-27

Family

ID=39344668

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2007/004191 WO2008053358A2 (en) 2006-07-25 2007-07-24 A common gene expression signature in dilated cardiomyopathy

Country Status (3)

Country Link
EP (1) EP2046997A2 (en)
JP (1) JP2009544306A (en)
WO (1) WO2008053358A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011519039A (en) * 2008-04-30 2011-06-30 エフ.ホフマン−ラ ロシュ アーゲー Use of SFRP-3 in the assessment of heart failure
CN110809718A (en) * 2017-06-21 2020-02-18 韩国生命工学研究院 Method and kit for diagnosing muscle weakness-related diseases using blood biomarkers
CN113355332A (en) * 2021-07-22 2021-09-07 青岛市妇女儿童医院 HEG1 gene mutant and application thereof
US11142570B2 (en) 2017-02-17 2021-10-12 Bristol-Myers Squibb Company Antibodies to alpha-synuclein and uses thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999024571A2 (en) * 1997-11-10 1999-05-20 Curagen Corporation Differentially expressed genes in cardiac hypertrophy and their uses in treatment and diagnosis
WO2003006687A2 (en) * 2001-07-10 2003-01-23 Medigene Ag Novel target genes for diseases of the heart
WO2003040407A2 (en) * 2001-11-09 2003-05-15 Max-Planck-Gesellschaft Novel markers for cardiopathies
WO2005060656A2 (en) * 2003-12-16 2005-07-07 Hare Joshua M Identification of a gene expression profile that differentiates ischemic and nonischemic cardiomyopathy

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999024571A2 (en) * 1997-11-10 1999-05-20 Curagen Corporation Differentially expressed genes in cardiac hypertrophy and their uses in treatment and diagnosis
WO2003006687A2 (en) * 2001-07-10 2003-01-23 Medigene Ag Novel target genes for diseases of the heart
WO2003040407A2 (en) * 2001-11-09 2003-05-15 Max-Planck-Gesellschaft Novel markers for cardiopathies
WO2005060656A2 (en) * 2003-12-16 2005-07-07 Hare Joshua M Identification of a gene expression profile that differentiates ischemic and nonischemic cardiomyopathy
US20050158756A1 (en) * 2003-12-16 2005-07-21 Joshua Hare Identification of a gene expression profile that differentiates ischemic and nonischemic cardiomyopathy

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Affimetrix GeneChip Human Genome U133 Array Set HG-U133A" GEO,, 1 January 1900 (1900-01-01), XP002254749 *
BOHELER ET AL: "Gene expression in cardiac hypertrophy" DS IN CARDIOVASCULAR MEDICINE, ELSEVIER SCIENCE, NEW YORK, NY, US, vol. 2, no. 5, 1 September 1992 (1992-09-01), pages 176-182, XP002096139 ISSN: 1050-1738 *
HWANG JUEY-JEN ET AL: "Microarray gene expression profiles in dilated and hypertrophic cardiomyopathic end-stage heart failure" PHYSIOLOGICAL GENOMICS, vol. 10, October 2002 (2002-10), pages 31-44, XP002488465 ISSN: 1094-8341 *
TAN F-L ET AL: "The gene expression fingerprint of human heart failure" PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA, NATIONAL ACADEMY OF SCIENCE, WASHINGTON, DC, vol. 99, no. 17, 20 August 2002 (2002-08-20), pages 11387-11392, XP002970100 ISSN: 0027-8424 *
YUNG C K ET AL: "Gene expression profiles in end-stage human idiopathic dilated cardiomyopathy: altered expression of apoptotic and cytoskeletal genes" GENOMICS, ACADEMIC PRESS, SAN DIEGO, US, vol. 83, no. 2, 1 February 2004 (2004-02-01), pages 281-297, XP004482956 ISSN: 0888-7543 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011519039A (en) * 2008-04-30 2011-06-30 エフ.ホフマン−ラ ロシュ アーゲー Use of SFRP-3 in the assessment of heart failure
US11142570B2 (en) 2017-02-17 2021-10-12 Bristol-Myers Squibb Company Antibodies to alpha-synuclein and uses thereof
US11827695B2 (en) 2017-02-17 2023-11-28 Bristol-Myers Squibb Company Antibodies to alpha-synuclein and uses thereof
CN110809718A (en) * 2017-06-21 2020-02-18 韩国生命工学研究院 Method and kit for diagnosing muscle weakness-related diseases using blood biomarkers
CN113355332A (en) * 2021-07-22 2021-09-07 青岛市妇女儿童医院 HEG1 gene mutant and application thereof

Also Published As

Publication number Publication date
JP2009544306A (en) 2009-12-17
WO2008053358A3 (en) 2008-11-20
EP2046997A2 (en) 2009-04-15
WO2008053358A8 (en) 2009-08-27

Similar Documents

Publication Publication Date Title
US11591655B2 (en) Diagnostic transcriptomic biomarkers in inflammatory cardiomyopathies
US20210062275A1 (en) Methods to predict clinical outcome of cancer
US20200115755A1 (en) Transcriptomic biomarkers for individual risk assessment in new onset heart failure
US20180094323A1 (en) Test Kits and Methods for Their Use to Detect Genetic Markers for Transitional Cell Carcinoma of the Bladder
US20210302437A1 (en) Transcriptomic biomarker of myocarditis
WO2012112315A2 (en) Methods for diagnosis of kawasaki disease
EP3152327B1 (en) Pulmonary hypertension biomarker
US20100304987A1 (en) Methods and kits for diagnosis and/or prognosis of the tolerant state in liver transplantation
WO2011044927A1 (en) A method for the diagnosis or prognosis of an advanced heart failure
US20120004127A1 (en) Gene expression markers for colorectal cancer prognosis
US10584383B2 (en) Mitochondrial non-coding RNAs for predicting disease progression in heart failure and myocardial infarction patients
EP2046997A2 (en) A common gene expression signature in dilated cardiomyopathy
WO2006132983A2 (en) Differential expression of molecules associated with vascular disease risk
US20140171371A1 (en) Compositions And Methods For The Diagnosis of Schizophrenia
CA2549712A1 (en) Identification of a gene expression profile that differentiates ischemic and nonischemic cardiomyopathy
CA2728688A1 (en) In vitro diagnosis/prognosis method and kit for assessment of tolerance in liver transplantation
CA2525179A1 (en) A gene equation to diagnose rheumatoid arthritis
JP2010261920A (en) Diagnostic agent for type-two diabetes
AU2014259525B2 (en) A transcriptomic biomarker of myocarditis
US20180142297A1 (en) Systems and methods for characterizing granulomatous diseases
EP2313519A1 (en) In vitro diagnosis/prognosis method and kit for assessment of tolerance in liver transplantation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07866600

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2009521384

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2007866600

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: RU