WO2014197713A2 - Molecular phenotyping of idiopathic interstitial pneumonia identifies two subtypes of idiopathic pulmonary fibrosis - Google Patents

Molecular phenotyping of idiopathic interstitial pneumonia identifies two subtypes of idiopathic pulmonary fibrosis Download PDF

Info

Publication number
WO2014197713A2
WO2014197713A2 PCT/US2014/041129 US2014041129W WO2014197713A2 WO 2014197713 A2 WO2014197713 A2 WO 2014197713A2 US 2014041129 W US2014041129 W US 2014041129W WO 2014197713 A2 WO2014197713 A2 WO 2014197713A2
Authority
WO
WIPO (PCT)
Prior art keywords
ipf
subtype
sample
expression
sequences
Prior art date
Application number
PCT/US2014/041129
Other languages
French (fr)
Other versions
WO2014197713A3 (en
Inventor
Ivana V. Yang
Christopher D. Coldren
David A. Schwartz
Original Assignee
The Regents Of The University Of Colorado, A Body Corporate
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Regents Of The University Of Colorado, A Body Corporate filed Critical The Regents Of The University Of Colorado, A Body Corporate
Publication of WO2014197713A2 publication Critical patent/WO2014197713A2/en
Publication of WO2014197713A3 publication Critical patent/WO2014197713A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present disclosure relates to compositions and devices useful in the identification, characterization, diagnosis, and treatment of previously unknown subtypes of idiopathic pulmonary fibrosis, and methods of using the same.
  • Idiopathic pulmonary fibrosis is the common form of MPs.
  • IPF is defined by the presence of the prototypical form of pulmonary fibrosis, usual interstitial pneumonia (UIP), which is a fibrosing interstitial pneumonia with a pattern of heterogeneous, subpleural regions of fibrotic and remodeled lung.
  • IPF can develop as a result of excessive, sequential lung injury and/or aberrant wound healing but the mechanisms that account for excessive lung injury or aberrant repair remain unknown.
  • IPF is generally untreatable and often results in death within 3 years of diagnosis. Some IPF patients may, however, live with a diagnosis of IPF for 20 years or more.
  • IPF idiopathic pulmonary fibrosis
  • compositions, devices, and methods for aiding in the diagnosis and treatment of IPF are compositions, devices, and methods for aiding in the diagnosis and treatment of IPF.
  • compositions and devices capable of distinguishing between two previously unknown molecular subtypes of IPF are useful, as are methods of using the same.
  • the present disclosure relates to the discovery that the IPF phenotype is actually heterogeneous and consists of two distinct molecular phenotypes or subtypes, which have previously presented clinically the same.
  • the two subtypes of IPF can be distinguished from each other by identification of differences in expression of certain genes that have not been previously implicated in IPF.
  • IPF existed in two distinct molecular phenotypes.
  • the two subtypes display markedly different survival rates. Therefore, differentiation between the two subtypes of IPF can determine the relative life expectancy of an individual IPF patient, which is decreased in one subtype and increased in the other.
  • compositions, nucleic acids, devices, and methods that aid in identifying novel sub-types of pulmonary disease. Also disclosed herein are compositions, nucleic acids, devices and methods that are useful in differentiating between novel subtypes of pulmonary disease and are thus capable of indicating whether an individual patient has one subtype or another.
  • the pulmonary disease is Idiopathic Pulmonary Fibrosis (IPF).
  • IPF Idiopathic Pulmonary Fibrosis
  • the subtypes are designated IPF subtype I and IPF subtype II.
  • certain genes disclosed herein are more highly expressed in tissue from IPF subtype II than subtype I and/or normal tissue.
  • compositions for diagnosing or classifying a lung disease comprising: one or more nucleic acids derived from expressed sequences obtained from a test sample; and diagnostic nucleic acids comprising one or more of all or a part of the sequences disclosed in SEQ ID NOs:1 -197.
  • the test sample nucleic acids and/or the diagnostic nucleic acids are labeled.
  • the composition may further comprise a device for quantifying the label.
  • Also disclosed are devices for diagnosing or classifying a lung disease the devices comprising a composition comprising one or more labelled nucleic acids derived from expressed sequences obtained from a test sample; and diagnostic nucleic acids comprising one or more of all or a part of expressed sequences; and a device for measuring the label.
  • Also described herein are methods of diagnosing or classifying a lung disease comprising: collecting a lung tissue sample; processing the tissue sample; purifying expressed sequences from the tissue sample; determining the abundance of one or more expressed sequences in the tissue sample; and classifying the lung disease based on the abundance of one or more expressed sequences.
  • IPF subtype II tissues exhibit overexpression of one or more diagnostic genes.
  • methods of treating IPF comprising: providing a lung tissue sample; obtaining expressed sequences from the sample; detecting the abundance of one or more expressed sequences having nucleotide sequences identical or homologous to all or a part of one or more of expressed sequences; classifying the lung tissue sample as belonging to I PF subtype I or IPF subtype II.
  • Figure 1 shows gene expression profiling data identifying the two subtypes of IPF.
  • mRNA profiles from 1 19 IPF lungs were subject to hierarchical clustering based on the expression of 472 transcripts that are differentially expressed at 5% FDR and with greater than 2 fold change in IPF compared to control lung.
  • the distance metric is Euclidean, with complete linkage across samples and Ward's linkage across genes.
  • honeycombing and fibroblastic foci in each sample as assessed by pathology is depicted by the shade of gray: light gray (none; 0%), medium gray (mild; 1 -25%), dark gray (moderate; 25-50%) and to black (severe; >75%); white indicates missing data.
  • Figure 2 shows quantitative real-time PCR data confirming increased expression of cilium-associated genes in IPF subtype-ll. Plotted are average fold change for IPF compared to control lung (black bars) and IPF subtype-ll compared to IPF subtype-l (white bar) for four cilium associated genes. Error bars represent standard deviations.
  • Figure 3 shows data confirming that expression of cilium-associated genes distinguishes IPF subtype-l from IPF subtype-ll and that the differential expression of cilium- associated genes in IPF defines two subcategories of IPF.
  • A Hierarchical clustering of cilium-associated genes (GO:0005929, cellular component (cilium)) across samples in IPF subtype-l (grey) and IPF subtype-ll (black) in Figure 1. Asterisk next to the gene names indicates presence in cluster B of Figure 1 .
  • C Expression levels of DNAH6 and DNAH7 correlate with the extent of microscopic honeycombing (left) but not with the presence of fibroblastic foci (right).
  • Figure 4 shows data confirming that cilium-associated gene expression signature predicts survival in an independent cohort of IPF patients.
  • Figure 5 depicts a list of differentially expressed genes from Cluster A (Figs. 5A and 5B) and Cluster B (Figs. 5C, 5D, 5E, and 5F).
  • Figure 6 depicts a list of 15 differentially expressed cilium-associated genes.
  • Figure 7 depicts hierarchical clustering of 1 19 IPF/UIP (black) and 50 control (white) based on expression of 472 transcripts with >2 fold change between IPF and control.
  • Figure 8 shows IHC staining of ARL13B, FOXJ1 , and MUC5B in transition zones from normal airways/bronchioles to honeycomb cysts in IPF lung. Tissue was counterstained with hematoxylin. The inset shows a magnification of basal and hyperplastic ATM cells.
  • Figure 9 shows additional IHC staining that shows widespread dysregulation of FOXJ1 in honeycomb and alveolar cysts, the same areas where MUC5B is overproduced in IPF lung. Tissue was counterstained with hematoxylin. Images were taken at 5X or 10X.
  • Figure 10 IHC staining for ARL13B demonstrates increased and misregulated expression of this early marker of ciliogenesis 21 days following 1 .5U/kg i.t. bleomycin.
  • the magnitude of misregulated expression is correlated with Muc5b status and the extent of fibrosis as a result of bleomycin exposure.
  • Tissue was counterstained with hematoxylin, Images were taken at 10X magnification (inset at 20X..
  • Hybridization refers to the binding of two single stranded nucleic acids via complementary base pairing. Extensive guides to the hybridization of nucleic acids can be found in: Tijssen, Laboratory Techniques in Biochemistry and Molecular Biology- Hybridization with Nucleic Acid Probes Part I, Ch. 2, Overview of principles of hybridization and the strategy of nucleic acid probe assays” (1993), Elsevier, N.Y.; and Sambrook et al., Molecular Cloning: A Laboratory Manual (3rd ed.) Vol. 1 -3 (2001 ), Cold Spring Harbor Laboratory, Cold Spring Harbor Press, N.Y.
  • hybridizing specifically to refers to the preferential binding, duplexing, or hybridizing of a nucleic acid molecule to a particular probe under stringent conditions.
  • stringent conditions refers to hybridization conditions under which a probe will hybridize preferentially to its target subsequence, and to a lesser extent, or not at all, to other sequences in a mixed population (e.g., an mRNA extraction from a tissue biopsy).
  • Stringent hybridization and stringent hybridization wash conditions are sequence-dependent and are different under different environmental parameters.
  • highly stringent hybridization and wash conditions are selected to be about 5° C lower than the thermal melting point (T m ) for a specific sequence at a defined ionic strength and pH.
  • T m is the temperature at which 50% of the target sequence hybridizes to a perfectly matched probe.
  • Very stringent conditions are selected to be equal to the Tm for a particular probe.
  • a high stringency wash is preceded by a low stringency wash to remove background probe signal.
  • An example of stringent hybridization conditions for hybridization of complementary nucleic acids which have more than 100 complementary residues on an array is 42 ° C using standard hybridization solutions, with the hybridization being carried out overnight.
  • An example of highly stringent wash conditions is a 0.15 M NaCI wash at 72 ° C for 15 minutes.
  • An example of stringent wash conditions is a wash in 0.2X Standard Saline Citrate (SSC) buffer at 65 ° C for 15 minutes.
  • An example of a medium stringency wash for a duplex of, for example, more than 100 nucleotides is 1 X SSC at 45 0 C for 15 minutes.
  • An example of a low stringency wash for a duplex of, for example, more than 100 nucleotides is 4X to 6X SSC at 40 ° C for 15 minutes.
  • Nucleic acid can refer to a deoxyribonucleotide (DNA) or ribonucleotide (RNA) in either single- or double-stranded form and includes all nucleic acids comprising naturally occurring nucleotide bases as well as nucleic acids containing any and/or all analogues of natural nucleotides. This term also includes nucleic acid analogues that are metabolized in a manner similar to naturally occurring nucleotides, but at rates that are improved for the purposes desired.
  • DNA deoxyribonucleotide
  • RNA ribonucleotide
  • nucleic-acid-like structures with synthetic backbone analogues including, without limitation, phosphodiester, phosphorothioate, phosphorodithioate, methylphosphonate, phosphoramidate, alkyl phosphotriester, sulfamate, 3'-thioacetal, methylene(methylimino), 3'-N-carbamate, morpholino carbamate, and peptide nucleic acids (PNAs) (see, e.g.: Oligonucleotides and Analogues, a Practical Approach," edited by F.
  • PNAs contain non- ionic backbones, such as N-(2- aminoethyl) glycine units. Phosphorothioate linkages are described in: WO 97/0321 1 ; WO 96/39154; and Mata (1997) Toxicol. Appl. Pharmacol. 144:189-197.
  • a patient means a subject whose tissue is used for analysis.
  • a patient may be a mammal such as, for example, a murine animal, a canine animal, a porcine animal, a feline animal, a simian animal, a hominid, or the like.
  • a patient is a human.
  • Probe or “nucleic acid probe” refer to one or more nucleic acid fragments whose specific hybridization to a sample can be detected.
  • probes are arranged on a substrate surface in an array. The probe may be unlabelled, or it may contain one or more labels so that its binding to a nucleic acid can be detected.
  • a probe can be produced from any source of nucleic acids from one or more particular, pre-selected portions of a chromosome including, without limitation, one or more clones, an isolated whole chromosome, an isolated chromosome fragment, or a collection of polymerase chain reaction (PCR) amplification products.
  • PCR polymerase chain reaction
  • the probes contain sequences specific to, or characteristic of, any one or more of the genes described in FIGS. 5 and 6, and SEQ ID NOS:1 -197.
  • the sequence of the probes can be varied.
  • the probe sequence can be varied to produce probes that are substantially identical to the probes disclosed hereinbelow, but that retain the ability to hybridize specifically to the same targets or samples as the probe from which they were derived.
  • the present disclosure relates to the discovery that the IPF phenotype is actually heterogeneous and consists of two distinct molecular phenotypes or subtypes, which have previously presented clinically the same.
  • IPF exists as two distinct subtypes that have previously presented clinically the same.
  • the two subtypes differ in their expression of cilium genes and can thus be distinguished based on this differential expression.
  • IPF subtype-l displays decreased levels of cilium gene expression as compared to IPF subtype-ll, which displays increased levels of cilium gene expression.
  • IPF subtype-l which displays decreased levels of cilium gene expression, displays a significantly longer survival rate than IPF subtype-ll, which displays increased levels of cilium gene expression.
  • measuring the differential levels of expression of cilium genes in patients with IPF identifies two novel subtypes of IPF that otherwise present clinically the same. Identification of the subtypes can allow for better determination of disease prognosis, as well as improved therapeutic approaches for one or both IPF subtypes.
  • IPF patients with elevated cilium gene expression display increased microscopic honeycombing. In some embodiments, IPF patients with elevated cilium gene expression display increased microscopic honeycombing and elevated MUC5B expression.
  • MUC5B is a secreted airway mucin. Prior to the filing of the present disclosure, secreted airway mucins had not been implicated in the development of pulmonary fibrosis. Therefore, in certain embodiments, IPF subtype I I displays elevated MUC5B expression. Without wishing to be bound by any theory, it is believed that excess concentrations of MUC5B compromise mucosal host defense, reducing lung clearance of inhaled particles, dissolved chemicals, and microorganisms, resulting in enhanced injury, chronic
  • IPF subtype-ll displays a unique IPF phenotype comprising honeycomb cysts, elevated MUC5B expression, and elevated expression of cilium-associated genes.
  • IPF subtype-ll patients displaying elevated cilium- associated gene expression do not show increased levels of fibroblastic foci.
  • the IPF phenotype is heterogeneous and comprises two distinct molecular phenotypes, groups, or subtypes; IPF subtype-l and IPF subtype-ll.
  • the two subtypes of IPF are characterized by differences in expression of at least one diagnostic gene that has not been previously implicated in IPF.
  • the diagnostic gene may be one or more than one cilium-associated gene.
  • the two subtypes of IPF are characterized by differences in expression of a plurality of cilium-associated genes that have not been previously implicated in IPF.
  • high cilium-associated gene expression is associated with pathological features of IPF (e.g. honeycombing) and survival, as compared to low cilium- associated gene expression.
  • analysis of gene expression in IPF subjects can be used to identify subjects that can respond differently to pharmacological intervention.
  • the application of gene expression profiling to identify unique disease subtypes can be used to improve clinical course and response to therapy for IPF/UIP subjects/patients.
  • over-expression of a diagnostic gene can be relative to a reference gene, a reference sample, or the total amount of expressed RNA.
  • over-expression may also be determined by comparing expression levels of diagnostic genes (for example cilium- associated genes) to expression levels of reference genes.
  • compositions, nucleic acids, devices, and methods that aid in identifying novel sub-types of pulmonary disease.
  • the pulmonary disease is Idiopathic Pulmonary Fibrosis (I PF).
  • I PF Idiopathic Pulmonary Fibrosis
  • the novel subtypes are IPF subtype-l and IPF subtype-l l.
  • the present disclosure relates to compositions, devices and methods useful in distinguishing patients with I PF subtype-l from patients with IPF subtype-ll.
  • the present disclosure relates to compositions, devices, and methods of distinguishing patients with IPF subtype I from patients with IPF subtype I I in lung tissue samples.
  • the two subtypes can be distinguished from each other by measuring the level of expression of one or more genes which, in some embodiments, are cilium-associated genes.
  • a patient with IPF subtype II has a shorter survival time, post-diagnosis, than a subject with I PF subtype I. Therefore, in certain embodiments, identification of the IPF subtype in an I PF patient can help determine the proper therapeutic approach for such patient.
  • one or more cilium-associated genes are more highly expressed in samples from patients with I PF subtype II relative to the expression level of the same cilium-associated genes in samples from patients with subtype I and/or from non-1 PF samples. Therefore, in certain embodiments, identification of novel IPF subtypes having distinct survival rates and phenotypes aids in the discovery and development of new treatments for this otherwise untreatable disease.
  • pharmaceutical compositions can have effect limited effect on the overall population of IPF but be selectively effective for treating IPF subtype I or IPF subtype I I.
  • the present disclosure relates to devices for quantitating expression levels of genes in a sample.
  • the genes are human genes.
  • gene expression levels are analyzed by quantitating mRNA levels.
  • gene expression levels are analyzed by quantitating protein levels.
  • the sample is a lung tissue sample.
  • the lung tissue sample is from a healthy patient.
  • the lung tissue sample is from a patient with pulmonary disease.
  • the pulmonary disease is IPF.
  • the present disclosure relates to devices for quantitating expression levels of cilium-associated genes in a lung tissue sample from a patient with pulmonary disease.
  • the pulmonary disease is IPF. Therefore, in some embodiments, the quantified expression levels can be used to classify the lung tissue sample as IPF subtype-l or IPF subtype-ll.
  • expression levels of the cilium-associated genes are analyzed by quantitating mRNA levels. In some embodiments, expression levels of the cilium-associated genes are analyzed by quantitating protein levels. In some embodiments, the cilium-associated genes are human genes. In some
  • the mRNA comprises a human cilium-associated gene transcript.
  • the device comprises one or more cilium-associated nucleic acid sequences.
  • Example 1 of this disclosure the present inventors have revealed that IPF exists in two distinct molecular phenotypes or subtypes, which have previously presented clinically the same. Additionally, the present inventors have shown that the two subtypes differ in their expression of cilium genes, with IPF subtype-ll displaying enhanced levels of cilium-associated gene expression as compared to IPF subtype-l and/or normal, non-diseased lung tissue. Additionally, the present inventors have revealed that patients with IPF subtype-l have a significantly longer survival rate than patients with IPF subtype-ll, making the determination of subtype in IPF patients ideally suited as predictive measures of survival in IPF patients.
  • the present disclosure is based on the discovery that certain differentially expressed genes, and the mRNA sequences transcribed therefrom, can be used to identify IPF subtype in patients with IPF. Such genes, and their mRNA sequences, can also be used to predict the life expectancy of IPF patients and whether such patients will benefit from one or more medical treatments.
  • devices comprising isolated nucleic acid sequences that selectively hybridize to one or more mRNA sequences of genes that are differentially regulated in lung tissue samples from IPF patients are disclosed.
  • the devices comprise nucleic acid sequences that will selectively hybridize to one or more mRNA sequences of genes that are differentially regulated in lung tissue of IPF patients.
  • the devices comprise isolated nucleic acid sequences that may be used to measure gene expression by quantitating amounts of mRNA from specific genes in a tissue sample.
  • the tissue sample may be from lung tissue, for example from a patient diagnosed with pulmonary disease.
  • the pulmonary disease is IPF.
  • the isolated nucleic acid sequences comprise sequences of cilium-associated genes, and the mRNA comprises sequences of transcribed cilium-associated genes. In some embodiments, mRNA levels from one, more than one, or more than 5 cilium-associated genes may be analyzed.
  • a diagnostic gene is a gene that is indicative of a disease state, or disease subtype.
  • a diagnostic gene is more highly expressed in lung tissue of one disease subtype compared to a second disease subtype and/or normal, non-diseased lung tissue.
  • a diagnostic gene is differentially expressed in a patient with IPF.
  • the abundance of expressed sequences from a diagnostic gene may be determined by comparing the number of expressed sequences per cell, per amount of total RNA and/or per amount of polyA + RNA.
  • the expression level of a diagnostic gene may be measured relative to the expression level of a specific gene, relative to the expression of all genes in a tissue sample, or relative to expression of the diagnostic gene in non diseased lung tissue.
  • enhanced expression may be identified by comparing the expression level of the diagnostic gene to the expression level of a reference gene.
  • a reference gene may provide a basis for comparison for a diagnostic gene.
  • the abundance of transcripts from a reference gene provides a basis for comparison for the expression level of a diagnostic gene.
  • the abundance of expressed sequences from a diagnostic gene may be determined by comparing the total amount of expressed sequences from the diagnostic gene to the total amount of expressed sequences of a reference gene.
  • the abundance of expressed sequences from a diagnostic gene may be determined relative to the expression level of all genes in a tissue sample. For example, in some embodiments a standard amount of mRNA may be analyzed and the abundance of the expressed sequence may be determined relative to the amount of mRNA analyzed.
  • the expression levels from a plurality of diagnostic genes are analyzed. In some embodiments, the expression levels from a plurality of diagnostic genes are compared to the expression levels of a plurality of reference genes. In some embodiments, a decrease in the expression level of one or more diagnostic genes or the total background expression level, as compared to the expression level of one or more reference genes, is indicative of IPF subtype-l. In some embodiments, no, or little, change in the expression level of one or more diagnostic genes, as compared to the expression level of one or more reference genes or the total background expression level, is indicative of IPF subtype-l. In some embodiments, an increase in the expression level of one or more diagnostic genes, as compared to the expression level of one or more reference genes or the total background expression level, is indicative of IPF subtype-ll.
  • a reference gene may be a gene whose expression level is the same or similar in lung tissue from diseased patients and subjects that do not suffer from lung disease.
  • a reference gene and a diagnostic gene may be the same gene, with the reference gene expression level determined in a different tissue type and/or different patient.
  • the expression level of a diagnostic gene may be determined from the lung tissue of a diseased patient, and the reference gene expression level is obtained by determining the expression level of the diagnostic gene in a sample taken from the lung tissue of a non- diseased patient, and/or from non-diseased lung tissue from the same patient.
  • a control or reference gene may be an expressed sequence whose expression is similar in both IPF subtype I and subtype II.
  • similar expression may be greater than about 0.7-fold, 0.8-fold, 0.9-fold, 1 .0-fold, or 1 .1 -fold, and/or less than about 1.2-fold, 1 .1 -fold, 1.0-fold, 0.9-fold, or 0.8-fold.
  • the abundance of an expressed sequence or transcript from a tissue sample from a patient having lung disease may compared to the relative abundance of that same expressed sequence from a non-lung-disease subject or patient.
  • diagnostic genes are expressed at greater levels in lung tissue obtained from a patient having lung disease than from a patient without lung disease.
  • diagnostic genes are expressed at greater levels in lung tissue obtained from a patient with IPF subtype-ll than from a patient without lung disease.
  • diagnostic genes are expressed at lesser levels in lung tissue obtained from a patient with IPF subtype-l than from a patient with subtype-ll and/or a patient without lung disease. In some embodiments, diagnostic genes are expressed at greater levels in lung tissue obtained from a patient having one subtype of lung disease than from a patient with a different subtype of lung disease. In some embodiments, diagnostic genes are expressed at lesser levels in lung tissue obtained from a patient with IPF subtype-l than from a patient with IPF subtype-ll. In some embodiments, diagnostic genes are expressed at greater levels in lung tissue obtained from a patient with IPF subtype-ll than from a patient with IPF subtype-l, and/or a patient without lung disease. [0059] In various aspects, the genes described in FIGS.
  • a diagnostic gene is a gene described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197, and/or a fragment(s) thereof.
  • devices provided by the present disclosure comprise at least one gene described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197.
  • devices provided by the present disclosure comprise a plurality of the genes described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197.
  • devices provided by the present disclosure comprise all of the genes described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197.
  • a diagnostic gene(s) is more highly expressed in lung tissue obtained from a patient with IPF type-ll than in lung tissue obtained from a patient with type-l IPF. In some embodiments, a diagnostic gene(s) is more highly expressed in lung tissue obtained from a patient with IPF type-ll than in lung tissue obtained from a non-diseased patient. In some embodiments, a diagnostic gene(s) is expressed at a lower level in lung tissue obtained from a patient with IPF type-l than in lung tissue obtained from a patient with subtype-ll and/or a non-diseased patient. In various embodiments, one or more genes and expressed sequences in Cluster A (FIGS. 5A and 5B) and Cluster B (FIGS. 5C - 5F) are expressed at higher levels in lung tissue obtained from IPF type-ll patients than from patients with IPF type-l.
  • a reference gene may be a gene that is not in Cluster A or Cluster B. In other embodiments, a reference gene may be a gene that is listed in FIG. 5A- 5B. In some embodiments a reference gene may be expressed at the same level in lung tissue samples from patients with lung disease, without lung disease, with IPF subtype-l, and with IPF subtype-ll.
  • the diagnostic genes may be one or more genes belonging to the Gene Ontology (GO) category 0005929, which comprises many genes of the cellular component of cilium. In various embodiments, the diagnostic genes may be one or more of the cilium-associated genes described in FIG. 6.
  • expressed sequences from cilium-associated genes are more abundant in lung tissue obtained from IPF type-ll patients than from patients with IPF type-l. In some embodiments, expressed sequences from cilium-associated genes are less abundant in lung tissue obtained from IPF type-ll patients than from patients with IPF type-l. In some embodiments, expressed sequences from cilium-associated genes are more abundant in lung tissue obtained from IPF type-l patients than from non-diseased patients. In some embodiments, expressed sequences from cilium-associated genes are less abundant in lung tissue obtained from IPF type-ll patients than from non-diseased patients. [0064] In most cases, diagnostic genes may be more highly expressed in lung tissue samples from patients with IPF subtype-ll than in lung tissue samples from patients with subtype- 1.
  • Expression levels can be analyzed by a variety of methods and techniques including, for example, differential expression screening, PCR, RT-PCR, SAGE analysis, high-throughput sequencing, microarrays, liquid or other arrays, protein-based methods (e.g., western blotting, proteomics, and other methods described herein), and data mining methods, as further described herein.
  • methods and techniques including, for example, differential expression screening, PCR, RT-PCR, SAGE analysis, high-throughput sequencing, microarrays, liquid or other arrays, protein-based methods (e.g., western blotting, proteomics, and other methods described herein), and data mining methods, as further described herein.
  • the diagnostic gene sequences are homologous or identical to genes described in FIGS. 5 and 6, and SEQ ID NOS:1 -197 or portions thereof, for example the diagnostic gene sequence may be more than about 5 nucleotides (nt), 10 nt, 15 nt, 20 nt, 25 nt, 30 nt, 35 nt, 40 nt, 45 nt, 50 nt, 60 nt, 70 nt, 80 nt, 90 nt, 100 nt, 150 nt, 200 nt, 300 nt, 400 nt, 500 nt, or 600 nt, 700 nt, 800 nt, 900 nt, 1 .0k nt, 1 .1 k nt, 1 .2, nt, 1 .3k nt, 1 .4k nt, 1 .5k nt, 1 .6k nt, 1 .7k nt,
  • the diagnostic gene sequences can be aligned with the gene sequences described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197 by a nucleotide sequence alignment algorithm. For example, blastn for aligning two nucleotide sequences, wherein the program is optimized for highly similar sequences (megablast) or for somewhat similar sequences (blastn; this can be useful where sequences have less than about 90% identity or the sequences have low complexity).
  • the maximum target sequence is set to the length of the longer of the two sequences to be aligned, the expected threshold can be 10, the word size can be 28, the match/mismatch scores can be - 1 ,-2 and the gap costs linear.
  • homology can be expressed as percent identity.
  • the diagnostic gene sequences when aligned with the sequences of FIGS. 5 and 6, and SEQ ID NOS:1 -197, have identity of more than about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95 % and/or less than about 100%, 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, or 45 % identities.
  • the sequence alignment can have gaps of less than about 15%, 14%, 13%, 12%, 1 1 %, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1 %.
  • the diagnostic gene sequences may include one or more of the genes selected from the group consisting of: ABCA13 (Gene Symbol), NM 152701 (Accession Number); ADAM28, NM_014265; ADH7, NM_000673; AGR2, NM_006408; AGR3, NM_176813; ALOX15, NM_001 140; ANKRD18B, ENST00000290943; C10orf81 , NM_001 193434; C12orf75, NM_001 145199; C1 orf1 10, NM_178550; C20orf1 14,
  • the diagnostic gene sequences may include one or more of the genes selected from the group consisting of: AGBL2 (Gene Symbol), NM 024783 (Accession Number); ARMC3, NM_173081 ; ARMC4, NM_018076; C10orf107, NM_173554; C10orf79, NM_025145; C1 1 orf70, NM_032930; C1 1 orf88, NM_207430; C12orf55, ENST00000298953; C12orf63, ENST00000342887; C13orf30, NM_182508; C1 orf129, NM_025063; C1 orf173, NM_001002912; C1 orf192, NM_001013625; C1 orf194,
  • NM_001 122961 C1 orf87, NM_152377; C20orf26, NM_015585; C20orf85, NM_178456; C2orf39, NM_145038; C2orf77, NM_001085447; C3orf15, NM_033364; C6,
  • RSPH4A NM_001010892; SERPINI2, NM_006217; SNTN, NM_001080537; SPA17, NM_017425; SPAG17, NM_206996; SPAG6, NM_012443; SPATA17, NM_138796;
  • TTC25 TTC25, NM_031421 ; UBXN10, NM_152376; VWA3A, NMJ 73615; VWA3B, NM_144992; WDR16, NM_145054; WDR49, NM_178824; WDR63, NM 145172; WDR65, NR_030778; WDR66, NM_144668; WDR69, NM_178821 ; WDR78, NM_024763; YSK4, NM_025052; ZBBX, NM_024687, and combinations thereof.
  • the diagnostic genes and/or the mRNA transcripts transcribed from the diagnostic genes may be from cilium-associated genes.
  • the cilium- associated genes may be selected from isoforms, alternative splice variants, and genes of the group consisting of: Homo sapiens sperm associated antigen 17 (SPAG17); Homo sapiens sperm flagellar protein Repro-SA-1 ; Homo sapiens enkurin, TRPC channel interacting protein (ENKUR) ; Homo sapiens dynein, axonemal, heavy chain 10; Homo sapiens dynein, axonemal, heavy chain 10 (DNAH10) ; Homo sapiens TATA box binding protein (TBP)-associated factor, RNA polymerase I, C, 1 10kDa (TAF1 C), transcript variant 2, mRNA, Homo sapiens axonemal heavy chain dynein type 3 (DNAH3) ; Homo sapiens sperm associated antigen 17 (SPA
  • NM_018100 NM_018100
  • AK299754 NM_001 161664, NM_001010892
  • AK095018 NM_018719, NM_001 127370, NM_003777, NM_001 127371 ; AF091619, NM_012144; and combinations thereof.
  • devices provided by the present disclosure comprise arrays or micro-arrays.
  • An array refers to an arrangement, on a substrate surface, of multiple nucleic acid sequences, which may be single-stranded, double-stranded, or a combination thereof.
  • the nucleic acid sequences may comprise sequences of transcribed genes.
  • an array can comprise many different nucleic acid sequences representing the same or different transcribed genes.
  • nucleic acid sequences may comprise different portions of the same transcribed gene, in some embodiments the different nucleic acid sequences may overlap, for example by more than about 1 nucleotide (nt), 2 nt, 3 nt, 4 nt, 5 nt, 6 nt, 7 nt, 8 nt, or 9 nt and/or less than about 10 nt, 9 nt, 8 nt, 7 nt, 6 nt, 5 nt, 4 nt, 3 nt, or 2 nt.
  • nt nucleotide
  • nucleic acid sequences on an array may have the same or similar melting temperatures, for example greater than 45 °C, 46 °C, 47 °C, 48 ⁇ €, 49 °C, 50 °C, 51 °C, 52 °C, 53 °C, 54 °C, 55 °C, 56 °C, 57 °C, 58 °C, 59 °C, 60 °C, 61 °C, 62 °C, 63 °C, 64 °C, 65 °C, 66 °C 67 °C, 68 °C, 69 °C, 70 °C or 75 °C and/or less than about 80 °C, 75 °C, 70 °C, 69 °C, 68 °C, 67 °C, 66 °C, 65 °C, 64 °C, 63 °C, 62 °C, 61 °C, 60 °C, 59 °C, 69 °C
  • the nucleic acids may have the same or similar lengths, for example greater than about 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, or 30 nt and/or less than about 31 nt, 30 nt, 29 nt, 28 nt, 27 nt, 26 nt, 25 nt, 24 nt, 23 nt, 22 nt, 21 nt, 20 nt, 19 nt, or 18 nt.
  • the nucleic acids may have the same or similar lengths, for example greater than about 1 nt, 2 nt, 3 nt, 4 nt, 5 nt, 6 nt, 7 nt, 8 nt, 9 nt, 10 nt, 15 nt, 20 nt, 25 nt, 30 nt, 35 nt, 40 nt, 45 nt, 50 nt, 55 nt, 60 nt, 65 nt, 70 nt, 75 nt, 80 nt, 85 nt, 90 nt, 95 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 450 nt, or 500 nt, and/or less than about 600 nt, 550 nt, 500 nt, 450 nt, 400 nt, 350 nt, 300 nt, 250 nt, 200
  • the diagnostic genes and/or the mRNA transcripts transcribed from the diagnostic genes may be referred to as a target.
  • an array can comprise a plurality of nucleotide sequences from a plurality of targets.
  • each individual nucleic acid sequence is immobilized to a designated, discrete location (i.e., a defined location or assigned position) on the substrate surface.
  • each nucleic acid sequence is immobilized to a discrete location on an array and each has a sequence that is either specific to, or characteristic of, a particular mRNA sequence, expressed sequence, or transcribed gene.
  • a given target may be represented at several positions on an array surface.
  • an array can comprise the same nucleotide sequence at more than one position.
  • a target may be represented by different oligonucleotide sequences on the same array surface.
  • a nucleic acid sequence on an array can be specific to, or characteristic of a particular mRNA sequence because it contains a nucleic acid sequence that is identical or homologous to the nucleotide sequence of a transcribed gene or the complement of that mRNA sequence.
  • Such a nucleic acid sequence represents a single mRNA sequence of a single transcribed gene and is able to discriminate the mRNA sequence from the single transcribed gene relative to other mRNA sequences from other transcribed genes.
  • nucleic acid sequences immobilized on an array surface can comprise sequence(s) corresponding to specific transcribed genes.
  • nucleic acid sequences comprise sequences identical or homologous to at least part of a transcribed gene described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197.
  • the nucleotides attached to the arrays and/or microarrays can be referred to as probes.
  • a target gene may be represented by one or more probes.
  • a probe is a single stranded nucleotide attached to a substrate surface of an array or a microarray.
  • a single probe may reside at one or more positions or spots on an array.
  • each spot or position on an array may comprise one or more copies of the same probe.
  • the probes may be arranged on the microarray substrate in a single density, or in varying densities.
  • the density of each of the probes can be varied to accommodate certain factors such as, for example, the nature of the test sample, the nature of a label used during hybridization, the type of substrate used, and the like. Techniques capable of producing high density arrays can also be used for this purpose ⁇ see, e.g., Fodor (1991 ) Science 767-773; Johnston (1998) Curr. Biol. 8: Rl 71 -Rl 74; Schummer (1997) Biotechniques 23: 1087-1092; Kern (1997) Biotechniques 23: 120- 124; and U.S. Patent No. 5,143,854).
  • the sequence of the probes can be varied.
  • the probe sequence can be varied to produce probes that are substantially identical to the nucleic acid sequences described in FIGS. 5 and 6, and SEQ ID NOS:1 -197.
  • probes hybridize specifically to a nucleotide sequence from a transcribed gene, from which the probe sequence was derived.
  • the probe may not be identical to the transcribed gene from which it was derived, but retains the ability to hybridize to a nucleic acid sequence from that gene.
  • the length, sequence, and complexity of the nucleic acid probes may be varied. In various embodiments, the length, sequence and complexity are varied to provide optimum hybridization and signal production for a given hybridization procedure, and to provide the required resolution among different genes or genomic locations.
  • single stranded nucleotides derived from a tissue sample can hybridize to probes on an array.
  • Hybridization as defined herein, can refer to binding of two single stranded nucleic acids via complementary base pairing.
  • the two nucleic acid sequences may be DNA sequences, RNA sequences, or a combination of both.
  • samples are obtained from various tissues to be analyzed for gene expression of one or more diagnostic or reference genes or transcribed sequences.
  • Samples can refer to both test samples (i.e. samples from subjects/patients with lung disease), or reference samples (i.e. samples from subjects/patients without apparent lung disease).
  • a reference sample is a tissue sample that serves as a basis for comparison to a test sample and thus may be used to compare expression levels.
  • a reference sample therefore represents a non-diseased state.
  • the reference sample is obtained from lung tissue of a healthy patient.
  • the reference sample is obtained from lung tissue of a patient who has a disease that is not an IIP, or an IPF- related disease.
  • the reference sample is analyzed with, and its expression profile compared to, a sample from a patient having lung disease.
  • lung tissue may be obtained from a patient having IPF.
  • RNA is purified from the lung tissue of a patient with IPF to create a test sample.
  • this purified RNA comprises expressed sequences of cells within the lung tissue.
  • this test sample can comprise a full compliment of expressed mRNA molecules from the tissue sample.
  • the present disclosure is directed to the detection of the expression level of certain differentially regulated proteins and/or mRNA molecules from diagnostic genes in one or more test samples.
  • RNA purified from a sample is reverse transcribed to produce complementary DNA, also known as cDNA, to create a test sample.
  • a cDNA sequence generally refers to a "complementary DNA" sequence of an expressed mRNA.
  • the cDNA can be labeled during reverse transcription.
  • cDNAs may, in some embodiments, be created by PCR (Polymerase Chain Reaction) amplification of a library of expressed sequences of mRNA, for example an mRNA library purified from a tissue sample.
  • the sequence of the complementary DNA is able to hybridize to the sequence of an expressed gene, or mRNA.
  • labeled cDNA is hybridized to probes on an array. The intensity of label at a given spot on an array correlates with the expression level of the transcribed gene represented by that probe.
  • RNA can be isolated from lung cells or lung tissues of interest, including for example lung tissue from a diseased patient and/or lung tissue from a non-diseased patient, and is reverse transcribed to yield cDNA.
  • the DNA or RNA can be labeled during reverse transcription by incorporating a labeled nucleotide in the reaction mixture. Although various labels can be used, most commonly the nucleotide is conjugated with the fluorescent dyes Cy3 or Cy5.
  • two different samples can be analyzed simultaneously, for example by labeling one with Cy5-dUTP and the other with Cy3-dUTP, and hybridizing similar amounts of labeled DNA or RNA from each sample to the array.
  • the primary data obtained by scanning the array using a detector capable of quantitatively detecting fluorescence intensity
  • the ratio of fluorescence intensity can be a ratio of fluorescence intensity (Cy3/Cy5).
  • a subject having I PF may exhibit differential expression of one or more genes having the nucleic acid sequences of genes in FIGS. 5 and 6, and
  • a nucleic acid sequence may exhibit differential expression at the RNA level if its RNA transcript varies in abundance relative to a reference transcript.
  • a gene exhibits differential expression at the protein level, if a polypeptide encoded by the gene varies in abundance between different samples in a sample set. In the context of a microarray experiment, differential expression generally refers to differential expression at the RNA level.
  • the expression level or transcript abundance of a diagnostic gene in a sample from a patient with IPF subtype II may be greater than the transcript abundance of that same gene in a subject or patient without IPF. In some embodiments, the expression level or transcript abundance of a diagnostic gene in a sample from a patient with IPF subtype II may be greater than the transcript abundance of that same gene in a subject or patient with IPF subtype I . In some embodiments, the expression level or transcript abundance of a diagnostic gene in a sample from a patient with I PF subtype I may be less than the transcript abundance of that same gene in a subject or patient without IPF. In some embodiments, the expression level or transcript abundance of a diagnostic gene in a sample from a patient with IPF subtype I may be less than the transcript abundance of that same gene in a subject or patient with IPF subtype II.
  • the expression level of one or more expressed sequences in a sample from that patient may be determined.
  • the expression level of one or more diagnostic genes is determined relative to the amount of total RNA.
  • the expression level of one or more diagnostic genes is determined relative to the expression level of one or more reference genes in that same patient or from a sample from a patient known to be healthy.
  • the expression level of one or more diagnostic genes is determined relative to the expression level of that same expressed sequence in a sample from a patient known to be healthy.
  • the expression level of one or more expressed sequences in a sample from that patient may be compared to the expression level of that same expressed sequence in a sample from a patient known to have IPF subtype I or subtype II.
  • a patient may be diagnosed with IPF subtype I or II by comparing the expression level or transcript abundance of one or more expressed sequences of that patient to a patient that is healthy. In some embodiments, a patient may be diagnosed with IPF subtype I or II by comparing the expression level or transcript abundance of one or more expressed sequences of that patient to a patient known to have IPF subtype I or subtype II. In some embodiments the expression levels of one or more cilium-associated genes will be greater in a patient with IPF subtype II than a healthy patient. In some embodiments the expression levels of one or more cilium-associated genes will be greater in a patient with IPF subtype II than a patient with IPF subtype I.
  • the expression levels of one or more cilium-associated genes will be less in a patient with IPF subtype I than a healthy patient. In some embodiments the expression levels of one or more cilium-associated genes will be less in a patient with IPF subtype I than a patient with IPF subtype II.
  • Differential expression may correlate with diagnostic information.
  • Diagnostic information or information for use in diagnosis, can be information that is useful in determining whether a patient has a UIP and/or in classifying IPF into a phenotypic category of IPF subtype-l or IPF subtype-ll, or any category having significance with regards to the prognosis of or likely response to treatment (either treatment in general or any particular treatment) of IPF.
  • diagnosis refers to providing any type of diagnostic information, including, but not limited to, whether a subject is likely to have an indication associated with IPF, information related to the nature or classification of IPF, information related to prognosis and/or information useful in selecting an appropriate treatment for IPF. Selection of treatment for IPF may include the choice of a particular chemotherapeutic agent or other treatment modality such as surgery, lung transplantation, a choice about whether to withhold or deliver therapy, etc.
  • the present disclosure encompasses the realization that genes that are differentially expressed are of use in classifying IPF subtypes.
  • the differentially expressed genes can be responsible for the different phenotypic characteristics and/or indicators of clinical outcomes of IPF subtypes.
  • the present disclosure identifies such genes.
  • the transcript abundance of that gene varies between different samples, e.g., between different IPF samples and/or between normal and IPF subtypes.
  • the transcript level of a differentially expressed gene varies by at least about 2-fold, 3-fold, 4- fold, 5-fold, 6-fold, 7-fold, 8-fold, or 9-fold from its average abundance in a given sample.
  • a given gene may be expressed more than about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, or 10-fold, and/or less than about 10-fold, 9-fold, 8-fold, 7-fold, 6- fold, 5-fold, 4-fold, 3-fold, or 2-fold.
  • the variation may be less than 2- fold, for example expression may be greater than about 1 .2-fold, 1 .3-fold, 1 .4-fold, 1 .5-fold, 1 .6-fold, 1 .7-fold, 1 .8-fold, or 1 .9-fold, and/or less than about 2.0-fold, 1 .9-fold, 1 .8-fold, 1 .7- fold, 1 .6-fold, 1 .5-fold, 1 .4-fold, or 1 .3-fold.
  • more than one gene can be differentially expressed, and the levels of differential expression may be the same or different.
  • gene 1 may be 4-fold more abundant and gene 2 may be 3-fold more abundant in an IPF sample compared to a non-IPF lung sample.
  • the differential expression of a gene may be the same in two IPF samples or it may be different.
  • the differential expression of an expressed gene in lung samples from different subjects, or patients having the same IPF subtypes may be the same, similar, or different.
  • an array comprising nucleotide sequences as in FIGS 5 and 6, and SEQ I D NOS:1 -197, can be referred to as gene expression systems, such systems can include a system, device or means to detect gene expression, diagnostic agents, candidate libraries, and oligonucleotides, oligonucleotide sets, or probe sets.
  • the use of about 300ng of processed total RNA prepared from a lung sample from a patient with I PF subtype II may result in a diagnostic gene having an expression level measured as a mean log(2) intensity on an Affymetrix gene chip of greater than 3.0, 3.1 , 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1 , 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9,
  • a diagnostic gene in a lung sample from a patient with IPF subtype II may have mean log(2) intensity on an Affymetrix gene chip of between about 5 and about 9. In some aspects, a diagnostic gene in a lung sample from a patient with IPF subtype II may have mean log(2) intensity on an Affymetrix gene chip of between 5.9 and 8.3.
  • a diagnostic gene in a lung sample from a patient with IPF subtype II may have mean log(2) intensity on an Affymetrix gene chip of between 5.99 (wherein the standard deviation is about 0.57 and the standard error of the mean is about 0.08) and 8.23 (wherein the standard deviation is about 1 .23 and the standard error of the mean is about 0.15).
  • the disclosure relates to diagnostic genes, diagnostic expressed sequences and diagnostic oligonucleotides comprising sequences of the genes and expressed sequences in FIGS. 5 and 6, SEQ ID NOS:1 -197, and of Clusters A and B.
  • the genes and expressed sequences in FIGS. 5 and 6, SEQ ID NOS: 1 -197, Clusters A and B represent nucleotide sequences, which when differentially expressed, can correlate with one of two IPF subtypes, as disclosed herein.
  • identification of the IPF subtype to which a sample belongs can require analysis of only one gene or expressed sequence in FIGS. 5 and 6, Clusters A and B, and SEQ I D NOS: 1 -197.
  • identification of the IPF subtype to which a sample belongs can require analysis of a plurality of the genes or expressed sequences in FIGS. 5 and 6, Clusters A and B, and SEQ I D NOS: 1 -197. In some embodiments, identification of the I PF subtype to which a sample belongs can require analysis of all of the genes or expressed sequences in FIGS. 5 and 6, Clusters A and B, and SEQ ID NOS: 1 -197.
  • the I PF subtype can be identified by any means capable of detecting expression levels of RNA and/or the presence of specific protein products coded for by those genes or expressed sequences.
  • gene expression profiling of tissue samples from a number of IPF subjects/patients can allow identification of novel molecular subcategories or subtypes. These subtypes and subcatagories can allow development of novel methods to diagnose and classify these complex diseases. In some embodiments, identification of subtypes and subcatagories can aid in more predictive diagnosis or identification of clinically meaningful endpoints.
  • the expression level of the mRNA sequences obtained from a patient having I PF and/or the expression level of mRNA sequences of a healthy patient or a patient with a known IPF subtype may be measured via array-based
  • Array comparative genomic hybridization is a technique that is used to detect copy number variations of nucleic acids at a higher level of resolution than chromosome-based comparative genomic hybridization.
  • aCGH Array comparative genomic hybridization
  • nucleic acids from a test sample and nucleic acids from a reference sample are labeled differentially.
  • the test sample and the reference sample are then hybridized to an array comprising a plurality of probes, which are derived from sequences of interest.
  • the differential labeling is then used to visualize the hybridized nucleic acids from the test and reference samples.
  • the ratio of the signal intensity of the test sample to that of the reference sample is then calculated, to measure the copy number changes between the test sample and the reference sample.
  • the difference in the signal ratio determines whether the total copy numbers of the nucleic acids in the test sample are increased or decreased, as compared to the reference sample.
  • the test sample and the reference sample may be hybridized to the array separately or they may be mixed together and hybridized simultaneously. Exemplary methods of performing aCGH can be found, for example, in U.S. Patent Nos. 5,635,351 ; 5,665,549; 5,721 ,098; 5,830,645; 5,856,097; 5,965,362; 5,976,790; 6,159,685; 6,197,501 ; and 6,335,167; European Patent Nos. EP 1 134 293 and EP 1 026 260; van Beers et al., Brit. J.
  • Information relating to the expression level of the mRNA sequences present in a sample can include, for example, an increase in expression level in one or more mRNA molecules, a decrease in expression level in one or more mRNA molecules, and/or no change in the expression level of one or more mRNA molecules.
  • this information is obtained by analyzing the difference in signal intensity between a test sample and a reference or control sample at one or more corresponding locations on the array representing one or more nucleic acid sequences of interest. The analysis can be performed using any of a variety of methods, means and variations thereof for carrying out array-based comparative genomic hybridization.
  • information relating to the expression level of mRNA sequences is obtained by determining the abundance of the expressed sequence(s) relative to the amount of input RNA or cDNA.
  • the test sample and the reference sample are mRNA.
  • the mRNA molecules comprising the test samples and the reference samples may be obtained by any suitable method of nucleic acid isolation and/or extraction. Methods of mRNA extraction are well known in the art and several kits for the extraction and purification of mRNA from tissue samples are commercially available from, e.g., Clontech (Mountain View, CA), Qiagen (Valencia, CA) and Life Technologies/lnvitrogen (Carlsbad, CA), among others.
  • the test samples and the reference samples may be differentially labeled with any detectable agents or moieties.
  • the detectable agents or moieties are selected such that they generate signals that can be readily measured and such that the intensity of the signals is proportional to the amount of labeled nucleic acids present in the sample.
  • the detectable agents or moieties are selected such that they generate localized signals, thereby allowing resolution of the signals from each spot on an array.
  • Standard nucleic acid labeling methods include: incorporation of radioactive agents, direct attachment of fluorescent dyes or of enzymes, chemical modification of nucleic acids to make them detectable immunochemically or by other affinity reactions, and enzyme-mediated labeling methods including, without limitation, random priming, nick translation, PCR and tailing with terminal transferase.
  • Other suitable labeling methods include psoralen-biotin, photoreactive azido derivatives, and DNA alkylating agents.
  • test sample and reference sample nucleic acids are labelled by Universal Linkage System, which is based on the reaction of monoreactive cisplatin derivatives with the N7 position of guanine moieties in DNA (see, e.g., Heetebrij et al., Cytogenet. Cell. Genet. (1999), 87: 47-52).
  • detectable agents or moieties can be used to label test and/or reference samples.
  • Suitable detectable agents or moieties include, but are not limited to: various ligands; radionuclides such as, for example, 32 P, 35 S, 3 H, 14 C, 125 l, 131 1, and others; fluorescent dyes; chemiluminescent agents such as, for example, acridinium esters, stabilized dioxetanes, and others; microparticles such as, for example, quantum dots, nanocrystals, phosphors and others; enzymes such as, for example, those used in an ELISA, horseradish peroxidase, beta-galactosidase, luciferase, alkaline phosphatase and others; colorimetric labels such as, for example, dyes, colloidal gold and others; magnetic labels such as, for example, DynabeadsTM; and biotin, dioxigenin or other haptens and proteins
  • the test samples and/or the reference samples are labelled with fluorescent dyes.
  • Suitable fluorescent dyes include, without limitation, Cy-3, Cy-5, Texas red, FITC, Spectrum Red, Spectrum Green, phycoerythrin, rhodamine, and fluorescein, as well as equivalents, analogues and/or derivatives thereof.
  • the fluorescent dyes selected display a high molar absorption coefficient, high fluorescence quantum yield, and photostability.
  • the fluorescent dyes exhibit absorption and emission wavelengths in the visible spectrum (i.e., between 400nm and 750nm) rather than in the ultraviolet range of the spectrum (i.e., lower than 400nm).
  • the fluorescent dyes are Cy-3 (3-N,N'-diethyltetramethylindo- dicarbocyanine) and Cy-5 (5-N,N'-diethyltetramethylindo-dicarbocyanine). Cy-3 and Cy-5 form a matched pair of fluorescent labels that are compatible with most fluorescence detection systems for array-based instruments.
  • the fluorescent dyes are Spectrum Red and Spectrum Green.
  • hybridization protocols used for aCGH are those of Pinkel et al., Nature Genetics (1998), 20:207-21 1 .
  • the hybridization protocols used for aCGH are those of Kallioniemi, Proc. Natl. Acad. Sci. USA (1992), 89:5321 -5325.
  • the array may be contacted simultaneously with differentially labeled mRNA sequences of the test sample and the reference sample. This may be done by, for example, mixing the labeled test sample and the labeled reference sample together to form a hybridization mixture, and contacting the array with the mixture.
  • the specificity of hybridization may be enhanced by inhibiting repetitive sequences.
  • repetitive sequences e.g., Alu sequences, L1 sequences, satellite sequences, MRE sequences, simple homo-nucleotide tracts, and/or simple oligonucleotide tracts
  • repetitive sequences present in the nucleic acids of the test sample, reference sample and/or probes immobilized on the array are either removed, or their hybridization capacity is disabled. Removing repetitive sequences or disabling their hybridization capacity can be accomplished using any of a variety of well-known methods. These methods include, but are not limited to, removing repetitive sequences by
  • the hybridization capacity of highly repeated sequences in a test sample and/or in a reference sample is competitively inhibited by including, in the hybridization mixture, unlabelled blocking nucleic acids.
  • the unlabelled blocking nucleic acids are therefore mixed with the hybridization mixture, and thus with a test sample and a reference sample, before the mixture is contacted with an array.
  • the unlabelled blocking nucleic acids act as a competitor for the highly repeated sequences and bind to them before the hybridization mixture is contacted with an array. Therefore, the unlabelled blocking nucleic acids prevent labelled repetitive sequences from binding to any highly repetitive sequences of the nucleic acid probes, thus decreasing the amount of background signal present in a given hybridization.
  • the unlabelled blocking nucleic acids are Human Cot-1 DNA. Human Cot-1 DNA is commercially available from a number of sources including, for example, Gibco/BRL Life Technologies (Gaithersburg, MD).
  • the ratio of the signal intensity of the test sample as compared to the signal intensity of the reference sample is calculated. This calculation quantifies the differential level of expression of the mRNA molecules of the test sample, as compared to the reference sample, if any. In some embodiments, this calculation is carried out quantitatively or semi-quantitatively. In certain embodiments, it is not necessary to determine the exact number associated with differential expression of the mRNA molecules comprising the test sample and the reference sample, as detection of a significant increase or decrease in expression level from the expression level in the reference sample is sufficient.
  • the quantification of the expression levels of the mRNA molecules of a test sample comprises an estimation of the level of expression, as a semi-quantitative or relative measure, which usually suffices to predict the I PF subtype a patient has and thus prospectively direct the determination of therapy for that patient.
  • Quantitative techniques may be used to determine the expression level of the mRNA molecules present in a test sample and/or in a reference sample.
  • quantitative and semi-quantitative techniques to determine expression levels exist including, for example, semi-quantitative PCR analysis or quantitative real-time PCR.
  • the Polymerase Chain Reaction (PCR) per se is not a quantitative technique, however PCR-based methods have been developed that are quantitative or semi-quantitative in that they give a reasonable estimate of original copy numbers of nucleic acids present in a tissue sample (i.e., expression level of mRNA), within certain limits.
  • PCR techniques include, for example, quantitative PCR and quantitative real-time PCR (also known as RT-PCR, RQ- PCR, QRT-PCR or RTQ-PCR).
  • quantitative PCR and quantitative real-time PCR also known as RT-PCR, RQ- PCR, QRT-PCR or RTQ-PCR.
  • RT-PCR quantitative real-time PCR
  • RQ-PCR quantitative real-time PCR
  • QRT-PCR QRT-PCR
  • RTQ-PCR quantitative real-time PCR
  • Fluorescence in situ hybridization permits the analysis of the expression level of individual mRNA molecules and can be used to study the expression level of individual mRNA molecules across tissue samples obtained from different donor sources (see, e.g., Pinkel et al., Proc. Natl. Acad. Sci. U.S.A. (1988), 85, 9138-42). Comparative genomic hybridization can also be used to probe for mRNA expression levels (see, e.g., Kallioniemi et a/., Science (1992), 258: 818-21 ; and Houldsworth et al., Am. J. Pathol. (1994), 145: 1253- 60).
  • the expression level of mRNA molecules of interest may also be determined using quantitative PCR techniques such as real-time PCR (see, e.g., Suzuki et al., Cancer Res. (2000), 60:5405-9).
  • quantitative microsatellite analysis can be performed for rapid measurement of relative mRNA sequence copy numbers.
  • the copy numbers of a test sample relative to a reference sample is assessed using quantitative, real-time PCR amplification of loci carrying simple sequence repeats. Simple sequence repeats are used because of the large numbers that have been precisely mapped in numerous organisms.
  • Exemplary protocols for quantitative PCR are provided in Innis et al., PCR Protocols, A Guide to Methods and Applications (1990), Academic Press, Inc.
  • N.Y. Semi -quantitative techniques that may be used to determine specific copy numbers include, for example, multiplex ligation-dependent probe amplification (see, e.g., Schouten et al. Nucleic Acids Res. (2002), 30(12):e57; and Sellner et al., Human Mutation (2004), 23(5):413-419) and multiplex amplification and probe hybridization (see, e.g., Sellner et al. (2004), supra).
  • Example 1 of this disclosure the present inventors have revealed that IPF exists in two distinct molecular phenotypes or subtypes, which have previously presented clinically the same. Additionally, the present inventors have shown that the two subtypes differ in their expression of cilium genes, with IPF subtype-l displaying decreased levels of cilium gene expression as compared to normal, non-diseased lung tissue and/or lung tissue from IPF subtype-ll samples. Additionally, the present inventors have revealed that patients with IPF subtype-l have a significantly longer survival rate than IPF subtype-ll, making the determination of subtype in IPF patients ideally suited as predictive measures of survival in IPF patients.
  • the present disclosure is based on the discovery that determining the expression levels of certain genes, and the proteins translated therefrom, can be used to identify IPF subtype in patients with IPF.
  • genes, and their proteins translated therefrom can be used to identify IPF subtype in patients with IPF.
  • corresponding protein products can also be used to predict the life expectancy of IPF patients and whether such patients will benefit from medical treatment.
  • expressed protein signatures are created. Protein signatures may comprise the identity of a plurality of proteins. In some embodiments, protein signatures can be created from a tissue sample of a diseased patient, thus creating a test sample. In some embodiments, protein signatures can be created from a tissue sample of a non-diseased or healthy patient, thus creating a reference sample. In some
  • a protein signature reference sample can be obtained from a tissue sample of a patient known to have IPF subtype-l. In some embodiments, a protein signature reference sample can be obtained from a tissue sample of a patient known to have IPF subtype-ll.
  • protein signatures can be created from any one or more of the translation products of the genes selected from the group consisting of: ABCA13 (Gene Symbol), NM_152701 (Accession Number); ADAM28, NM_014265; ADH7,
  • MMP13 NM_002427; MMP7, NM_002423; MSMB, NM_002443; MUC16, NM_024690; MUC4, NM_018406; MUC5B, NM_002458; PFN2, NM_053024; PIP, NM_002652;
  • ST6GALNAC1 NM_018414; TMEM45A, NM_018004; TMPRSS4, NM_019894; TP63, NM_003722; TPPP3, NM_016140; TRIM2, NM_015271 ; TRIM29, NM_012101 ; UGT1 A1 , NM_000463; VTCN1 , NM_024626, and combinations thereof.
  • protein signatures can be created from any one or more of the translation products of the genes selected from the group consisting of: AGBL2 (Gene Symbol), NM_024783 (Accession Number); ARMC3, NM_173081 ; ARMC4, NM_018076; C10orf107, NM_173554; C10orf79, NM_025145; C1 1orf70, NM_032930; C1 1 orf88, NM_207430; C12orf55, ENST00000298953; C12orf63, ENST00000342887; C13orf30, NM_182508; C1 orf 129, NM_025063; C1 orf 173, NM_001002912; C1 orf 192,
  • NM_017539 DNAH5, NM_001369; DNAH6, NM_001370; DNAH6, NM_001370; DNAH6, NM_001370; DNAH6, NM_001370; DNAH7, NM_018897; DNAH9, NM_001372; DNAI1 , NM_012144; DNAJA4, NM_018602; DPY19L2P2, NR_027768; DTHD1 , NM_001 136536; DZIP3, NM_014648; EFCAB1 , NM 024593; EFHB, NMJ44715; EFHC1 , NR_033327; EFHC2, NM_025184; ENKUR, NM_145010; FAM154B, AK304339; F AM 183 A,
  • NM_001039845 MNS1 , NM 018365; M0RN5, NM_198469; MS4A8B, NM_031457;
  • NEK10 NM_199347; NEK1 1 , NM_024800; NEK5, NM_199289; NME5, NM_003551 ;
  • NM_145286 ST0X1 , NM_152709; TEKT1 , NM 053285; TEX9, NM_198524; TMEM212, NM_001 164436; TMEM232, NM_001039763; TSGA10, NM_025244; TSPAN1 ,
  • the expression levels of reference samples may be compared to the expression levels of a test sample of proteins obtained from a patient. Therefore, the expression levels of the proteins comprising any of the protein signatures disclosed herein (reference samples) can be compared to the expression level of the same proteins obtained from a tissue sample from a patient (test sample).
  • the reference protein signature may comprise expressed sequences whose protein levels are not changed. Thus, a reference signature may be determined from the same sample as the test signature.
  • a patient having similar protein expression levels as compared to the reference sample identifies the patient as having I PF subtype-1 .
  • the expression level of a test sample of proteins obtained from a patient and the expression level of a reference sample protein signature disclosed herein identifies the patient as having IPF subtype-l.
  • a decrease in the expression level of a test sample of proteins obtained from a patient and the expression level of a reference sample protein signature disclosed herein identifies the patient as having IPF subtype-l .
  • an increase in the expression level of a test sample of proteins obtained from a patient and the expression level of a reference sample protein signature disclosed herein identifies the patient as having IPF subtype-ll.
  • the reference sample comprises proteins obtained from a patient known to have I PF subtype-l and an increase in the expression level of a test sample of proteins obtained from a patient as compared to the expression level of the reference sample identifies the patient as having IPF subtype-ll. In some embodiments, the reference sample comprises proteins obtained from a patient known to have IPF subtype-ll and a decrease in the expression level of a test sample of proteins obtained from a patient as compared to the expression level of the reference sample identifies the patient as having I PF subtype-l.
  • the reference sample comprises proteins obtained from a non-diseased patient and a decrease in the expression level of a test sample of proteins obtained from a patient as compared to the expression level of the reference sample identifies the patient as having IPF subtype-l. In some embodiments, the reference sample comprises proteins obtained from a non-diseased patient and an increase in the expression level of a test sample of proteins obtained from a patient as compared to the expression level of the reference sample identifies the patient as having IPF subtype-ll.
  • the degree of similarity or dissimilarity between the level of expression of the proteins comprising a test sample and the level of expression of the proteins comprising a reference sample is determined based on signal intensity, such as that derived from an assay (e.g., ELISA, see below).
  • the ratio of the signal intensity of the proteins comprising a test sample, as compared to the signal intensity of the proteins comprising a reference sample is calculated. This calculation quantifies the differential level of expression of the proteins of the test sample, as compared to the reference sample, if any. In some embodiments, this calculation is carried out quantitatively or semi-quantitatively. In certain embodiments, it is not necessary to determine an exact number associated with the level of expression of the proteins comprising the test sample and the reference sample.
  • the reference sample comprises proteins taken from a non-diseased patient and detection of a statistically significant deviation
  • the quantification of the expression levels of proteins of a test sample comprises an estimation of the level of expression, as a semi-quantitative or relative measure, that is sufficient to predict the IPF subtype for an individual patient (as compared to a reference sample) and thus prospectively direct the determination of therapy for a patient.
  • determination of a level of protein expression in a test sample that is less than that produced by the reference sample is indicative of IPF subtype-l in the patient from which the test sample was derived.
  • determination of a level of protein expression in a test sample that is greater than that produced by the reference sample is indicative of IPF subtype-ll in the patient from which the test sample was derived. Therefore, in certain embodiments detection of signal intensity from a test sample that is less than (in the case of IPF subtype-l) or greater than (in the case of IPF subtype-ll), within experimentally acceptable margins of error, as the signal intensity produced by the reference sample is sufficient to determine the IPF subtype for a given patient.
  • the deviation of signal intensity of the test sample from the reference sample is measured as a percent difference.
  • a reference sample is deemed to have produced a signal that is less than the reference sample if the signal intensity of the test sample measures at the level selected from: the signal intensity of the reference sample less 5%; the signal intensity of the reference sample less 10%; the signal intensity of the reference sample less 15%; the signal intensity of the reference sample less 20%; the signal intensity of the reference sample less 25%; the signal intensity of the reference sample less 30%; the signal intensity of the reference sample less 35%; the signal intensity of the reference sample less 40%; the signal intensity of the reference sample less 45%; the signal intensity of the reference sample less 50%; the signal intensity of the reference sample less 55%; the signal intensity of the reference sample less 60%; the signal intensity of the reference sample less 65%; the signal intensity of the reference sample less 70%; the signal intensity of the reference sample less 75%; the signal intensity of the reference sample less 80%; the signal intensity of the reference sample less 85%; the signal intensity of the reference sample
  • a reference sample is deemed to have produced a signal that is greater than the reference sample if the signal intensity of the test sample measures at the level selected from: the signal intensity of the reference sample plus 5%; the signal intensity of the reference sample plus 10%; the signal intensity of the reference sample plus 15%; the signal intensity of the reference sample plus 20%; the signal intensity of the reference sample plus 25%; the signal intensity of the reference sample plus 30%; the signal intensity of the reference sample plus 35%; the signal intensity of the reference sample plus 40%; the signal intensity of the reference sample plus 45%; the signal intensity of the reference sample plus 50%; the signal intensity of the reference sample plus 55%; the signal intensity of the reference sample plus 60%; the signal intensity of the reference sample plus 65%; the signal intensity of the reference sample plus 70%; the signal intensity of the reference sample plus 75%; the signal intensity of the reference sample plus 80%; the signal intensity of the reference sample plus 85%; the signal intensity of the reference sample plus 90%; the signal intensity of the reference sample plus 95%; and the signal intensity of the reference sample plus 100%.
  • the deviation of signal intensity of the test sample from the reference sample is measured as a -fold difference, or a difference based upon unit signal production.
  • a reference sample is deemed to have produced a signal that is less than the reference sample if the signal intensity of the test sample is selected from : two-fold less than the signal intensity of the reference sample; three-fold less than the signal intensity of the reference sample; four-fold less than the signal intensity of the reference sample; five-fold less than the signal intensity of the reference sample; six-fold less than the signal intensity of the reference sample; seven-fold less than the signal intensity of the reference sample; eight-fold less than the signal intensity of the reference sample; nine-fold less than the signal intensity of the reference sample; ten-fold less than the signal intensity of the reference sample; and greater than ten-fold less than the signal intensity of the reference sample.
  • a reference sample is deemed to have produced a signal that is greater than the reference sample if the signal intensity of the test sample is selected from : two-fold more than the signal intensity of the reference sample; three-fold more than the signal intensity of the reference sample; four-fold more than the signal intensity of the reference sample; five-fold more than the signal intensity of the reference sample; six-fold more than the signal intensity of the reference sample; seven-fold more than the signal intensity of the reference sample; eight-fold more than the signal intensity of the reference sample; nine-fold more than the signal intensity of the reference sample; ten-fold more than the signal intensity of the reference sample; and greater than ten-fold more than the signal intensity of the reference sample.
  • complete identity, within acceptable levels of experimental error, between the expression level of a test sample proteins obtained from a patient known to have either IPF subtype-l and the expression levels of any one or more of the reference sample protein signatures disclosed herein identifies the patient as having IPF subtype-l.
  • complete identity, within acceptable levels of experimental error, between the expression level of a test sample proteins obtained from a patient known to have either I PF subtype-l I and the expression levels of any one or more of the reference sample protein signatures disclosed herein identifies the patient as having IPF subtype-ll.
  • the expression level of any one or more of the translation products of the differentially regulated genes disclosed herein, and/or the expression levels of any one or more proteins isolated from a test sample can be determined using any one or more of a number of techniques. In some embodiments, the expression levels can be determined using routine assays such as, for example, antibody-based methods such as
  • the expression levels can be determined using targeted multiplex mass spectrometry as a means of quantifying protein signatures in tissue samples taken from a patient.
  • the expression levels can be determined using mass-spectrometry based proteomics technologies, which have matured to the extent that they can now identify and quantify thousands of proteins.
  • protein expression levels can be determined via immunohistochemistry, which is a process capable of detecting proteins directly in the cells of a section of isolated and fixed tissue via the use of antibodies that bind specifically to the proteins of interest.
  • Immunohistochemistry is a widely used technique to visualize the distribution and localization of differentially expressed proteins between two tissues.
  • a tissue sample is taken from a subject and properly fixed (e.g., by heat fixation, perfusion, immersion or chemical fixation) to make the epitopes of the proteins of interest available for binding by the antibodies.
  • the tissue sample may be taken from the lung of a subject known to have either IPF subtype-l to create a reference sample.
  • the tissue sample may be taken from the lung of a subject known to have either IPF subtype-l to create a reference sample.
  • the tissue sample may be taken from a tumor in subject whose I PF subtype is unknown, to create a test sample.
  • the tissue samples are taken from corresponding tissues and corresponding regions within the tissues in order to create similar testing parameters between the reference and test samples. The proteins in the reference sample and the test sample can be analyzed in parallel or individually.
  • Detecting the protein(s) of interest in a reference sample or a test sample can be accomplished by contact with an antibody that is specifically directed to the protein(s) of interest.
  • an antibody that is specifically directed to the protein(s) of interest.
  • One or more antibodies may be used, depending on the number of proteins to be tested in a single reference or test sample. Detection via contact with an antibody may be done directly, whereby the antibody itself is coupled with a label that will allow for visualization of binding to the protein, or indirectly, where a second antibody that specifically binds to the first antibody is used, the second antibody having the label to allow for visualization. Visualizing an antibody-protein interaction can be accomplished in a number of ways.
  • the antibody itself is conjugated to an agent that allows for visualization such as an enzyme (e.g., a peroxidase) that can catalyze a color- producing reaction, or a fluorophore (e.g., fluorescein or rhodamine) that fluoresces under certain conditions to visually display binding.
  • an enzyme e.g., a peroxidase
  • a fluorophore e.g., fluorescein or rhodamine
  • the level of differential expression of a protein between a reference sample and a test sample may be determined by measuring the difference in intensity of the visualization means employed. In that regard, in some embodiments the same means of visualization is utilized in both samples.
  • the protein of interest is present in different quantities in the samples, indicating differential expression.
  • the means of visualizing the signal in each sample is linked to an antibody and each antibody will bind to a limited number of proteins in the sample. Therefore, the number of antibodies binding to proteins of a sample is directly proportional to the total number of proteins present in the sample and the strength of the signal produced by the antibody- protein interaction in a sample is directly proportional to the amount of protein present in the sample. The ratio of the signal intensity of the test sample to that of the reference sample is then calculated, to measure the protein expression levels between the test sample and the reference sample.
  • the difference in the signal ratio determines whether the total level of protein expression of each protein in the test sample is increased or decreased, as compared to the reference sample. If the signal produced by the reference sample is the same (within acceptable levels of experimental error) as the signal produced in the test sample, then the protein of interest is present in approximately the same quantity in each sample.
  • protein expression levels can be determined via enzyme- linked immunosorbent assay (ELISA), which is an analytic assay that utilizes a solid-phase enzyme immunoassay to detect the presence of a protein in an isolated sample.
  • ELISA enzyme- linked immunosorbent assay
  • an unknown amount of a sample is affixed to a substrate surface and an antibody is placed into contact with the substrate surface such that the antibody is also placed into contact with the sample.
  • the antibody will bind to the sample provided that an antigen capable of being bound by the antibody is present in the sample.
  • the antibody is typically linked to some means of visualizing binding, which in some embodiments is an enzyme, so that binding of the antibody to the sample can be detected.
  • a substance that contains the enzyme's substrate is placed into contact with the surface, and thus the antibody, such that the subsequent enzymatic reaction produces a detectable signal.
  • the signal may be a color change in the substrate or a fluorescent emission.
  • a protein sample isolated from a patient known to have IPF subtype-l is affixed to a substrate surface to create a reference sample.
  • a protein sample isolated from a patient known to have IPF subtype-ll is affixed to a substrate surface to create a reference sample.
  • a protein sample isolated from a patient whose IPF subtype is unknown is affixed to a substrate surface to create a test sample.
  • the proteins in each sample are the same (the reference sample protein is the same as the test sample protein).
  • the substrate surface may contain more than one isolated area (e.g., wells) such that the reference sample protein and the test sample protein are each affixed in their own isolated area and also so that the same substrate surface may accommodate multiple proteins from the reference sample and the test sample.
  • the substrate surface is a microtiter plate.
  • At least one antibody having specificity for the protein in the reference and test samples is placed in contact with the protein in the reference and test samples so that it may bind to the protein.
  • the proteins in the reference sample and the test sample can be analyzed in parallel or individually.
  • the antibody can be covalently linked to an enzyme, or can itself be detected by a secondary antibody that is linked to an enzyme.
  • the substrate of the enzyme is then placed in contact with each of the reference sample and the test sample to produce a visible signal, which indicates the quantity of protein in each sample. If the signal produced in the reference sample is different from the signal produced in the test sample, then the protein is present in different quantities in the samples, indicating differential expression.
  • the means of visualizing the signal in each sample is linked to an antibody and each antibody will bind to a limited number of proteins in the sample.
  • the number of antibodies binding to proteins of a sample is directly proportional to the total number of proteins present in the sample and the strength of the signal produced by the antibody-protein interaction in a sample is directly proportional to the amount of protein present in the sample.
  • the ratio of the signal intensity of the test sample to that of the reference sample is then calculated, to measure the protein expression levels between the test sample and the reference sample. The difference in the signal ratio determines whether the total level of protein expression of each protein in the test sample is increased or decreased, as compared to the reference sample. If the signal in the reference sample is the same (within acceptable levels of experimental error) as the signal produced in the test sample, then the protein of interest is present in approximately the same quantity in each sample.
  • protein expression levels can be determined via targeted multiplex mass spectrometry.
  • liquid chromatography- tandem mass spectrophotometry LC-MS/MS
  • LC-MS/MS liquid chromatography- tandem mass spectrophotometry
  • LC-MS/MS can be used to determine the expression level of proteins isolated from a patient whose IPF subtype is not known to create a test sample. The levels of protein expression can then be compared between the two samples.
  • matrix-assisted laser desorption/ionization time-of-flight mass spectrometry can be used to image histological sections taken from IPF subtype-l and/or IPF subtype-ll patients (reference samples) and histological sections taken from patients whose IPF subtype is not known (test samples).
  • MALDI-MS can be used to image naturally occurring molecules, such as proteins, within a reference sample and within a test sample such that the presence and the levels of expression of the proteins can be compared between the two samples.
  • the protein signatures disclosed herein are advantageous in comparison to transcript-based and genomic markers, as the expression level of the disclosed proteins comprising the deficiency signatures can be measured using routine assays such as, for example, antibody-based methods such as immunohistochemistry and ELISA, of which the latter allows for non-invasive testing.
  • protein profiles of patients known to have IPF subtype-l or IPF subtype-ll can be generated using high-resolution tandem mass spectrometry-based proteomics.
  • proteomics can be employed based on 1 D gel electrophoresis in combination with nano-LC-MS/MS and spectral counting to compare the protein profile of an IPF subtype-l patient and the protein profile of an IPF subtype-ll patient.
  • the two protein profiles can then be compared in order to determine which proteins are differentially regulated between the two subtypes, as well as which proteins are not differentially expressed.
  • protein profiles of IPF subtype-l and subtype-ll can be compared with protein profiles of non-diseased patients. These comparisons can be used to identify a protein or proteins that can aid in differentiating subtype-l from subtype-ll.
  • protein signatures may include both differentially expressed proteins and non-differentially expressed proteins. Pathway and protein complex analysis can then be used to identify the functions of the proteins that are differentially regulated between the two subtypes.
  • Isolation of proteins from tissue samples can be accomplished via any number of techniques. For example, in certain embodiments, a tissue sample from a patient known to have IPF subtype-l may be taken and homogenized. In some embodiments, a tissue sample from a patient known to have IPF subtype-ll may be taken and separately homogenized. In some embodiments, a tissue sample from a patient whose IPF subtype is not known may be taken and separately homogenized. As will be evident to a person of ordinary skill in the art, each sample is processed separately to avoid cross-contamination and to ensure that the comparison between the two samples is scientifically sound. For purposes of brevity, the following description relates to the processing of a generic sample.
  • the proteins in the tissue sample are solubilized in an appropriate buffer (e.g., a buffer containing an anionic surfactant such as sodium dodecyl sulfate), and then heat denatured.
  • the proteins can then be fractionated according to their electrophoretic mobility using any number of gel electrophoresis techniques such as, for example, one-dimensional sodium dodecyl sulfate-polyacrylamide gel electrophoresis.
  • the gel can be fixed and stained to reveal the bands of fractionated proteins isolated from a non-1 PF tissue sample or from a sample taken from a person known to have subtype-l or subtype I I. Data relating to electrophoretic mobility and band color intensity can be obtained.
  • each of the individual gel lanes can be cut into a plurality of bands and each band can be processed separately to remove the proteins therefrom, thereby creating a library of individual pools of proteins isolated from the tissue sample.
  • each gel band can be processes for in-gel digestion by reducing any cysteine bonds that may be present in the proteins in each band (e.g., by treatment with dithiotreitol) and then incubating each band with an appropriate protease (e.g., trypsin). The resulting peptides can then be extracted from each gel band and stored prior to LC-MS analysis.
  • the peptides in each pool can then be separated by LC-MS/MS.
  • the MS/MS spectra obtained from each pool can then be analyzed (e.g., by use of one or more algorithms and comparison to known databases) to determine the intact protein and peptide fragment composition.
  • the MS/MS spectra of the proteins contained in each gel band pool can be searched against known human protein databases, and the results imported into one or more software programs that can organize the gel-band data, validate peptide identifications and generate a list of identified proteins for the gel band pool.
  • MS/MS analysis can also serve to quantify the amount of each protein and peptide present in each gel band pool.
  • proteins that are significantly differentially expressed in a patient known to have IPF subtype-l l are suitable for use in the protein signatures disclosed herein.
  • proteins that are more highly expressed in a patient known to have IPF subtype-ll are suitable for use in the protein signatures disclosed herein.
  • proteins that are more highly expressed in IPF subtype-ll are cilium- associated genes, or genes described in FIGS. 5, 6, and SEQ I D NOs:1 -197.
  • proteins that are not differentially expressed in IPF are suitable for use in the protein signatures disclosed herein. Proteins that are not differentially expressed may be reference proteins.
  • Table 1 summarizes demographic and clinical characteristics of the LTRC IPF subjects and the non-diseased control cohort used in the initial analysis.
  • the IPF cohort is older and composed of more males than the control cohort. There are no notable differences in racial distribution between the two groups. Approximately half of the individuals with IPF are former smokers, as compared to controls that are almost 50% current smokers. IPF individuals on average have smoked more cigarettes than controls but there is substantial variability in pack years in the IPF cohort.
  • Table 1 Subject demographics and clinical characteristics of derivation cohort.
  • IPF idiopathic pulmonary fibrosis
  • UIP usual interstitial pneumonia
  • the LTRC is a resource created by the NHLBI to provide human lung tissue and DNA to qualified investigators for use in research.
  • the program enrolls donor subjects who are anticipating lung surgery, collects blood and extensive phenotypic data from the prospective donors, and then processes their surgical waste tissues for research use. Most donor subjects have fibrotic interstitial lung disease or COPD.
  • Clinical data include clinical and pathological diagnoses, chest CT images, pulmonary function tests (spirometry, DLCO, and ABG), exposure (including cigarette smoking history) and symptom questionnaires (including Borg dyspnea scale), and family history of lung disease.
  • Affymetrix GeneChip® arrays are fabricated by using in-situ synthesis of short oligonucleotide sequences on a small glass chip using light directed synthesis.
  • mRNA samples of interest are labeled.
  • the first step of the labeling procedure is the synthesis of double stranded cDNA from the RNA sample using reverse transcriptase and an oligo-dT primer.
  • the cDNA serves as a template in an in vitro transcription (IVT) reaction that produces amplified amounts of biotin-labeled antisense m RNA .
  • This biotinylated RNA is referred to as labeled aRNA or cRNA - the microarray target.
  • Affymetrix GeneChip® Human Gene 1 .0 ST Array Each of the 28,869 genes is represented on the array by approximately 26 probes spread across the full length of the gene, providing a more complete and more accurate picture of gene expression than 3' based expression array designs.
  • False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses (type I errors). It is a less conservative procedure for comparison than the Bonferroni correction, with greater power than familywise error rate (FWER) control, at a cost of increasing the likelihood of obtaining type I errors. In practical terms, the FDR is the expected proportion of false positives among all significant hypotheses. 5% FDR means that 5 out of 100 identified genes are expected to be false positives.
  • ANCOVA is a general linear model with a continuous outcome variable
  • ANCOVA is a merger of ANOVA and regression for continuous variables. ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative predictors (covariates) account.
  • FIG. 7 Hierarchical clustering of IPF and control samples is shown in FIG. 7, and IPF samples only in FIG. 1 .
  • FIG. 1 illustrates the presence of two groups of subjects with IPF and six clusters of transcripts (A-F).
  • the most prominent feature of the heatmap is the group of 51 subjects (43%; subject group II) with relatively high expression compared to 68 subjects (57%; subject group I) of a large set of transcripts (transcript clusters A and B).
  • Transcript cluster A contains 80 unique transcripts, which are shown in FIGS. 5A and 5B.
  • Cluster A includes a number of genes that have been previously shown to be upregulated in IPF, namely osteopontin, MMP1 , PLUNC, MMP7, MUC5B, collagen COL17A1 , and keratins 5, 6C, 15 and 17.
  • IPF osteopontin
  • Cluster A also contains a number of other potentially interesting genes such as lipocalin2, a gene with an established role in innate immunity and a marker of acute exacerbations in cystic fibrosis that has more recently been shown to promote epithelial-to-mesenchymal transition (EMT) in cancer and play an active role in renal fibrosis.
  • Functional enrichment analysis, using Fisher exact test, of the 121 unique transcripts in cluster B showed it to be strongly enriched in transcripts associated with the cilium genes (Benjamini corrected p value 3.7 x 10-1 1 ) and their structural components (axoneme, 3.9 x 10-1 1 , dynein, 9.4 x 10-7). Expression of cilium-associated mRNAs was confirmed (DNAH6, DNAH7, DNAI 1 and RPGRIP1 L) in the LTRC subjects with I PF and controls by quantitative RT-PCR (FIG. 2). Quantitative RT-PCR
  • Each 20- ⁇ _ PCR contained 15 ng cDNA, 0.5 ⁇ final concentration of forward and reverse primers and 1 x final concentration of the Power SYBR Green master mix.
  • Real-time PCR was performed on an Applied Biosystems Viia 7 instrument using the following profile: 50°C for 2min, 95°C for 10min, and 40 cycles of 95°C for 15sec, and 60°C for 1 min. Dissociation curves were collected at the end of each run. Data were analyzed using the AACT relative quantification method (16). ACT values were calculated relative to GAPDH, and AACT values were calculated by comparison among different groups of samples.
  • RT-PCR Real time quantitative RT-PCR.
  • the real-time reverse transcription polymerase chain reaction uses fluorescent reporter molecules to monitor the production of amplification products during each cycle of the PCR reaction. This combines the nucleic acid amplification and detection steps into one homogeneous assay and obviates the need for gel electrophoresis to detect amplification products.
  • Use of appropriate chemistries and data analysis eliminates the need for Southern blotting or DNA sequencing for amplicon identification. Its simplicity, specificity and sensitivity, together with its potential for high throughput and the ongoing introduction of new chemistries, more reliable instrumentation and improved protocols, has made real-time RT-PCR the benchmark technology for the detection and/or comparison of RNA levels.
  • LTRC subjects with IPF in group II do not differ in age, gender or smoking status from IPF subjects in group I (Table 3).
  • there were more individuals with higher scores for microscopic honeycombing but not fibroblastic foci in group II compared to group I (Fisher exact test; Table 3).
  • the National Jewish Health (NJH) IPF cohort consists of 1 1 1 IPF patients that were clinically evaluated by investigators at National Jewish Health. All subjects in this cohort have undergone a standardized evaluation designed to provide a specific diagnosis. The evaluation included a standardized history focused on the presence of current or previous systemic disease; medications; tobacco and recreational drug use; familial lung disease; avocational, occupational, environmental, and accidental exposures. Additional testing includes serologic evaluation for evidence of systemic disease, chest radiography, pulmonary physiology (including lung volumes by body plethysmography, spirometry before and after inhaled bronchodilator, and diffusing capacity), pressure volume curves, and gas exchange with exercise (formal six-minute walk testing and/or cardiopulmonary exercise testing). Video assisted thorascopic (VAT) or open surgical lung biopsy was performed as clinically indicated. The diagnosis of IIP was established using the criteria defined in the ATS/ERS consensus statement (1 , 2).
  • Censored survival analysis in the NJH cohort was performed in GraphPad Prism. Mantel-Cox log-rank test was used for curve comparison between high and low cilium expression groups.
  • cilia may play a key role in pathogenesis of IPF is further supported by observing expression of cilium gene markers in honeycomb cysts in IPF lung, the same pathogenic lesions in which MUC5B expression is dysregulated.
  • Immunohistochemical (IHC) staining for ARL13B a marker that is expressed early in ciliogenesis and FOXJ1 , required for formation of motile cilia following primary cilia, reveals dysregulation of both genes in IPF lung.
  • ARL13B stains cilia on epithelial cells lining the trachea, airways and bronchioles and FOXJ1 stains nuclei of the same cells (data not shown). There is no expression of either marker in the normal alveoli with the exception of FOXJ1 expression in alveolar macrophages (data not shown).
  • ARL13B and FOXJ1 are expressed in the cytoplasm of basal and alveolar type II (ATM) cells in the transition zone from normal bronchioles to honeycomb cysts in IPF lung; these cells also express MUC5B ( Figure 8) and have been termed "hyperplastic" ATM cells.
  • ATM basal and alveolar type II
  • Table 4 is a brief summary of the most important findings from the IHC analysis of non-diseased and IPF lung tissue sections for cilium gene markers.
  • Muc5b-/-, WT and Scgbl a1 -driven Muc5b overexpressing mice were exposed to bleomycin.
  • Bleomycin is an antineoplastic antibiotic that forms a complex with oxygen and metals such as Fe 2+ leading to the production of oxygen radicals, double-stranded DNA breaks, and ultimately cell death.
  • Wild type C57BL/6 mice intratracheally instilled with bleomycin (1 -2U/kg) develop significant fibrosis after 2-3 weeks compared to saline controls as determined by increased mortality, decreased lung static compliance, and increased collagen accumulation and lung hydroxyproline levels.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Nucleic acids, devices, and methods that aid in identifying novel sub-types of pulmonary disease are disclosed. In some embodiments, the pulmonary disease is Idiopathic Pulmonary Fibrosis (IPF) and the subtypes are designated IPF subtype I and IPF subtype II.

Description

PATENT APPLICATION
MOLECULAR PHENOTYPING OF IDIOPATHIC INTERSTITIAL PNEUMONIA IDENTIFIES TWO SUBTYPES OF IDIOPATHIC PULMONARY FIBROSIS
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims benefit of priority pursuant to 35 U.S.C. § 1 19(e) of U.S. provisional patent application no. 61 /831 ,379 filed June 5, 2013, which is hereby incorporated herein by reference in its entirety.
GOVERNMENT LICENSE RIGHTS
[0002] This invention was made with government support under grant number
HL095393 awarded by the National Institutes of Health. The government has certain rights in the invention.
FIELD
[0003] The present disclosure relates to compositions and devices useful in the identification, characterization, diagnosis, and treatment of previously unknown subtypes of idiopathic pulmonary fibrosis, and methods of using the same.
SEQUENCE LISTING
[0004] A Sequence Listing in computer readable form (CRF) is submitted with this application. The CRF file is named 231758us02_ST25.txt, was created on June 5, 2013, and contains 1039 kilobytes. The entire contents of the CRF file are incorporated herein by this reference.
BACKGROUND
[0005] Although progress has been made in the clinical characterization of idiopathic interstitial pneumonias (MPs), there remains concern in terms of the certainty of diagnosis. The uncertainty of the diagnosis affects the ability to predict both response to various therapies and provide a patient with a prognosis. In addition, diagnostic uncertainty hinders the ability to understand the etiology and pathogenesis of MPs.
[0006] Idiopathic pulmonary fibrosis (IPF) is the common form of MPs.
Histopathologically, IPF is defined by the presence of the prototypical form of pulmonary fibrosis, usual interstitial pneumonia (UIP), which is a fibrosing interstitial pneumonia with a pattern of heterogeneous, subpleural regions of fibrotic and remodeled lung. IPF can develop as a result of excessive, sequential lung injury and/or aberrant wound healing but the mechanisms that account for excessive lung injury or aberrant repair remain unknown.
[0007] IPF is generally untreatable and often results in death within 3 years of diagnosis. Some IPF patients may, however, live with a diagnosis of IPF for 20 years or more.
Diagnosing idiopathic pulmonary fibrosis (IPF) requires a combination of clinical, radiographic, and pathologic criteria, but the diagnosis of IPF provides little certainty in regard to prognosis or outcome.
[0008] Despite recent efforts to develop treatments for IPF, drugs tested to date have not proven to be clinically beneficial. The only viable treatment for IPF, at this time, is lung transplantation.
[0009] What is needed are compositions, devices, and methods for aiding in the diagnosis and treatment of IPF.
SUMMARY
[0010] Therefore, compositions and devices capable of distinguishing between two previously unknown molecular subtypes of IPF are useful, as are methods of using the same.
[0011] In various aspects, the present disclosure relates to the discovery that the IPF phenotype is actually heterogeneous and consists of two distinct molecular phenotypes or subtypes, which have previously presented clinically the same. The two subtypes of IPF can be distinguished from each other by identification of differences in expression of certain genes that have not been previously implicated in IPF. Prior to the filing of the present disclosure, it was unknown that IPF existed in two distinct molecular phenotypes.
[0012] The two subtypes display markedly different survival rates. Therefore, differentiation between the two subtypes of IPF can determine the relative life expectancy of an individual IPF patient, which is decreased in one subtype and increased in the other.
[0013] Therefore, described herein are compositions, nucleic acids, devices, and methods that aid in identifying novel sub-types of pulmonary disease. Also disclosed herein are compositions, nucleic acids, devices and methods that are useful in differentiating between novel subtypes of pulmonary disease and are thus capable of indicating whether an individual patient has one subtype or another. In some embodiments, the pulmonary disease is Idiopathic Pulmonary Fibrosis (IPF). In some embodiments, the subtypes are designated IPF subtype I and IPF subtype II. In some embodiments, certain genes disclosed herein are more highly expressed in tissue from IPF subtype II than subtype I and/or normal tissue. [0014] In some aspects, compositions for diagnosing or classifying a lung disease are described, the compositions comprising: one or more nucleic acids derived from expressed sequences obtained from a test sample; and diagnostic nucleic acids comprising one or more of all or a part of the sequences disclosed in SEQ ID NOs:1 -197. In some aspects, the test sample nucleic acids and/or the diagnostic nucleic acids are labeled. In some aspects, the composition may further comprise a device for quantifying the label. Also disclosed are devices for diagnosing or classifying a lung disease, the devices comprising a composition comprising one or more labelled nucleic acids derived from expressed sequences obtained from a test sample; and diagnostic nucleic acids comprising one or more of all or a part of expressed sequences; and a device for measuring the label.
[0015] Also described herein are methods of diagnosing or classifying a lung disease comprising: collecting a lung tissue sample; processing the tissue sample; purifying expressed sequences from the tissue sample; determining the abundance of one or more expressed sequences in the tissue sample; and classifying the lung disease based on the abundance of one or more expressed sequences. In some aspects, IPF subtype II tissues exhibit overexpression of one or more diagnostic genes. Also described herein are methods of treating IPF comprising: providing a lung tissue sample; obtaining expressed sequences from the sample; detecting the abundance of one or more expressed sequences having nucleotide sequences identical or homologous to all or a part of one or more of expressed sequences; classifying the lung tissue sample as belonging to I PF subtype I or IPF subtype II.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] Figure 1 shows gene expression profiling data identifying the two subtypes of IPF. mRNA profiles from 1 19 IPF lungs were subject to hierarchical clustering based on the expression of 472 transcripts that are differentially expressed at 5% FDR and with greater than 2 fold change in IPF compared to control lung. The distance metric is Euclidean, with complete linkage across samples and Ward's linkage across genes. Extent of
honeycombing and fibroblastic foci in each sample as assessed by pathology is depicted by the shade of gray: light gray (none; 0%), medium gray (mild; 1 -25%), dark gray (moderate; 25-50%) and to black (severe; >75%); white indicates missing data.
[0017] Figure 2 shows quantitative real-time PCR data confirming increased expression of cilium-associated genes in IPF subtype-ll. Plotted are average fold change for IPF compared to control lung (black bars) and IPF subtype-ll compared to IPF subtype-l (white bar) for four cilium associated genes. Error bars represent standard deviations. [0018] Figure 3 shows data confirming that expression of cilium-associated genes distinguishes IPF subtype-l from IPF subtype-ll and that the differential expression of cilium- associated genes in IPF defines two subcategories of IPF. (A) Hierarchical clustering of cilium-associated genes (GO:0005929, cellular component (cilium)) across samples in IPF subtype-l (grey) and IPF subtype-ll (black) in Figure 1. Asterisk next to the gene names indicates presence in cluster B of Figure 1 . (B) Representative dot plots of two cilium- associated genes DNAH6 and DNAH7 illustrate bimodal distribution of gene expression in IPF; IPF subtype-l = grey, IPF subtype- II = black, control = white. (C) Expression levels of DNAH6 and DNAH7 correlate with the extent of microscopic honeycombing (left) but not with the presence of fibroblastic foci (right).
[0019] Figure 4 shows data confirming that cilium-associated gene expression signature predicts survival in an independent cohort of IPF patients. (A) Expression of cilium- associated genes divides the NJH cohort of 1 1 1 IPF subjects into two groups of subjects, one with high cilium gene expression (black bar; n = 39) and a group with low cilium expression (grey bar, n=72). (B) Kaplan-Meier survival curves grouped by cilium gene expression signature into high (solid line) and low cilium (dashed line) groups. p=0.01 by Mantel-Cox log-rank test for curve comparison.
[0020] Figure 5 depicts a list of differentially expressed genes from Cluster A (Figs. 5A and 5B) and Cluster B (Figs. 5C, 5D, 5E, and 5F).
[0021] Figure 6 depicts a list of 15 differentially expressed cilium-associated genes.
[0022] Figure 7 depicts hierarchical clustering of 1 19 IPF/UIP (black) and 50 control (white) based on expression of 472 transcripts with >2 fold change between IPF and control.
[0023] Figure 8 shows IHC staining of ARL13B, FOXJ1 , and MUC5B in transition zones from normal airways/bronchioles to honeycomb cysts in IPF lung. Tissue was counterstained with hematoxylin. The inset shows a magnification of basal and hyperplastic ATM cells.
[0024] Figure 9 shows additional IHC staining that shows widespread dysregulation of FOXJ1 in honeycomb and alveolar cysts, the same areas where MUC5B is overproduced in IPF lung. Tissue was counterstained with hematoxylin. Images were taken at 5X or 10X.
[0025] Figure 10 IHC staining for ARL13B (brown) demonstrates increased and misregulated expression of this early marker of ciliogenesis 21 days following 1 .5U/kg i.t. bleomycin. The magnitude of misregulated expression is correlated with Muc5b status and the extent of fibrosis as a result of bleomycin exposure. Tissue was counterstained with hematoxylin, Images were taken at 10X magnification (inset at 20X..
[0026] DETAILED DESCRIPTION
Definitions
[0027] "Hybridization" refers to the binding of two single stranded nucleic acids via complementary base pairing. Extensive guides to the hybridization of nucleic acids can be found in: Tijssen, Laboratory Techniques in Biochemistry and Molecular Biology- Hybridization with Nucleic Acid Probes Part I, Ch. 2, Overview of principles of hybridization and the strategy of nucleic acid probe assays" (1993), Elsevier, N.Y.; and Sambrook et al., Molecular Cloning: A Laboratory Manual (3rd ed.) Vol. 1 -3 (2001 ), Cold Spring Harbor Laboratory, Cold Spring Harbor Press, N.Y. The phrases "hybridizing specifically to", "specific hybridization", and "selectively hybridize to", refer to the preferential binding, duplexing, or hybridizing of a nucleic acid molecule to a particular probe under stringent conditions. The term "stringent conditions" refers to hybridization conditions under which a probe will hybridize preferentially to its target subsequence, and to a lesser extent, or not at all, to other sequences in a mixed population (e.g., an mRNA extraction from a tissue biopsy). "Stringent hybridization" and "stringent hybridization wash conditions" are sequence-dependent and are different under different environmental parameters.
[0028] Generally, highly stringent hybridization and wash conditions are selected to be about 5° C lower than the thermal melting point (Tm) for a specific sequence at a defined ionic strength and pH. The Tm is the temperature at which 50% of the target sequence hybridizes to a perfectly matched probe. Very stringent conditions are selected to be equal to the Tm for a particular probe. Often, a high stringency wash is preceded by a low stringency wash to remove background probe signal. An example of stringent hybridization conditions for hybridization of complementary nucleic acids which have more than 100 complementary residues on an array is 42 ° C using standard hybridization solutions, with the hybridization being carried out overnight. An example of highly stringent wash conditions is a 0.15 M NaCI wash at 72 ° C for 15 minutes. An example of stringent wash conditions is a wash in 0.2X Standard Saline Citrate (SSC) buffer at 65 ° C for 15 minutes. An example of a medium stringency wash for a duplex of, for example, more than 100 nucleotides, is 1 X SSC at 450 C for 15 minutes. An example of a low stringency wash for a duplex of, for example, more than 100 nucleotides, is 4X to 6X SSC at 40 ° C for 15 minutes.
[0029] "Nucleic acid," as used herein, can refer to a deoxyribonucleotide (DNA) or ribonucleotide (RNA) in either single- or double-stranded form and includes all nucleic acids comprising naturally occurring nucleotide bases as well as nucleic acids containing any and/or all analogues of natural nucleotides. This term also includes nucleic acid analogues that are metabolized in a manner similar to naturally occurring nucleotides, but at rates that are improved for the purposes desired. This term also encompasses nucleic-acid-like structures with synthetic backbone analogues including, without limitation, phosphodiester, phosphorothioate, phosphorodithioate, methylphosphonate, phosphoramidate, alkyl phosphotriester, sulfamate, 3'-thioacetal, methylene(methylimino), 3'-N-carbamate, morpholino carbamate, and peptide nucleic acids (PNAs) (see, e.g.: Oligonucleotides and Analogues, a Practical Approach," edited by F. Eckstein, I RL Press at Oxford University Press (1991 ) ; "Antisense Strategies," Annals of the New York Academy of Sciences, Volume 600, Eds. Baserga and Denhardt (NYAS 1992); Milligan (1993) J. Med. Chem. 36:1923- 1937; and "Antisense Research and Applications" (1993, CRC Press)). PNAs contain non- ionic backbones, such as N-(2- aminoethyl) glycine units. Phosphorothioate linkages are described in: WO 97/0321 1 ; WO 96/39154; and Mata (1997) Toxicol. Appl. Pharmacol. 144:189-197. Other synthetic backbones encompassed by this term include methylphosphonate linkages or alternating methyl-phosphonate and phosphodiester linkages (Strauss-Soukup (1997) Biochemistry 36: 8692-8698), and benzyl-phosphonate linkages (Samstag (1996) Antisense Nucleic Acid Drug Dev 6: 153-156).
[0030] "Patient" means a subject whose tissue is used for analysis. As used herein, a patient may be a mammal such as, for example, a murine animal, a canine animal, a porcine animal, a feline animal, a simian animal, a hominid, or the like. In some embodiments, a patient is a human.
[0031] "Probe" or "nucleic acid probe" refer to one or more nucleic acid fragments whose specific hybridization to a sample can be detected. In various embodiments, probes are arranged on a substrate surface in an array. The probe may be unlabelled, or it may contain one or more labels so that its binding to a nucleic acid can be detected. In various embodiments, a probe can be produced from any source of nucleic acids from one or more particular, pre-selected portions of a chromosome including, without limitation, one or more clones, an isolated whole chromosome, an isolated chromosome fragment, or a collection of polymerase chain reaction (PCR) amplification products.
[0032] In some embodiments, the probes contain sequences specific to, or characteristic of, any one or more of the genes described in FIGS. 5 and 6, and SEQ ID NOS:1 -197.
Techniques capable of producing high density arrays can also be used for this purpose (see, e.g., Fodor (1991 ) Science 767-773; Johnston (1998) Curr. Biol. 8: Rl 71 -Rl 74; Schummer (1997) Biotechniques 23: 1087-1092; Kern (1997) Biotechniques 23: 120-124; and U.S. Patent No. 5,143,854).
[0033] The sequence of the probes can be varied. In various embodiments, the probe sequence can be varied to produce probes that are substantially identical to the probes disclosed hereinbelow, but that retain the ability to hybridize specifically to the same targets or samples as the probe from which they were derived.
[0034] Reference is now made in detail to certain embodiments of compositions, nucleic acids, devices, and methods useful in differentiation of IPF subtype-l and IPF subtype-ll. The disclosed embodiments are not intended to be limiting of the claims. To the contrary, the claims are intended to cover all alternatives, modifications, and equivalents.
IPF Subtypes I and II
[0035] In various aspects, the present disclosure relates to the discovery that the IPF phenotype is actually heterogeneous and consists of two distinct molecular phenotypes or subtypes, which have previously presented clinically the same.
[0036] The present inventors have discovered that IPF exists as two distinct subtypes that have previously presented clinically the same. The two subtypes differ in their expression of cilium genes and can thus be distinguished based on this differential expression. IPF subtype-l displays decreased levels of cilium gene expression as compared to IPF subtype-ll, which displays increased levels of cilium gene expression. As shown in FIG. 4B, IPF subtype-l, which displays decreased levels of cilium gene expression, displays a significantly longer survival rate than IPF subtype-ll, which displays increased levels of cilium gene expression.
[0037] Prior to the information presented in the present disclosure, it was unknown that IPF existed in two distinguishable subtypes. Thus both subtypes, which are molecularly and pathologically distinct and have different survival rates, were previously treated the same.
[0038] To discover the two distinct subtypes of IPF, transcription/expression profiles were produced from lung tissue from a pool of subjects with IPF. Expression/transcription profiles were also produced from a pool of non-diseased control subjects. Strong molecular signatures were identified that were associated with the expression of cilium genes. The molecular signatures separated the IPF subjects into two subtypes, one with increased cilium gene expression (subtype II), relative to the other (subtype I). The more abundant cilium gene expression levels (subtype II) were associated with microscopic honeycombing. The more abundant cilium-associated gene expression levels (subtype II) also correlated with diminished survival in an independent cohort of subjects/patients with IPF.
[0039] Therefore, surprisingly, measuring the differential levels of expression of cilium genes in patients with IPF identifies two novel subtypes of IPF that otherwise present clinically the same. Identification of the subtypes can allow for better determination of disease prognosis, as well as improved therapeutic approaches for one or both IPF subtypes.
[0040] In some embodiments, IPF patients with elevated cilium gene expression display increased microscopic honeycombing. In some embodiments, IPF patients with elevated cilium gene expression display increased microscopic honeycombing and elevated MUC5B expression. MUC5B is a secreted airway mucin. Prior to the filing of the present disclosure, secreted airway mucins had not been implicated in the development of pulmonary fibrosis. Therefore, in certain embodiments, IPF subtype I I displays elevated MUC5B expression. Without wishing to be bound by any theory, it is believed that excess concentrations of MUC5B compromise mucosal host defense, reducing lung clearance of inhaled particles, dissolved chemicals, and microorganisms, resulting in enhanced injury, chronic
inflammation, and fibroproliferation in the distal airspace. Therefore, in certain
embodiments, IPF subtype-ll displays a unique IPF phenotype comprising honeycomb cysts, elevated MUC5B expression, and elevated expression of cilium-associated genes.
[0041] In some embodiments, IPF subtype-ll patients displaying elevated cilium- associated gene expression do not show increased levels of fibroblastic foci.
[0042] In various aspects, the IPF phenotype is heterogeneous and comprises two distinct molecular phenotypes, groups, or subtypes; IPF subtype-l and IPF subtype-ll. In some embodiments, the two subtypes of IPF are characterized by differences in expression of at least one diagnostic gene that has not been previously implicated in IPF. In some embodiments, the diagnostic gene may be one or more than one cilium-associated gene. In some embodiments, the two subtypes of IPF are characterized by differences in expression of a plurality of cilium-associated genes that have not been previously implicated in IPF.
[0043] In various aspects, high cilium-associated gene expression is associated with pathological features of IPF (e.g. honeycombing) and survival, as compared to low cilium- associated gene expression. Thus, in various aspects, analysis of gene expression in IPF subjects can be used to identify subjects that can respond differently to pharmacological intervention. The application of gene expression profiling to identify unique disease subtypes can be used to improve clinical course and response to therapy for IPF/UIP subjects/patients.
[0044] In various embodiments, over-expression of a diagnostic gene can be relative to a reference gene, a reference sample, or the total amount of expressed RNA. In various embodiments, high cilium gene expression may correspond to Affymetrix mean intensity readings ranging from about 5 to about 9, and in other embodiment from about 5.99 log(2) (standard deviation = 0.57, and standard error of the mean = 0.08) to about 8.23 (standard deviation = 1 .23, and standard error of the mean = 0.15) (for example see Cluster B gene expression data in Figs. 5C-5F). In some embodiments, over-expression may also be determined by comparing expression levels of diagnostic genes (for example cilium- associated genes) to expression levels of reference genes.
[0045] Disclosed herein are compositions, nucleic acids, devices, and methods that aid in identifying novel sub-types of pulmonary disease. In some embodiments, the pulmonary disease is Idiopathic Pulmonary Fibrosis (I PF). In some embodiments, the novel subtypes are IPF subtype-l and IPF subtype-l l.
[0046] Therefore, in various aspects, the present disclosure relates to compositions, devices and methods useful in distinguishing patients with I PF subtype-l from patients with IPF subtype-ll. In some embodiments, the present disclosure relates to compositions, devices, and methods of distinguishing patients with IPF subtype I from patients with IPF subtype I I in lung tissue samples. In various aspects, the two subtypes can be distinguished from each other by measuring the level of expression of one or more genes which, in some embodiments, are cilium-associated genes.
[0047] In certain embodiments, a patient with IPF subtype II has a shorter survival time, post-diagnosis, than a subject with I PF subtype I. Therefore, in certain embodiments, identification of the IPF subtype in an I PF patient can help determine the proper therapeutic approach for such patient. In certain embodiments, one or more cilium-associated genes are more highly expressed in samples from patients with I PF subtype II relative to the expression level of the same cilium-associated genes in samples from patients with subtype I and/or from non-1 PF samples. Therefore, in certain embodiments, identification of novel IPF subtypes having distinct survival rates and phenotypes aids in the discovery and development of new treatments for this otherwise untreatable disease. For example, pharmaceutical compositions can have effect limited effect on the overall population of IPF but be selectively effective for treating IPF subtype I or IPF subtype I I.
Differential Expression
[0048] In various aspects, the present disclosure relates to devices for quantitating expression levels of genes in a sample. In some embodiments, the genes are human genes. In some embodiments, gene expression levels are analyzed by quantitating mRNA levels. In some embodiments, gene expression levels are analyzed by quantitating protein levels. In some embodiments, the sample is a lung tissue sample. In some embodiments, the lung tissue sample is from a healthy patient. In some embodiments the lung tissue sample is from a patient with pulmonary disease. In some embodiments, the pulmonary disease is IPF. [0049] In some aspects, the present disclosure relates to devices for quantitating expression levels of cilium-associated genes in a lung tissue sample from a patient with pulmonary disease. In some embodiments, the pulmonary disease is IPF. Therefore, in some embodiments, the quantified expression levels can be used to classify the lung tissue sample as IPF subtype-l or IPF subtype-ll. In some embodiments, expression levels of the cilium-associated genes are analyzed by quantitating mRNA levels. In some embodiments, expression levels of the cilium-associated genes are analyzed by quantitating protein levels. In some embodiments, the cilium-associated genes are human genes. In some
embodiments, the mRNA comprises a human cilium-associated gene transcript. In some embodiments, the device comprises one or more cilium-associated nucleic acid sequences.
Differential Expression of mRNA
[0050] As shown in Example 1 of this disclosure, the present inventors have revealed that IPF exists in two distinct molecular phenotypes or subtypes, which have previously presented clinically the same. Additionally, the present inventors have shown that the two subtypes differ in their expression of cilium genes, with IPF subtype-ll displaying enhanced levels of cilium-associated gene expression as compared to IPF subtype-l and/or normal, non-diseased lung tissue. Additionally, the present inventors have revealed that patients with IPF subtype-l have a significantly longer survival rate than patients with IPF subtype-ll, making the determination of subtype in IPF patients ideally suited as predictive measures of survival in IPF patients.
[0051] In various aspects, the present disclosure is based on the discovery that certain differentially expressed genes, and the mRNA sequences transcribed therefrom, can be used to identify IPF subtype in patients with IPF. Such genes, and their mRNA sequences, can also be used to predict the life expectancy of IPF patients and whether such patients will benefit from one or more medical treatments.
[0052] In various embodiments, devices comprising isolated nucleic acid sequences that selectively hybridize to one or more mRNA sequences of genes that are differentially regulated in lung tissue samples from IPF patients are disclosed. In some embodiments, the devices comprise nucleic acid sequences that will selectively hybridize to one or more mRNA sequences of genes that are differentially regulated in lung tissue of IPF patients. In various embodiments, the devices comprise isolated nucleic acid sequences that may be used to measure gene expression by quantitating amounts of mRNA from specific genes in a tissue sample. In some embodiments, the tissue sample may be from lung tissue, for example from a patient diagnosed with pulmonary disease. In some embodiments, the pulmonary disease is IPF. In some embodiments the isolated nucleic acid sequences comprise sequences of cilium-associated genes, and the mRNA comprises sequences of transcribed cilium-associated genes. In some embodiments, mRNA levels from one, more than one, or more than 5 cilium-associated genes may be analyzed.
Diagnostic Genes
[0053] In various aspects, the abundance of mRNA transcripts from a diagnostic gene, or the abundance of expressed sequence from a diagnostic gene, can correlate with the presence or absence of lung disease, or a specific subtype of lung disease. Therefore, in various aspects, a diagnostic gene is a gene that is indicative of a disease state, or disease subtype. In some embodiments, a diagnostic gene is more highly expressed in lung tissue of one disease subtype compared to a second disease subtype and/or normal, non-diseased lung tissue. In some embodiments, a diagnostic gene is differentially expressed in a patient with IPF. In some embodiments, the abundance of expressed sequences from a diagnostic gene may be determined by comparing the number of expressed sequences per cell, per amount of total RNA and/or per amount of polyA+ RNA.
[0054] The expression level of a diagnostic gene may be measured relative to the expression level of a specific gene, relative to the expression of all genes in a tissue sample, or relative to expression of the diagnostic gene in non diseased lung tissue. For example, in some embodiments enhanced expression may be identified by comparing the expression level of the diagnostic gene to the expression level of a reference gene. A reference gene may provide a basis for comparison for a diagnostic gene. In other words, the abundance of transcripts from a reference gene provides a basis for comparison for the expression level of a diagnostic gene. In some embodiments, the abundance of expressed sequences from a diagnostic gene may be determined by comparing the total amount of expressed sequences from the diagnostic gene to the total amount of expressed sequences of a reference gene. In some embodiments, a difference in the abundance of transcripts from a diagnostic gene, as compared to the abundance of transcripts from a reference gene, in indicative of a disease state. In some embodiments, the abundance of expressed sequences from a diagnostic gene may be determined relative to the expression level of all genes in a tissue sample. For example, in some embodiments a standard amount of mRNA may be analyzed and the abundance of the expressed sequence may be determined relative to the amount of mRNA analyzed.
[0055] In various embodiments, the expression levels from a plurality of diagnostic genes are analyzed. In some embodiments, the expression levels from a plurality of diagnostic genes are compared to the expression levels of a plurality of reference genes. In some embodiments, a decrease in the expression level of one or more diagnostic genes or the total background expression level, as compared to the expression level of one or more reference genes, is indicative of IPF subtype-l. In some embodiments, no, or little, change in the expression level of one or more diagnostic genes, as compared to the expression level of one or more reference genes or the total background expression level, is indicative of IPF subtype-l. In some embodiments, an increase in the expression level of one or more diagnostic genes, as compared to the expression level of one or more reference genes or the total background expression level, is indicative of IPF subtype-ll.
[0056] A reference gene may be a gene whose expression level is the same or similar in lung tissue from diseased patients and subjects that do not suffer from lung disease. In that regard, a reference gene and a diagnostic gene may be the same gene, with the reference gene expression level determined in a different tissue type and/or different patient. For example, the expression level of a diagnostic gene may be determined from the lung tissue of a diseased patient, and the reference gene expression level is obtained by determining the expression level of the diagnostic gene in a sample taken from the lung tissue of a non- diseased patient, and/or from non-diseased lung tissue from the same patient.
[0057] In some embodiments, a control or reference gene may be an expressed sequence whose expression is similar in both IPF subtype I and subtype II. In some embodiments, similar expression may be greater than about 0.7-fold, 0.8-fold, 0.9-fold, 1 .0-fold, or 1 .1 -fold, and/or less than about 1.2-fold, 1 .1 -fold, 1.0-fold, 0.9-fold, or 0.8-fold.
[0058] In some embodiments, the abundance of an expressed sequence or transcript from a tissue sample from a patient having lung disease may compared to the relative abundance of that same expressed sequence from a non-lung-disease subject or patient. In some embodiments, diagnostic genes are expressed at greater levels in lung tissue obtained from a patient having lung disease than from a patient without lung disease. In some
embodiments, diagnostic genes are expressed at greater levels in lung tissue obtained from a patient with IPF subtype-ll than from a patient without lung disease. In some
embodiments, diagnostic genes are expressed at lesser levels in lung tissue obtained from a patient with IPF subtype-l than from a patient with subtype-ll and/or a patient without lung disease. In some embodiments, diagnostic genes are expressed at greater levels in lung tissue obtained from a patient having one subtype of lung disease than from a patient with a different subtype of lung disease. In some embodiments, diagnostic genes are expressed at lesser levels in lung tissue obtained from a patient with IPF subtype-l than from a patient with IPF subtype-ll. In some embodiments, diagnostic genes are expressed at greater levels in lung tissue obtained from a patient with IPF subtype-ll than from a patient with IPF subtype-l, and/or a patient without lung disease. [0059] In various aspects, the genes described in FIGS. 5 and 6, and in SEQ ID NOS: 1 - 197 can be diagnostic genes. In some embodiments, a diagnostic gene is a gene described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197, and/or a fragment(s) thereof. In some embodiments, devices provided by the present disclosure comprise at least one gene described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197. In some embodiments, devices provided by the present disclosure comprise a plurality of the genes described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197. In some embodiments, devices provided by the present disclosure comprise all of the genes described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197.
[0060] In some embodiments, a diagnostic gene(s) is more highly expressed in lung tissue obtained from a patient with IPF type-ll than in lung tissue obtained from a patient with type-l IPF. In some embodiments, a diagnostic gene(s) is more highly expressed in lung tissue obtained from a patient with IPF type-ll than in lung tissue obtained from a non-diseased patient. In some embodiments, a diagnostic gene(s) is expressed at a lower level in lung tissue obtained from a patient with IPF type-l than in lung tissue obtained from a patient with subtype-ll and/or a non-diseased patient. In various embodiments, one or more genes and expressed sequences in Cluster A (FIGS. 5A and 5B) and Cluster B (FIGS. 5C - 5F) are expressed at higher levels in lung tissue obtained from IPF type-ll patients than from patients with IPF type-l.
[0061] In some embodiments, a reference gene may be a gene that is not in Cluster A or Cluster B. In other embodiments, a reference gene may be a gene that is listed in FIG. 5A- 5B. In some embodiments a reference gene may be expressed at the same level in lung tissue samples from patients with lung disease, without lung disease, with IPF subtype-l, and with IPF subtype-ll.
[0062] In various embodiments, the diagnostic genes may be one or more genes belonging to the Gene Ontology (GO) category 0005929, which comprises many genes of the cellular component of cilium. In various embodiments, the diagnostic genes may be one or more of the cilium-associated genes described in FIG. 6.
[0063] In some embodiments, expressed sequences from cilium-associated genes are more abundant in lung tissue obtained from IPF type-ll patients than from patients with IPF type-l. In some embodiments, expressed sequences from cilium-associated genes are less abundant in lung tissue obtained from IPF type-ll patients than from patients with IPF type-l. In some embodiments, expressed sequences from cilium-associated genes are more abundant in lung tissue obtained from IPF type-l patients than from non-diseased patients. In some embodiments, expressed sequences from cilium-associated genes are less abundant in lung tissue obtained from IPF type-ll patients than from non-diseased patients. [0064] In most cases, diagnostic genes may be more highly expressed in lung tissue samples from patients with IPF subtype-ll than in lung tissue samples from patients with subtype- 1.
[0065] Expression levels can be analyzed by a variety of methods and techniques including, for example, differential expression screening, PCR, RT-PCR, SAGE analysis, high-throughput sequencing, microarrays, liquid or other arrays, protein-based methods (e.g., western blotting, proteomics, and other methods described herein), and data mining methods, as further described herein.
[0066] In some embodiments, the diagnostic gene sequences are homologous or identical to genes described in FIGS. 5 and 6, and SEQ ID NOS:1 -197 or portions thereof, for example the diagnostic gene sequence may be more than about 5 nucleotides (nt), 10 nt, 15 nt, 20 nt, 25 nt, 30 nt, 35 nt, 40 nt, 45 nt, 50 nt, 60 nt, 70 nt, 80 nt, 90 nt, 100 nt, 150 nt, 200 nt, 300 nt, 400 nt, 500 nt, or 600 nt, 700 nt, 800 nt, 900 nt, 1 .0k nt, 1 .1 k nt, 1 .2, nt, 1 .3k nt, 1 .4k nt, 1 .5k nt, 1 .6k nt, 1 .7k nt, 1 .8k nt, 1 .9k nt, 2.0k nt, 2.5k nt, 3.0k nt, 4.5k nt, 5.5k nt, 5.5k nt, 6.0k nt, 6.5k nt, 7.0k nt, 7.5k nt, 8.0k nt, 8.5k nt, 9.0k nt, 9.5k nt, 10.0k nt, 10.5k nt, 1 1 .0k nt, 1 1 .5k nt, 12.0k nt, 12.5k nt, and/or less than about 13.0k nt, 12.5k nt, 12.0k nt, 1 1 .5k nt, 1 1 .Ok nt, 10.5k nt, 10.0k nt, 9.5k nt, 9.0k nt, 8.5k nt, 8.0k nt, 7.5k nt, 6.0k nt, 5.5k nt, 5.0k nt, 4.5k nt, 4.0k nt, 3.5k nt, 3.0k nt, 2.5k nt, 2.0k nt, 1 .9k nt, 1 .8k nt, 1 .7k nt, 1 .6k nt, 1 .5k nt, 1 .4k nt, 1 .3k nt, 1 .2k nt, 1 .1 k nt, 1 .Ok nt, 900 nt, 800 nt, 700 nt, 600 nt, 500 nt, 400 nt, 300 nt, 200 nt, 150 nt, 90 nt, 80 nt, 70 nt, 60 nt, 55 nt, 50 nt, 45 nt, 40 nt, 35 nt, 30 nt, 25 nt, 20 nt, 15 nt, 10 nt, or 5 nt of the sequences described in FIGS. 5 and 6, and SEQ I D NOS:1 -197. In various embodiments, the homologous sequences can include deleted nucleotides or inserted nucleotides.
[0067] In various embodiments, the diagnostic gene sequences can be aligned with the gene sequences described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197 by a nucleotide sequence alignment algorithm. For example, blastn for aligning two nucleotide sequences, wherein the program is optimized for highly similar sequences (megablast) or for somewhat similar sequences (blastn; this can be useful where sequences have less than about 90% identity or the sequences have low complexity). In various embodiments, the maximum target sequence is set to the length of the longer of the two sequences to be aligned, the expected threshold can be 10, the word size can be 28, the match/mismatch scores can be - 1 ,-2 and the gap costs linear. In various embodiments, homology can be expressed as percent identity.
[0068] In some embodiments, the diagnostic gene sequences, when aligned with the sequences of FIGS. 5 and 6, and SEQ ID NOS:1 -197, have identity of more than about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95 % and/or less than about 100%, 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, or 45 % identities. In various embodiments, the sequence alignment can have gaps of less than about 15%, 14%, 13%, 12%, 1 1 %, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1 %.
Cluster A
[0069] In some embodiments, the diagnostic gene sequences may include one or more of the genes selected from the group consisting of: ABCA13 (Gene Symbol), NM 152701 (Accession Number); ADAM28, NM_014265; ADH7, NM_000673; AGR2, NM_006408; AGR3, NM_176813; ALOX15, NM_001 140; ANKRD18B, ENST00000290943; C10orf81 , NM_001 193434; C12orf75, NM_001 145199; C1 orf1 10, NM_178550; C20orf1 14,
NM_033197; C7orf63, NM_001039706; CAPN13, NM_144575; CD24, NM_013230; CDH3, NM_001793; CHST9, NM_031422; CKMT1 A, NM 001015001 ; CKMT1 A, NM_001015001 ; CLCA2, NM_006536; CLDN1 , NM 021 101 ; CLIC6, NM_053277; CNTN3, NM_020872; COL17A1 , NM_000494; CP, NM_000096; CRISPLD1 , NM 031461 ; CXCL13, NM_006419; CYP24A1 , NM_000782; DNAH5, NM_001369; DNAJA4, NM_018602; DSC3, NM_024423; FAT2, NM_001447; FGF14, NM_175929; GOLM1 , NM 016548; GPR1 10, NM_153840; GSTA1 , NM_145740; HHLA2, NM_007072; HSPA4L, NM_014278; ITGB8, NM_002214; KIAA1324, NM_020775; KIAA1377, NM_020802; KLHL13, NM_033495; KRT15,
NM_002275; KRT17, NM_000422; KRT5, NM_000424; KRT6C, NM_173086; LCN2, NM_005564; MAPI A, NM_002373; MMP1 , NM_002421 ; MMP13, NM_002427; MMP7, NM_002423; MSMB, NM_002443; MUC16, NM_024690; MUC4, NM_018406; MUC5B, NM_002458; PFN2, NM_053024; PIP, NM_002652; PLEKHG7, NM_001004330; PLUNC, NM_130852; PROM1 , NM_006017; PRSS12, NM_003619; SCGB1 A1 , NM 003357;
SCGB3A1 , NM_052863; SCGB3A2, NM_054023; SERPINB3, NM_006919; SERPINB5, NM_002639; SIX1 , NM_005982; SIX4, NM_017420; SLC27A2, NM_003645; SLC44A4, NM_025257; SLC44A4, NM_025257; SLC44A4, NM_025257; SLITRK6, NM_032229; SOX2, NM_003106; SPP1 , NM_001040058; ST6GALNAC1 , NM 018414; TMEM45A, NM_018004; TMPRSS4, NM_019894; TP63, NM_003722; TPPP3, NM_016140; TRIM2, NM_015271 ; TRIM29, NM_012101 ; UGT1 A1 , NM_000463; VTCN1 , NM_024626, and combinations thereof.
Cluster B
[0070] In some embodiments, the diagnostic gene sequences may include one or more of the genes selected from the group consisting of: AGBL2 (Gene Symbol), NM 024783 (Accession Number); ARMC3, NM_173081 ; ARMC4, NM_018076; C10orf107, NM_173554; C10orf79, NM_025145; C1 1 orf70, NM_032930; C1 1 orf88, NM_207430; C12orf55, ENST00000298953; C12orf63, ENST00000342887; C13orf30, NM_182508; C1 orf129, NM_025063; C1 orf173, NM_001002912; C1 orf192, NM_001013625; C1 orf194,
NM_001 122961 ; C1 orf87, NM_152377; C20orf26, NM_015585; C20orf85, NM_178456; C2orf39, NM_145038; C2orf77, NM_001085447; C3orf15, NM_033364; C6,
NM_001 1 15131 ; C6orf 103, NM_024694; C6orf 1 18, NM_144980; C6orf 165,
NM_001031743; C6orf97, NM_025059; C9orf135, NM_001010940; CAPS, NM_004058; CAPSL, NM_144647; CASC1 , NM_018272; CCDC1 1 , NM_145020; CCDC1 13,
NM_014157; CCDC146, NM_020879; CCDC39, NMJ 81426; CCDC60, NM_178499; CDHR3, NM_152750; CERKL, NM_201548; CXorf22, NM_152632; CXorf59, NM_173695; DNAH10, NM_207437; DNAH10, NM_207437; DNAH1 1 , NM_003777; DNAH12,
NM_178504; DNAH12, NM_178504; DNAH12, NM_198564; DNAH3, NM_017539; DNAH5, NM_001369; DNAH6, NM_001370; DNAH6, NM_001370; DNAH6, NM_001370; DNAH6, NM_001370; DNAH7, NM_018897; DNAH9, NM_001372; DNAI1 , NM_012144; DNAJA4, NM_018602; DPY19L2P2, NR_027768; DTHD1 , NM_001 136536; DZIP3, NM_014648; EFCAB1 , NM_024593; EFHB, NMJ 44715; EFHC1 , NR 033327; EFHC2, NM_025184; ENKUR, NM_145010; FAM154B, AK304339; F AM 183 A, NM_001 101376; FAM81 B, NM_152548; FANK1 , NM_145235; HYDIN, NM 032821 ; HYDIN, NM 032821 ; HYDIN, NM_032821 ; HYDIN, NM_032821 ; IL5RA, NM_000564; IQUB, NM_178827; KCNRG, NM_173605; LOC646851 , NM_001013647; LRRC46, NM_033413; LRRC48,
NM_001 130090; LRRC50, NM_178452; LRRIQ1 , NM_032165; MDH1 B, NM_001039845; MNS1 , NM_018365; M0RN5, NM_198469; MS4A8B, NM_031457; NEK10, NM_199347; NEK1 1 , NM_024800; NEK5, NM_199289; NME5, NM_003551 ; PACRG, NM_152410; PIH1 D2, NM_138789; PLEKHG7, NM_001004330; PTRH1 , ENST00000419060; RGS22, NM_015668; RINT1 , NM 021930; R0PN1 L, NM_031916; RP1 , NM 006269; RPGRIP1 L, NM_015272; RSPH1 , NM_080860; RSPH1 OB, NM_173565; RSPH1 OB, NM_173565;
RSPH4A, NM_001010892; SERPINI2, NM_006217; SNTN, NM_001080537; SPA17, NM_017425; SPAG17, NM_206996; SPAG6, NM_012443; SPATA17, NM_138796;
SPATA18, NM_145263; STK33, NM_030906; ST0ML3, NM_145286; ST0X1 , NM_152709; TEKT1 , NM_053285; TEX9, NM_198524; TMEM212, NM_001 164436; TMEM232,
NM_001039763; TSGA10, NM_025244; TSPAN1 , NM_005727; TTC18, NM_145170;
TTC25, NM_031421 ; UBXN10, NM_152376; VWA3A, NMJ 73615; VWA3B, NM_144992; WDR16, NM_145054; WDR49, NM_178824; WDR63, NM 145172; WDR65, NR_030778; WDR66, NM_144668; WDR69, NM_178821 ; WDR78, NM_024763; YSK4, NM_025052; ZBBX, NM_024687, and combinations thereof.
Cilium-Associated Genes [0071] In some embodiments, the diagnostic genes and/or the mRNA transcripts transcribed from the diagnostic genes may be from cilium-associated genes. The cilium- associated genes may be selected from isoforms, alternative splice variants, and genes of the group consisting of: Homo sapiens sperm associated antigen 17 (SPAG17); Homo sapiens sperm flagellar protein Repro-SA-1 ; Homo sapiens enkurin, TRPC channel interacting protein (ENKUR) ; Homo sapiens dynein, axonemal, heavy chain 10; Homo sapiens dynein, axonemal, heavy chain 10 (DNAH10) ; Homo sapiens TATA box binding protein (TBP)-associated factor, RNA polymerase I, C, 1 10kDa (TAF1 C), transcript variant 2, mRNA, Homo sapiens axonemal heavy chain dynein type 3 (DNAH3) ; Homo sapiens RPGRIP1 -like, mRNA, Homo sapiens dynein, axonemal, heavy chain 7 (DNAH7) ; Homo sapiens sentan, cilia apical structure protein (SNTN), mRNA, Homo sapiens dynein, axonemal, heavy chain 12 (DNAH12); Homo sapiens dynein, axonemal, heavy chain 12 (DNAH12), transcript variant 1 ; Homo sapiens dynein, axonemal, heavy chain 12 (DNAH12) ; Homo sapiens EF-hand domain (C-terminal); Homo sapiens cDNA FLJ58105 highly similar to Homo sapiens radial spokehead-like 3 (RSHL3); Homo sapiens cell division cycle associated 7-like (CDCA7L) ; Homo sapiens dynein intermediate chain DNAI1 (DNAI1 ). In some embodiments the cilium-associated genes are isoforms, variants, and genes with accession numbers selected from the group consisting of AY555274, NM 206996;
AF079363, NM_172242, NM_012443; BC026165, NM_145010; BC144575, BC150622, AK125475, NM_207437; AK125796, NM_207437; BX648657, NM_139353, NM_005679, NM_178452; AF494040, NM_017539; BC136433, NM_001 127897, NM_015272;
AB023161 , NM_018897; AK126350, NM_001080537; AK128592, NM_178504; U53532, NM_178504; NM_198564, NM_178504; BC020210, NR_033327, NM_001 172420,
NM_018100; AK299754, NM_001 161664, NM_001010892; AK095018, NM_018719, NM_001 127370, NM_003777, NM_001 127371 ; AF091619, NM_012144; and combinations thereof.
Array
[0072] In various aspects, devices provided by the present disclosure comprise arrays or micro-arrays. An array refers to an arrangement, on a substrate surface, of multiple nucleic acid sequences, which may be single-stranded, double-stranded, or a combination thereof. In some embodiments, the nucleic acid sequences may comprise sequences of transcribed genes. In some embodiments an array can comprise many different nucleic acid sequences representing the same or different transcribed genes. In some embodiments, several different nucleic acid sequences may comprise different portions of the same transcribed gene, in some embodiments the different nucleic acid sequences may overlap, for example by more than about 1 nucleotide (nt), 2 nt, 3 nt, 4 nt, 5 nt, 6 nt, 7 nt, 8 nt, or 9 nt and/or less than about 10 nt, 9 nt, 8 nt, 7 nt, 6 nt, 5 nt, 4 nt, 3 nt, or 2 nt.
[0073] In some embodiments, nucleic acid sequences on an array may have the same or similar melting temperatures, for example greater than 45 °C, 46 °C, 47 °C, 48 <€, 49 °C, 50 °C, 51 °C, 52 °C, 53 °C, 54 °C, 55 °C, 56 °C, 57 °C, 58 °C, 59 °C, 60 °C, 61 °C, 62 °C, 63 °C, 64 °C, 65 °C, 66 °C 67 °C, 68 °C, 69 °C, 70 °C or 75 °C and/or less than about 80 °C, 75 °C, 70 °C, 69 °C, 68 °C, 67 °C, 66 °C, 65 °C, 64 °C, 63 °C, 62 °C, 61 °C, 60 °C, 59 °C, 58 °C, 57 °C, 56 °C, 55 °C, 54 °C, 53 °C, 52 °C, 51 °C, 50 °C, 49 °C, 48 °C, 47 °C or 46 °C. In other embodiments, the nucleic acids may have the same or similar lengths, for example greater than about 17 nt, 18 nt, 19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, 26 nt, 27 nt, 28 nt, 29 nt, or 30 nt and/or less than about 31 nt, 30 nt, 29 nt, 28 nt, 27 nt, 26 nt, 25 nt, 24 nt, 23 nt, 22 nt, 21 nt, 20 nt, 19 nt, or 18 nt. In some embodiments, the nucleic acids may have the same or similar lengths, for example greater than about 1 nt, 2 nt, 3 nt, 4 nt, 5 nt, 6 nt, 7 nt, 8 nt, 9 nt, 10 nt, 15 nt, 20 nt, 25 nt, 30 nt, 35 nt, 40 nt, 45 nt, 50 nt, 55 nt, 60 nt, 65 nt, 70 nt, 75 nt, 80 nt, 85 nt, 90 nt, 95 nt, 100 nt, 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 450 nt, or 500 nt, and/or less than about 600 nt, 550 nt, 500 nt, 450 nt, 400 nt, 350 nt, 300 nt, 250 nt, 200 nt, 150 nt, 100 nt, 95 nt, 90 nt, 85 nt, 80 nt, 75 nt, 70 nt, 65 nt, 60 nt, 55 nt, 50 nt, 45 nt, 40 nt, 35 nt, 30 nt, 25 nt, 20 nt, 15 nt, 10 nt, 9 nt, 8 nt, 7 nt, 6 nt, 5 nt, 4 nt, 3 nt, or 2 nt.
[0074] In some embodiments, the diagnostic genes and/or the mRNA transcripts transcribed from the diagnostic genes may be referred to as a target. In general, an array can comprise a plurality of nucleotide sequences from a plurality of targets. In various embodiments, each individual nucleic acid sequence is immobilized to a designated, discrete location (i.e., a defined location or assigned position) on the substrate surface. In some embodiments, each nucleic acid sequence is immobilized to a discrete location on an array and each has a sequence that is either specific to, or characteristic of, a particular mRNA sequence, expressed sequence, or transcribed gene. In some embodiments, a given target may be represented at several positions on an array surface. In some embodiments an array can comprise the same nucleotide sequence at more than one position. In some embodiments, a target may be represented by different oligonucleotide sequences on the same array surface.
[0075] A nucleic acid sequence on an array can be specific to, or characteristic of a particular mRNA sequence because it contains a nucleic acid sequence that is identical or homologous to the nucleotide sequence of a transcribed gene or the complement of that mRNA sequence. Such a nucleic acid sequence represents a single mRNA sequence of a single transcribed gene and is able to discriminate the mRNA sequence from the single transcribed gene relative to other mRNA sequences from other transcribed genes.
[0076] In some embodiments, nucleic acid sequences immobilized on an array surface can comprise sequence(s) corresponding to specific transcribed genes. In some
embodiments, at least some of the nucleic acid sequences comprise sequences identical or homologous to at least part of a transcribed gene described in FIGS. 5 and 6, and SEQ ID NOS: 1 -197.
[0077] In some embodiments, the nucleotides attached to the arrays and/or microarrays can be referred to as probes. A target gene may be represented by one or more probes. In some embodiments a probe is a single stranded nucleotide attached to a substrate surface of an array or a microarray. In some embodiments, a single probe may reside at one or more positions or spots on an array. In some embodiments, each spot or position on an array may comprise one or more copies of the same probe. The probes may be arranged on the microarray substrate in a single density, or in varying densities. The density of each of the probes can be varied to accommodate certain factors such as, for example, the nature of the test sample, the nature of a label used during hybridization, the type of substrate used, and the like. Techniques capable of producing high density arrays can also be used for this purpose {see, e.g., Fodor (1991 ) Science 767-773; Johnston (1998) Curr. Biol. 8: Rl 71 -Rl 74; Schummer (1997) Biotechniques 23: 1087-1092; Kern (1997) Biotechniques 23: 120- 124; and U.S. Patent No. 5,143,854).
[0078] The sequence of the probes can be varied. In various embodiments, the probe sequence can be varied to produce probes that are substantially identical to the nucleic acid sequences described in FIGS. 5 and 6, and SEQ ID NOS:1 -197. In some embodiments, probes hybridize specifically to a nucleotide sequence from a transcribed gene, from which the probe sequence was derived. In some embodiments, the probe may not be identical to the transcribed gene from which it was derived, but retains the ability to hybridize to a nucleic acid sequence from that gene.
[0079] The length, sequence, and complexity of the nucleic acid probes may be varied. In various embodiments, the length, sequence and complexity are varied to provide optimum hybridization and signal production for a given hybridization procedure, and to provide the required resolution among different genes or genomic locations.
[0080] In some embodiments, single stranded nucleotides derived from a tissue sample can hybridize to probes on an array. Hybridization, as defined herein, can refer to binding of two single stranded nucleic acids via complementary base pairing. In some embodiments the two nucleic acid sequences may be DNA sequences, RNA sequences, or a combination of both.
Samples
[0081] In some embodiments, samples are obtained from various tissues to be analyzed for gene expression of one or more diagnostic or reference genes or transcribed sequences. Samples can refer to both test samples (i.e. samples from subjects/patients with lung disease), or reference samples (i.e. samples from subjects/patients without apparent lung disease).
[0082] A reference sample is a tissue sample that serves as a basis for comparison to a test sample and thus may be used to compare expression levels. A reference sample therefore represents a non-diseased state. In some embodiments the reference sample is obtained from lung tissue of a healthy patient. In some embodiments, the reference sample is obtained from lung tissue of a patient who has a disease that is not an IIP, or an IPF- related disease. In some embodiments, the reference sample is analyzed with, and its expression profile compared to, a sample from a patient having lung disease.
[0083] In some embodiments, lung tissue may be obtained from a patient having IPF. In some embodiments, RNA is purified from the lung tissue of a patient with IPF to create a test sample. In some embodiments, this purified RNA comprises expressed sequences of cells within the lung tissue. In some embodiments, this test sample can comprise a full compliment of expressed mRNA molecules from the tissue sample. In various aspects, the present disclosure is directed to the detection of the expression level of certain differentially regulated proteins and/or mRNA molecules from diagnostic genes in one or more test samples.
[0084] In some embodiments, RNA purified from a sample is reverse transcribed to produce complementary DNA, also known as cDNA, to create a test sample. A cDNA sequence generally refers to a "complementary DNA" sequence of an expressed mRNA. In some embodiments, the cDNA can be labeled during reverse transcription. cDNAs may, in some embodiments, be created by PCR (Polymerase Chain Reaction) amplification of a library of expressed sequences of mRNA, for example an mRNA library purified from a tissue sample. The sequence of the complementary DNA is able to hybridize to the sequence of an expressed gene, or mRNA. In some embodiments, labeled cDNA is hybridized to probes on an array. The intensity of label at a given spot on an array correlates with the expression level of the transcribed gene represented by that probe.
[0085] In some embodiments, RNA (either total RNA or poly A+RNA) can be isolated from lung cells or lung tissues of interest, including for example lung tissue from a diseased patient and/or lung tissue from a non-diseased patient, and is reverse transcribed to yield cDNA. In some embodiments, the DNA or RNA can be labeled during reverse transcription by incorporating a labeled nucleotide in the reaction mixture. Although various labels can be used, most commonly the nucleotide is conjugated with the fluorescent dyes Cy3 or Cy5. In some embodiments, two different samples can be analyzed simultaneously, for example by labeling one with Cy5-dUTP and the other with Cy3-dUTP, and hybridizing similar amounts of labeled DNA or RNA from each sample to the array. In a two-sample array experiment, the primary data (obtained by scanning the array using a detector capable of quantitatively detecting fluorescence intensity) can be a ratio of fluorescence intensity (Cy3/Cy5). These ratios represent the relative concentrations of cDNA molecules that hybridized to the cDNAs represented on the array and thus reflect the relative expression levels of the mRNA corresponding to each cDNA/gene represented on the array.
Differential Expression
[0086] In various embodiments, a subject having I PF may exhibit differential expression of one or more genes having the nucleic acid sequences of genes in FIGS. 5 and 6, and
SEQ ID NOS:1 -197. A nucleic acid sequence may exhibit differential expression at the RNA level if its RNA transcript varies in abundance relative to a reference transcript. A gene exhibits differential expression at the protein level, if a polypeptide encoded by the gene varies in abundance between different samples in a sample set. In the context of a microarray experiment, differential expression generally refers to differential expression at the RNA level.
[0087] In some embodiments, the expression level or transcript abundance of a diagnostic gene in a sample from a patient with IPF subtype II may be greater than the transcript abundance of that same gene in a subject or patient without IPF. In some embodiments, the expression level or transcript abundance of a diagnostic gene in a sample from a patient with IPF subtype II may be greater than the transcript abundance of that same gene in a subject or patient with IPF subtype I . In some embodiments, the expression level or transcript abundance of a diagnostic gene in a sample from a patient with I PF subtype I may be less than the transcript abundance of that same gene in a subject or patient without IPF. In some embodiments, the expression level or transcript abundance of a diagnostic gene in a sample from a patient with IPF subtype I may be less than the transcript abundance of that same gene in a subject or patient with IPF subtype II.
[0088] In some embodiments, in order to determine the subtype of a patient with IPF, the expression level of one or more expressed sequences in a sample from that patient may be determined. In some aspects, the expression level of one or more diagnostic genes is determined relative to the amount of total RNA. In some aspects, the expression level of one or more diagnostic genes is determined relative to the expression level of one or more reference genes in that same patient or from a sample from a patient known to be healthy. In some aspects, the expression level of one or more diagnostic genes is determined relative to the expression level of that same expressed sequence in a sample from a patient known to be healthy. In some embodiments, in order to determine the subtype of a patient with IPF, the expression level of one or more expressed sequences in a sample from that patient may be compared to the expression level of that same expressed sequence in a sample from a patient known to have IPF subtype I or subtype II.
[0089] In some embodiments, a patient may be diagnosed with IPF subtype I or II by comparing the expression level or transcript abundance of one or more expressed sequences of that patient to a patient that is healthy. In some embodiments, a patient may be diagnosed with IPF subtype I or II by comparing the expression level or transcript abundance of one or more expressed sequences of that patient to a patient known to have IPF subtype I or subtype II. In some embodiments the expression levels of one or more cilium-associated genes will be greater in a patient with IPF subtype II than a healthy patient. In some embodiments the expression levels of one or more cilium-associated genes will be greater in a patient with IPF subtype II than a patient with IPF subtype I. In some embodiments the expression levels of one or more cilium-associated genes will be less in a patient with IPF subtype I than a healthy patient. In some embodiments the expression levels of one or more cilium-associated genes will be less in a patient with IPF subtype I than a patient with IPF subtype II.
[0090] Differential expression, in some embodiments, may correlate with diagnostic information. Diagnostic information, or information for use in diagnosis, can be information that is useful in determining whether a patient has a UIP and/or in classifying IPF into a phenotypic category of IPF subtype-l or IPF subtype-ll, or any category having significance with regards to the prognosis of or likely response to treatment (either treatment in general or any particular treatment) of IPF. Similarly, diagnosis refers to providing any type of diagnostic information, including, but not limited to, whether a subject is likely to have an indication associated with IPF, information related to the nature or classification of IPF, information related to prognosis and/or information useful in selecting an appropriate treatment for IPF. Selection of treatment for IPF may include the choice of a particular chemotherapeutic agent or other treatment modality such as surgery, lung transplantation, a choice about whether to withhold or deliver therapy, etc.
[0091] In various aspects, the present disclosure encompasses the realization that genes that are differentially expressed are of use in classifying IPF subtypes. In some embodiments, the differentially expressed genes can be responsible for the different phenotypic characteristics and/or indicators of clinical outcomes of IPF subtypes. The present disclosure identifies such genes. In general, when a gene is differentially expressed, the transcript abundance of that gene varies between different samples, e.g., between different IPF samples and/or between normal and IPF subtypes. In some embodiments, the transcript level of a differentially expressed gene varies by at least about 2-fold, 3-fold, 4- fold, 5-fold, 6-fold, 7-fold, 8-fold, or 9-fold from its average abundance in a given sample. For example, a given gene may be expressed more than about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, or 10-fold, and/or less than about 10-fold, 9-fold, 8-fold, 7-fold, 6- fold, 5-fold, 4-fold, 3-fold, or 2-fold. In some embodiments, the variation may be less than 2- fold, for example expression may be greater than about 1 .2-fold, 1 .3-fold, 1 .4-fold, 1 .5-fold, 1 .6-fold, 1 .7-fold, 1 .8-fold, or 1 .9-fold, and/or less than about 2.0-fold, 1 .9-fold, 1 .8-fold, 1 .7- fold, 1 .6-fold, 1 .5-fold, 1 .4-fold, or 1 .3-fold. In some embodiments, more than one gene can be differentially expressed, and the levels of differential expression may be the same or different. For example gene 1 may be 4-fold more abundant and gene 2 may be 3-fold more abundant in an IPF sample compared to a non-IPF lung sample. In some embodiments, the differential expression of a gene may be the same in two IPF samples or it may be different. In some embodiments, the differential expression of an expressed gene in lung samples from different subjects, or patients having the same IPF subtypes may be the same, similar, or different. In some embodiments, an array comprising nucleotide sequences as in FIGS 5 and 6, and SEQ I D NOS:1 -197, can be referred to as gene expression systems, such systems can include a system, device or means to detect gene expression, diagnostic agents, candidate libraries, and oligonucleotides, oligonucleotide sets, or probe sets.
[0092] In some aspects, the use of about 300ng of processed total RNA prepared from a lung sample from a patient with I PF subtype II may result in a diagnostic gene having an expression level measured as a mean log(2) intensity on an Affymetrix gene chip of greater than 3.0, 3.1 , 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1 , 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9,
5.0, 5.1 , 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6.0, 6.1 , 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7.0,
7.1 , 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8.0, 8.1 , or 8.2 and/or less than about 8.3, 8.2, 8.1 , 8.0, 7.9, 7.8, 7.7, 7.6, 7.5, 7.4, 7.3, 7.2, 7.1 , 7.0, 6.9, 6.8, 6.7, 6.6, 6.5, 6.4, 6.3, 6.2, 6.1 , 6.0,
5.9, 5.8, 5.7, 5.6, 5.5, 5.4, 5.3, 5.2, 5.1 , 5.0, 4.9, 4.8, 4.7, 4.6, 4.5, 4.4, 4.3, 4.2, 4.1 , 4.0, 3.9, 3.8, 3.7, 3.6, 3.5, 3.4, 3.3, 3.2, 3.1 , 3.0. In some aspects, a diagnostic gene in a lung sample from a patient with IPF subtype II may have mean log(2) intensity on an Affymetrix gene chip of between about 5 and about 9. In some aspects, a diagnostic gene in a lung sample from a patient with IPF subtype II may have mean log(2) intensity on an Affymetrix gene chip of between 5.9 and 8.3. In some aspects, a diagnostic gene in a lung sample from a patient with IPF subtype II may have mean log(2) intensity on an Affymetrix gene chip of between 5.99 (wherein the standard deviation is about 0.57 and the standard error of the mean is about 0.08) and 8.23 (wherein the standard deviation is about 1 .23 and the standard error of the mean is about 0.15).
[0093] In various aspects, the disclosure relates to diagnostic genes, diagnostic expressed sequences and diagnostic oligonucleotides comprising sequences of the genes and expressed sequences in FIGS. 5 and 6, SEQ ID NOS:1 -197, and of Clusters A and B. The genes and expressed sequences in FIGS. 5 and 6, SEQ ID NOS: 1 -197, Clusters A and B represent nucleotide sequences, which when differentially expressed, can correlate with one of two IPF subtypes, as disclosed herein. In some embodiments, identification of the IPF subtype to which a sample belongs can require analysis of only one gene or expressed sequence in FIGS. 5 and 6, Clusters A and B, and SEQ I D NOS: 1 -197. In some
embodiments, identification of the IPF subtype to which a sample belongs can require analysis of a plurality of the genes or expressed sequences in FIGS. 5 and 6, Clusters A and B, and SEQ I D NOS: 1 -197. In some embodiments, identification of the I PF subtype to which a sample belongs can require analysis of all of the genes or expressed sequences in FIGS. 5 and 6, Clusters A and B, and SEQ ID NOS: 1 -197.
[0094] In some embodiments, the I PF subtype can be identified by any means capable of detecting expression levels of RNA and/or the presence of specific protein products coded for by those genes or expressed sequences.
[0095] In some embodiments, gene expression profiling of tissue samples from a number of IPF subjects/patients can allow identification of novel molecular subcategories or subtypes. These subtypes and subcatagories can allow development of novel methods to diagnose and classify these complex diseases. In some embodiments, identification of subtypes and subcatagories can aid in more predictive diagnosis or identification of clinically meaningful endpoints.
Determination of mRNA Expression Levels
[0096] In some embodiments, the expression level of the mRNA sequences obtained from a patient having I PF and/or the expression level of mRNA sequences of a healthy patient or a patient with a known IPF subtype may be measured via array-based
comparative genomic hybridization. Array comparative genomic hybridization (aCGH) is a technique that is used to detect copy number variations of nucleic acids at a higher level of resolution than chromosome-based comparative genomic hybridization. In aCGH, nucleic acids from a test sample and nucleic acids from a reference sample are labeled differentially. The test sample and the reference sample are then hybridized to an array comprising a plurality of probes, which are derived from sequences of interest. The differential labeling is then used to visualize the hybridized nucleic acids from the test and reference samples. The ratio of the signal intensity of the test sample to that of the reference sample is then calculated, to measure the copy number changes between the test sample and the reference sample. The difference in the signal ratio determines whether the total copy numbers of the nucleic acids in the test sample are increased or decreased, as compared to the reference sample. The test sample and the reference sample may be hybridized to the array separately or they may be mixed together and hybridized simultaneously. Exemplary methods of performing aCGH can be found, for example, in U.S. Patent Nos. 5,635,351 ; 5,665,549; 5,721 ,098; 5,830,645; 5,856,097; 5,965,362; 5,976,790; 6,159,685; 6,197,501 ; and 6,335,167; European Patent Nos. EP 1 134 293 and EP 1 026 260; van Beers et al., Brit. J. Cancer (2006), 20; Joosse et al., BMC Cancer (2007), 7:43; Pinkel et al., Nat. Genet. (1998), 20: 207-21 1 ; Pollack et al., Nat. Genet. (1999), 23: 41 -46; and Cooper, Breast Cancer Res. (2001 ), 3: 158-175.
[0097] Information relating to the expression level of the mRNA sequences present in a sample can include, for example, an increase in expression level in one or more mRNA molecules, a decrease in expression level in one or more mRNA molecules, and/or no change in the expression level of one or more mRNA molecules. In some aspects, this information is obtained by analyzing the difference in signal intensity between a test sample and a reference or control sample at one or more corresponding locations on the array representing one or more nucleic acid sequences of interest. The analysis can be performed using any of a variety of methods, means and variations thereof for carrying out array-based comparative genomic hybridization. In some aspects, information relating to the expression level of mRNA sequences is obtained by determining the abundance of the expressed sequence(s) relative to the amount of input RNA or cDNA.
[0098] In various aspects, the test sample and the reference sample are mRNA. The mRNA molecules comprising the test samples and the reference samples may be obtained by any suitable method of nucleic acid isolation and/or extraction. Methods of mRNA extraction are well known in the art and several kits for the extraction and purification of mRNA from tissue samples are commercially available from, e.g., Clontech (Mountain View, CA), Qiagen (Valencia, CA) and Life Technologies/lnvitrogen (Carlsbad, CA), among others.
[0099] The test samples and the reference samples may be differentially labeled with any detectable agents or moieties. In various embodiments, the detectable agents or moieties are selected such that they generate signals that can be readily measured and such that the intensity of the signals is proportional to the amount of labeled nucleic acids present in the sample. In various embodiments, the detectable agents or moieties are selected such that they generate localized signals, thereby allowing resolution of the signals from each spot on an array.
[00100] Methods for labeling nucleic acids are well-known in the art. For exemplary reviews of labeling protocols, label detection techniques and recent developments in the field, see: Kricka, Ann. Clin. Biochem. (2002), 39: 1 14-129; van Gijlswijk et ai, Expert Rev. Mol. Diagn. (2001 ), 1 : 81 -91 ; and Joos et al., J. Biotechnol. (1994), 35: 135-153. Standard nucleic acid labeling methods include: incorporation of radioactive agents, direct attachment of fluorescent dyes or of enzymes, chemical modification of nucleic acids to make them detectable immunochemically or by other affinity reactions, and enzyme-mediated labeling methods including, without limitation, random priming, nick translation, PCR and tailing with terminal transferase. Other suitable labeling methods include psoralen-biotin, photoreactive azido derivatives, and DNA alkylating agents. In various embodiments, test sample and reference sample nucleic acids are labelled by Universal Linkage System, which is based on the reaction of monoreactive cisplatin derivatives with the N7 position of guanine moieties in DNA (see, e.g., Heetebrij et al., Cytogenet. Cell. Genet. (1999), 87: 47-52).
[00101] Any of a wide variety of detectable agents or moieties can be used to label test and/or reference samples. Suitable detectable agents or moieties include, but are not limited to: various ligands; radionuclides such as, for example, 32P, 35S, 3H, 14C, 125l, 1311, and others; fluorescent dyes; chemiluminescent agents such as, for example, acridinium esters, stabilized dioxetanes, and others; microparticles such as, for example, quantum dots, nanocrystals, phosphors and others; enzymes such as, for example, those used in an ELISA, horseradish peroxidase, beta-galactosidase, luciferase, alkaline phosphatase and others; colorimetric labels such as, for example, dyes, colloidal gold and others; magnetic labels such as, for example, Dynabeads™; and biotin, dioxigenin or other haptens and proteins for which antisera or monoclonal antibodies are available.
[00102] In some embodiments, the test samples and/or the reference samples are labelled with fluorescent dyes. Suitable fluorescent dyes include, without limitation, Cy-3, Cy-5, Texas red, FITC, Spectrum Red, Spectrum Green, phycoerythrin, rhodamine, and fluorescein, as well as equivalents, analogues and/or derivatives thereof. In some embodiments, the fluorescent dyes selected display a high molar absorption coefficient, high fluorescence quantum yield, and photostability. In some embodiments, the fluorescent dyes exhibit absorption and emission wavelengths in the visible spectrum (i.e., between 400nm and 750nm) rather than in the ultraviolet range of the spectrum (i.e., lower than 400nm). In some embodiments, the fluorescent dyes are Cy-3 (3-N,N'-diethyltetramethylindo- dicarbocyanine) and Cy-5 (5-N,N'-diethyltetramethylindo-dicarbocyanine). Cy-3 and Cy-5 form a matched pair of fluorescent labels that are compatible with most fluorescence detection systems for array-based instruments. In some embodiments, the fluorescent dyes are Spectrum Red and Spectrum Green.
[00103] Exemplary hybridization and wash protocols are described, for example, in Sambrook et al. (2001 ), supra; Tijssen (1993), supra; and Anderson (Ed.), "Nucleic Acid Hybridization" (1999), Springer Verlag: New York, N.Y. In some embodiments, the hybridization protocols used for aCGH are those of Pinkel et al., Nature Genetics (1998), 20:207-21 1 . In some embodiments, the hybridization protocols used for aCGH are those of Kallioniemi, Proc. Natl. Acad. Sci. USA (1992), 89:5321 -5325.
[00104] Methods of optimizing hybridization conditions are well known in the art (see, e.g., Tijssen, (1993), supra). To create competitive hybridization conditions, the array may be contacted simultaneously with differentially labeled mRNA sequences of the test sample and the reference sample. This may be done by, for example, mixing the labeled test sample and the labeled reference sample together to form a hybridization mixture, and contacting the array with the mixture.
[00105] In some embodiments, the specificity of hybridization may be enhanced by inhibiting repetitive sequences. In some embodiments, repetitive sequences (e.g., Alu sequences, L1 sequences, satellite sequences, MRE sequences, simple homo-nucleotide tracts, and/or simple oligonucleotide tracts) present in the nucleic acids of the test sample, reference sample and/or probes immobilized on the array are either removed, or their hybridization capacity is disabled. Removing repetitive sequences or disabling their hybridization capacity can be accomplished using any of a variety of well-known methods. These methods include, but are not limited to, removing repetitive sequences by
hybridization to specific nucleic acid sequences immobilized to a solid support (see, e.g., Brison et al., Mol. Cell. Biol. (1982), 2: 578- 587) ; suppressing the production of repetitive sequences by PCR amplification using adequately designed PCR primers; inhibiting the hybridization capacity of highly repeated sequences by self-reassociation (see, e.g., Britten et al., Methods of Enzymology (1974), 29: 363-418) ; or removing repetitive sequences using hydroxyapatite which is commercially available from a number of sources including, for example, Bio-Rad Laboratories, Richmond, VA. In some embodiments, the hybridization capacity of highly repeated sequences in a test sample and/or in a reference sample is competitively inhibited by including, in the hybridization mixture, unlabelled blocking nucleic acids. The unlabelled blocking nucleic acids are therefore mixed with the hybridization mixture, and thus with a test sample and a reference sample, before the mixture is contacted with an array. The unlabelled blocking nucleic acids act as a competitor for the highly repeated sequences and bind to them before the hybridization mixture is contacted with an array. Therefore, the unlabelled blocking nucleic acids prevent labelled repetitive sequences from binding to any highly repetitive sequences of the nucleic acid probes, thus decreasing the amount of background signal present in a given hybridization. In some embodiments, the unlabelled blocking nucleic acids are Human Cot-1 DNA. Human Cot-1 DNA is commercially available from a number of sources including, for example, Gibco/BRL Life Technologies (Gaithersburg, MD).
[00106] Once hybridization is complete, the ratio of the signal intensity of the test sample as compared to the signal intensity of the reference sample is calculated. This calculation quantifies the differential level of expression of the mRNA molecules of the test sample, as compared to the reference sample, if any. In some embodiments, this calculation is carried out quantitatively or semi-quantitatively. In certain embodiments, it is not necessary to determine the exact number associated with differential expression of the mRNA molecules comprising the test sample and the reference sample, as detection of a significant increase or decrease in expression level from the expression level in the reference sample is sufficient. Therefore, in several embodiments the quantification of the expression levels of the mRNA molecules of a test sample comprises an estimation of the level of expression, as a semi-quantitative or relative measure, which usually suffices to predict the I PF subtype a patient has and thus prospectively direct the determination of therapy for that patient.
[00107] Quantitative techniques may be used to determine the expression level of the mRNA molecules present in a test sample and/or in a reference sample. Several quantitative and semi-quantitative techniques to determine expression levels exist including, for example, semi-quantitative PCR analysis or quantitative real-time PCR. The Polymerase Chain Reaction (PCR) per se is not a quantitative technique, however PCR-based methods have been developed that are quantitative or semi-quantitative in that they give a reasonable estimate of original copy numbers of nucleic acids present in a tissue sample (i.e., expression level of mRNA), within certain limits. Examples of such PCR techniques include, for example, quantitative PCR and quantitative real-time PCR (also known as RT-PCR, RQ- PCR, QRT-PCR or RTQ-PCR). In addition, many techniques exist that give estimates of relative copy numbers, as calculated relative to a reference. Such techniques include many array-based techniques. Absolute copy number estimates may be obtained by in situ hybridization techniques such as, for example, fluorescence in situ hybridization or chromogenic in situ hybridization.
[00108] Fluorescence in situ hybridization permits the analysis of the expression level of individual mRNA molecules and can be used to study the expression level of individual mRNA molecules across tissue samples obtained from different donor sources (see, e.g., Pinkel et al., Proc. Natl. Acad. Sci. U.S.A. (1988), 85, 9138-42). Comparative genomic hybridization can also be used to probe for mRNA expression levels (see, e.g., Kallioniemi et a/., Science (1992), 258: 818-21 ; and Houldsworth et al., Am. J. Pathol. (1994), 145: 1253- 60).
[00109] The expression level of mRNA molecules of interest may also be determined using quantitative PCR techniques such as real-time PCR (see, e.g., Suzuki et al., Cancer Res. (2000), 60:5405-9). For example, quantitative microsatellite analysis can be performed for rapid measurement of relative mRNA sequence copy numbers. In quantitative microsatellite analysis, the copy numbers of a test sample relative to a reference sample is assessed using quantitative, real-time PCR amplification of loci carrying simple sequence repeats. Simple sequence repeats are used because of the large numbers that have been precisely mapped in numerous organisms. Exemplary protocols for quantitative PCR are provided in Innis et al., PCR Protocols, A Guide to Methods and Applications (1990), Academic Press, Inc. N.Y. Semi -quantitative techniques that may be used to determine specific copy numbers include, for example, multiplex ligation-dependent probe amplification (see, e.g., Schouten et al. Nucleic Acids Res. (2002), 30(12):e57; and Sellner et al., Human Mutation (2004), 23(5):413-419) and multiplex amplification and probe hybridization (see, e.g., Sellner et al. (2004), supra).
Differential Expression of Protein
[00110] As shown in Example 1 of this disclosure, the present inventors have revealed that IPF exists in two distinct molecular phenotypes or subtypes, which have previously presented clinically the same. Additionally, the present inventors have shown that the two subtypes differ in their expression of cilium genes, with IPF subtype-l displaying decreased levels of cilium gene expression as compared to normal, non-diseased lung tissue and/or lung tissue from IPF subtype-ll samples. Additionally, the present inventors have revealed that patients with IPF subtype-l have a significantly longer survival rate than IPF subtype-ll, making the determination of subtype in IPF patients ideally suited as predictive measures of survival in IPF patients.
[00111] In various aspects, the present disclosure is based on the discovery that determining the expression levels of certain genes, and the proteins translated therefrom, can be used to identify IPF subtype in patients with IPF. Such genes, and their
corresponding protein products, can also be used to predict the life expectancy of IPF patients and whether such patients will benefit from medical treatment.
[00112] In some embodiments, expressed protein signatures are created. Protein signatures may comprise the identity of a plurality of proteins. In some embodiments, protein signatures can be created from a tissue sample of a diseased patient, thus creating a test sample. In some embodiments, protein signatures can be created from a tissue sample of a non-diseased or healthy patient, thus creating a reference sample. In some
embodiments, a protein signature reference sample can be obtained from a tissue sample of a patient known to have IPF subtype-l. In some embodiments, a protein signature reference sample can be obtained from a tissue sample of a patient known to have IPF subtype-ll.
[00113] In some embodiments, protein signatures can be created from any one or more of the translation products of the genes selected from the group consisting of: ABCA13 (Gene Symbol), NM_152701 (Accession Number); ADAM28, NM_014265; ADH7,
NM_000673; AGR2, NM_006408; AGR3, NMJ 76813; ALOX15, NM_001 140; ANKRD18B, ENST00000290943; C10orf81 , NM_001 193434; C12orf75, NM_001 145199; C1 orf 1 10, NM_178550; C20orf 1 14, NM_033197; C7orf63, NM_001039706; CAPN13, NM_144575; CD24, NM_013230; CDH3, NM_001793; CHST9, NM_031422; CKMT1 A, NM_001015001 ; CKMT1 A, NM_001015001 ; CLCA2, NM_006536; CLDN1 , NM_021 101 ; CLIC6, NM_053277; CNTN3, NM_020872; COL17A1 , NM_000494; CP, NM_000096; CRISPLD1 , NM_031461 ; CXCL13, NM_006419; CYP24A1 , NM 000782; DNAH5, NM_001369; DNAJA4,
NM_018602; DSC3, NM_024423; FAT2, NM_001447; FGF14, NM_175929; GOLM1 ,
NM_016548; GPR1 10, NM_153840; GSTA1 , NM_145740; HHLA2, NM_007072; HSPA4L, NM_014278; ITGB8, NM_002214; KIAA1324, NM_020775; KIAA1377, NM_020802;
KLHL13, NM_033495; KRT15, NM_002275; KRT17, NM_000422; KRT5, NM_000424; KRT6C, NM_173086; LCN2, NM_005564; MAPI A, NM_002373; MMP1 , NM 002421 ;
MMP13, NM_002427; MMP7, NM_002423; MSMB, NM_002443; MUC16, NM_024690; MUC4, NM_018406; MUC5B, NM_002458; PFN2, NM_053024; PIP, NM_002652;
PLEKHG7, NM_001004330; PLUNC, NM_130852; PROM1 , NM 006017; PRSS12, NM_003619; SCGB1 A1 , NM_003357; SCGB3A1 , NM 052863; SCGB3A2, NM_054023; SERPINB3, NM_006919; SERPINB5, NM_002639; SIX1 , NM_005982; SIX4, NM_017420; SLC27A2, NM_003645; SLC44A4, NM_025257; SLC44A4, NM_025257; SLC44A4, NM_025257; SLITRK6, NM_032229; SOX2, NM_003106; SPP1 , NM_001040058;
ST6GALNAC1 , NM_018414; TMEM45A, NM_018004; TMPRSS4, NM_019894; TP63, NM_003722; TPPP3, NM_016140; TRIM2, NM_015271 ; TRIM29, NM_012101 ; UGT1 A1 , NM_000463; VTCN1 , NM_024626, and combinations thereof.
[00114] In some embodiments, protein signatures can be created from any one or more of the translation products of the genes selected from the group consisting of: AGBL2 (Gene Symbol), NM_024783 (Accession Number); ARMC3, NM_173081 ; ARMC4, NM_018076; C10orf107, NM_173554; C10orf79, NM_025145; C1 1orf70, NM_032930; C1 1 orf88, NM_207430; C12orf55, ENST00000298953; C12orf63, ENST00000342887; C13orf30, NM_182508; C1 orf 129, NM_025063; C1 orf 173, NM_001002912; C1 orf 192,
NM_001013625; C1 orf 194, NM_001 122961 ; C1 orf 87, NM_152377; C20orf26, NM_015585; C20orf85, NM_178456; C2orf39, NM_145038; C2orf77, NM_001085447; C3orf15,
NM_033364; C6, NM_001 1 15131 ; C6orf 103, NM_024694; C6orf118, NM_144980;
C6orf165, NM_001031743; C6orf97, NM_025059; C9orf135, NM_001010940; CAPS, NM_004058; CAPSL, NM_144647; CASC1 , N M 018272; CCDC1 1 , NM_145020;
CCDC1 13, NM_014157; CCDC146, NM_020879; CCDC39, NM_181426; CCDC60,
NM_178499; CDHR3, NM_152750; CERKL, NM_201548; CXorf22, NM_152632; CXorf59, NM_173695; DNAH10, NM_207437; DNAH10, NM_207437; DNAH1 1 , NM 003777;
DNAH12, NM_178504; DNAH12, NM_178504; DNAH12, NM_198564; DNAH3,
NM_017539; DNAH5, NM_001369; DNAH6, NM_001370; DNAH6, NM_001370; DNAH6, NM_001370; DNAH6, NM_001370; DNAH7, NM_018897; DNAH9, NM_001372; DNAI1 , NM_012144; DNAJA4, NM_018602; DPY19L2P2, NR_027768; DTHD1 , NM_001 136536; DZIP3, NM_014648; EFCAB1 , NM 024593; EFHB, NMJ44715; EFHC1 , NR_033327; EFHC2, NM_025184; ENKUR, NM_145010; FAM154B, AK304339; F AM 183 A,
NM_001 101376; FAM81 B, NM_152548; FANK1 , NM_145235; HYDIN, NM 032821 ; HYDIN, NM_032821 ; HYDIN, NM_032821 ; HYDIN, NM_032821 ; IL5RA, NM_000564; IQUB,
NM_178827; KCNRG, NM_173605; LOC646851 , NM_001013647; LRRC46, NM_033413; LRRC48, NM_001 130090; LRRC50, NM_178452; LRRIQ1 , NM 032165; MDH1 B,
NM_001039845; MNS1 , NM 018365; M0RN5, NM_198469; MS4A8B, NM_031457;
NEK10, NM_199347; NEK1 1 , NM_024800; NEK5, NM_199289; NME5, NM_003551 ;
PACRG, NM_152410; PIH1 D2, NM_138789; PLEKHG7, NM_001004330; PTRH1 ,
ENST00000419060; RGS22, NM_015668; RINT1 , NM 021930; R0PN1 L, NM_031916; RP1 , NM_006269; RPGRIP1 L, NM_015272; RSPH1 , NM_080860; RSPH10B, NM_173565; RSPH10B, NM_173565; RSPH4A, NM_001010892; SERPINI2, NM_006217; SNTN, NM_001080537; SPA17, NM_017425; SPAG17, NM_206996; SPAG6, NM_012443;
SPATA17, NM_138796; SPATA18, NM_145263; STK33, NM_030906; ST0ML3,
NM_145286; ST0X1 , NM_152709; TEKT1 , NM 053285; TEX9, NM_198524; TMEM212, NM_001 164436; TMEM232, NM_001039763; TSGA10, NM_025244; TSPAN1 ,
NM_005727; TTC18, NMJ 45170; TTC25, NM_031421 ; UBXN10, NM_152376; VWA3A, NM 173615; WVA3B, NM 144992; WDR16, NM_145054; WDR49, NM_178824; WDR63, NM_145172; WDR65, NR_030778; WDR66, NM_144668; WDR69, NM_178821 ; WDR78, NM_024763; YSK4, NM_025052; ZBBX, NM_024687, and combinations thereof.
Determination of Protein Expression Levels
[00115] The expression levels of the translation products of the differentially expressed genes comprising the protein signatures of the present disclosure may be readily
determined. The expression levels of reference samples may be compared to the expression levels of a test sample of proteins obtained from a patient. Therefore, the expression levels of the proteins comprising any of the protein signatures disclosed herein (reference samples) can be compared to the expression level of the same proteins obtained from a tissue sample from a patient (test sample). In some aspects, the reference protein signature may comprise expressed sequences whose protein levels are not changed. Thus, a reference signature may be determined from the same sample as the test signature.
[00116] In some embodiments, a patient having similar protein expression levels as compared to the reference sample identifies the patient as having I PF subtype-1 . In some embodiments, the expression level of a test sample of proteins obtained from a patient and the expression level of a reference sample protein signature disclosed herein identifies the patient as having IPF subtype-l. In some embodiments, a decrease in the expression level of a test sample of proteins obtained from a patient and the expression level of a reference sample protein signature disclosed herein identifies the patient as having IPF subtype-l . In some embodiments, an increase in the expression level of a test sample of proteins obtained from a patient and the expression level of a reference sample protein signature disclosed herein identifies the patient as having IPF subtype-ll. In some embodiments, the reference sample comprises proteins obtained from a patient known to have I PF subtype-l and an increase in the expression level of a test sample of proteins obtained from a patient as compared to the expression level of the reference sample identifies the patient as having IPF subtype-ll. In some embodiments, the reference sample comprises proteins obtained from a patient known to have IPF subtype-ll and a decrease in the expression level of a test sample of proteins obtained from a patient as compared to the expression level of the reference sample identifies the patient as having I PF subtype-l. In some embodiments, the reference sample comprises proteins obtained from a non-diseased patient and a decrease in the expression level of a test sample of proteins obtained from a patient as compared to the expression level of the reference sample identifies the patient as having IPF subtype-l. In some embodiments, the reference sample comprises proteins obtained from a non-diseased patient and an increase in the expression level of a test sample of proteins obtained from a patient as compared to the expression level of the reference sample identifies the patient as having IPF subtype-ll.
[00117] In some embodiments, the degree of similarity or dissimilarity between the level of expression of the proteins comprising a test sample and the level of expression of the proteins comprising a reference sample is determined based on signal intensity, such as that derived from an assay (e.g., ELISA, see below). In certain embodiments, the ratio of the signal intensity of the proteins comprising a test sample, as compared to the signal intensity of the proteins comprising a reference sample is calculated. This calculation quantifies the differential level of expression of the proteins of the test sample, as compared to the reference sample, if any. In some embodiments, this calculation is carried out quantitatively or semi-quantitatively. In certain embodiments, it is not necessary to determine an exact number associated with the level of expression of the proteins comprising the test sample and the reference sample. In some embodiments, the reference sample comprises proteins taken from a non-diseased patient and detection of a statistically significant deviation
(increase or decrease) in the signal intensity produced by the proteins of the test sample, as compared to the signal produced by the proteins of the reference sample, is sufficient to diagnose a patient with IPF subtype-l or IPF subtype-ll. Therefore, in several embodiments the quantification of the expression levels of proteins of a test sample comprises an estimation of the level of expression, as a semi-quantitative or relative measure, that is sufficient to predict the IPF subtype for an individual patient (as compared to a reference sample) and thus prospectively direct the determination of therapy for a patient.
[00118] In various aspects, determination of a level of protein expression in a test sample that is less than that produced by the reference sample is indicative of IPF subtype-l in the patient from which the test sample was derived. In various aspects, determination of a level of protein expression in a test sample that is greater than that produced by the reference sample is indicative of IPF subtype-ll in the patient from which the test sample was derived. Therefore, in certain embodiments detection of signal intensity from a test sample that is less than (in the case of IPF subtype-l) or greater than (in the case of IPF subtype-ll), within experimentally acceptable margins of error, as the signal intensity produced by the reference sample is sufficient to determine the IPF subtype for a given patient.
[00119] In certain embodiments, the deviation of signal intensity of the test sample from the reference sample is measured as a percent difference. In certain embodiments, a reference sample is deemed to have produced a signal that is less than the reference sample if the signal intensity of the test sample measures at the level selected from: the signal intensity of the reference sample less 5%; the signal intensity of the reference sample less 10%; the signal intensity of the reference sample less 15%; the signal intensity of the reference sample less 20%; the signal intensity of the reference sample less 25%; the signal intensity of the reference sample less 30%; the signal intensity of the reference sample less 35%; the signal intensity of the reference sample less 40%; the signal intensity of the reference sample less 45%; the signal intensity of the reference sample less 50%; the signal intensity of the reference sample less 55%; the signal intensity of the reference sample less 60%; the signal intensity of the reference sample less 65%; the signal intensity of the reference sample less 70%; the signal intensity of the reference sample less 75%; the signal intensity of the reference sample less 80%; the signal intensity of the reference sample less 85%; the signal intensity of the reference sample less 90%; the signal intensity of the reference sample less 95%; and the signal intensity of the reference sample less 100%.
[00120] In certain embodiments, a reference sample is deemed to have produced a signal that is greater than the reference sample if the signal intensity of the test sample measures at the level selected from: the signal intensity of the reference sample plus 5%; the signal intensity of the reference sample plus 10%; the signal intensity of the reference sample plus 15%; the signal intensity of the reference sample plus 20%; the signal intensity of the reference sample plus 25%; the signal intensity of the reference sample plus 30%; the signal intensity of the reference sample plus 35%; the signal intensity of the reference sample plus 40%; the signal intensity of the reference sample plus 45%; the signal intensity of the reference sample plus 50%; the signal intensity of the reference sample plus 55%; the signal intensity of the reference sample plus 60%; the signal intensity of the reference sample plus 65%; the signal intensity of the reference sample plus 70%; the signal intensity of the reference sample plus 75%; the signal intensity of the reference sample plus 80%; the signal intensity of the reference sample plus 85%; the signal intensity of the reference sample plus 90%; the signal intensity of the reference sample plus 95%; and the signal intensity of the reference sample plus 100%.
[00121] In certain embodiments, the deviation of signal intensity of the test sample from the reference sample is measured as a -fold difference, or a difference based upon unit signal production. In certain embodiments, a reference sample is deemed to have produced a signal that is less than the reference sample if the signal intensity of the test sample is selected from : two-fold less than the signal intensity of the reference sample; three-fold less than the signal intensity of the reference sample; four-fold less than the signal intensity of the reference sample; five-fold less than the signal intensity of the reference sample; six-fold less than the signal intensity of the reference sample; seven-fold less than the signal intensity of the reference sample; eight-fold less than the signal intensity of the reference sample; nine-fold less than the signal intensity of the reference sample; ten-fold less than the signal intensity of the reference sample; and greater than ten-fold less than the signal intensity of the reference sample.
[00122] In certain embodiments, a reference sample is deemed to have produced a signal that is greater than the reference sample if the signal intensity of the test sample is selected from : two-fold more than the signal intensity of the reference sample; three-fold more than the signal intensity of the reference sample; four-fold more than the signal intensity of the reference sample; five-fold more than the signal intensity of the reference sample; six-fold more than the signal intensity of the reference sample; seven-fold more than the signal intensity of the reference sample; eight-fold more than the signal intensity of the reference sample; nine-fold more than the signal intensity of the reference sample; ten-fold more than the signal intensity of the reference sample; and greater than ten-fold more than the signal intensity of the reference sample.
[00123] In some embodiments, complete identity, within acceptable levels of experimental error, between the expression level of a test sample proteins obtained from a patient known to have either IPF subtype-l and the expression levels of any one or more of the reference sample protein signatures disclosed herein identifies the patient as having IPF subtype-l. In some embodiments, complete identity, within acceptable levels of experimental error, between the expression level of a test sample proteins obtained from a patient known to have either I PF subtype-l I and the expression levels of any one or more of the reference sample protein signatures disclosed herein identifies the patient as having IPF subtype-ll.
[00124] The expression level of any one or more of the translation products of the differentially regulated genes disclosed herein, and/or the expression levels of any one or more proteins isolated from a test sample, can be determined using any one or more of a number of techniques. In some embodiments, the expression levels can be determined using routine assays such as, for example, antibody-based methods such as
immunohistochemistry and enzyme-linked immunosorbent assay (ELISA), of which the latter allows for non-invasive testing. In some embodiments, the expression levels can be determined using targeted multiplex mass spectrometry as a means of quantifying protein signatures in tissue samples taken from a patient. In some embodiments, the expression levels can be determined using mass-spectrometry based proteomics technologies, which have matured to the extent that they can now identify and quantify thousands of proteins.
[00125] In some embodiments, protein expression levels can be determined via immunohistochemistry, which is a process capable of detecting proteins directly in the cells of a section of isolated and fixed tissue via the use of antibodies that bind specifically to the proteins of interest. Immunohistochemistry is a widely used technique to visualize the distribution and localization of differentially expressed proteins between two tissues. When using this technique, a tissue sample is taken from a subject and properly fixed (e.g., by heat fixation, perfusion, immersion or chemical fixation) to make the epitopes of the proteins of interest available for binding by the antibodies. In some embodiments, the tissue sample may be taken from the lung of a subject known to have either IPF subtype-l to create a reference sample. In some embodiments, the tissue sample may be taken from the lung of a subject known to have either IPF subtype-l to create a reference sample. In some embodiments, the tissue sample may be taken from a tumor in subject whose I PF subtype is unknown, to create a test sample. In some embodiments, the tissue samples are taken from corresponding tissues and corresponding regions within the tissues in order to create similar testing parameters between the reference and test samples. The proteins in the reference sample and the test sample can be analyzed in parallel or individually.
[00126] Detecting the protein(s) of interest in a reference sample or a test sample can be accomplished by contact with an antibody that is specifically directed to the protein(s) of interest. One or more antibodies may be used, depending on the number of proteins to be tested in a single reference or test sample. Detection via contact with an antibody may be done directly, whereby the antibody itself is coupled with a label that will allow for visualization of binding to the protein, or indirectly, where a second antibody that specifically binds to the first antibody is used, the second antibody having the label to allow for visualization. Visualizing an antibody-protein interaction can be accomplished in a number of ways. In the direct detection method, the antibody itself is conjugated to an agent that allows for visualization such as an enzyme (e.g., a peroxidase) that can catalyze a color- producing reaction, or a fluorophore (e.g., fluorescein or rhodamine) that fluoresces under certain conditions to visually display binding. In the indirect detection method, a second antibody is used that is conjugated to an agent that allows for visualization, such as an enzyme or a fluorophore. The level of differential expression of a protein between a reference sample and a test sample may be determined by measuring the difference in intensity of the visualization means employed. In that regard, in some embodiments the same means of visualization is utilized in both samples. If the signal produced by the reference sample is different from the signal produced in the test sample, then the protein of interest is present in different quantities in the samples, indicating differential expression. The means of visualizing the signal in each sample is linked to an antibody and each antibody will bind to a limited number of proteins in the sample. Therefore, the number of antibodies binding to proteins of a sample is directly proportional to the total number of proteins present in the sample and the strength of the signal produced by the antibody- protein interaction in a sample is directly proportional to the amount of protein present in the sample. The ratio of the signal intensity of the test sample to that of the reference sample is then calculated, to measure the protein expression levels between the test sample and the reference sample. The difference in the signal ratio determines whether the total level of protein expression of each protein in the test sample is increased or decreased, as compared to the reference sample. If the signal produced by the reference sample is the same (within acceptable levels of experimental error) as the signal produced in the test sample, then the protein of interest is present in approximately the same quantity in each sample.
[00127] In some embodiments, protein expression levels can be determined via enzyme- linked immunosorbent assay (ELISA), which is an analytic assay that utilizes a solid-phase enzyme immunoassay to detect the presence of a protein in an isolated sample. Typically the sample is in liquid form. In ELISA, an unknown amount of a sample is affixed to a substrate surface and an antibody is placed into contact with the substrate surface such that the antibody is also placed into contact with the sample. The antibody will bind to the sample provided that an antigen capable of being bound by the antibody is present in the sample. The antibody is typically linked to some means of visualizing binding, which in some embodiments is an enzyme, so that binding of the antibody to the sample can be detected. A substance that contains the enzyme's substrate is placed into contact with the surface, and thus the antibody, such that the subsequent enzymatic reaction produces a detectable signal. The signal may be a color change in the substrate or a fluorescent emission. In some embodiments, a protein sample isolated from a patient known to have IPF subtype-l is affixed to a substrate surface to create a reference sample. In some embodiments, a protein sample isolated from a patient known to have IPF subtype-ll is affixed to a substrate surface to create a reference sample. In some embodiments, a protein sample isolated from a patient whose IPF subtype is unknown is affixed to a substrate surface to create a test sample. In some embodiments, the proteins in each sample are the same (the reference sample protein is the same as the test sample protein). The substrate surface may contain more than one isolated area (e.g., wells) such that the reference sample protein and the test sample protein are each affixed in their own isolated area and also so that the same substrate surface may accommodate multiple proteins from the reference sample and the test sample. In some embodiments, the substrate surface is a microtiter plate.
[00128] At least one antibody having specificity for the protein in the reference and test samples is placed in contact with the protein in the reference and test samples so that it may bind to the protein. The proteins in the reference sample and the test sample can be analyzed in parallel or individually. The antibody can be covalently linked to an enzyme, or can itself be detected by a secondary antibody that is linked to an enzyme. The substrate of the enzyme is then placed in contact with each of the reference sample and the test sample to produce a visible signal, which indicates the quantity of protein in each sample. If the signal produced in the reference sample is different from the signal produced in the test sample, then the protein is present in different quantities in the samples, indicating differential expression. The means of visualizing the signal in each sample is linked to an antibody and each antibody will bind to a limited number of proteins in the sample.
Therefore, the number of antibodies binding to proteins of a sample is directly proportional to the total number of proteins present in the sample and the strength of the signal produced by the antibody-protein interaction in a sample is directly proportional to the amount of protein present in the sample. The ratio of the signal intensity of the test sample to that of the reference sample is then calculated, to measure the protein expression levels between the test sample and the reference sample. The difference in the signal ratio determines whether the total level of protein expression of each protein in the test sample is increased or decreased, as compared to the reference sample. If the signal in the reference sample is the same (within acceptable levels of experimental error) as the signal produced in the test sample, then the protein of interest is present in approximately the same quantity in each sample.
[00129] In some embodiments, protein expression levels can be determined via targeted multiplex mass spectrometry. For example, in some embodiments, liquid chromatography- tandem mass spectrophotometry (LC-MS/MS) can be used to determine the expression level of proteins isolated from a patient known to have IPF subtype-l or IPF subtype-ll, and thus create a reference sample. Similarly, LC-MS/MS can be used to determine the expression level of proteins isolated from a patient whose IPF subtype is not known to create a test sample. The levels of protein expression can then be compared between the two samples. In some embodiments, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-MS) can be used to image histological sections taken from IPF subtype-l and/or IPF subtype-ll patients (reference samples) and histological sections taken from patients whose IPF subtype is not known (test samples). MALDI-MS can be used to image naturally occurring molecules, such as proteins, within a reference sample and within a test sample such that the presence and the levels of expression of the proteins can be compared between the two samples.
[00130] The protein signatures disclosed herein are advantageous in comparison to transcript-based and genomic markers, as the expression level of the disclosed proteins comprising the deficiency signatures can be measured using routine assays such as, for example, antibody-based methods such as immunohistochemistry and ELISA, of which the latter allows for non-invasive testing.
Creation of Protein Signatures
[00131] The protein signatures disclosed herein can be generated in a number of ways. In some embodiments, protein profiles of patients known to have IPF subtype-l or IPF subtype-ll can be generated using high-resolution tandem mass spectrometry-based proteomics. For example, proteomics can be employed based on 1 D gel electrophoresis in combination with nano-LC-MS/MS and spectral counting to compare the protein profile of an IPF subtype-l patient and the protein profile of an IPF subtype-ll patient. The two protein profiles can then be compared in order to determine which proteins are differentially regulated between the two subtypes, as well as which proteins are not differentially expressed. In some cases, protein profiles of IPF subtype-l and subtype-ll can be compared with protein profiles of non-diseased patients. These comparisons can be used to identify a protein or proteins that can aid in differentiating subtype-l from subtype-ll. In some cases, protein signatures may include both differentially expressed proteins and non-differentially expressed proteins. Pathway and protein complex analysis can then be used to identify the functions of the proteins that are differentially regulated between the two subtypes.
[00132] Isolation of proteins from tissue samples can be accomplished via any number of techniques. For example, in certain embodiments, a tissue sample from a patient known to have IPF subtype-l may be taken and homogenized. In some embodiments, a tissue sample from a patient known to have IPF subtype-ll may be taken and separately homogenized. In some embodiments, a tissue sample from a patient whose IPF subtype is not known may be taken and separately homogenized. As will be evident to a person of ordinary skill in the art, each sample is processed separately to avoid cross-contamination and to ensure that the comparison between the two samples is scientifically sound. For purposes of brevity, the following description relates to the processing of a generic sample.
[00133] After homogenization, the proteins in the tissue sample are solubilized in an appropriate buffer (e.g., a buffer containing an anionic surfactant such as sodium dodecyl sulfate), and then heat denatured. The proteins can then be fractionated according to their electrophoretic mobility using any number of gel electrophoresis techniques such as, for example, one-dimensional sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Upon completion of electrophoresis, the gel can be fixed and stained to reveal the bands of fractionated proteins isolated from a non-1 PF tissue sample or from a sample taken from a person known to have subtype-l or subtype I I. Data relating to electrophoretic mobility and band color intensity can be obtained.
[00134] In order to liberate the proteins from the gel, each of the individual gel lanes can be cut into a plurality of bands and each band can be processed separately to remove the proteins therefrom, thereby creating a library of individual pools of proteins isolated from the tissue sample. In certain embodiments, each gel band can be processes for in-gel digestion by reducing any cysteine bonds that may be present in the proteins in each band (e.g., by treatment with dithiotreitol) and then incubating each band with an appropriate protease (e.g., trypsin). The resulting peptides can then be extracted from each gel band and stored prior to LC-MS analysis.
[00135] The peptides in each pool (produced by extraction of each individual gel band) can then be separated by LC-MS/MS. The MS/MS spectra obtained from each pool can then be analyzed (e.g., by use of one or more algorithms and comparison to known databases) to determine the intact protein and peptide fragment composition. In some embodiments, the MS/MS spectra of the proteins contained in each gel band pool can be searched against known human protein databases, and the results imported into one or more software programs that can organize the gel-band data, validate peptide identifications and generate a list of identified proteins for the gel band pool. MS/MS analysis can also serve to quantify the amount of each protein and peptide present in each gel band pool.
[00136] The data generated in this manner obtained from a test sample can then be compared against the corresponding data obtained from a reference sample to determine which of the proteins are differentially expressed between the two samples. In some embodiments, proteins that are significantly differentially expressed in a patient known to have IPF subtype-l l are suitable for use in the protein signatures disclosed herein. In some embodiments, proteins that are more highly expressed in a patient known to have IPF subtype-ll are suitable for use in the protein signatures disclosed herein. In some embodiments, proteins that are more highly expressed in IPF subtype-ll are cilium- associated genes, or genes described in FIGS. 5, 6, and SEQ I D NOs:1 -197. In some embodiments, proteins that are not differentially expressed in IPF are suitable for use in the protein signatures disclosed herein. Proteins that are not differentially expressed may be reference proteins. EXAMPLES
[00137] The following examples describe in detail certain embodiments of the molecular phenotyping devices and methods disclosed herein. It will be apparent to those skilled in the art that many modifications, both to materials and methods, may be practiced without departing from the scope of the disclosure.
Example 1 - Derivation of IPF Sub-Group Profiles
[00138] Lung tissue obtained from 1 19 subjects with a IPF diagnosis, and from 50 non- diseased controls, was used to generate transcriptional profiles for each individual. The generated transcriptional profiles were then used to identify molecular phenotypes of IPF subtypes. In some embodiments, the two subtypes are distinguished by differential expression of cilium genes.
[00139] All human tissue was collected with appropriate ethical review for the protection of human subjects. The Lung Tissue Research Consortium (LTRC) IPF cohort was used to derive gene expression signatures. The control tissue cohort was split to provide control lung expression profiles for both derivation (Example 1 ) and validation stages (Example 2). Subjects and Tissue Samples
Demographic Characteristics
[00140] Table 1 summarizes demographic and clinical characteristics of the LTRC IPF subjects and the non-diseased control cohort used in the initial analysis. The IPF cohort is older and composed of more males than the control cohort. There are no notable differences in racial distribution between the two groups. Approximately half of the individuals with IPF are former smokers, as compared to controls that are almost 50% current smokers. IPF individuals on average have smoked more cigarettes than controls but there is substantial variability in pack years in the IPF cohort.
Table 1. Subject demographics and clinical characteristics of derivation cohort.
Figure imgf000043_0001
* IPF = idiopathic pulmonary fibrosis, UIP = usual interstitial pneumonia.
** Average for current and former smokers
[00141] Lung tissue specimens from lower (n=90), upper (n=20), and middle/lingula (n=9) lobes from subjects with IIP were obtained from the LTRC. The LTRC is a resource created by the NHLBI to provide human lung tissue and DNA to qualified investigators for use in research. The program enrolls donor subjects who are anticipating lung surgery, collects blood and extensive phenotypic data from the prospective donors, and then processes their surgical waste tissues for research use. Most donor subjects have fibrotic interstitial lung disease or COPD. Clinical data include clinical and pathological diagnoses, chest CT images, pulmonary function tests (spirometry, DLCO, and ABG), exposure (including cigarette smoking history) and symptom questionnaires (including Borg dyspnea scale), and family history of lung disease.
[00142] Control, non-diseased lung tissue was also obtained from either the lower (n=86) or middle (n=4) lobes, through the International Institute for Advancement of Medicine, formerly Tissue Transformation Technologies (Ediston, NJ). All individuals in the control group had suffered brain death and were evaluated for organ transplantation before research consent. Informed consent was obtained at the time of transplant evaluation. All specimens failed regional lung selection criteria for transplantation. Subjects had to demonstrate no evidence of active infection or chest radiographic abnormalities, mechanical ventilation < 48 h, Pa02/Fi02 ratio > 200, and no past medical history of underlying lung disease or systemic disease that involves the lungs (e.g., rheumatoid arthritis). Lung samples were procured within 34 h after brain death (mean, 16.2 h; range, 4.5-33.25 h).
[00143] Half of the control group (45) was used in derivation experiments while the rest of the samples was used during validation.
Microarray Data Generation
[00144] Total RNA was isolated from approximately 100 mg of snap-frozen lung tissue using the mirVana kit (AB/Ambion, Austin TX). RNA purity and concentration were determined by spectrophotometry, and RNA integrity was determined using the Bioanalyzer (Agilent, Santa Clara, CA). mRNA microarray target labeling was conducted using 300 ng of total RNA and the Message Amp II kit (AB/Ambion, Austin TX), hybridized to the Human Gene 1 .0 ST Array (Affymetrix, Santa Clara, CA) and processed according to the manufacturer's instructions. All microarray data met the quality control criteria established by the Tumor Analysis Best Practices Working Group (9) and is available in the Gene
Expression Omnibus repository as GSE31962.
[00145] Affymetrix GeneChip® arrays are fabricated by using in-situ synthesis of short oligonucleotide sequences on a small glass chip using light directed synthesis. mRNA samples of interest are labeled. The first step of the labeling procedure is the synthesis of double stranded cDNA from the RNA sample using reverse transcriptase and an oligo-dT primer. Next, the cDNA serves as a template in an in vitro transcription (IVT) reaction that produces amplified amounts of biotin-labeled antisense m RNA . This biotinylated RNA is referred to as labeled aRNA or cRNA - the microarray target. Prior to hybridization, the cRNA is fragmented to 25-200 bp fragments and added to a hybridization cocktail that is then hybridized to the probes on the array. After hybridization, the chip is stained with a fluorescent molecule (streptavidin-phycoerythrin) that binds to biotin. The staining protocol includes a signal amplification step that employs anti-Streptavidin antibody (goat) and biotinylated goat IgG antibody . The series of washes and stains with aforementioned reagents binds the biotin and provides an amplified flour that emits light when the chip is then scanned with a confocal laser and the distribution pattern of signal in the array is recorded. [00146] Affymetrix GeneChip® Human Gene 1 .0 ST Array. Each of the 28,869 genes is represented on the array by approximately 26 probes spread across the full length of the gene, providing a more complete and more accurate picture of gene expression than 3' based expression array designs.
Microarray Data Analysis
[00147] Expression data from 169 mRNA arrays (1 19 LTRC IPF subjects and 50 controls) were analyzed using ANCOVA and hierarchical clustering methods implemented in Partek (St Louis, MO).
[00148] Intensity data were imported, log2-transformed, and quantile normalized using RMA (10), and expression levels were summarized on a transcript level using the mean value of all probe sets mapping to a transcript. Non-expressed and invariant transcripts were removed using a median variance filter, corrected by a Benjamini-Hochberg false discovery rate (FDR) of 0.10 (1 1 ), resulting in a final dataset of 1 1950 transcript
measurements across 169 samples.
[00149] False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses (type I errors). It is a less conservative procedure for comparison than the Bonferroni correction, with greater power than familywise error rate (FWER) control, at a cost of increasing the likelihood of obtaining type I errors. In practical terms, the FDR is the expected proportion of false positives among all significant hypotheses. 5% FDR means that 5 out of 100 identified genes are expected to be false positives.
[00150] Differential expression of individual transcripts between I PF and control groups was identified using an ANCOVA model incorporating the final clinical diagnosis of each subject as well as age, gender, and smoking status.
[00151] ANCOVA is a general linear model with a continuous outcome variable
(quantitative, scaled) and two or more predictor variables where at least one is continuous (quantitative, scaled) and at least one is categorical (nominal, non-scaled). ANCOVA is a merger of ANOVA and regression for continuous variables. ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative predictors (covariates) account.
[00152] NJH IPF mRNA expression profiles were collected and processed in the same manner as the LTRC I PF samples, with the exception of the final filtering step; in this embodiment, 1 1950 transcripts from the LTRC dataset were retained in the dataset. NJH dataset was only used for hierarchical clustering and no statistical tests were performed on this dataset.
[00153] To focus on the most prominent changes in mRNA profiles, a minimum of 2-fold change in expression was imposed in addition to 5% FDR and conducted post-hoc clustering of the 472 differentially expressed mRNA transcripts using these criteria.
Hierarchical clustering of IPF and control samples is shown in FIG. 7, and IPF samples only in FIG. 1 . FIG. 1 illustrates the presence of two groups of subjects with IPF and six clusters of transcripts (A-F). The most prominent feature of the heatmap is the group of 51 subjects (43%; subject group II) with relatively high expression compared to 68 subjects (57%; subject group I) of a large set of transcripts (transcript clusters A and B).
Novel Molecular Subtype of IPF
[00154] To identify molecular profiles associated with clinically defined subtypes of IPF, an ANCOVA model was used that incorporates disease status (IPF and controls), age, gender and smoking status as factors. The impact of several other technical variables was considered including array batch, RNA preservative, RNA quality (RIN) and anatomic location of the lung biopsy; minimal expression changes were associated with these variables and were therefore not included in the final model (data not shown). Disease status had the largest impact on gene expression with 5465 transcripts meeting the 5% FDR criteria for differential expression in IPF compared to controls while other factors had substantially fewer differentially expressed transcripts associated with them.
[00155] Transcript cluster A contains 80 unique transcripts, which are shown in FIGS. 5A and 5B. Cluster A includes a number of genes that have been previously shown to be upregulated in IPF, namely osteopontin, MMP1 , PLUNC, MMP7, MUC5B, collagen COL17A1 , and keratins 5, 6C, 15 and 17. However, the present studies surprisingly demonstrate that these newly identified I PF-associated genes differentiate two
subpopulations of subjects with I PF. Cluster A also contains a number of other potentially interesting genes such as lipocalin2, a gene with an established role in innate immunity and a marker of acute exacerbations in cystic fibrosis that has more recently been shown to promote epithelial-to-mesenchymal transition (EMT) in cancer and play an active role in renal fibrosis. Functional enrichment analysis, using Fisher exact test, of the 121 unique transcripts in cluster B showed it to be strongly enriched in transcripts associated with the cilium genes (Benjamini corrected p value 3.7 x 10-1 1 ) and their structural components (axoneme, 3.9 x 10-1 1 , dynein, 9.4 x 10-7). Expression of cilium-associated mRNAs was confirmed (DNAH6, DNAH7, DNAI 1 and RPGRIP1 L) in the LTRC subjects with I PF and controls by quantitative RT-PCR (FIG. 2). Quantitative RT-PCR
[00156] Primers for mRNA expression were designed using Primer-BLAST and are listed in Table 2.
Table 2
Figure imgf000047_0001
[00157] RNA was normalized to a concentration of 10Ong/μΙ.. and reverse transcribed to cDNA using the Applied Biosystems High Capacity cDNA Reverse Transcription Kit. Each 20-μΙ_ PCR contained 15 ng cDNA, 0.5μΜ final concentration of forward and reverse primers and 1 x final concentration of the Power SYBR Green master mix. Real-time PCR was performed on an Applied Biosystems Viia 7 instrument using the following profile: 50°C for 2min, 95°C for 10min, and 40 cycles of 95°C for 15sec, and 60°C for 1 min. Dissociation curves were collected at the end of each run. Data were analyzed using the AACT relative quantification method (16). ACT values were calculated relative to GAPDH, and AACT values were calculated by comparison among different groups of samples.
Real time quantitative RT-PCR. [00158] The real-time reverse transcription polymerase chain reaction (RT-PCR) uses fluorescent reporter molecules to monitor the production of amplification products during each cycle of the PCR reaction. This combines the nucleic acid amplification and detection steps into one homogeneous assay and obviates the need for gel electrophoresis to detect amplification products. Use of appropriate chemistries and data analysis eliminates the need for Southern blotting or DNA sequencing for amplicon identification. Its simplicity, specificity and sensitivity, together with its potential for high throughput and the ongoing introduction of new chemistries, more reliable instrumentation and improved protocols, has made real-time RT-PCR the benchmark technology for the detection and/or comparison of RNA levels.
[00159] LTRC subjects with IPF in group II (enriched for high expression of cilium genes and their structural components) do not differ in age, gender or smoking status from IPF subjects in group I (Table 3). To evaluate whether subjects in group II have a unique clinical presentation of IPF, the extent of microscopic honeycombing and fibroblastic foci was assessed. 17 specimens of IPF (9 from group I and 8 from group II) were randomly selected to perform semi-quantitative assessment (scores 0-3) of these two pathological features on stained sections of lung tissue. Among subjects with IPF, there were more individuals with higher scores for microscopic honeycombing but not fibroblastic foci in group II compared to group I (Fisher exact test; Table 3). The histologic differences between the two groups of subjects with IPF are depicted in the dendogram in FIG. 1. Taken together these findings suggest that groups I and II represent novel molecular phenotypes of IPF and may be important in distinguishing clinical subtypes of this disease.
Table 3. Subject demographics and clinical characteristics of the IPF cohort by groups I and II from Figure 1.
Figure imgf000048_0001
Group I Group II p value
1 2 2
2 0 0
3 0 6
Lung tissue pathology - fibroblastic foci score
0 3 0
1 4 4 0.86****
2 1 2
3 1 2
* Average for current and former smokers
** by two-tailed t-test
*** by chi square test
**** by Fisher exact test
[00160] Differential expression of individual transcripts was identified using an ANCOVA model incorporating the clinical diagnosis of each subject as well as age, gender, and smoking status. Validation was performed in an independent cohort of 1 11 IPF and 39 non- diseased controls.
Cilium Genes and Novel Molecular Phenotypes
[00161] To further explore the role of the cilium and its structural components in defining these novel molecular phenotypes of IPF, all probesets on the Gene 1.0 ST chip that represent transcripts from the Gene Ontology (GO) category 0005929, cellular component cilium were identified. The Gene Ontology database is available at www.geneontology.org. Of the 98 probe sets, 63 representing 59 unique transcripts are contained in the filtered dataset used for analysis (those transcripts filtered out had little expression or change across all subjects). Expression levels of these cilium-associated transcripts in LTRC IPF subjects (FIG. 3A) reveals that the majority of the transcripts (40/59) exhibit the pattern of higher gene expression in group II compared to group I. Shown in FIG. 3B are representative dot plots for two dynein genes, DNAH6 and DNAH7, in LTRC IPF and non-diseased control categories. It is apparent from the plots that expression of these two genes in IPF lungs are
characterized by a bimodal distribution with a subset of IPF samples expressing lower levels (group I) and another group of IPF subjects with a significantly higher expression of these cilium genes (group II). Dot plots were examined of expression levels of the two genes grouped by the extent of honeycombing and fibroblastic foci and demonstrated a correlation between higher expression of cilium-associated genes and microscopic honeycombing but not fibroblastic foci (FIG. 3C).
Histological Evaluation
[00162] Histological correlates of differential expression were examined using
hematoxylin and eosin-stained tissue sections of a random sample of 17 IPF embodiments. Slides were obtained from formalin fixed tissue blocks of lung tissue adjacent to the frozen tissue used for transcriptional profiling, and each slide was examined simultaneously by two board-certified pathologists. Each sample was given a score from 0-3 on the presence and extent of microscopic honeycombing and fibroblastic foci.
[00163] Taken together, these data suggest that two novel subtypes of IPF are defined largely by the expression of cilium-associated genes and that the expression of cilium- associated genes is also associated with pathological features of IPF, specifically microscopic honeycombing.
Example 2 - Validation
[00164] The National Jewish Health (NJH) IPF cohort consists of 1 1 1 IPF patients that were clinically evaluated by investigators at National Jewish Health. All subjects in this cohort have undergone a standardized evaluation designed to provide a specific diagnosis. The evaluation included a standardized history focused on the presence of current or previous systemic disease; medications; tobacco and recreational drug use; familial lung disease; avocational, occupational, environmental, and accidental exposures. Additional testing includes serologic evaluation for evidence of systemic disease, chest radiography, pulmonary physiology (including lung volumes by body plethysmography, spirometry before and after inhaled bronchodilator, and diffusing capacity), pressure volume curves, and gas exchange with exercise (formal six-minute walk testing and/or cardiopulmonary exercise testing). Video assisted thorascopic (VAT) or open surgical lung biopsy was performed as clinically indicated. The diagnosis of IIP was established using the criteria defined in the ATS/ERS consensus statement (1 , 2).
Subjects and Tissue Samples
[00165] National Jewish Health (NJH) IPF cohort was used to validate gene expression signatures.
Cilium Gene Expression and Survival in an Independent IPF Cohort
[00166] Expression of cilium-associated genes was validated in an independent cohort of 1 1 1 IPF (NJH cohort). Hierarchical clustering of samples based on expression profiles of 63 probes (59 unique transcripts) in the GO category 0005929, cellular component cilium recapitulated our findings from the LTRC cohort and divided samples into two similar groups, 72 (65%) with low cilium gene expression and 39 (35%) with high cilium gene expression (FIG. 4A). Analysis of survival data in these two groups demonstrated that high cilium gene expression is associated with reduced survival (p=0.01 by Mantel-Cox log-rank test; FIG. 4B) in the NJH cohort of subjects with IPF. Survival Analysis
[00167] Censored survival analysis in the NJH cohort was performed in GraphPad Prism. Mantel-Cox log-rank test was used for curve comparison between high and low cilium expression groups.
Example 3 - Localization of Cilium Gene Expression to Honeycomb Cycsts in IPF Lung
[00168] The idea that cilia may play a key role in pathogenesis of IPF is further supported by observing expression of cilium gene markers in honeycomb cysts in IPF lung, the same pathogenic lesions in which MUC5B expression is dysregulated.
[00169] Immunohistochemical (IHC) staining for ARL13B, a marker that is expressed early in ciliogenesis and FOXJ1 , required for formation of motile cilia following primary cilia, reveals dysregulation of both genes in IPF lung. In normal lung, ARL13B stains cilia on epithelial cells lining the trachea, airways and bronchioles and FOXJ1 stains nuclei of the same cells (data not shown). There is no expression of either marker in the normal alveoli with the exception of FOXJ1 expression in alveolar macrophages (data not shown). In addition to this normal pattern of expression, ARL13B and FOXJ1 are expressed in the cytoplasm of basal and alveolar type II (ATM) cells in the transition zone from normal bronchioles to honeycomb cysts in IPF lung; these cells also express MUC5B (Figure 8) and have been termed "hyperplastic" ATM cells.
[00170] Similar to MUC5B, expression of FOXJ1 is much more broadly elevated in honeycomb and alveolar cysts (Figure 9) while expression of ARL13B is limited to basal and ATM cells in transition zones. These data suggest that these hyperplastic ATM cells in the transition zone from normal bronchioles to honeycomb cysts that express the early marker of ciliogenesis ARL13B are likely the result of dysregulated cell programming during the repair process following injury to bronchoalveolar duct junctions and are likely to be the origin of honeycomb cysts. These results are summarized in Table 4.
Table 4
[00171] Table 4 is a brief summary of the most important findings from the IHC analysis of non-diseased and IPF lung tissue sections for cilium gene markers.
Figure imgf000052_0001
[00172] Others have shown that airway epithelial cells give rise to ATI and ATII cells in bleomycin-induced pulmonary fibrosis using lineage tracing experiments in mice and that distal airway stem cells proliferate into interbronchial regions of alveolar ablation following H1 N1 infection and assemble into alveoli-like structures. Moreover, expression of the primary ciliary gene/adult stem cell marker prominin 1/CD133 is elevated in alveolar epithelial cells of rapidly but not slowly progressing IPF patients. Finally, a follow up study demonstrated that BPIFB1/LPLUNC1 , the most differentially expressed (Group ll/Group I IPF=1 1 ) co-localizes with MUC5B to the bronchiolized epithelium in the honeycomb cysts in IPF lung.
Example 4 - Correlation of Cilium Marker and Muc5B Gene Expression in Mouse Models of Fibrosis.
[00173] Muc5b-/-, WT and Scgbl a1 -driven Muc5b overexpressing mice (all on pure C57BL/6 background) were exposed to bleomycin. Bleomycin is an antineoplastic antibiotic that forms a complex with oxygen and metals such as Fe2+ leading to the production of oxygen radicals, double-stranded DNA breaks, and ultimately cell death. Wild type C57BL/6 mice intratracheally instilled with bleomycin (1 -2U/kg) develop significant fibrosis after 2-3 weeks compared to saline controls as determined by increased mortality, decreased lung static compliance, and increased collagen accumulation and lung hydroxyproline levels. Muc5b-/- mice are protected while mice overexpressing Muc5b in the airway are more prone to fibrosis. Importantly, the magnitude of misregulated ARL13B staining correlated with Muc5b status (Figure 10). [00174] While multiple embodiments are disclosed, still other embodiments of the present disclosure will become apparent to those skilled in the art from the detailed description provided in this disclosure. As will be apparent, the disclosure is capable of modifications in various obvious aspects, all without departing from the spirit and scope of the present disclosure. Accordingly, the detailed description is to be regarded as illustrative in nature and not restrictive.
[00175] Although the present disclosure has been described with a certain degree of particularity, it is understood the disclosure has been made by way of example, and changes in detail or structure may be made without departing from the spirit of the disclosure as defined in the appended claims.

Claims

CLAIMS We claim:
1 . A composition for diagnosing or classifying a lung disease comprising:
one or more nucleic acids derived from expressed sequences obtained from a test sample; and
diagnostic nucleic acids comprising one or more of all or a part of sequences SEQ ID NOs:1 -197.
2. The composition of claim 1 , wherein the test sample nucleic acids and/or the diagnostic nucleic acids are labelled.
3. The composition of any of claims 1 -2, wherein the test sample is derived from lung
tissue.
4. A device for diagnosing or classifying a lung disease comprising:
the composition of claims 2 or 3; and
a device for quantitating the label.
5. A composition for classifying a lung disease as IPF subtype-l or subtype II, comprising:
at least one diagnostic gene selected from the group consisting of SEQ ID NOs:1 -197, or a fragment thereof.
6. The composition of claim 5, comprising the diagnostic genes of SEQ ID NO:1 -197.
7. A method of diagnosing or classifying a lung disease comprising:
collecting a lung tissue sample;
processing the tissue sample;
purifying expressed sequences from the tissue sample;
determining the abundance of one or more expressed sequences in the tissue sample; and
classifying the lung disease based on the abundance of one or more expressed sequences.
8. The method of claim 7, wherein the one or more expressed sequences have nucleotide sequences identical or homologous to all or a part of one or more of sequences SEQ ID NOs:1 -197.
9. The method of any of claims 7-8, wherein the expressed sequences are quantitated by real-time PCR, quantitative PCR, or gene chip technology.
10. The method of any of claims 7-9, wherein the lung disease is classified as IPF subtype II if the abundance of the expressed sequence is higher than abundance of a reference gene, than the expressed sequence in a tissue sample from IPF subtype I lung tissue, or than the expressed sequence in a tissue sample from a non-diseased lung tissue.
1 1 . The method of any of claims 7-10, wherein the one or more expressed sequence is quantitated by Affymatrix gene chip technology, and wherein the lung disease is classified as subtype II if about 300 ng of total processed RNA from the tissue sample the mean log(2) intensity of the expressed sequence is between 5.9 and 8.3.
12. The method of any of claims 7-10, wherein the one or more expressed sequence is quantitated by Affymatrix gene chip technology, and wherein the lung disease is classified as subtype I if about 300 ng of total processed RNA from the tissue sample results in a mean log(2) intensity of the expressed sequence less than about 5.9.
13. A method of treating IPF comprising:
providing a lung tissue sample;
obtaining expressed sequences from the sample;
detecting the abundance of one or more expressed sequences having nucleotide sequences identical or homologous to all or a part of one or more of sequences SEQ ID NOs:1 -197;
classifying the lung tissue sample as belonging to IPF subtype I or IPF subtype II.
14. A method of treating IPF, comprising:
generating a test sample comprising RNA obtained from lung tissue of a patient; measuring the expression level of mRNA in the test sample;
comparing the expression level of the mRNA of the test sample to an mRNA signature for classifying IPF subtype-l; and
administering therapy to the patient when the expression level of the mRNA of the test sample is les than to the expression level of the corresponding mRNA in the signature.
PCT/US2014/041129 2013-06-05 2014-06-05 Molecular phenotyping of idiopathic interstitial pneumonia identifies two subtypes of idiopathic pulmonary fibrosis WO2014197713A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361831379P 2013-06-05 2013-06-05
US61/831,379 2013-06-05

Publications (2)

Publication Number Publication Date
WO2014197713A2 true WO2014197713A2 (en) 2014-12-11
WO2014197713A3 WO2014197713A3 (en) 2015-02-12

Family

ID=52008751

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/041129 WO2014197713A2 (en) 2013-06-05 2014-06-05 Molecular phenotyping of idiopathic interstitial pneumonia identifies two subtypes of idiopathic pulmonary fibrosis

Country Status (1)

Country Link
WO (1) WO2014197713A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694748A (en) * 2022-02-22 2022-07-01 中国人民解放军军事科学院军事医学研究院 Proteomics molecular typing method based on prognosis information and reinforcement learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007047796A2 (en) * 2005-10-17 2007-04-26 Institute For Systems Biology Tissue-and serum-derived glycoproteins and methods of their use
WO2009073167A2 (en) * 2007-12-03 2009-06-11 The Government Of The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services Identification and diagnosis of pulmonary fibrosis using mucin genes, and related methods and compositions

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694748A (en) * 2022-02-22 2022-07-01 中国人民解放军军事科学院军事医学研究院 Proteomics molecular typing method based on prognosis information and reinforcement learning

Also Published As

Publication number Publication date
WO2014197713A3 (en) 2015-02-12

Similar Documents

Publication Publication Date Title
US20230287511A1 (en) Neuroendocrine tumors
JP6106636B2 (en) Diagnosis of type 2 neoplasia (NEOPLASMS-II)
CN104232762B (en) For measuring the diagnostic method of nonsmall-cell lung cancer prognosis
CN105143467B (en) Method for predicting the risk of interstitial pneumonia
JP4938672B2 (en) Methods, systems, and arrays for classifying cancer, predicting prognosis, and diagnosing based on association between p53 status and gene expression profile
US8765371B2 (en) Method for the in vitro detection and differentiation of pathophysiological conditions
ES2527062T3 (en) Survival and recurrence of prostate cancer
EP3543359B1 (en) Molecular marker, kit and application for use in early diagnosis and prediction of sepsis as complication of acute kidney injury
JP2011515068A (en) Gene expression signature for chronic / sclerosing allograft nephropathy
WO2011130435A1 (en) Biomarkers based on a multi-cancer invasion-associated mechanism
CA2985683A1 (en) Methods and compositions for diagnosing or detecting lung cancers
JP6106257B2 (en) Diagnostic methods for determining the prognosis of non-small cell lung cancer
EP3394291B1 (en) Triage biomarkers and uses therefor
CN113637744B (en) Application of microbial marker in judging progress of acute pancreatitis course
CN113493829B (en) Application of biomarker in pulmonary hypertension diagnosis and treatment
JP2010502177A (en) Diagnosis method
EP3728630A1 (en) Compositions and methods for diagnosing lung cancers using gene expression profiles
WO2015105190A1 (en) Method for assessing lymph node metastatic potential of endometrial cancer
WO2014197713A2 (en) Molecular phenotyping of idiopathic interstitial pneumonia identifies two subtypes of idiopathic pulmonary fibrosis
WO2015117205A1 (en) Biomarker signature method, and apparatus and kits therefor
CN115605608A (en) Method for detecting Parkinson&#39;s disease
JP2010502940A (en) Diagnosis method
KR20230036505A (en) Markers for diagnosing Respiratory disease and use thereof
CN116377053A (en) Diagnostic biomarker for coronary artery dilatation and application thereof
CN116287180A (en) Application of reagent for detecting marker in preparation of kit for diagnosing asthma

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14807577

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 14807577

Country of ref document: EP

Kind code of ref document: A2