WO2014121180A1

WO2014121180A1 - Genetic variants in interstitial lung disease subjects

Info

Publication number: WO2014121180A1
Application number: PCT/US2014/014395
Authority: WO
Inventors: Imre Noth; Joe Garcia; Naftali Kaminski
Original assignee: The University Of Chicago
Priority date: 2013-02-01
Filing date: 2014-02-03
Publication date: 2014-08-07

Abstract

Disclosed are methods and kits for diagnosing or predicting risk for developing interstitial pulmonary fibrosis or predicting survival of individuals with interstitial pulmonary fibrosis.

Description

GENETIC VARIANTS IN INTERSTITIAL LUNG DISEASE SUBJECTS

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR

DEVELOPMENT

Not applicable.

CROSS-REFERENCE TO RELATED APPLICATIONS

This PCT application claims the benefit of US Provisional Application No. 61/759,820, filed February 1 , 2013, which is incorporated by reference herein.

INTRODUCTION

Idiopathic Pulmonary Fibrosis (IPF) is a low prevalence, devastating disease of unknown etiology characterized by an interstitial fibrotic process and high mortality. The course of disease is heterogeneous with a 2-5 year median survival from diagnosis. To date, lung transplantation remains the only successful treatment option, while immunosuppression regimens were recently demonstrated as harmful. Therefore, identifying genetic variants associated with susceptibility to IPF and alleles involved in the heterogeneity of disease course and mortality remains a major challenge.

A common single nucleotide polymorphism (SNP) of MUC5B is present in 34- 38% of non-familial IPF cases, suggesting that a genetic underpinning contributes to disease. A prior genome-wide association study (GWAS) examining approximately 250,000 SNPs in 159 IPF cases demonstrated the association of an intronic common variant in telomerase reverse transcriptase {TERT) gene with susceptibility to IPF¹ Mutations in TERT or telomerase RNA component (TERC) genes result in telomere shortening and are associated with both familial and non-familial IPF. Rare heterozygous variants in surfactant protein A2 (SFTPA2) and surfactant protein C (SFTPC) genes have also been implicated in familial IPF. These findings suggest that the etiology of IPF may integrate multiple genetic loci.

There is a need in the art to identify genetic variants in interstitial lung disease subjects. Provided here are methods and compositions addressing these and other needs in the art. l SUMMARY OF THE INVENTION

In certain embodiments is provided compositions and methods for identifying genetic variants in interstitial lung disease subjects. Also provided are compositions and methods of determining whether a human subject has, or is at risk of developing, an interstitial lung disease. In certain embodiments, the methods include detecting whether the genome of the subject comprises a genetic variant of at least one of TOLLIP, SPPL2C, and MDGA2, the presence of the genetic variant indicating that the subject has or is at risk of developing the interstitial lung disease. In certain embodiments, more than one genetic variant of TOLLIP and/or SPPL2C and/or MDGA2 is detected. In certain embodiments, in addition to detecting genetic variants of TOLLIP and/or SPPL2C and/or MDGA2, the method includes detecting whether the genome of the subject includes other genetic variants diagnostic or predictive of risk for interstitial lung disease, e.g., a genetic variant of MUC5B, such as rs35705950.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 shows probability of survival over time, revealing the association with marker SNPs in the ch11 p15.5 and ch17q21.31 regions on InterMune, UChicago and UPittsburgh case series. Brown=homozygote minor; green=heterozygote; blue=homozygote major for each single nucleotide polymorphism.

Fig. 2A is a flowchart showing the approach used in a three-stage association study; Fig. 2B is flowchart of mortality analyses by regression.

Fig. 3. QQ plot of the genome-wide association study (GWAS) of idiopathic pulmonary fibrosis (IPF).

Fig. 4 includes regional association plots showing the IPF-associated regions in Ch11p15.5 (Fig. 4A) and Ch17q21.31 (Fig. 4B).

Fig. 5 survival probability over time for people with or without H2 and with or without an SPPL2C variant.

Fig. 6A is a KM plot for TOLLIP*/MUC5B risk alleles; Fig. 6B is KM plot by Risk Index for WPGS using all 3 genes (TOLLIP, SPPL2C & MUC5B) and categorizing into 4 groups.

Fig. 7A-7C is a list of top associated loci with susceptibility to IPF.

Fig. 8 is a table listing the sample sources and sizes used in a three stage study. Fig. 9 shows the characteristics of IPF patients used in stage 1 discovery GWAS study.

Fig. 10 lists the characteristics of IPF patients by stage and availability.

Fig. 1 1 A-11 C is a list of 44 SNPs and their association p-values with susceptibility to IPF from stage 1 , stage 2, and overall.

Fig. 12 shows characteristics of IPF case series for mortality analysis.

Fig. 13 is a table showing association signals with susceptibility to IPF across stages of six SNPs followed up in Stage 3.

Fig. 14 is a table listing SNP effects on mortality.

Fig. 15 provides summaries of univariate Cox analysis for mortality.

Fig. 16 provides summaries of univariate and multivariate Cox analysis for mortality

Fig. 17 provides summaries of Kaplan-Meier survival analysis.

Fig. 18 lists predictors of survival in IPF patients identified using a univariant Cox model.

Fig. 19A-Fig. 19B lists predictors of survival in IPF patients identified using a multivariate analysis of covariance.

Fig. 20 lists 30 regions identified showing the value of aggregation and using information in addition to protein coding SNPs, with the six p values represent highest-ranking SNPs in each region in bold.

DETAILED DESCRIPTION

As described in detail below, an independent genome wide association study (GWAS) was used to identify novel polymorphisms associated with IPF susceptibility and/or mortality. The association of two novel genetic loci and the replication of a third locus in a 3-stage association study are reported herein. These loci are also associated with mortality in case series with follow-up data.

Specifically, the results obtained identified three genetic loci and replicated the association of four novel SNPs (rs11 1521887, rs5743894, rs5743890, and rs17690703) in two novel loci (ch11 p15.5/TOLL/P and ch17q21.3MSPPL2C), and the MUC5B promoter SNP (rs35705950) with IPF susceptibility in European- Americans through a three-stage case-control study. Another novel SNP (rs7144383) on a third genetic locus not previously known to be associated with IPF, ch14q21.23/MDG/\2, was discovered to show association with IPF susceptibility, although it did not replicate in Stage 3, possibly owing to the Stage 3 sample size.

The findings reported herein provide, inter alia, for novel compositions and methods for identifying genetic variants in interstitial lung disease subjects and/or determining whether an individual has, or is at risk for developing, interstitial lung disease and/or compositions and methods for predicting prognosis, e.g., survival time or mortality, of an individual with an interstitial lung disease, for example, a fibrotic interstitial lung disease, such as IPF, or familial interstitial pneumonia. Further, the identification of genetic loci and SNPs associated with interstitial lung disease contributes to the understanding of IPF pathogenesis and provides potential targets for novel treatment paradigms.

Definitions

Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by a person of ordinary skill in the art. See, e.g., Lackie, DICTIONARY OF CELL AND MOLECULAR BIOLOGY, Elsevier (4^th ed. 2007); Sambrook et al., MOLECULAR CLONING, A LABORATORY MANUAL, Cold Springs Harbor Press (Cold Springs Harbor, NY 1989). The term "a" or "an" is intended to mean "one or more." The term "comprise" and variations thereof such as "comprises" and "comprising," when preceding the recitation of a step or an element, are intended to mean that the addition of further steps or elements is optional and not excluded. The following definitions are provided to facilitate understanding of certain terms used frequently herein and are not meant to limit the scope of the present disclosure.

The term "nucleic acid" refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form, and complements thereof. "Nucleic acid" or "oligonucleotide" or "polynucleotide" or grammatical equivalents used herein means at least two nucleotides covalently linked together. Oligonucleotides are typically from about 5, 6, 7, 8, 9, 0, 12, 15, 25, 30, 40, 50 or more nucleotides in length, up to about 100 nucleotides in length. Nucleic acids and polynucleotides are a polymers of any length, including longer lengths, e.g., 200, 300, 500, 1000, 2000, 3000, 5000, 7000, 10,000, etc. The term "nucleotide" typically refers to a single unit of a polynucleotide, i.e., a monomer. Nucleotides can be ribonucleotides, deoxyribonucleotides, or modified versions thereof. As used herein, a "genetic variant" refers to a mutation, single nucleotide polymorphism (SNP), deletion variant, missense variant, insertion variant, inversion, or copy number variant.

The terms "probe" or "primer" refer to one or more nucleic acid fragments whose specific hybridization to a sample can be detected. A probe or primer can be of any length depending on the particular technique it will be used for. For example, PCR primers are generally between 10 and 40 nucleotides in length, while nucleic acid probes for, e.g., a Southern blot, can be more than a hundred nucleotides in length. The probe or primers can be unlabeled or labeled as described below so that its binding to a target sequence can be detected (e.g., with a FRET donor or acceptor label). The probe or primer can be designed based on one or more particular (preselected) portions of a chromosome, e.g., one or more clones, an isolated whole chromosome or chromosome fragment, or a collection of polymerase chain reaction (PCR) amplification products. The length and complexity of the nucleic acid fixed onto the target element is not critical to the invention. One of skill can adjust these factors to provide optimum hybridization and signal production for a given hybridization and detection procedures, and to provide the required resolution among different genes or genomic locations.

Probes and primers can also be immobilized on a solid surface (e.g., nitrocellulose, glass, quartz, fused silica slides), as in an array. Techniques for producing high density arrays can also be used for this purpose (see, e.g., Fodor (1991) Science 767-773; Johnston (1998) Curr. Biol. 8: R171-R174; Schummer (1997) Biotechniques 23: 1087-1092; Kern (1997) Biotechniques 23: 120-124; U.S. Patent No. 5,143,854). One of skill will recognize that the precise sequence of particular probes and primers can be modified from the target sequence to a certain degree to produce probes that are "substantially identical" or "substantially complementary to" a target sequence, but retain the ability to specifically bind to (i.e., hybridize specifically to) the same targets from which they were derived.

A probe or primer is "capable of detecting" a genetic variant if it is complementary to a region that covers or is adjacent to the genetic variant. For example, to detect a SNP, primers can be designed on either side of the SNP, and primer extension used to determine the identity of the nucleotide at the position of the SNP. In some embodiments, FRET-labeled primers are used (at least one labeled with a FRET donor and at least one labeled with a FRET acceptor) so that FRET signal will be detected only upon hybridization of both primers. In some embodiments, a probe is used in conditions such that it hybridizes only to a genetic variant, or only to a dominant sequence. For example, the probe can be designed to hybridize to a junction point of a genetice inversion, but not to a sequence that does not include the inversion.

Again, in the context of nucleic acids, the term "capable of hybridizing to" refers to a polynucleotide sequence that forms non-covalent, Watson-Crick bonds with a complementary sequence. One of skill will understand that the percent complementarity need not be 100% for hybridization to occur, depending on the length of the polynucleotides, length of the complementary region, and stringency of the conditions. For example, a polynucleotide (e.g., primer or probe) can be capable of hybrindizing (binding) to a polynucleotide having 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% complementarity over the stretch of the complementary region. Stringency can be increased by reducing the length of the complementary region, reducing the G-C content of the complementary region, increasing temperature and/or detergent levels, varying salt levels and pH, etc. as known in the art. In some embodiments, a polynucleotide is capable of hybridizing to a complementary sequence in standard PCR annealing conditions. In the context of detecting genetic variants, the tolerated percent complementarity or number of mismatches will vary depending on the technique used for detection (see below).

In the context of nucleic acids, the term "amplification product" refers to a polynucleotide that results from an amplification reaction, e.g., PCR and variations thereof, rtPCR, strand displacement reaction (SDR), ligase chain reaction (LCR), transcription mediated amplification (TMA), or Qbeta replication. A thermally stable polymerase, e.g., Taq, can be used to avoid repeated addition of polymerase throughout amplification procedures that involve cyclic or extreme temperatures (e.g., PCR and its variants).

The terms "label," "detectable moiety," "detectable agent," and like terms refer to a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical means. For example, useful labels include fluorescent dyes, luminescent agents, radioisotopes (e.g., ³²P, ³H), electron- dense reagents, enzymes, biotin, digoxigenin, or haptens and proteins or other entities which can be made detectable, e.g., by affinity. Any method known in the art for conjugating a nucleic acid or other biomolecule to a label may be employed, e.g., using methods described in Hermanson, Bioconiuqate Techniques 1996, Academic Press, Inc., San Diego. The term "tag" can be used synonymously with the term "label," but generally refers to an affinity-based moiety, e.g., a "His tag" for purification, or a "strepavidin tag" that interacts with biotin.

A "labeled" molecule (e.g., nucleic acid, protein, or antibody) is one that is bound, either covalently, through a linker or a chemical bond, or noncovalently, through ionic, van der Waals, electrostatic, or hydrogen bonds to a label such that the presence of the molecule may be detected by detecting the presence of the label bound to the molecule.

F5rster resonance energy transfer (abbreviated FRET), also known as fluorescence resonance energy transfer, is a mechanism describing energy transfer between two chromophores. A donor chromophore (FRET donor), initially in its electronic excited state, can transfer energy to an acceptor chromophore (FRET acceptor), which is typically less than 10 nm away, through nonradiative dipole- dipole coupling. The energy transferred to the FRET acceptor is detected as an emission of light (energy) when the FRET donor and acceptor are in proximity. A "FRET signal" is thus the signal that is generated by the emission of light from the acceptor. The efficiency of Forster resonance energy transfer between a donor and an acceptor dye separated by a distance of R is given by E = 1/[1 +(R/R₀)⁶] with R₀ being the Forster radius of the donor-acceptor pair at which E= ½. R₀ is about 50-60 A for some commonly used dye pairs (e.g., Cy3-Cy5). FRET signal varies as the distance to the 6^th power. If the donor-acceptor pair is positioned around R₀, a small change in distance ranging from 1 A to 50 A can be measured with the greatest signal to noise. With current technology, 1 ms or faster parallel imaging of many single FRET pairs is achievable.

A "FRET pair" refers to a FRET donor and FRET acceptor pair that are capable of FRET detection.

The terms "fluorophore," "dye," "fluorescent molecule," "fluorescent dye,"

"FRET dye" and like terms are used synonymously herein unless otherwise indicated. "Subject," "patient," "individual" and like terms are used interchangeably and refer to, except where indicated, humans and non-human animals. The term does not necessarily indicate that the subject has been diagnosed with a particular disease, but typically refers to an individual under medical supervision. A patient can be an individual that is seeking diagnosis, treatment, monitoring, adjustment or modification of an existing therapeutic regimen, etc.

As used herein, a "sample" refers to a biological sample obtained from a subject. Samples include material that is processed prior to carrying out testing, e.g., genomic DNA separated or purified from other cellular and non-cellular debris. In the context of the present disclosure, the sample includes genomic DNA from the subject, e.g., cheek swab, blood sample, mucosal sample, buccal swab, skin sample, hair, etc.

A "control" sample or value refers to a sample that serves as a reference, usually a known reference, for comparison to a test sample. For example, a test sample can be taken from a test condition, e.g., a sample from an individual of unknown disease status, and compared to samples from individuals with known conditions, e.g., healthy, or lacking a given genetic variation (negative control), or pulmonary disease or having a given genetic variation (positive control). A control can also represent an average value gathered from a number of tests or results. One of skill in the art will recognize that controls can be designed for assessment of any number of parameters. For example, a control can be devised to compare signal strength in given conditions, e.g., in the presence of a test probe, or primer. One of skill in the art will understand which controls are valuable in a given situation and be able to analyze data based on comparisons to control values. Controls are also valuable for determining the significance of data. For example, if values for a given parameter are widely variant in controls, variation in test samples will not be considered as significant.

Diagnosis, prognosis, and treatment of interstitial lung disease

Provided herein are compositions and methods for determining whether a human subject has or is at risk of developing an interstitial lung disease and/or prognosing interstitial lung disease. In certain embodiments, the methods of the invention may be used in conjunction with any other diagnostic or prognostic criterion or method, including, but not limited to, currently known criterion or methods. In certain embodiments, the method for determining whether a human subject has or is at risk of developing an interstitial lung disease includes detecting whether the genome of the subject comprises a genetic variant of at least one of TOLLIP, SPPL2C, and MDGA2, the presence of the genetic variant indicating that the subject has or is at risk of developing the interstitial lung disease. In certain embodiments, more than one genetic variant of TOLLIP and/or SPPL2C and/or MDGA2 is detected. In certain embodiments, in addition to detecting genetic variants of TOLLIP and/or SPPL2C and/or DGA2, the method includes detecting whether the genome of the subject includes other genetic variants diagnostic or predictive of risk for interstitial lung disease, e.g., a genetic variant of MUC5B, such as rs35705950. In some embodiments, the method for determining whether a human subject has or is at risk of developing an interstitial lung disease includes detecting the presence or absence of one or more SNPs selected from rs11 1521887, rs5743894, rs5743890, rs17690703, and rs7144383. The presence or absence of each SNP may be detected alone or in combination with each other, i.e., the methods of the invention may include detection of one, two, three, four, or five of rs11 1521887, rs5743894, rs5743890, rs17690703, and rs7144383 in any possible combination. In certain embodiments, the method includes detecting the presence or absence of from one to five of rs1 11521887, rs5743894, rs5743890, rs17690703, and rs7144383 in any combination and the presence or absence of any other SNP associated with an interstitial lung disease or its prognosis, including, without limitation, the MUC5B SNP rs35705950.

In some embodiments, the method for determining whether a human subject has or is at risk of developing an interstitial lung disease includes detecting the presence of rs1 11521887 {e.g., G or other non-dominant allele). In some embodiments, the method for determining whether a human subject has or is at risk of developing an interstitial lung disease includes detecting the presence of rs5743894 (e.g., G or other non-dominant allele). In some embodiments, the method for determining whether a human subject has or is at risk of developing an interstitial lung disease includes detecting the presence of rs5743890 (e.g., G or other non-dominant allele). In some embodiments, the method for determining whether a human subject has or is at risk of developing an interstitial lung disease includes detecting the presence of rs17690703 (e.g., T or other non-dominant allele). In some embodiments, the method for determining whether a human subject has or is at risk of developing an interstitial lung disease includes detecting the presence of rs7144383 (e.g., G or other non-dominant allele).

In certain embodiments, the method for determining whether a human subject has or is at risk of developing an interstitial lung disease includes detecting one or more genetic variants listed in Fig. 7. The one or more genetic variants may be detected alone or in any possible combination of from two to 52 of the listed genetic variants. If the method includes detecting rs35705950, then the method includes detecting at least one additional genetic variant from the remaining 51 genetic variants listed in Fig. 7.

In certain embodiments, the method includes prognosing an interstitial lung disease in a human subject. In certain embodiments, the method comprises detecting whether the genome of the subject comprises a genetic variant of TOLLIP and/or SPPL2C prognostic of increased or decreased survival. In certain embodiments, the methods include detecting whether the genome of the subject comprises a genetic variant of MUC5B and whether the genome comprises a genetic variant of a genetic variant of TOLLIP and/or SPPL2C prognostic of increased or decreased survival. In certain embodiments, the method includes detecting whether the genome comprises rs17690703 and/or rs5743890, each of which is predictive of decreased survival. In certain embodiments, the method detects whether the genome comprises rs35705950, which is predictive of increased survival, and rs17690703 and/or rs5743890. In some embodiments, the method comprises detecting rs17690703 (e.g., T or other non-dominant allele), and prognosing reduced survival time for the subject, In some embodiments, the method comprises detecting rs5743890 (e.g., G or other non-dominant allele), and prognosing reduced survival time for the subject.

In certain embodiments, the method for prognosing the interstitial lung disease in a human subject includes detecting one or more genetic variants listed in Fig. 7. The one or more genetic variants may be detected alone or in any possible combination of from two to 52 of the listed genetic variants. If the method includes detecting rs35705950, then the method includes detecting at least one additional genetic variant from the remaining 51 genetic variants listed in Fig. 7. The present invention provides methods for detecting the presence or absence of at least one genetic variant in a human subject. In certain embodiments, the method includes detecting the presence or absence of at least one genetic variant of at least one of TOLLIP, SPPL2C, and MDGA2 in a sample from the subject. In certain embodiments, more than one genetic variant of TOLLIP and/or SPPL2C and/or MDGA2 is detected. In certain embodiments, in addition to detecting genetic variants of TOLLIP and/or SPPL2C and/or MDGA2, the method includes detecting a genetic variant of MUC5B, such as rs355950.

In certain embodiments, the method for detecting the presence or absence of at least one genetic variant in a human subject includes detecting the presence or absence of at least one genetic variant of the genetic variants listed in Fig. 7. The one or more genetic variants may be detected alone or in any possible combination of from two to 52 of the genetic variants listed in Fig. 7. If the method includes detecting rs35705950, then the method includes detecting at least one additional genetic variant from the remaining 51 genetic variants listed in Fig. 7. In certain embodiments, the at least one genetic variant includes one or more of a single nucleotide polymorphism selected from the group consisting of rs111521887, rs5743894, rs5743890, rs17690703, and rs7144383 in any possible combination.

In other embodiments, the method for detecting the presence or absence of at least one genetic variant in a human subject includes detecting the presence or absence of heterozygosity in least one genetic variant of the genetic variants listed in Fig. 7. Alternatively, the method for detecting the presence or absence of at least one genetic variant in a human subject includes detecting the presence or absence of homozygosity in least one genetic variant of the genetic variants listed in Fig. 7. The heterozygosity or homozygosity of the one or more genetic variants may be detected alone or in any possible combination of from two to 52 of the genetic variants listed in Fig. 7, wherein the genetic variant may be the same or different in the individual chromosomes present in the diploid human subject. If the method includes detecting heterozygosity or homozygosity of rs35705950, then the method includes detecting heterozygosity or homozygosity of at least one additional genetic variant from the remaining 51 genetic variants listed in Fig. 7. In certain embodiments, the heterozygosity or homozygosity of at least one genetic variant includes the heterozygosity or homozygosity of one or more of a single nucleotide polymorphism selected from the group consisting of rs111521887, rs5743894, rs5743890, rs17690703, and rs7144383 in any possible combination.

Also provided is a method for testing for interstitial lung disease in a human subject that involves detecting the level of TOLLIP gene expression in a sample from the subject, a low level of TOLLIP gene expression relative to a control being indicative of interstitial lung disease. The level of gene expression may be detected by measuring, directly or indirectly, TOLLIP mRNA or by measuring Tollip protein by any suitable method, several of which are known in the art. The control may include, for example, a sample from a human that does not have interstitial lung disease or a value or set of values, for example, a normal range, derived from several humans that do not have interstitial lung disease. A low level of TOLLIP gene expression relative to a control (standard control) indicative of interstitial lung disease is a level that is less than about 50% of the control.

In certain embodiments, the present invention includes a method of treating a human subject having an interstitial lung disease comprising detecting the level of TOLLIP expression in a sample from the subject, and if the subject has a low level of TOLLIP expression relative to a control (standard control), administering to the subject an amount of a Tollip agonist, Tollip or a genetic construct expressing TOLLIP effective to treat the interstitial lung disease. An amount effective to treat the interstitial lung disease is an amount effective to delay onset, reduce frequency and/or severity of one or more symptoms, ameliorate one or more symptoms, and/or improve comfort and/or some function of the subject, e.g., respiratory function, relative to an untreated second subject or pool of subjects, or relative to, or to the same subject prior to treatment, or after cessation of treatment.

Methods of detecting a genetic variant

The methods of the invention are not limited to any particular way of detecting the presence or absence of a genetic variant (e.g. SNP) and can employ any suitable method to detect the presence or absence of a variant(s), of which numerous detection methods are known in the art.

Dynamic allele-specific hybridization (DASH) can be used to detect a genetic variant. DASH genotyping takes advantage of the differences in the melting temperature in DNA that results from the instability of mismatched base pairs. The process can be vastly automated and encompasses a few simple principles. Typically, the target genomic segment is amplified and separated from non- target sequence, e.g., through use of a biotinylated primer and chromatography. A probe that is specific for the particular allele is added to the amplification product. The probe can be designed to hybridize specifically to a variant sequence or to the dominant allelic sequence. The probe can be either labeled with or added in the presence of a molecule that fluoresces when bound to double-stranded DNA. The signal intensity is then measured as temperature is increased until the Tm can be determined. A non-matching sequence (either genetic variant or dominant allelic sequence, depending on probe design), will result in a lower than expected Tm.

DASH genotyping relies on a quantifiable change in Tm, and is thus capable of measuring many types of mutations, not just SNPs. Other benefits of DASH include its ability to work with label free probes and its simple design and performance conditions.

Molecular beacons can also be used to detect a genetic variant. This method makes use of a specifically engineered single-stranded oligonucleotide probe. The oligonucleotide is designed such that there are complementary regions at each end and a probe sequence located in between. This design allows the probe to take on a hairpin, or stem-loop, structure in its natural, isolated state. Attached to one end of the probe is a fluorophore and to the other end a fluorescence quencher. Because of the stem-loop structure of the probe, the fluorophore is in close proximity to the quencher, thus preventing the molecule from emitting any fluorescence. The molecule is also engineered such that only the probe sequence is complementary to the targeted genomic DNA sequence.

If the probe sequence of the molecular beacon encounters its target genomic DNA sequence during the assay, it will anneal and hybridize. Because of the length of the probe sequence, the hairpin segment of the probe will be denatured in favor of forming a longer, more stable probe-target hybrid. This conformational change permits the fluorophore and quencher to be free of their tight proximity due to the hairpin association, allowing the molecule to fluoresce.

If on the other hand, the probe sequence encounters a target sequence with as little as one non-complementary nucleotide, the molecular beacon will preferentially stay in its natural hairpin state and no fluorescence will be observed, as the fluorophore remains quenched. The unique design of these molecular beacons allows for a simple diagnostic assay to identify SNPs at a given location. If a molecular beacon is designed to match a wild-type allele and another to match a mutant of the allele, the two can be used to identify the genotype of an individual. If only the first probe's fluorophore wavelength is detected during the assay then the individual is homozygous to the wild type. If only the second probe's wavelength is detected then the individual is homozygous to the mutant allele. Finally, if both wavelengths are detected, then both molecular beacons must be hybridizing to their complements and thus the individual must contain both alleles and be heterozygous.

A microarray can also be used to detect genetic variants. Hundreds of thousands of probes can be arrayed on a small chip, allowing for many genetic variants or SNPs to be interrogated simultaneously. Because SNP alleles only differ in one nucleotide and because it is difficult to achieve optimal hybridization conditions for all probes on the array, the target DNA has the potential to hybridize to mismatched probes. This can be addressed by using several redundant probes to interrogate each SNP. Probes can be designed to have the SNP site in several different locations as well as containing mismatches to the SNP allele. By comparing the differential amount of hybridization of the target DNA to each of these redundant probes, it is possible to determine specific homozygous and heterozygous alleles.

Restriction fragment length polymorphism (RFLP) can be used to detect genetic variants and SNPs. RFLP makes use of the many different restriction endonucleases and their high affinity to unique and specific restriction sites. By performing a digestion on a genomic sample and determining fragment lengths through a gel assay it is possible to ascertain whether or not the enzymes cut the expected restriction sites. A failure to cut the genomic sample results in an identifiably larger than expected fragment implying that there is a mutation at the point of the restriction site which is rendering it protected from nuclease activity.

PCR- and amplification-based methods can be used to detect genetic variants. For example, tetra-primer PCR employs two pairs of primers to amplify two alleles in one PCR reaction. The primers are designed such that the two primer pairs overlap at a SNP location but each matches perfectly to only one of the possible alleles. As a result, if a given allele is present in the PCR reaction, the primer pair specific to that allele will produce product but not the alternative allele with a different allelic sequence. The two primer pairs can be designed such that their PCR products are of a significantly different length allowing for easily distinguishable bands by gel electrophoresis, or such that they are differently labeled.

Primer extension can also be used to detect genetic variants. Primer extension first involves the hybridization of a probe to the bases immediately upstream of the SNP nucleotide followed by a 'mini-sequencing' reaction, in which DNA polymerase extends the hybridized primer by adding a base that is complementary to the SNP nucleotide. The incorporated base that is detected determines the presence or absence of the SNP allele. Because primer extension is based on the highly accurate DNA polymerase enzyme, the method is generally very reliable. Primer extension is able to genotype most SNPs under very similar reaction conditions making it also highly flexible. The primer extension method is used in a number of assay formats, and can be detected using e.g., fluorescent labels or mass spectrometry.

Primer extension can involve incorporation of either fluorescently labeled ddNTP or fluorescently labeled deoxynucleotides (dNTP). With ddNTPs, probes hybridize to the target DNA immediately upstream of SNP nucleotide, and a single, ddNTP complementary to the SNP allele is added to the 3' end of the probe (the missing 3'-hydroxyl in didioxynucleotide prevents further nucleotides from being added). Each ddNTP is labeled .with a different fluorescent signal allowing for the detection of all four alleles in the same reaction. With dNTPs, allele-specific probes have 3' bases which are complementary to each of the SNP alleles being interrogated. If the target DNA contains an allele complementary to the 3' base of the probe, the target DNA will completely hybridize to the probe, allowing DNA polymerase to extend from the 3' end of the probe. This is detected by the incorporation of the fluorescently labeled dNTPs onto the end of the probe. If the target DNA does not contain an allele complementary to the probe's 3' base, the target DNA will produce a mismatch at the 3' end of the probe and DNA polymerase will not be able to extend from the 3' end of the probe.

The iPLEX® SNP genotyping method takes a slightly different approach, and relies on detection by mass spectrometer. Extension probes are designed in such a way that many different SNP assays can be amplified and analyzed in a PCR cocktail. The extension reaction uses ddNTPs as above, but the detection of the SNP allele is dependent on the actual mass of the extension product and not on a fluorescent molecule. This method is for low to medium high throughput, and is not intended for whole genome scanning.

Primer extension methods are, however, amenable to high throughput analysis. Primer extension probes can be arrayed on slides allowing for many SNPs to be genotyped at once. Broadly referred to as arrayed primer extension (APEX), this technology has several benefits over methods based on differential hybridization of probes. Comparatively, APEX methods have greater discriminating power than methods using differential hybridization, as it is often impossible to obtain the optimal hybridization conditions for the thousands of probes on DNA microarrays (usually this is addressed by having highly redundant probes).

Oligonucleotide ligation assays can also be used to detect genetic variants. DNA ligase catalyzes the ligation of the 3' end of a DNA fragment to the 5' end of a directly adjacent DNA fragment. This mechanism can be used to interrogate a SNP by hybridizing two probes directly over the SNP polymorphic site, whereby ligation can occur if the probes are identical to the target DNA. For example, two probes can be designed; an allele-specific probe which hybridizes to the target DNA so that its 3' base is situated directly over the SNP nucleotide and a second probe that hybridizes the template upstream (downstream in the complementary strand) of the SNP polymorphic site providing a 5' end for the ligation reaction. If the allele-specific probe matches the target DNA, it will fully hybridize to the target DNA and ligation can occur. Ligation does not generally occur in the presence of a mismatched 3' base. Ligated or unligated products can be detected by gel electrophoresis, MALDI- TOF mass spectrometry or by capillary electrophoresis.

The 5'-nuclease activity of Taq DNA polymerase can be used for detecting genetic variants. The assay is performed concurrently with a PCR reaction and the results can be read in real-time. The assay requires forward and reverse PCR primers that will amplify a region that includes the SNP polymorphic site. Allele discrimination is achieved using FRET, and one or two allele-specific probes that hybridize to the SNP polymorphic site. The probes have a fluorophore linked to their 5' end and a quencher molecule linked to their 3' end. While the probe is intact, the quencher will remain in close proximity to the fluorophore, eliminating the fluorophore's signal . During the PCR amplification step, if the allele-specific probe is perfectly complementary to the SNP allele, it will bind to the target DNA strand and then get degraded by 5'-nuclease activity of the Taq polymerase as it extends the DNA from the PCR primers. The degradation of the probe results in the separation of the fluorophore from the quencher molecule, generating a detectable signal. If the allele-specific probe is not perfectly complementary, it will have lower melting temperature and not bind as efficiently. This prevents the nuclease from acting on the probe.

Fluorescence resonance energy transfer (FRET) detection can be used for detection in primer extension and ligation reactions where the two labels are brought into close proximity to each other. It can also be used in the 5'-nuclease reaction, the molecular beacon reaction, and the invasive cleavage reactions where the neighboring donor/acceptor pair is separated by cleavage or disruption of the stem- loop structure that holds them together. FRET occurs when two conditions are met. First, the emission spectrum of the fluorescent donor dye must overlap with the excitation wavelength of the acceptor dye. Second, the two dyes must be in close proximity to each other because energy transfer drops off quickly with distance. The proximity requirement is what makes FRET a good detection method for a number of allelic discrimination mechanisms.

A variety of dyes can be used for FRET, and are known in the art. The most common ones are fluorescein, cyanine dyes (Cy3 to Cy7), rhodamine dyes (e.g. rhodamine 6G), the Alexa series of dyes (Alexa 405 to Alexa 730). Some of these dyes have been used in FRET networks (with multiple donors and acceptors). Optics for imaging all of these require detection from UV to near IR (e.g. Alex 405 to Cy7), and the Atto series of dyes (Atto-Tec GmbH). The Alexa series of dyes from Invitrogen cover the whole spectral range. They are very bright and photostable.

Example dye pairs for FRET labeling include Alexa-405/Alex-488, Alexa- 488/Alexa-546, Alexa-532/Alexa-594, Alexa-594/Alexa-680, Alexa-594/Alexa-700, Alexa-700/Alexa-790, Cy3/Cy5, Cy3.5/Cy5.5, and Rhodamine-Green/Rhodamine- Red, etc. Fluorescent metal nanoparticles such as silver and gold nanoclusters can also be used (Richards ei al. (2008) J Am Chem Soc 130:5038-39; Vosch et al. (2007) Proc Natl Acad Sci USA 104:12616-21 ; Petty and Dickson (2003) J Am Chem Soc 125:7780-81 Available filters, dichroics, multichroic mirrors and lasers can affect the choice of dye. Kits

In certain embodiments, the present invention provides a kit for predicting, diagnosing, or prognosing interstitial lung disease in a human subject, the kit including (e.g. consisting essentially of) at least one probe or primer for detecting the presence or absence of at least one genetic variation. In certain embodiments, the at least one genetic variation includes a genetic variant of at least one of TOLLIP, SPPL2C, and MDGA2. In certain embodiments, the kit includes at least one primer or probe for detecting more than one genetic variant of TOLLIP and/or SPPL2C and/or MDGA2. In certain embodiments, the kit includes at least one probe or primer for detecting additional genetic variants diagnostic or predictive of risk for interstitial lung disease, e.g., a genetic variant of MUC5B, such as rs37055950. In some embodiments, the kit includes a probe or primer for detecting one or more SNPs selected from rs11 1521887, rs5743894, rs5743890, rs17690703, and rs7144383. The kit may include probes or primers for detecting rs1 11521887, rs5743894, rs5743890, rs17690703, and rs7144383 alone or in any combination. In certain embodiments, the kit may include additional primers or probes for detecting the presence of detecting the presence or absence of rs37055950 and rs1 11521887, rs5743894, rs5743890, rs17690703, or rs7144383 in any combination. In certain embodiments, the kit includes at least one probe or primer includes at least one probe or primer for detecting one or more of the genetic variants listed in Fig. 7. The kit may include probes or primers for detecting the one or more genetic variants listed in Fig. 7 alone or in any possible combination of from two to 52 of the listed genetic variants. If the kit includes a probe or primer for detecting rs35705950, the kit also includes a probe or primer for detecting at least one additional genetic variant from the remaining 51 genetic variants listed in Fig. 7.

Claims directed to kits for predicting, diagnosing, or prognosing interstitial lung disease in a human subject "consisting essentially of" certain types of probes or primers is intended to capture kits that include probes or primers that are suitable primarily for detecting genetic variants associated with interstitial lung disease in humans, although the kits may also include additional probes or primers used as controls, for example, probes or primers for detecting housekeeping genes such β- actin, tubulin, or glyceraldehyde-3-phosphate dehydrogenase, for example. In this context, the use of the transitional phrase "consisting essentially of" is intended to exclude arrays containing thousands of probes, the vast majority of which are unrelated to interstitial lung disease. In certain embodiments, the kits may include buffers, enzymes, labels, and the like, for example, for use in isolating DNA or mRNA, generating cDNA, or for amplifying and/or detecting and/or sequencing specific SNPs.

In some embodiments, the kit includes (or consists essentially of) a nucleic acid primer capable of hybridizing to a genetic variant in the TOLLIP gene (e.g., a TOLLIP nucleic acid), SPPL2C gene (e.g., a SPPL2C nucleic acid), or MDGA2 gene (e.g., MDGA2 nucleic acid). In some embodiments, the genetic variant has been extracted from a human subject with an interstitial lung disease, or suspected of having an interstitial lung disease. In some embodiments, the genetic variant is an amplification product of DNA extracted from a human subject with an interstitial lung disease, or suspected of having an interstitial lung disease. In some embodiments, the interstitial lung disease is a pulmonary fibrotic condition.

In some embodiments, the kit includes a first nucleic acid probe (e.g. , a labeled probe) capable of hybridizing to an amplification product of a genetic variant in the TOLLIP gene (e.g., a TOLLIP nucleic acid), SPPL2C gene (e.g., a SPPL2C nucleic acid), or MDGA2 gene (e.g. , MDGA2 nucleic acid). In some embodiments, the kit includes a second nucleic acid probe capable of hybridizing to an amplification product of a genetic variant in the TOLLIP gene (e.g., a TOLLIP nucleic acid), SPPL2C gene (e.g., a SPPL2C nucleic acid), or MDGA2 gene (e.g., MDGA2 nucleic acid). In some embodiments, the second nucleic acid probe is capable of hybridizing to a different sequence than the first probe. In some embodiments, only one of the nucleic acid probes hybridizes to the variant nucleotide(s) (e.g., in the case of a SNP), while the other nucleic acid probe hybridizes to a nearby sequence. In some embodiments, the second probe is labeled, e.g., with a different label than the first probe. In some embodiments, the first nucleic acid probe is labeled with a first label, and the second nucleic acid probe is labeled with a second label, wherein the first and second label form a FRET pair (are capable of fluorescence resonance energy transfer) when hybridized to the genetic variant TOLLIP gene (e.g., a TOLLIP nucleic acid), SPPL2C gene (e.g., a SPPL2C nucleic acid), or MDGA2 gene (e.g. , MDGA2 nucleic acid), or amplification product thereof. In some embodiments, the kit includes (or consists essentially of) primers or at least one probe capable of detecting a genetic variant, e.g., as described above, depending on the detection method selected. In some embodiments, the kit includes primers or at least one probe capable of detecting a genetic variant in a region selected from the group consisting of 11p15.5, 14q21.3, and 17q21.31. In some embodiments, the kit includes primers or at least one probe capable of detecting at least one genetic variant in 11p15.5 (e.g., rs111521887, rs5743894, rs5743890, and rs35705950). In some embodiments, the kit includes primers or probes capable of detecting more than one (e.g., 2, 3, 4, 5, 5-10, 10-20, or more) genetic variant in 11p15.5 and 14q21.3 (e.g., rs7144383). In some embodiments, the kit includes primers or probes capable of detecting more than one (e.g., 2, 3, 4, 5, 5-10, 10-20, or more) genetic variant in 11 p15.5 and 17q21.31 (e.g., rs17690703, a genetic inversion, or copy number variation). In some embodiments, the kit includes primers or probes capable of detecting more than one (e.g., 2, 3, 4, 5, or more) genetic variant in 14q21.3 and 17q21.31. In some embodiments, the kit includes primers or probes capable of detecting more than one (e.g., 2, 3, 4, 5, 5-10, 10-20, or more) genetic variant in 11p15.5, 14q21.3, and 17q21.31.

In some embodiments, the primers and/or probes are labeled, e.g., with fluorescent labels or FRET labels. In some embodiments, the primers and/or probes are unlabeled. In some embodiments, the kit includes primers and/or probes that detect both a variant allelic sequence and the dominant allelic sequence at a selected genetic variant site, e.g., with different labels, or designed to generate amplification or primer extension products with different masses.

In some embodiments, the kit further includes at least one control sample, e.g., sample(s) with dominant allele(s) at the selected genetic variation site(s), or sample(s) with variant allele(s) at the selected genetic variation site(s). In some embodiments, the kit includes a polymerase.

In vitro complexes

Provided herein are nucleic acid complexes, e.g., formed in in vitro assays to indicate the presence of a genetic variant sequence. One of skill will understand that a nucleic acid complex can also be formed to detect the presence of a dominant allelic sequence, depending on the design of the probe or primer, e.g., in assays to distinguish homozygous and heterozygous subjects. In some embodiments, the complex comprises a first nucleic acid hybridized to a genetic variant nucleic acid, wherein the genetic variant nucleic acid is a genetic variant in a region selected from 11 p15.5, 14q21.3, and 17q21.31. In some embodiments, the genetic variant nucleic acid is an amplification product. In some embodiments, the genetic variant nucleic acid is on genomic DNA, e.g., from a subject that has or is suspected of having an interstitial lung disease. In some embodiments, the first nucleic acid is an amplification product or a primer extension product. In some embodiments, the first nucleic acid is labeled. In some embodiments, the nucleic acid complex further comprises a second nucleic acid hybridized to the genetic variant nucleic acid. In some embodiments, the second nucleic acid is labeled e.g., with a FRET or other fluorescent label. In some embodiments, the first and second nucleic acids form a FRET pair when hybridized to a genetic variant sequence.

In some embodiments, the genetic variant is in the TOLLIP gene (e.g., rs1 11521887, rs5743894, rs5743890). In some embodiments, the genetic variant is in the MDGA2 gene (e.g., rs7144383). In some embodiments, the genetic variant is in the SPPL2C gene (e.g., rs17690703, a genetic inversion, or copy number variation).

Further provided is an in vitro complex comprising a first nucleic acid probe (e.g., a labeled probe) hybridized to a genetic variant nucleic acid, wherein said genetic variant nucleic acid comprises a genetic variant TOLLIP, SPPL2C or MDGA2 gene sequence, wherein said genetic variant nucleic acid is extracted from a human subject with an interstitial lung disease or suspected of having an interstitial lung disease, or is an amplification product thereof. In some embodiments, the complex further comprises a second nucleic acid probe (e.g., labeled with a different label) hybridized to said genetic variant nucleic acid. In some embodiments, first nucleic acid probe comprises a first label and said second nucleic acid probe comprises a second label, wherein said first and second label are capable of fluorescence resonance energy transfer.

In some embodiments, the complex further comprises an enzyme, such as a

DNA polymerase (e.g., standard DNA polymerase or thermally stable polymerase such as Taq) or ligase. Genetic variants associated with interstitial lung disease

MUC5B and TOLLIP genes reside on the same genetic locus. Based on the analysis performed, the association of TOLLIP genetic variants was found to be independent from association with the previously reported MUC5B promoter SNP, rs35705950, on IPF susceptibility. Notably, the minor allele of TOLLIP SNP, rs5743890_G, was discovered to be a "protective" allele, as it lowered susceptibility to IPF compared with controls. However, mortality analysis demonstrated that individuals who developed IPF despite having the protective rs5743890_G allele had increased mortality in two independent case series and in a meta-analysis. The MUC5B/TOLLIP region on chromosome 11 p15.5 exemplifies the association patterns, disease susceptibility and outcomes.

The Toll interacting protein (Tollip), encoded by the TOLLIP gene, is known to be a critical regulator of Toll-like receptor (TLR)-mediated innate immune responses and transforming growth factor-β (TGF-βΙ) signaling pathway. Tollip activates Myd88-dependent NF-kB to modulate TLR signaling and membrane trafficking; interacts with Smad7 to modulate intracellular trafficking and negatively regulated TGF-β signaling pathway by degrading ubiquitinated TGF- β type 1 receptor; interacts with caveolin-1 interacting protein in monocytes, regulating signaling in antigen-presenting cells to induce antigen specific proliferation of T-cell proliferation, B cells, or both. TOLLIP polymorphisms are involved in regulation of TLR2 and TLR4 and are associated with susceptibility to tuberculosis, atopic dermatitis, sepsis, and TOLLIP is differentially hypomethylated in IPF lungs. Lastly, failure to upregulate TOLLIP expression in inflammatory bowel disease, may lead to chronic inflammation.

Chromosome 17q21 region has been associated with Parkinson's, multiple sclerosis, Alzheimer's, androgenic alopecia, and interestingly, with the response to inhaled corticosteroids in asthma and COPD. In the present study, it was discovered that the minor allele rs17690703_T in the 17q21.31 region was associated with decreased susceptibility for IPF development and also conferred increased mortality in Inter une, UChicago, and in the meta-analysis. Among the unique aspects of this region include a known inversion, referred to as H2, in a large region of conserved LD on the chromosome, which is positively selected in Europeans. There also appear to be a high number of copy number variants (CNVs) within this region and it has been associated with a microdeletion syndrome. A critical span of 440 kb that partially or entirely involves five genes: CRHR1, IMP5 (SPPL2C), MAPT, STH and KIAA1267 reside on 17q21.31 region. A large number of variants in the region with significance in Stage 1 were discovered, with a focus on the top SNPs.

MDGA2, a novel region, resides on 14q21.23 and showed association with

IPF susceptibility. MDGA2 is a paralog for ICAM, which has been recently demonstrated as a potential biomarker of IPF disease activity. The instant findings indicate the importance of this gene in IPF.

IPF is a heterogeneous disease and, by definition, is a diagnosis of exclusion. As such, misdiagnoses are possible, which might lead to a reduction in power. However all subjects met currently accepted criteria for diagnosis as outlined by ATS/ERS/JRS/ELAT with many having been vetted with core pathology and radiology as in Inter une, ACE-IPF, as well as participation in variety of studies.

This discovery GWAS study revealed novel genetic loci associated with IPF susceptibility. Furthermore, susceptibility alleles within these loci were discovered to be associated with mortality. Identification of common genetic variants in association with IPF provides insight into the manifestations of this complex disease process and lead to earlier detection, more predictable prognosis, and personalized therapeutic strategies.

EXAMPLES

A three-stage association study was conducted including a discovery GWAS for susceptibility to IPF in Stage 1 , and replicated the findings in two independent case-control association studies (Stage 2 and Stage 3, respectively). Association with mortality was evaluated in three case series. A flowchart illustrating the strategic approach used is shown in Fig. 2.

IPF cases and controls of each stage

Three stages of IPF cases were collected and characterized by the conventional criteria.^12"14 Stage 1 samples consisting of African-Americans (AA) and European-Americans (EA) were collected for the discovery phase of the genome- wide association study (GWAS), while Stages 2 and 3 consisting of only EA samples were collected for two independent replication studies (replication 1 and 2, respectively). All eligible subjects were at least 35 years of age and reported having symptoms of idiopathic interstitial pneumonia for at least 3 months. A high-resolution computed tomographic scan was required to show definite or probable idiopathic interstitial pneumonia in accordance with predefined criteria,¹⁴ and a surgical lung biopsy confirming UIP, was obtained in 37.3% of subjects in the discovery GWAS stage. Subjects with clinically significant exposure to known fibrogenic agents or another cause of interstitial lung disease were excluded.

Stage 1 discovery GWAS IPF samples (n=633) were identified and clinically characterized at the University of Chicago (UChicago), University of Pittsburgh (UPittsburgh), via the Lung Tissue Research Consortium (LTRC), and from the Correlating Outcomes with biomedical Markers to Estimate Time-progression in IPF (COMET) study. Stage 2 samples (n=544) comprised additional independent IPF patients from UChicago, InterMune,³ Lung Transplant Outcomes Group (LTOG) cohort⁴ and LTRC. Stage 3 IPF cases (n=324) consisted of additional independent IPF patients from LTOG and Anticoagulant Effectiveness in Idiopathic Pulmonary Fibrosis Study (ACE-IPF).⁵ Fig. 8, Fig. 9, and Fig. 10 feature each study population.

All eligible subjects were > 35 years old and reported symptoms of idiopathic interstitial pneumonia for at least 3 months. A high-resolution computed tomographic scan was required for diagnosis of definite or probable idiopathic interstitial pneumonia in accordance with predefined criteria.⁶ A surgical lung biopsy was obtained in 37.3% of affected subjects in the discovery GWAS stage. Subjects with clinically significant exposure to known fibrogenic agents and those with other known cause of interstitial lung disease were excluded.

For Stage 1 , data of unaffected European American (EA) subjects, from dbGaP (n=1 ,442) were compiled with healthy subjects recruited from the University of Pittsburgh (n=103), to increase the available pool of subjects (n=1,545). A subset of controls matched one-on-one to cases by means of genome-wide genetic ancestry estimates were selected for downstream analysis.

EA controls for Stages 2 and 3 (n=687 and n=702, respectively) were collected from 2005 to 2012 as part of the Translational Research in the Department of Medicine Study (TRIDOM) at the University of Chicago. Institutional review boards at each institution approved this study and informed consent was obtained from all subjects. Summarized strategic methodology of the study and detailed clinical and demographic characteristics of all study stages are shown in Fig. 2 and Fig. 10, respectively. Genotyping, imputation, and statistical analysis

Discovery Stage 1 genotyping was conducted using the Genome-Wide Human SNP 6.0 array (Affymetrix, Santa Clara, CA). Stages 2 and 3 genotyping was conducted using the iPLEX Gold™ Platform (Sequenom, San Diego, CA). Genotype imputation was performed with IMPUTE2 using European ancestry panel data from the 1000 Genomes Project as a reference. Association testing was performed using SNPTEST software (v2.3).⁷ Fifty-two SNPs selected in 19 loci showing an association with IPF (p<10^"4) in Stage 1 were carried forward to Stage 2. As the selected SNPs with the lowest p-value in Stage 1 were all a result of imputation, their association was validated by genotyping using the iPLEX Gold™ Platform. Six SNPs in 3 loci achieving an overall p<5x10^"8 (i.e. Stage 1 and 2 combined) were carried forward to Stage 3.

DNA quantity was checked using PicoGreen fluorometry. Samples were dispensed at 50 ng/μΙ in 96-well plates and hybridized to arrays following manufacturer's protocols. Samples with fewer than 86% of the quality control (FQC) SNPs produced genotypes were rerun. Genotypes were recalled plate-by-plate in the study, including those downloaded from dbGaP using "crlmm" package, a new implementation of the Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) algorithm, available through the Oligo package at Bioconductor.^{18, 19}

Samples were excluded from the analysis if they failed any of several quality metrics: low call rate (below 97% or 93% for production plate with > 35 samples or with <35 samples, respectively), incompatibility between reported gender and genetically determined gender, or incompatibility between reported race and genetically determined race. Samples were also checked for unexpected familial relationships using pairwise IBD estimation in PLINK.²⁰ The total number of European-American IPF case and control samples passing all initial QC tests was 575 and 1 ,427 (1 ,340 of the available 1,442 cases from dbGaP and 87 of the 103 cases from University of Pittsburgh), respectively.

To reduce the false-positive rate, inflated by spuriously small p-values, while having little impact on the p-values associated with true positive loci for heterogeneous human populations, controls were matched to cases on a one-on- one basis for race and ethnicity based on genetic ancestry.²¹ SMARTPCA software, was used to select control individuals from a larger set of available controls with the first four principal components (PCAs) obtained from a subset of variants showing limited linkage disequilibrium (n=267,000). To do so, the distance between every case individual and control individual was defined as the Euclidean distance between the individuals in a space based on the first four principal components, where each axis was also multiplied by its corresponding eigenvalue. After pairwise matching of 575 cases and 1,427 controls and accounted for the first four PCAs, 542 cases and 542 genetically matched controls were retained for downstream analysis.

Two tiers of filtering of control genotyping quality was performed using a call rate (<95%) and Hardy-Weinberg Equilibrium (HWE) p-value<10^"3. An additional 1 ,367 variants were further removed for inconsistent allele differences with IMPUTE2 1000 Genomes Project panel data. Prior to imputation, SNPs with minor allele frequency (MAF)< 5% were removed (a total of 349,801 were filtered based on QC and MAF) leaving a final number of 555,432 variants for further analysis and imputation.

Genome-wide SNP imputation was performed for the cleaned dataset to identify additional SNPs possibly showing associations. SHAPEIT²³ software was used to estimate phased haplotypes from the directly observed genotype data. Haplotypes derived from a European ancestry panel, consisting on samples from CEU, FIN, GBR, IBS and TSI from 1000 Genomes Project (February 2012 release), was used as a reference. Imputation was conducted using IMPUTE2. The inflation factor (λ) between cases and controls across all SNPs was 1.06. SNPTEST software (v2.3)²⁴ was used to calculate p-values based on a one degree-of-freedom score test for a logistic regression which assumes that the allele effect on the genotype for each SNP is additive. The score test implemented in SNPTEST allows for genotypic uncertainty via missing data likelihood, therefore it is applicable to both imputed genotypic data (i.e. in Stage 1) and to directly genotyped data (i.e. all stages). P- values were calculated for each stage separately, for Stages 1 and 2 combined, and finally for a joint analysis with all stages combined as one sample. Model parameters were estimated with a random subset of 200 individuals before imputation on the entire dataset. Regions were deemed for follow-up in Stage 2 if they had a SNP with an association p<10^"4 in Stage 1. A minimum of 2 SNPs was selected from each region for Stage 2 genotyping. Where possible, the linkage disequilibrium (LD) of those two SNPs was low (?<0.2), where one of them was the variant with direct genotyping data showing the lowest p-value, and the other was the variant with imputed data showing the lowest p-value. Based on these criteria, a total of 40 SNPs for 19 loci were selected (2 SNPs per loci except for chrl VTOLUP, chrl 7/SPPL2C, and chr7/MAD1L1 regions with 3 SNPs; for c rtlS H region with only 1 SNP).

In order to provide a better coverage of genetic variants for the previously reported region on chromosome 11 p15.5, containing TOLLIP and MUC5B, an additional set of tagging SNPs (tSNPs) were selected using the multiple-marker selection algorithm, haplotype i², included in TagIT 3.03 software.²⁵ A set of 23 chrl MTOLLIP tSNPs under previously described criteria²⁶ from 380 European individuals (CEU, FIN, GBR, IBS and TSI) in 1000 Genomes Project Consortium²⁷ were selected. The common polymorphism of MUC5B (rs35705950) associated with familial and sporadic IPF cases was used as a positive control for genotyping quality and association. A total of 64 SNPs were compiled and submitted to Assay Design Suite (https://www.mysequenom.com/ToolsMassArray online design) for primers and probes design. Twelve of these SNPs failed during assay design and were considered failed and discarded from the analysis. A list of the remaining 52 SNPs from the 19 regions are shown in Fig. 7 along with their association p-values and MAFs.

A subset of 6 SNPs (rs111521887, rs17690703, rs35705950, rs5743890, rs5743894, rs7144383) from 3 loci showing a statistically significant association p- value<5x10^"8 in the joint analysis of Stages 1 and 2 and with the same direction of effects in the two stages was compiled for Stage 3 replication (Fig. 11).

As the SNPs with the best p-value in the Stage 1 discovery GWAS were all a result of imputation, 541 of the 633 cases previously genotyped by the SNP array were compiled and genotyping was performed using iPLEX Gold™ platform to validate the findings. Approximately 10% of the samples were genotyped by TaqMan™ allelic discrimination assays (Applied Biosystems) to monitor genotyping quality. Genotyping was blind to case-control status. Samples with disconcordant genotypes were discarded. Linkage disequilibrium assessment

Linkage disequilibrium (LD) between SNPs in the MUC5B/T0LLIP region was measured using pairwise r² measures.⁸ The mode of inheritance for these SNPs (dominant, recessive) was determined by comparing the odds ratios of the heterozygous and at-risk homozygous genotypes. A regression-based conditional analysis of the interaction between MUC5B and TOLLIP SNPs on IPF susceptibility was implemented in the R statistical package.⁹

TOLLIP Gene expression in IPF lung tissues

Gene expression profiling data of IPF lungs was obtained from the Lung Genomics Research Consortium (LGRC) website. A total of 67 IPF individuals have paired genotype of SNPs associated with susceptibility and gene expression profiling data. The TOLLIP gene expression levels in these 67 samples were stratified into two groups according to presence or absence of the minor allele. Two-group comparison was performed using unequal variance i-test.

Mortality analysis for individual loci

Three case series in Stages 1 and 2 averaging follow-up data between 22 to 70 months (Fig. 12) were subjected to Cox regression analyses for mortality using the SPSS package (SPSS Inc., Chicago, IL) on the three IPF susceptibility loci that showed an overall p<10^"8 in Stages 1 and 2. Time "at risk" was defined as the interval between the date of enrollment in a given study and date of the last follow- up, lung transplant, or death. Lung transplant patients (2%, 7%, and 25% in InterMune, UChicago and UPittsburgh, respectively) were censored at time of transplant from the analysis, as potential confounders of survival. Univariate and multivariate analyses, considering relevant demographic and clinical parameters in the models, were conducted as appropriate. A single aggregate result for each locus was obtained by means of a meta-analysis applying both fixed and random effect models¹⁰ as appropriate to account for the different available follow-up data among the case series studied.

Average follow-up data of 22 to 70 months was available for a subset of samples in 3 case series included in Stages 1 and 2 (Fig. 12). These case series were utilized mortality analyses was performed on the previously identified MUC5B promoter SNP and 5 novel SNPs within susceptibility loci that showed an overall association p<10^"8 in Stages 1 and 2 assuming that the genotypic effects were additive. Logistic regressions were used initially to explore SNP effects comparing alive vs. dead patients. A more appropriate analysis of survival was then assessed on the 5 novel SNPs only, utilizing time "at risk".

All transplanted cases were censored from these analyses in order to avoid the confounding factor associated with IPF mortality. Univariate and multivariate analyses, using models considering relevant demographic and clinical parameters (such as age, gender, tobacco history, forced vital capacity (FVC) percent predicted, diffusing capacity of carbon monoxide (D_LCO) percent predicted, and recruitment center) were conducted. The heterogeneity of the Kaplan-Meier mortality curves as a function of genotypes for each SNP was assessed by the log-rank test. Hazard ratio (HR) estimates were obtained using Cox proportional hazard analyses. Schoenfeld residuals were used to assess the assumption of proportional hazards.

A single aggregate result for each locus was obtained with METASOFT by means of a meta-analysis. For that, both fixed and random effect models were applied, the latter corresponding to an optimized model to detect associations under heterogeneity, which was applied if heterogeneity between study samples was evident, as indicated by the significance of the Cochran's Q statistic.

Sample characteristics

Demographic and clinical characteristics of IPF patients and controls in each stage are shown in Fig. 9 and Fig. 10. As in other studies, cases in the discovery stage had a wide range of disease severity and age. The Stage 2 patients were a blend of cases with milder (InterMune) and more severe disease undergoing lung transplantation (LTOG), yielding a very similar group to Stage 1 based on the overall physiologic severity as assessed by forced vital capacity (FVC) and diffusing capacity for carbon monoxide (DLCO) (Fig. 10). The Stage 3 patients were more severe, derived from the LTOG and ACE-lPF study. However all IPF cases met diagnostic criteria¹⁶ and were all of similar age and gender. Characteristics of cases with follow-up data for survival analysis are shown in Fig. 12.

Genome-wide association study, replication, and regional association

After completion of sample quality control and genotype filtering, 542 of the

633 cases and 542 genetically matched controls selected from the available 1 ,545- pooled controls were retained for Stage 1. A total of 555,432 high quality genotyped variants were used for imputation which resulted in 10,601 ,812 best imputed common variants with minor allele frequency (MAF) > 5%. The GWAS was then conducted using the genotyped and imputed SNPs. The inflation was modest with a test statistics of λ=1.06, indicating an insignificant confounding of the results by population stratification (Fig. 3).

A total of 19 genomic loci with an association (p-value<10^"4) were identified from Stage 1 discovery GWAS. Fifty-two SNPs were compiled from the combination of genotyped, imputed, and tSNPs. Fig. 7 summarizes annotations for these loci, allele frequency in reference populations (CEU, EUR), IPF cases, controls, as well as their association p-values with susceptibility to IPF.

Directly genotyped SNPs in Stage 2 nominally replicated many of the associations with IPF susceptibility detected in Stage 1 GWAS. Five imputed SNPs and the previously identified UC5B promoter SNP reached genome-wide significance levels (p-value<4.2 x 10^"8) in a joint analysis of Stage 1 and 2. These six SNPs were re-genotyped in Stage 1 samples and the association confirmed. Fig. 1 highlighted loci of chrl 1 p15.5 containing SNPs of TOLLIP (rs111521887, rs5743894, rs5743890) and MUC5B (rs35705950); chr17q21.31 of SPPL2C (rs17690703) and Chr14q21.3 of MDGA2 (rs7144383).

In Stage 3, the association of four of the SNPs in two novel loci (ch11 p15.5/7O _L/P and ch17q21.ZMSPPL2C) was replicated, as well as the association of MUC5B promoter SNP, previously reported in an independent study,¹¹ with IPF susceptibility. Each of them had overall combined p<10^"9, showing effects in the same direction across all single stages (i.e. allele rs35705950_T constitutes as risk for IPF, while alleles rs5743890_G and rs17690703_T protect from IPF) (Fig. 13). Regional associations of the genotyped and imputed SNPs at ch11 p15.5 and ch17q21.31 loci are shown in Fig. 4. (A) ch11p15.5/MUC5B/TOLLIP locus and (B) ch17q21. ZMSPPL2C locus as defined by the positions of SNPs showing a linkage disequilibrium with the lead SNP rs5743894 (A; p=2.2 x 10^"6) and SNP rs17690703 (B; p=4.9 x 10^"6), respectively. Disease associations as indicated by -Iog10 p-values are plotted against chromosomal positions. Diamonds and circles represent individual SNP of the GWA screen using genotyped and imputed data, respectively. Colored diamonds indicate SNP data obtained by the analysis of 542 IPF cases and 542 controls. Additional tSNPs selected for better coverage are included. Associations were assessed assuming recessive and additive modes of inheritance for the MUC5B/TOLLIP locus and the SPPL2C locus, respectively. Levels of linkage disequilibria (r²) with the best-associated SNP (red diamonds) are color-coded. Blue lines indicate recombination fractions as estimated from the European panel sample. Horizontal arrows mark structural human genes as annotated by Human Genome Build 37.3/gh19 of the UCSC (Genome Bioinformatics Group, University of California, Santa Cruz). Symbols, position and direction of each gene within the loci are shown at the bottom of the plot.

In ch11 p15.5 locus, the r² values of MUC5B promoter SNP, rs35705950, and TOLLIP S Ps (rs111521887, rs5743894, and rs5743890) were 0.07, 0.16, and 0.01, respectively. These low levels of LD indicate that the signals of association for TOLLIP SNPs are independent from MUC5B (Fig. 4A). Moreover, the mode of effect for the MUC5B SNP (dominant) was different than that for the TOLLIP SNPs (additive or recessive), providing additional evidence that these are independent signals. Lastly, in a conditional regression-based analysis, genotypes were combined according to the mode of inheritance and it was found that, while the /C56/rs35705950 SNP showed the strongest signal (p=2x10^"16), the 7OLL/P/rs11152887/rs5743894/rs5743890 SNPs remained associated with IPF (p=0.05).

Relationship between presence of susceptibility alleles by genotype and survival in IPF case series

Enrollment criteria in the InterMune study skewed patients towards better pulmonary function as assessed by FVC (71.56±12.68 percent predicted), and less heterogeneity of disease severity as assessed by a lesser standard deviation on lung function than in the UChicago study (65.17±18.29) or the UPittsburgh study (65.27±19.72). Also, InterMune had a shorter average follow-up period (22 months) in survivors, than UChicago (40 months) or UPittsburgh (70 months) (Fig. 12). Since the follow-up time varied widely in each IPF case series, it was decided to evaluate the novel susceptibility alleles in association with mortality, both separately and jointly through a meta-analysis.

Three SNPs were associated with mortality in an initial logistic regressions analysis of the overall case series (Fig. 14). Univariate Cox regression analysis in InterMune, UChicago, UPittsburgh as well as in the meta-analysis further demonstrated that the novel risk alleles for susceptibility in 11p15.5/rOZ.L/P and 17q21.3 MSPPL2C loci were associated with protection from IPF mortality (Fig. 1 and Fig. 15). Briefly, allele rs5743890_G was associated with increased mortality in UChicago (p=0.008) and in UPittsburgh (p=0.025). Similarly, allele rs17690703_T was associated with increased mortality in InterMune (p=0.044) and in UChicago (p=0.030). Meta-analysis of the 3 case series sustained associations with mortality (p=0.034 for rs17690703_T, and p=0.0009 for rs5743890_G) (Fig. 15). Notably, the meta-analysis of rs17690703_T with increased mortality suggested significant study heterogeneity among the three case series (Cochran's Q-value=9.54, p=0.0085). Multivariate analyses adjusting for recorded covariates (i.e. age, gender, tobacco history, FVC, D_LCO, at each recruitment center) that maintained p-value<0.1 in regression models did not appreciably change these findings (Fig. 16). Results of additional analyses pertaining to survival are presented in Fig. 17, Fig. 18, and Fig. 19.

The SPPL2C variant rs17690703 failed to meet significance only after adjustment of disease severity (p=0.06). This is highly suggestive that the region may have a relationship to survival. Intronic variants are unlikely to be causal. In fact, this variant might actually be a tag for an altogether different gene within the H2 inversion. Because H2 is rare among individuals of African (6%) and Asian (1%) ancestry, and the IPF cohort was overwhelmingly comprised of individuals of European ancestry (EA), where H2 occurs in approximately 20%, further evaluation of the role of H2 focused on an EA group. H2-specific SNPs tag the inversion, and are strongly correlated, but incompletely linked, to the SPPL2C variant (rs17690703) (r2 = 0.76). Three SNPs that tag H2 (rs916793, rs2902662, rs17651213) were included on the Affymetrix 6.0 GeneChip^® (Affymetrix, Santa Clara, CA) used in the GWAS. Several proxy SNPs in complete linkage disequilibrium (LD) (r2 = 1) with SNPs that tag H2 were also identified. Included in this analysis were 120 EA individuals from the University of Chicago cohort for which mortality and genotype data were available. Of this group, 28.3% (n=34) carried an H2 haplotype, a 40% increase over the general population estimate. Of these 34 patients, 30 (88%) were heterozygous and 4(12%) homozygous for the inversion. Assignment of H2 status was based on the presence of all 3 SNPs that tag H2 (rs916793, rs2902662, rs17651213). This method allowed H2 assignment to all but 3 patients in this cohort. The addition of a proxy SNP (rs199448) allowed H2 status to be determined for the 3 remaining patients. These data suggest that presence of an H2 haplotype increases susceptibility to IPF.

To perform the survival analysis, the cohort of 120 EA individuals was then stratified based on H2 (absent vs. present) and SPPL2C (wild-type (WT) vs variant (Var)) status. Inclusion of SPPL2C in the stratification is necessary given the strong correlation between the two variants and potential confounding by SPPL2C. Survival analysis for each group showed a statistically significant difference (p=0.04) in mortality risk between the 4 groups (Fig. 5). The vast majority of patients belonged to either the H2(-)/ SPPL2CANT or H2(+)/SPPL2C-Var group, making it difficult to draw a conclusion about the two smaller groups. When comparing one group to another the statistical significance was lost. But these data suggest that H2 and SPPL2C contribute to mortality risk independently, SPPL2C (wild-type (WT) vs variant (Var)) status. Inclusion of SPPL2C in the stratification is necessary given the strong correlation between the two variants and potential confounding by SPPL2C.

The presence of a common variant associated with IPF, its location within an inversion, the independence of chromosome 17 from chromosome 11 and the possibility of variants related to survival in IPF indicate that additional sequencing may facilitate identification of causal or regulatory variants within the region.

A barrier inherent in the large amount of data generated by next generation sequencing of genetic regions involves methods to evaluate uncommon or rare variants. However the importance of regions and likelihood of additional uncommon and rare variants can be discovered by using aggregating or collapsing methods within regions. Indeed, regions with common variants have a greater number of uncommon or rare variants as well. One approach using the fundamentals of a logistic regression involves an L1 -regularized regression to accommodate large number of variants. The Lasso method is a shrinkage and selection method for linear regression. It minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients. It has connections to soft-thresholding of wavelet coefficients, forward stagewise regression, and boosting methods. It is recognized that the power to detect true rare variants is limited by the number of cases (n=542) conducted on the array. A preliminary analysis based on aggregating the common and uncommon variants by region, ranking them by p values at 10^"3 or smaller to yield expected rates of variants beyond that of the populace in general was therefore conducted. The 542 cases versus controls were examined using a mutli-variant genetic association test of functional features of the genome by LASSO. This represents 37,000 functional features, which include but are not limited to protein coding regions as well as known lincRNA and miRNA, etc. The genome was analyzed in 5 MB increments or "clusters" where, 4.6 represents chromosome 4 and the 6 would then be cluster 35-30MB along that chromosome. Using 5x10^"3 as a cutoff for significance, there were over 100 regions of interest. In Table X are the top 6 regions. Not surprisingly, MUC5B and TOLLIP in the same cluster were ranked second. Chromosome 17 actually demonstrated multiple clusters of interest. The preliminary data demonstrate several additional loci not previously identified, while also emphasizing the importance of 17q21.31 and 11p15.5 region, as well as other regions. This analysis demonstrates the ability to handle complex datasets of uncommon and rare variants to generate novel discovery.

To address the issue that variants within a region could exert an effect in opposite directions, a unique dataset with survival cohorts was used that that allows performance of linear regression analysis of each individual variant within the region to assess the direction of effect and assignment of an additive or subtractive model for multiple variants within a region. In fact, TOLLIP SNPs demonstrated this phenomenon, in that rs111521887 (G for T), or rs5743894 (G for T) of TOLLIP SNPs were associated with increased susceptibility to IPF while rs5743890 (G for T) was associated with decreased susceptibility, or a protective effect in developing IPF. However, all three SNPs seem to exert an effect reducing gene expression levels of TOLLIP in lung tissue, again arguing for a better understanding of causal or regulatory SNPs.

To further the integration of multiple genetic markers with clinical parameters, the two published SNPs in 11 p15.5 were examined to determine if there was an interaction. The intersection of these two independent SNPs in TOLLIP and MUC5B demonstrated only a weak interaction with an r² of 0.009 by linear regression. The relationship with survival appears to therefore be additive in preliminary data. (Fig. 6A). Initial results indicated that the association with mortality for SPPL2C moved to only trend levels (p=0.06) after adjusting for severity of illness, with a modest hazard ratio of 1.3. However, taking into account information regarding the H1/H2 status and its influence, it is more than plausible that other SNPs in the regions will carry greater hazard ratio of significance. Therefore the SPPL2C was incorporated into a risk index using a multidimensional approach to collapse categories down to 4 groups (Fig. 6B). An analysis using a weighted sum of risk index alleles across the SNPs where the Weighted Personalized Gene Risk Score (WPGRS) is obtained by multiplying the logHR by the number of risk alleles by genotype across 3 SNPs gives 17 categories. The unadjusted Cox regression model gave HR=6.51 (2.91-14.55), p=5.02x10^"06 and the adjusted Cox regression gave HR=6.60 (2.71-16.09), p=3.23x10^"05 (adjusted for age, sex, FVC, DLCO, study center) demonstrating the power of this approach. The identification of causal SNPs in the TOLLIP or SPPL2C region is expected to increase HR.

Novel genetic loci associated with IPF susceptibility.

In the joint analysis, two loci (ch11p15.5 and ch17q21.31 ) showed clear evidence of replication with effects in the same direction as in Stage 1 discovery GWAS and genome-wide significance levels of p<10^"8. Association of the genotyped and imputed SNPs at ch11p15.5 and ch17q21.31 loci is shown in Figures S2A and S2B, respectively. SNP rs35705950 on locus chr11p15.5/ /C5S has been firmly implicated in association with IPF.¹⁷ Notably, three novel SNPs were revealed on the same locus, located in the intronic regions of TOLLIP gene, which were associated with IPF (rs111521887_G, Odds ratio (OR)=1.48, 95%CI=1.32-1.66, p=2.2x10^"12; rs5743894_G, OR=1.49, 95%CI=1.33-1.68, p=1.35x10^"12; rs5743890_G, OR=0.61 , 95%CI=0.52-0.71, p=3.43x10^"11) (Fig. 13). Subsequent logistic regression analyses conditioned on the marker SNPs in ch11 p15.5 revealed that these TOLLIP SNPs were not in LD with the MUC5B SNP, rs35705950. The i² values were 0.07, 0.16, 0.01 between rs35705950 and rs111521887, rs5743894, and rs5743890, respectively. This data indicated that the signals of association for these three SNPs were not correlated to rs35705950. Additionally, the mode of inheritance for the MUC5B SNP (dominant) is different than that for the TOLLIP SNPs (additive or recessive), adding to the evidence for independent signals. Lastly, genotypes were combined according to the mode of inheritance to identify the underlying genetic mode and perform a joint conditional analysis of rs35705950 and rs111521887: the MUC5B SNP shows the strongest signal (p<2x10^"16), but the TOLLIP SNP remains associated (p=0.05). The second novel locus, which is located on chromosome 17q21.31 , was indicated by imputation and supported by physical genotyping of SNP rs17690703_T (OR=0.70, 95%CI=0.62-0.79, p=5.70x10^"9)(Fig. 13).

For the third novel locus on chromosome 14q21.3, replication of SNP rs7144383 was achieved in an independent case-control association study after imputation of the 1000 Genomes Project data demonstrating an OR=1.57, 95%CI=1.18-2.08, p=3.50x10^"8 in the joint analysis. In a joint analysis of Stage 3 data along with data from the two previous stages this association maintained a suggestive association (OR=1.44, 95%CI=1.23-1.69, p=3.7x10^"6) (Fig. 13).

The following embodiments are included:

1. A method of determining whether a human subject has or is at risk of developing an interstitial lung disease, the method comprising detecting whether the genome of the subject comprises a genetic variant of at least one of TOLLIP, SPPL2C, and MDGA2 and determining whether the subject has or is at risk of developing an interstitial lung disease, the presence of the genetic variant indicating that the subject has or is at risk of developing the interstitial lung disease.

2. The method of embodiment 1, wherein the method comprises detecting whether the genome of the subject comprises a genetic variant of TOLLIP.

3. The method of embodiment 1 or embodiment 2, wherein the method comprises detecting whether the genome of the subject comprises a genetic variant of SPPL2C.

4. The method of any one of embodiments 1-3, wherein the method comprises detecting whether the genome of the subject comprises a genetic variant of MDGA2.

5. The method of any one of embodiments 1-4, further comprising detecting whether the genome of the subject comprises a genetic variant of MUC5B.

6. The method of any one of embodiments 1-5, wherein the method comprises detecting whether the genome of the subject comprises one or more genetic variants having a single nucleotide polymorphism selected from the group consisting of rs111521887, rs5743894, rs5743890, rs17690703, and rs7144383.

7. The method of embodiment 6, wherein the method comprises detecting whether the genome of the subject comprises the genetic variant having the single nucleotide polymorphism rs111521887. 8. The method of embodiment 6 or 7, wherein the method comprises detecting whether the genome of the subject comprises the genetic variant having the single nucleotide polymorphism rs5743894.

9. The method of any one of embodiments 6-8, wherein the method comprises detecting whether the genome of the subject comprises the genetic variant having the single nucleotide polymorphism rs5743890.

10. The method of any one of embodiments 6-9, wherein the method comprises detecting whether the genome of the subject comprises the genetic variant having the single nucleotide polymorphism rs17690703.

11. The method of any one of embodiments 6-10, wherein the method comprises detecting whether the genome of the subject comprises the genetic variant having the single nucleotide polymorphism rs7144383.

12. The method of any one of embodiments 6-11 , further comprising detecting whether the genome of the subject comprises a genetic variant having a single polynucleotide polymorphism rs35705950.

13. The method of any one of embodiments 1-12, wherein the interstitial lung disease is a fibrotic interstitial lung disease.

14. The method of embodiment 13, wherein the interstitial lung disease is idiopathic pulmonary fibrosis or familial interstitial pneumonia.

15. A method of prognosing an interstitial lung disease in a human subject, the method comprising detecting whether the genome of the subject comprises a genetic variant of TOLLIP or SPPL2C and determining a prognosis for the subject, the presence of the genetic variant gene being prognostic of increased or decreased survival.

16. The method of embodiment 15, wherein the method comprises detecting whether the genome of the subject comprises a genetic variant of TOLLIP.

17. The method of embodiment 15 or 16, wherein the method comprises detecting whether the genome of the subject comprises a genetic variant of SPPL2C.

18. The method of any one of embodiments 15-17, further comprising detecting whether the genome of the subject comprises a genetic variant of MUC5B.

19. The method of any one of embodiments 15-18, wherein the genetic variant has at least one single nucleotide polymorphism selected from the group consisting of rs17690703 and rs5743890, and wherein the single nucleotide polymorphism is predictive of decreased survival.

20. The method of any one of embodiments 15-19, wherein the genome of the subject comprises the single nucleotide polymorphism rs35705950, and wherein the single nucleotide polymorphism is predictive of increased survival.

21. The method of any one of embodiments 15-20, wherein the interstitial lung disease is a fibrotic interstitial lung disease.

22. The method of embodiment 21, wherein the interstitial lung disease is idiopathic pulmonary fibrosis or familial interstitial pneumonia.

23. A method of detecting the presence or absence of at least one genetic variant in a human subject, the method comprising: detecting the presence or absence of at least one genetic variant of at least one of TOLLIP, SPPL2C, and MDGA2 in a sample from the subject.

24. The method of embodiment 23, wherein the at least one genetic variant includes a genetic variant of TOLLIP.

25. The method of embodiment 23 or embodiment 24, wherein the at least one genetic variant includes a genetic variant of SPPL2C.

26. The method of any one of embodiments 23-25, wherein the at least one genetic variant includes a genetic variant of MDGA2.

27. The method of any one of embodiments 23-26, further comprising testing the sample for a genetic variant of MUC5B.

28. The method of any of embodiments 23-27, wherein the at least one genetic variant includes at least one of the genetic variants listed in Fig. 7.

29. The method of embodiment 28, wherein the at least one genetic variant includes one or more of a single nucleotide polymorphism selected from the group consisting of rs1 1521887, rs5743894, rs5743890, rs17690703, and rs7144383.

30. The method of embodiment 29, wherein the at least one genetic variant includes rs111521887.

31. The method of embodiment 29 or 30, wherein the wherein the at least one genetic variant includes rs5743894.

32. The method of any one of embodiments embodiment 29-31 , wherein the at least one genetic variant includes rs5743890. 33. The method of any one of embodiments 29-32, wherein the at least one genetic variant includes rs17690703.

34. The method of any one of embodiments 29-33, wherein the at least one genetic variant includes rs7144383.

35. The method of any one of embodiments 29-34, further comprising testing the sample for the genetic variant rs35705950.

36. The method of any one of embodiments 22-35, wherein the subject has or is suspected of having or is at risk for developing an interstitial lung disease.

37. The method of embodiment 36, wherein the interstitial lung disease is a fibrotic interstitial lung disease or familial interstitial pneumonia.

38. The method of embodiment 37, wherein the interstitial lung disease is idiopathic pulmonary fibrosis.

39. A method of detecting the presence or absence of at least two genetic variants in a human subject having or suspected of being at risk for developing an interstitial lung disease, the method comprising: detecting the presence or absence of at least two of the genetic variants listed in Fig. 7 in a sample from the subject.

40. The method of embodiment 39, wherein the at least two genetic variants includes from two to 52 of the genetic variants listed in Fig. 7.

41. The method of embodiment 40, wherein the at least two genetic variants includes from two to 44 of the genetic variants listed in Fig. 11.

42. A method of testing for interstitial lung disease in a human subject, the method comprising: detecting a level of TOLLIP gene expression in a sample from the subject, a low level of TOLLIP gene expression relative to a control being indicative of interstitial lung disease.

43. The method of embodiment 42, wherein the level of gene expression is detected by measuring directly or indirectly TOLLIP mRNA.

44. The method of embodiment 42, wherein the level of gene expression is detected by measuring Tollip protein.

45. A method of treating a human subject having an interstitial lung disease, the method comprising: detecting a level of TOLLIP expression according to any one of embodiments 42-44; and if the subject has a low level of TOLLIP expression relative to a control, administering to the subject an amount of a Tollip agonist, Tollip or a genetic construct expressing TOLLIP effective to treat the interstitial lung disease.

46. A kit for predicting, diagnosing, or prognosing interstitial lung disease in a human subject, the kit consisting essentially of: at least one probe or primer for detecting the presence or absence of at least one genetic variation in at least one of TOLLIP, SPPL2C, and MDGA2.

47. The kit of embodiment 46, wherein the at least one probe or primer includes probes or primers for detecting at least one genetic variation in TOLLIP.

48. The kit of embodiment 46 or 47, wherein the at least one probe or primer includes probes or primers for detecting at least one genetic variation in SPPL2C.

49. The kit of any one of embodiments 46-48, wherein the at least one probe or primer includes probes or primers for detecting at least one genetic variation in MDGA2.

50. The kit of any one of embodiments 46-49, further comprising at least one probe or primer for detecting at least one genetic variation in MUC5B.

51. The kit of any one of embodiments 46-50, wherein the genetic variation includes at least one of rs111521887, rs5743894, rs5743890, rs17690703, rs7144383, and rs35705950.

52. The kit of any one of embodiments 46-51 , wherein the at least one probe or primer includes at least one probe or primer for detecting one or more of the genetic variations listed in Fig. 7.

53. A kit for predicting, diagnosing, or prognosing interstitial lung disease in a human subject, the kit comprising: at least one probe or primer for detecting the presence or absence of at least two genetic variations selected from the genetic variations listed in Fig. 7.

54. The kit of embodiment 53, wherein the kit comprises probes and/or primers for detecting the presence or absence of from two to 52 of the genetic variations listed in Fig. 7.

55. The kit of embodiment 54, wherein the kit comprises probes and/or primers for detecting the presence or absence of from two to 44 of the genetic variations listed in Fig. 11.

56. A method of determining whether a human subject has or is at risk of developing an interstitial lung disease, the method comprising detecting whether the genome of the subject comprises at least two genetic variants selected from the group of variants listed in Fig. 7 and determining whether the subject has or is at risk of developing an interstitial lung disease, the presence of the genetic variant indicating that the subject has or is at risk of developing the interstitial lung disease.

57. The method of embodiment 56, wherein the at least two genetic variants includes from two to 52 of the genetic variants listed in Fig. 7.

58. The method of embodiment 57, wherein the at least one genetic variant includes from two to 44 of the genetic variants listed in Fig. 11.

59. The method of any one of embodiments 56-58, wherein the interstitial lung disease is a fibrotic interstitial lung disease.

60. The method of embodiment 59, wherein the interstitial lung disease is idiopathic pulmonary fibrosis or familial interstitial pneumonia.

61. A method of prognosing an interstitial lung disease in a human subject, the method comprising detecting whether the genome of the subject comprises at least two of the genetic variants listed in Fig. 7 and determining a prognosis for the subject, the presence of the genetic variant gene being prognostic of increased or decreased survival.

62. The method of embodiment 61 , wherein the interstitial lung disease is a fibrotic interstitial lung disease.

63. The method of embodiment 62, wherein the interstitial lung disease is idiopathic pulmonary fibrosis or familial interstitial pneumonia.

64. A method of prognosing an interstitial lung disease in a human subject, the method comprising detecting whether the genome of the subject comprises an inversion in the 17q21.31 chromosomal region and determining a prognosis for the subject, the presence of the inversion being prognostic of increased or decreased survival.

65. A kit comprising a nucleic acid primer capable of hybridizing to a genetic variant TOLLIP nucleic acid, SPPL2C nucleic acid, or MDGA2 nucleic acid.

66. The kit of claim 65, wherein said genetic variant has been extracted from a human subject with an interstitial lung disease or is an amplification product of a nucleic acid extracted from a human subject with an interstitial lung disease.

67. The kit of claim 65 or 66, wherein said interstitial lung disease is a pulmonary fibrotic condition. 68. The kit of one of claims 65-67, further comprising a first labeled nucleic acid probe capable of hybridizing to an amplification product of said genetic variant TOLLIP nucleic acid, SPPL2C nucleic acid, or MDGA2 nucleic acid.

69. The kit of claim 68, further comprising a second labeled nucleic acid probe capable of hybridizing to an amplification product of said genetic variant

TOLLIP nucleic acid, SPPL2C nucleic acid, or MDGA2 nucleic acid.

70. The kit of claim 69, wherein said first labeled nucleic acid probe comprises a first label and said additional labeled nucleic acid probe comprises a second label, wherein said first and second label are capable of fluorescence resonance energy transfer when hybridized to said genetic variant TOLLIP nucleic acid, SPPL2C nucleic acid, or MDGA2 nucleic acid.

71. An in vitro complex comprising a first nucleic acid probe hybridized to a genetic variant nucleic acid, said genetic variant nucleic acid comprising a genetic variant TOLLIP, SPPL2C or MDGA2 gene sequence, wherein said genetic variant nucleic acid is extracted from a human subject with an interstitial lung disease or is an amplification product of a nucleic acid extracted from a human subject with an interstitial lung disease.

72. The in vitro complex of claim 72, wherein said complex further comprises an second labeled nucleic acid probe hybridized to said genetic variant nucleic acid.

73. The in vitro complex of claim 72, wherein said first labeled nucleic acid probe comprises a first label and said second labeled nucleic acid probe comprises a second label, wherein said first and second label are capable of fluorescence resonance energy transfer.

74 An in vitro complex comprising a thermally stable polymerase bound to a genetic variant nucleic acid, said genetic variant nucleic acid comprising a genetic variant TOLLIP, SPPL2C or MDGA2 gene sequence, wherein said genetic variant nucleic acid is extracted from a human subject with an interstitial lung disease or is an amplification product of a nucleic acid extracted from a human subject with an interstitial lung disease.

75. The in vitro complex of claim 74, wherein the complex further comprises a nucleic acid primer hybridized to said genetic variant nucleic acid.

It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All patents, patent applications, internet sources, and other published reference materials cited in this specification are incorporated herein by reference in their entireties. Any discrepancy between any reference material cited herein or any prior art in general and an explicit teaching of this specification is intended to be resolved in favor of the teaching in this specification. This includes any discrepancy between an art- understood definition of a word or phrase and a definition explicitly provided in this specification of the same word or phrase.

Each of the following publications is incorporated by reference in its entirety:

1. Mushiroda T, Wattanapokayakit S, Takahashi A, et al. A genome-wide association study identifies an association of a common variant in TERT with susceptibility to idiopathic pulmonary fibrosis. Journal of medical genetics 2008;45:654-6.

3. Raghu G, Brown KK, Bradford WZ, et al. A placebo-controlled trial of interferon gamma-1 b in patients with idiopathic pulmonary fibrosis. The New England journal of medicine 2004;350:125-33.

4. Lederer DJ, Kawut SM, Wickersham N, et al. Obesity and primary graft dysfunction after lung transplantation: the Lung Transplant Outcomes Group Obesity Study. American journal of respiratory and critical care medicine 2011; 184: 1055-61.

5. Noth I, Anstrom KJ, Calvert SB, et al. A placebo-controlled randomized trial of warfarin in idiopathic pulmonary fibrosis. American journal of respiratory and critical care medicine 2012;186:88-95.

6. Raghu G, Collard HR, Egan JJ, et al. An official ATS/E RS/J RS/AL AT statement: idiopathic pulmonary fibrosis: evidence-based guidelines for diagnosis and management. American journal of respiratory and critical care medicine 2011 ;183:788-824.

7. Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nature reviews 2010; 11 :499-511.

8. Gabriel SB, Schaffner SF, Nguyen H, et al. The structure of haplotype blocks in the human genome. Science (New York, NY 2002;296:2225-9.

9. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. . 2012. 10. Han B, Eskin E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. American journal of human genetics 2011 ;88:586-98.

11. Seibold MA, Wise AL, Speer MC, et al. A common MUC5B promoter polymorphism and pulmonary fibrosis. The New England journal of medicine

2011 ;364:1503-12.

12. American Thoracic Society. Idiopathic pulmonary fibrosis: diagnosis and treatment. International consensus statement. American Thoracic Society (ATS), and the European Respiratory Society (ERS). American journal of respiratory and critical care medicine 2000; 161(2 Pt 1):646-64.

13. American Thoracic Society/European Respiratory Society International Multidisciplinary Consensus Classification of the Idiopathic Interstitial Pneumonias. This joint statement of the American Thoracic Society (ATS), and the European Respiratory Society (ERS) was adopted by the ATS board of directors, June 2001 and by the ERS Executive Committee, June 2001. American journal of respiratory and critical care medicine 2002;165(2):277-304.

14. Raghu G, Collard HR, Egan JJ, et al. An official ATS/E RS/J RS/AL AT statement: idiopathic pulmonary fibrosis: evidence-based guidelines for diagnosis and management. American journal of respiratory and critical care medicine 2011 ;183(6):788-824.

15. Raghu G, Brown KK, Bradford WZ, et al. A placebo-controlled trial of interferon gamma-1 b in patients with idiopathic pulmonary fibrosis. The New England journal of medicine 2004;350(2): 125-33.

16. Lederer DJ, Kawut SM, Wickersham N, et al. Obesity and primary graft dysfunction after lung transplantation: the Lung Transplant Outcomes Group Obesity

Study. American journal of respiratory and critical care medicine 2011 ;184(9); 1055- 61.

17. Noth I, Anstrom KJ, Calvert SB, et al. A placebo-controlled randomized trial of warfarin in idiopathic pulmonary fibrosis. American journal of respiratory and critical care medicine 2012;186(1 ):88-95.

18. Carvalho B, Bengtsson H, Speed TP, Irizarry RA. Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics (Oxford, England) 2007;8(2):485-99. 19. Carvalho BS, Louis TA, Irizarry RA. Quantifying uncertainty in genotype calls. Bioinformatics (Oxford, England) 2010;26(2):242-9.

20. Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. American journal of human genetics 2007;81(3):559-75.

21. Luca D, Ringquist S, Klei L, et al. On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants. American journal of human genetics 2008;82(2):453-63.

22. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS genetics 2006;2(12):e190.

23. Delaneau O, Marchini J, Zagury JF. A linear complexity phasing method for thousands of genomes. Nature methods 2012;9(2):179-81.

24. Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nature reviews 2010;11(7):499-511.

25. Weale ME, Depondt C, Macdonald SJ, et al. Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage- disequilibrium gene mapping. American journal of human genetics 2003;73(3):551- 65.

26. Flores C, Ma SF, Maresso K, Ober C, Garcia JG. A variant of the myosin light chain kinase gene is associated with severe asthma in African Americans. Genetic epidemiology 2007;31(4):296-305.

27. A map of human genome variation from population-scale sequencing. Nature 2010;467(7319):1061-73.

28. Seibold MA, Wise AL, Speer MC, et al. A common MUC5B promoter polymorphism and pulmonary fibrosis. The New England journal of medicine

2011 ;364(16):1503-12.

Claims

CLAIMS It is claimed:

1. A method of determining whether a human subject has or is at risk of developing an interstitial lung disease, the method comprising detecting whether the genome of the subject comprises a genetic variant of TOLLIP, SPPL2C or MDGA2 and determining whether the subject has or is at risk of developing an interstitial lung disease, the presence of the genetic variant indicating that the subject has or is at risk of developing the interstitial lung disease.

2. The method of claim 1 , wherein the method comprises detecting whether the genome of the subject comprises a genetic variant of TOLLIP.

3. The method of claim 1 or claim 2, wherein the method comprises detecting whether the genome of the subject comprises a genetic variant of SPPL2C.

4 The method of claim 3, further comprising determining if the subject carries an

H2 inversion in 17q21.31.

5. The method of claim 4, wherein the determining comprises determining if the subject comprises one or more single nucleotide polymorphisms selected from the group consisting of rs916793, rs2902662, rs17651213, and rs199448.

6. The method of any one of claims 1-5, wherein the method comprises detecting whether the genome of the subject comprises a genetic variant of MDGA2.

7. The method of any one of claims 1-6, further comprising detecting whether the genome of the subject comprises a genetic variant of MUC5B.

8. The method of any one of claims 1-7, wherein the method comprises detecting whether the genome of the subject comprises one or more genetic variants comprising a single nucleotide polymorphism selected from the group consisting of rs111521887, rs5743894, rs5743890, rs17690703, and rs7144383.

9. The method of claim 8, wherein the method comprises detecting whether the genome of the subject comprises the genetic variant comprising the single nucleotide polymorphism rs11 521887.

10. The method of claim 8 or 9, wherein the method comprises detecting whether the genome of the subject comprises the genetic variant comprising the single nucleotide polymorphism rs5743894.

11. The method of any one of claims claim 8-10, wherein the method comprises detecting whether the genome of the subject comprises the genetic variant comprising the single nucleotide polymorphism rs5743890.

12. The method of any one of claims 8-11 wherein the method comprises detecting whether the genome of the subject comprises the genetic variant comprising the single nucleotide polymorphism rs17690703.

13. The method of any one of claims 8-12, wherein the method comprises detecting whether the genome of the subject comprises the genetic variant comprising the single nucleotide polymorphism rs7144383.

14. The method of any one of claims 8-13, further comprising detecting whether the genome of the subject comprises a genetic variant comprising a single polynucleotide polymorphism rs35705950.

15. The method of any one of claims 1-14, wherein the interstitial lung disease is a fibrotic interstitial lung disease.

16. The method of claim 15, wherein the interstitial lung disease is idiopathic pulmonary fibrosis or familial interstitial pneumonia.

17. A method of prognosing an interstitial lung disease in a human subject, the method comprising detecting whether the genome of the subject comprises a genetic variant of TOLLIP or SPPL2C and determining a prognosis for the subject, the presence of the genetic variant gene being prognostic of increased or decreased survival.

18. The method of claim 17, wherein the method comprises detecting whether the genome of the subject comprises a genetic variant of TOLLIP.

19. The method of claim 17 or 18, wherein the method comprises detecting whether the genome of the subject comprises a genetic variant of SPPL2C.

20. The method of claim 19, further comprising determining if the subject carries an H2 inversion in 17q21.31.

21. The method of claim 20, wherein the determining comprises determining if the subject comprises one or more single nucleotide polymorphisms selected from the group consisting of rs916793, rs2902662, rs17651213, and rs199448.

22. The method of any one of claims 17-21 , further comprising detecting whether the genome of the subject comprises a genetic variant of MUC5B.

23. The method of any one of claims 17-22, wherein the genetic variant comprises a single nucleotide polymorphism selected from the group consisting of rs17690703 and rs5743890, and wherein the single nucleotide polymorphism is predictive of decreased survival.

24. The method of any one of claims 17-23, wherein the genome of the subject comprises the single nucleotide polymorphism rs35705950, and wherein the single nucleotide polymorphism is predictive of increased survival.

25. The method of any one of claims 17-24, wherein the interstitial lung disease is a fibrotic interstitial lung disease.

26. The method of claim 25, wherein the interstitial lung disease is idiopathic pulmonary fibrosis or familial interstitial pneumonia.

27. A method of detecting the presence or absence of a genetic variant in a human subject, the method comprising:

detecting the presence or absence of a genetic variant of TOLLIP, SPPL2C, or MDGA2 in a sample from the subject.

28. The method of claim 27, wherein the genetic variant is a genetic variant of TOLLIP.

29. The method of claim 27, wherein the genetic variant is a genetic variant of SPPL2C.

30. The method of claim 29, further comprising determining if the subject carries an H2 inversion in 17q21.31.

31. The method of claim 30, wherein the determining comprises determining if the subject comprises one or more single nucleotide polymorphisms selected from the group consisting of rs916793, rs2902662, rs17651213, and rs199448.

32. The method of 27, wherein the genetic variant is a genetic variant of MDGA2.

33. The method of any one of claims 27-32, further comprising detecting the presence or absence of a genetic variant of MUC5B in said sample.

34. The method of any of claims 27-33, wherein the genetic variant comprises a single nucleotide polymorphism listed in Fig. 7.

35. The method of claim 34, wherein the genetic variant comprises a single nucleotide polymorphism selected from the group consisting of rs111521887, rs5743894, rs5743890, rs17690703, and rs7144383.

36. The method of claim 35, wherein the genetic variant comprises rs111521887.

37. The method of claim 35 or 36, wherein the genetic variant comprises rs5743894.

38. The method of any one of claims 29-31 , wherein the genetic variant comprises rs5743890.

33. The method of any one of claims 29-32, wherein the genetic variant comprises rs17690703.

34. The method of any one of claims 29-33, wherein the genetic variant comprises rs7144383.

35. The method of any one of claims 29-34, further comprising detecting the presence or absence of a genetic variant of MUC5B comprising rs35705950.

36. The method of any one of claims 22-35, wherein the subject has, is suspected of having, or is at risk for developing an interstitial lung disease.

37. The method of claim 36, wherein the interstitial lung disease is a fibrotic interstitial lung disease.

38. The method of claim 37, wherein the interstitial lung disease is idiopathic pulmonary fibrosis or familial interstitial pneumonia.

39. A method of detecting the presence or absence of at least two genetic variants in a human subject having, suspected of having, or at risk for developing an interstitial lung disease, the method comprising:

detecting the presence or absence of at least two of the genetic variants listed in Fig. 7 in a sample from the subject.

40. The method of claim 39, wherein the at least two genetic variants includes from two to 52 of the genetic variants listed in Fig. 7.

41. The method of claim 40, wherein the at least two genetic variants includes from two to 44 of the genetic variants listed in Fig. 1.

42. A method of testing for interstitial lung disease in a human subject, the method comprising:

detecting a level of TOLLIP gene expression in a sample from the subject, a low level of TOLLIP gene expression relative to a control being indicative of interstitial lung disease.

43. The method of claim 42, wherein the level of gene expression is detected by measuring directly or indirectly TOLLIP mRNA.

44. The method of claim 42, wherein the level of gene expression is detected by measuring Tollip protein.

45. A method of treating a human subject having an interstitial lung disease, the method comprising:

detecting a low level of TOLLIP expression relative to a control; and administering to the subject an amount of a Tollip agonist, Tollip or a genetic construct expressing TOLLIP effective to treat the interstitial lung disease.

46. A kit for predicting, diagnosing, or prognosing interstitial lung disease in a human subject, the kit comprising:

a probe or primer capable of detecting the presence or absence of a genetic variant of TOLLIP, SPPL2C, or DGA2.

47. The kit of claim 46, wherein the probe or primer is capable of detecting a genetic variant of TOLLIP.

48. The kit of claim 46 or 47, wherein the at least one probe or primer is capable of detecting a genetic variant of SPPL2C.

49. The kit of claim 48, further comprising at least one probe or primer that is capable of detecting an H2 inversion in 17q21.31.

50. The kit of claim 49, wherein the at least one probe or primer detects one or more single nucleotide polymorphisms selected from the group consisting of rs916793, rs2902662, rs17651213, and rs199448.

5 . The kit of any one of claims 46-50, wherein the at least one probe or primer is capable of detecting a genetic variant of MDGA2.

52. The kit of any one of claims 46-51, further comprising an additional probe or primer capable of detecting at least one genetic variant of MUC5B.

53. The kit of any one of claims 46-52, wherein the genetic variant comprises rs111521887, rs5743894, rs5743890, rs17690703, rs7144383, or rs35705950.

54. The kit of any one of claims 46-53, wherein the genetic variant comprises a single nucleotide polymorphism set forth in Fig. 7.

55. A kit for predicting, diagnosing, or prognosing interstitial lung disease in a human subject, the kit comprising:

at least one probe or primer for detecting the presence or absence of at least two single nucleotide polymorphisms set forth in Fig. 7.

56. The kit of claim 55, wherein the kit comprises probes and/or primers for detecting the presence or absence of from two to 52 of the single nucleotide polymorphisms set forth in Fig. 7.

57. The kit of claim 56, wherein the kit comprises probes and/or primers for detecting the presence or absence of from two to 44 of the single nucleotide polymorphisms set forth in Fig. 11.

58. A method of determining whether a human subject has or is at risk of developing an interstitial lung disease, the method comprising detecting whether the genome of the subject comprises at least two single nucleotide polymorphisms set forth in Fig. 7 and determining whether the subject has or is at risk of developing an interstitial lung disease, the presence of the genetic variant indicating that the subject has or is at risk of developing the interstitial lung disease.

59. The method of claim 58, wherein the at least two single nucleotide polymorphisms includes from two to 52 of the single nucleotide polymorphisms set forth in Fig. 7.

60. The method of claim 59, wherein the at least two single nucleotide polymorphisms includes from two to 44 of the single nucleotide polymorphisms set forth in Fig. 11.

61. The method of any one of claims 58-60, wherein the interstitial lung disease is a fibrotic interstitial lung disease.

62. The method of claim 60, wherein the interstitial lung disease is idiopathic pulmonary fibrosis or familial interstitial pneumonia.

63. A method of prognosing an interstitial lung disease in a human subject, the method comprising detecting whether the genome of the subject comprises at least two single nucleotide polymorphisms set forth in Fig. 7 and determining a prognosis for the subject, the presence of the genetic variant gene being prognostic of increased or decreased survival.

64. The method of claim 63, wherein the interstitial lung disease is a fibrotic interstitial lung disease.

65. The method of claim 64, wherein the interstitial lung disease is idiopathic pulmonary fibrosis or familial interstitial pneumonia.

66. A method of prognosing an interstitial lung disease in a human subject, the method comprising detecting whether the genome of the subject comprises an inversion in the 17q21.31 chromosomal region and determining a prognosis for the subject, the presence of the inversion being prognostic of increased or decreased survival.

67. A kit comprising a nucleic acid primer capable of hybridizing to a genetic variant TOLLIP nucleic acid, SPPL2C nucleic acid, or MDGA2 nucleic acid.

68. The kit of claim 67, wherein said genetic variant has been extracted from a human subject with an interstitial lung disease or is an amplification product of a nucleic acid extracted from a human subject with an interstitial lung disease.

69. The kit of claim 67 or 68, wherein said interstitial lung disease is a pulmonary fibrotic condition.

70. The kit of one of claims 67-69, further comprising a first labeled nucleic acid probe capable of hybridizing to an amplification product of said genetic variant TOLLIP nucleic acid, SPPL2C nucleic acid, or MDGA2 nucleic acid.

71. The kit of claim 70, further comprising a second labeled nucleic acid probe capable of hybridizing to an amplification product of said genetic variant TOLLIP nucleic acid, SPPL2C nucleic acid, or MDGA2 nucleic acid.

72. The kit of claim 71 , wherein said first labeled nucleic acid probe comprises a first label and said additional labeled nucleic acid probe comprises a second label, wherein said first and second label are capable of fluorescence resonance energy transfer when hybridized to said genetic variant TOLLIP nucleic acid, SPPL2C nucleic acid, or MDGA2 nucleic acid.

73. An in vitro complex comprising a first nucleic acid probe hybridized to a genetic variant nucleic acid, said genetic variant nucleic acid comprising a genetic variant TOLLIP, SPPL2C or MDGA2 gene sequence, wherein said genetic variant nucleic acid is extracted from a human subject with an interstitial lung disease or is an amplification product of a nucleic acid extracted from a human subject with an interstitial lung disease.

74. The in vitro complex of claim 73, wherein said complex further comprises an second labeled nucleic acid probe hybridized to said genetic variant nucleic acid.

75. The in vitro complex of claim 74, wherein said first labeled nucleic acid probe comprises a first label and said second labeled nucleic acid probe comprises a second label, wherein said first and second label are capable of fluorescence resonance energy transfer.

76 An in vitro complex comprising a thermally stable polymerase bound to a genetic variant nucleic acid, said genetic variant nucleic acid comprising a genetic variant TOLLIP, SPPL2C or MDGA2 gene sequence, wherein said genetic variant nucleic acid is extracted from a human subject with an interstitial lung disease or is an amplification product of a nucleic acid extracted from a human subject with an interstitial lung disease.

77. The in vitro complex of claim 76, wherein the complex further comprises a nucleic acid primer hybridized to said genetic variant nucleic acid.