AU2015202173A1

AU2015202173A1 - Proliferation signature and prognosis for gastrointestinal cancer

Info

Publication number: AU2015202173A1
Application number: AU2015202173A
Authority: AU
Inventors: Ahmad Anjomshoaa; Michael A. Black; Yu-Hsin Lin; Anthony Edmund Reeve
Original assignee: Pacific Edge Ltd
Current assignee: Pacific Edge Ltd
Priority date: 2007-10-05
Filing date: 2015-04-29
Publication date: 2015-05-14
Also published as: AU2017201785A1; AU2019201377A1

Abstract

WO 2009/045115 PCT/NZ2008/000260 Stage I: Stage2: Stage 3: Identification of a gene proliferation Ev'aluation of proliferation stme of CRC Evaluation of proliferation state of CRC signature using a CRC cell line model samples bas;d on the expression level samples using Ki-67 immunostaining of gene proiiferation signature CRC surgical samples Ten colorectal cell lines Cohort A: Cohort F Paraffin-embedded l-otNZ patients *Geran patients sections from cohort A Full-confluent Sm-ofun cultures cultures age -JN * and II 32'R an 41 NR * 26R and 2,9NR Calculation of Ki-67 P 30K oligo arrays 3AfyMetr arrays HG I_133A -_-----------_-------Classification of tumours Expression stus of Expression stas of into two groups according to IDne proliferation g n proif action the signature in cohort A signature in cohort Identification by GO analysis of Classification of tumouts iClassification of tumor No association between gene proliferation signature into two groups by Kn pKi-67 expression and consisting of 38 genes over- means ciusteri means clusterin clinical outcome expressed in actively cycling cells -_----- _.. Association of low expression of gene prohiferation signature with poor ontconm WO 2009/04.5115 PCT/NZ2008l/000260 Predominantly Red Green Red ------ ProlifrAon signatre K-67 p1 (%/) Predominantly Green WO 2009/04.5115 PCT/NZ2008l/000260 I.J P=0.. 0.6,004ap~g E. *. 4~-~ High GPS expresson (p4=37) 20.4- IU-67 P1 >,moan (N=413) 0.3: Low GPS expression (N=-36) M . a'P on(=5 p~~Q~ 0.12:O5 0 1.0 20 30 40 58 00) in1 20 30 40 00) so RFS-cohort A RFS-cohort A > 0.9. 0 ) '-High SiPS expression (N=26) .oos-- ihSP xrsin(~8 2 Lo PSersio N2) 0.2 -Low SiPS expression (1142,991 D 0 20 0 0 s s0 i 20 so 40 0o 00o OS-cohort B RFS-cohort 8 WO 2009/04.5115 PCT/NZ2008l/000260 S0.4 _ High, GPS expression Low GPS express-Lon oog 0 10 20 30 40 5 Survival (months) WO 2009/04.5115 PCT/NZ2008l/000260 5,5 Ep P SP E Psp 0.0-0 ~--I.oq- -2.01 -2-0- -------------------------__ EP SF' E' S.S 22.0, 00.0l iQ '00 EP1.0 -1. ---- - - --- -- =p SP

Description

WO 2009/045115 PCT/NZ2008/000260 PROLIFERATION SIGNATURES AND PROGNOSIS FOR GASTROINTESTINAL CANCER FIELD OF THE INVENTION 5 This invention relates to methods and compositions for determining the prognosis of cancer, particularly gastrointestinal cancer, in a patient. Specifically, this invention relates to the use of genetic markers for determining the prognosis of cancer, such as gastrointestinal cancer, based on cell proliferation signatures. 10 BACKGROUND OF THE INVENTION Cellular proliferation is the most fundamental process in living organisms, and as such is precisely regulated by the expression level of proliferation-associated genes (1). Loss of proliferation control is a hallmark of cancer, and it is thus not surprising that growth regulating genes are abnormally expressed in tumours relative to the neighbouring normal 15 tissue (2). Proliferative changes may accompany other changes in cellular properties, such as invasion and ability to metastasize, and therefore could affect patient outcome. This association has attracted substantial interest and many studies have been devoted to the exploration of tumour cell proliferation as a potential indicator of outcome. 20 Cell proliferation is usually assessed by flow cytometry or, more commonly, in tissues, by immunohistochemical evaluation of proliferation markers (3). The most widely used proliferation marker is Ki-67, a protein expressed in all cell cycle phases except for the resting phase Go (4). Using Ki-67, a clear association between the proportion of cycling cells and clinical outcome has been established in malignancies such as breast cancer, 25 lung cancer, soft tissue tumours, and astrocytoma (5). In breast cancer, this association has also been confirmed by microarray analysis, leading to a proliferative gene expression profile that has been employed for identifying patients at increased risk of recurrence (6). However, in colorectal cancer (CRC), the proliferation index (PI) has produced conflicting 30 results as a prognostic factor and therefore cannot be applied in a clinical context (see below). Studies vary with respect to patient selection, sampling methods, cut-off point levels, antibody choices, staining techniques and the way data have been collected and interpreted. The methodological differences and heterogeneity of these studies may partly explain the contradictory results (7),(8). The use of Ki-67 as a proliferation marker also 35 has limitations. The Ki-67 PI estimates the fraction of actively cycling cells, but gives no indication of cell cycle length (3).(9). Thus, tumours with a similar P1 may grow at dissimilar rates due to different cycling speeds. In addition, while Ki-67 mRNA is not

I

WO 2009/045115 PCT/NZ2008/000260 produced in resting cells, protein may still be detectable in a proportion of colorectal tumours leading to an overestimated proliferation rate (10). Since the assessment of a prognosis using a single proliferation marker does not appear 5 to be reliable in CRC (see below), there is a need for further tools to predict the prognosis of gastrointestinal cancer. This invention provides further methods and compositions based on prognostic cancer markers, specifically gastrointestinal cancer prognostic markers, to aid in the prognosis and treatment of cancer. 10 SUMMARY OF THE INVENTION In certain aspects of the invention, microarray analysis is used to identify genes that provide a proliferation signature for cancer cells. These genes, and the proteins encoded by those genes, are herein termed gastrointestinal cancer proliferation markers (GCPMs). In one aspect of the invention, the cancer for prognosis is gastrointestinal cancer, 15 particularly gastric or colorectal cancer. In particular aspects, the invention includes a method for determining the prognosis of a cancer by identifying the expression levels of at least one GCPM in a sample. Selected GCPMs encode proteins that associated with cell proliferation, e.g., cell cycle 20 components. These GCPMs have the added utility in methods for determining the best treatment regime for a particular cancer based on the prognosis. In particular aspects, GCPM levels are higher in non-recurring tumour tissue as compared to recurring tumour tissue. These markers can be used either alone or in combination with each other, or other known cancer markers. 25 In an additional aspect, this invention includes a method for determining the prognosis of a cancer, comprising: (a) providing a sample of the cancer; (b) detecting the expression level of at least one GCPM family member in the sample; and (c) determining the prognosis of the cancer. 30 In another aspect, the invention includes a step of detecting the expression level of at least one GCPM RNA, for example, at least one mRNA. In a further aspect, the invention includes a step of detecting the expression level of at least one GCPM protein. In yet a further aspect, the invention includes a step of detecting the level of at least one GCPM 35 peptide. In yet another aspect, the invention includes detecting the expression level of at least one GCPM family member in the sample. In an additional aspect, the GCPM is a 2 WO 2009/045115 PCT/NZ2008/000260 gene associated with cell proliferation, such as a cell cycle component. In other aspects, the at least one GCPM is selected from Table A, Table B, Table C or Table D, herein. In a still further aspect, the invention includes a method for detecting the expression level 5 of at least one GCPM set forth in Table A, Table B, Table C or Table D, herein. In an even further aspect, the invention includes a method for detecting the expression level of at least one of CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L. TOPK, GMNN, RRM1, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC4L1, CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREXI, BUB3, FENI, DRF1, PRE3, CCNE1, RPA1, 10 POLE3, RFC4, MCM3, CHEKI, CCND1, and CDC37. In yet a further aspect, the invention comprises detecting the expression level of at least one of CDC2, RFC4, PCNA, CCNEI, CCND1, CDK7, MCM genes, FEN1, MAD2L1, MYBL2, RRM2, and BUB3. In additional aspects, the expression levels of at least two, or at least 5, or at least 10, at 15 least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, or at least 75 of the proliferation markers or their expression products are determined, for example, as selected from Table A, Table, B, Table C or Table D; as selected from CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRM1, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC4L1, CSPG6, POLD2, 20 POLE2, BCCIP, Pfs2, TREXI, BUB3, FENI, DRF1, PRE13, CCNE1, RPA1, POLE, RFC4, MCM3, CHEKI, CCNDI, and CDC37; or as selected from CDC2. RFC4,. PCNA, CCNE1, CCND1, CDK7, MCM genes (e.g., one or more of MCM3, MCM6, and MCM7), FEN1, MAD2L1, MYBL2, RRM2, and BUB3. 25 In other aspects, the expression levels of all proliferation markers or their expression products are determined, for example, as listed in Table A, Table, B, Table C or Table D; as listed for the group CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRM1, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC4L1, CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREX1, BUB3, FEN1, DRF1, PRE3, 30 CCNEI, RPA1, POLE3, RFC4, MCM3, CHEK1, CCND1, and CDC37; or as listed for the group CDC2, RFC4, PCNA, CCNEI, CCND1, CDK7, MCM genes (e.g., one or more of MCM3, MCM6, and MCM7), FENI, MAD2L1, MYBL2, RRM2, and BUB3. In yet a further aspect, the invention includes a method of determining a treatment regime 35 for a cancer comprising: (a) providing a sample of the cancer; (b) detecting the expression level of at least one GCPM family member in the sample; (c) determining the 3 WO 2009/045115 PCT/NZ2008/000260 prognosis of the cancer based on the expression level of at least one GCPM family member; and (d) determining the treatment regime according to the prognosis. In yet another aspect, the invention includes a device for detecting at least one GCPM, 5 comprising: (a) a substrate having at least one GCPM capture reagent thereon; and (b) a detector capable of detecting the at least one captured GCPM, the capture reagent, or a complex thereof. An additional aspect of the invention includes a kit for detecting cancer, comprising: (a) a i0 GCPM capture reagent; (b) a detector capable of detecting the captured GCPM, the capture reagent, or a complex thereof; and, optionally, (c) instructions for use. In certain aspects, the kit also includes a substrate for the GCPM as captured. Yet a further aspect of the invention includes a method for detecting at least one GCPM 15 using quantitative PCR, comprising: (a) a forward primer specific for the at least one GCPM; (b) a reverse primer specific for the at least one GCPM; (c) PCR reagents; and, optionally, at least one of: (d) a reaction vial; and (e) instructions for use. Additional aspects of this invention include a kit for detecting the presence of at least one 20 GCPM protein or peptide, comprising: (a) an antibody or antibody fragment specific for the at least one GCPM protein or peptide; and, optionally, at least one of: (b) a label for the antibody or antibody fragment; and (c) instructions for use. In certain aspects, the kit also includes a substrate having a capture agent for the at least one GCPM protein or peptide. 25 In specific aspects, this invention includes a method for determining the prognosis of gastrointestinal cancer, especially colorectal or gastric cancer, comprising the steps of: (a) providing a sample, e.g., tumour sample, from a patient suspected of having gastrointestinal cancer; (b) measuring the presence of a GCPM protein using an ELISA method. 30 In additional aspects of this invention, one or more GCPMs of the invention are selected from the group outlined in Table A, Table B, Table C or Table D, herein. Other aspects and embodiments of the invention are described herein below. 4 WO 2009/045115 PCT/NZ2008/000260 BRIEF DESCRIPTION OF THE DRAWINGS This invention is described with reference to specific embodiments thereof and with reference to the figures. 5 FIG. 1: An overview of the approach used to derive and apply the gene proliferation signature (GPS) disclosed herein. FIG. 2A: K-means clustering of 73 Cohort A tumours into two groups according to the expression level of the gene proliferation signature. FIG. 28: Bar graph of Ki-67 P (%); 10 vertical line represents the mean Ki-67 P1 across all samples. Tumours with a proliferation index about and below the mean are shown in red and green, respectively. The results show that over-expression of the proliferation signature is not always associated with a higher Ki-67 PI. FIG. 3: Kaplan-Meier survival curves according to the expression level of GPS (gene 15 proliferation signal) and Ki-67 PL. Both overall (OS) and recurrence-free survival (RFS) are significantly shorter in patients with low GPS expression in colorectal cancer Cohort A (a, b) and colorectal cancer Cohort B (c, d). No difference was observed in the survival rates of Cohort A patients according to Ki-67 PI (e, D. P values from Log rank test are indicated. 20 FIG. 4: Kaplan-Meier survival curves according to the expression level of GPS (gene proliferation signal) in gastric cancer patients. Overall survival is significantly shorter in patients with low GPS expression in this cohort of 38 gastric cancer patients of mixed stage. P values from Log rank test are indicated. 25 FIG. 5: A box-and-whisker plot showing differential expression between cycling cells in the exponential phase (EP) and growth-inhibited cells in the stationary phase (SP) of 11 QRT-PCR-validated genes. The box range includes the 25 to the 75 percentiles of the data. The horizontal line in the box represents the median value. The "whiskers" are the largest and smallest values. (excluding outliers). Any points more than 3/2 times of the 30 interquartile range from the end of a box will be outliers and presented as a dot. The Y axis represents the log 2 fold change of the ratio between cell line RNA and reference RNA. Analysis was performed using SPSS software. DETAILED DESCRIPTION OF THE INVENTION 35 Because a single proliferation marker is insufficient for obtaining reliable CRC prognosis, the simultaneous analysis of several growth-related genes by microarray was employed to provide a more quantitative and objective method to determine the proliferation state of a WO 2009/045115 PCT/NZ2008/000260 gastrointestinal tumour. Table I (below) illustrates the previously published and conflicting results shown for use of the proliferation index (PI) as a prognostic factor for colorectal cancer. 5 Table 1: Summary of studies on the association of proliferation indices with the CRC patients' survival Study Number of patients Dukes stage Marker Association with survival Evans et al, 2006" 40 A-C Ki-67 Rosati et at, 20042 103 B-C Ki-67 1shida et al, 2004" 51, C Ki-67 Buglioni et al, 19994 J71 A-) Ki-67 No association was found. Guerra et at, 199815 108 A-C PCNA ,No soition ind Kyzer and Gordon, 19976 30 B-D Ki-67 Jansson and Sun, 199747 255 A-D Ki-67 and survival Baretton et al 1996 95 AB Ki47 Sun et at, 199619 293 A-C PCNA Kubota et al, 199223 100 A-D Ki-67 Valera et al, 20052 106 A-D Ki-67 Dziegiel at a, 2003" 81 NI Ki-67 Scopa t al, 2003 117 A-D Ki-67 I proliferation idex was Bhatavdekar et al, 20012 98 B-C Ki-67 Chen et al, 199725 70 B-C Ki-67 I Choi at al, 19976 86 B-D PCNA _ Hilska et al, 20052 363 A-D Ki-67 Salminen et Al.20053 146 A-D Ki-67 Garrity et al, 2004 366 B-C Ki-67 Low proliferation index was .Allegra et al, 2003,3 706 B-C K ri67 associated with shorter Palniqvist et al, 199931 56 B Ki-67 survival Paradiso et al, 1996 3 71 NI CNA Neoptolemos at al, 1995 3 79 A-C PCNA NI: No Information available In contrast, the present disclosure has succeeded in (i) defining a CRC-specific gene 10 proliferation signature (GPS) using a cell line model; and (ii) determining the prognostic significance of the GPS in the prediction of patient outcome and its association with clinico-pathologic variables in two independent cohorts of CRC patients. Definitions 15 Before describing embodiments of the invention in detail, it will be useful to provide some definitions of terms used herein. As used herein "antibodies" and like terms refer to immunoglobulin molecules and immunologically active portions of immunoglobulin (1g) molecules, i.e., molecules that 20 contain an antigen binding site that specifically binds (immunoreacts with) an antigen. These include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, Fc, Fab, Fab', and Fab 2 fragments, and a Fab expression library. Antibody molecules relate to any of the classes IgG, igM, IgA, IgE, and IgD, which differ from one another by the nature 6 WO 2009/045115 PCT/NZ2008/000260 of heavy chain present in the molecule. These include subclasses as well, such as IgG1, IgG2, and others. The light chain may be a kappa chain or a lambda chain. Reference herein to antibodies includes a reference to all classes, subclasses, and types. Also included are chimeric antibodies, for example, monoclonal antibodies or fragments thereof 5 that are specific to more than one source, e.g., a mouse or human sequence. Further included are camelid antibodies, shark antibodies or nanobodies. The term "marker" refers to a molecule that is associated quantitatively or qualitatively with the presence of a biological phenomenon. Examples of "markers" include a 10 polynucleotide, such as a gene or gene fragment, RNA or RNA fragment; or a polypeptide such as a peptide, oligopeptide, protein, or protein fragment; or any related metabolites, by products, or any other identifying molecules, such as antibodies or antibody fragments, whether related directly or indirectly to a mechanism underlying the phenomenon. The markers of the invention include the nucleotide sequences (e.g., GenBank sequences) as 15 disclosed herein, in particular, the full-length sequences, any coding sequences, any fragments, or any complements thereof. The terms "GCPM" or "gastrointestinal cancer proliferation marker" or "GCPM family member" refer to a marker with increased expression that is associated with a positive 20 prognosis, e.g., a lower likelihood of recurrence cancer, as described herein, but can exclude molecules that are known in the prior art to be associated with prognosis of gastrointestinal cancer. It is to be understood that the term GCPM does not require that the marker be specific only for gastrointestinal tumours. Rather, expression of GCPM can be altered in other types of tumours, including malignant tumours. 25 Non-limiting examples of GCPMs are included in Table A, Table B, Table C or Table D, herein below, and include, but are not limited to, the specific group CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRMI, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC4L1, CSPG6, POLD2, POLE2, BCCIP, 30 Pfs2, TREXt, BUB3, FENI, DRF1, PRE3, CCNE1, RPA1, POLE3, RFC4, MCM3, CHEK1, CCND1, and CDC37; and the specific group CDC2, RFC4, PCNA, CCNEI, CCND1, CDK7, MCM genes (e.g., one or more of MCM3, MCM6, and MCM7), FENI, MAD2L1, MYBL2, RRM2, and BUB3. 35 The terms "cancer" and "cancerous" refer to or describe the physiological condition in mammals that is typically characterized by abnormal or unregulated cell growth. Cancer and cancer pathology can be associated, for example, with metastasis, interference with 7 WO 2009/045115 PCT/NZ2008/000260 the normal functioning of neighbouring cells, release of cytokines or other secretory products at abnormal levels, suppression or aggravation of inflammatory or immunological response, neoplasia, premalignancy, malignancy, invasion of surrounding or distant tissues or organs, such as lymph nodes, etc. Specifically included are gastrointestinal 5 cancers, such as esophageal, stomach, small bowel, large bowel, anal, and rectal cancers, particularly included are gastric and colorectal cancers. The term "colorectal cancer" includes cancer of the colon, rectum, and/or anus, and especially, adenocarcinomas, and may also include carcinomas (e.g., squamous 10 cloacogenic carcinomas), melanomas, lymphomas, and sarcomas. Epidermoid (nonkeratinizing squamous cell or basaloid) carcinomas are also included. The cancer may be associated with particular types of polyps or other lesions, for example, tubular adenomas, tubulovillous adenomas (e.g., villoglandular polyps), villous (e.g., papillary) adenomas (with or without adenocarcinoma), hyperplastic polyps, hamartomas, juvenile 15 polyps, polypoid carcinomas, pseudopolyps, lipomas, or leiomyomas. The cancer may be associated with familial polyposis and related conditions such as Gardner's syndrome or Peutz-Jeghers syndrome. The cancer may be associated, for example, with chronic fistulas, irradiated anal skin, leukoplakia, lymphogranuloma venereum, Bowen's disease (intraepithelial carcinoma), condyloma acuminatum, or human papillomavirus. In other 20 aspects, the cancer may be associated with basal cell carcinoma, extramammary Paget's disease, cloacogenic carcinoma, or malignant melanoma. The terms "differentially expressed gene," "differential gene expression," and like phrases, refer to a gene whose expression is activated to a higher or lower level in a subject (e.g., 25 test sample), specifically cancer, such as gastrointestinal cancer, relative to its expression in a control subject (e.g., control sample). The terms also include genes whose expression is activated to a higher or lower level at different stages of the same disease; in recurrent or non-recurrent disease; or in cells with higher or lower levels of proliferation. A differentially expressed gene may be either activated or inhibited at the polynucleotide 30 level or polypeptide level, or may be subject to alternative splicing to result in a different polypeptide product. Such differences may be evidenced by a change in mRNA levels, surface expression, secretion or other partitioning of a polypeptide, for example. Differential gene expression may include a comparison of expression between two or 35 more genes or their gene products; or a comparison of the ratios of the expression between two or more genes or their gene products; or a comparison of two differently processed products of the same gene, which differ between normal subjects and diseased 8 WO 2009/045115 PCT/NZ2008/000260 subjects; or between various stages of the same disease; or between recurring and non recurring disease; or between cells with higher and lower levels of proliferation; or between normal tissue and diseased tissue, specifically cancer, or gastrointestinal cancer. Differential expression includes both quantitative, as well as qualitative, differences in the 5 temporal or cellular expression pattern in a gene or its expression products among, for example, normal and diseased cells, or among cells which have undergone different disease events or disease stages, or cells with different levels of proliferation. The term "expression" includes production of polynucleotides and polypeptides, in to particular, the production of RNA (e.g., mRNA) from a gene or portion of a gene, and includes the production of a protein encoded by an RNA or gene or portion of a gene, and the appearance of a detectable material associated with expression. For example, the formation of a complex, for example, from a protein-protein interaction, protein-nucleotide interaction, or the like, is included within the scope of the term "expression". Another 15 example is the binding of a binding ligand, such as a hybridization probe or antibody, to a gene or other oligonucleotide, a protein or a protein fragment and the visualization of the binding ligand. Thus, increased intensity of a spot on a microarray, on a hybridization blot such as a Northern blot, or on an immunoblot such as a Western blot, or on a bead array, or by PCR analysis, is included within the term "expression" of the underlying biological 20 molecule. The term "gastric cancer" includes cancer of the stomach and surrounding tissue, especially adenocarcinomas, and may also include lymphomas and leiomyosarcomas. The cancer may be associated with gastric ulcers or gastric polyps, and may be classified 25 as protruding, penetrating, spreading, or any combination of these categories, or, alternatively, classified as superficial (elevated, flat, or depressed) or excavated. The term "long-term survival" is used herein to refer to survival for at least 5 years, more preferably for at least 8 years, most preferably for at least 10 years following surgery or 30 other treatment The term "microarray" refers to an ordered arrangement of capture agents, preferably polynucleotides (e.g., probes) or polypeptides on a substrate, See, e.g., Microarray Analysis, M. Schena, John Wiley & Sons, 2002; Microarray Biochip Technology, M. 35 Schena, ed., Eaton Publishing, 2000; Guide to Analysis of DNA Microarray Data, S. Knudsen, John Wiley & Sons, 2004; and Protein Microarray Technology, D. Kambhampati, ed., John Wiley & Sons, 2004. 9 WO 2009/045115 PCT/NZ2008/000260 The term "oligonucleotide" refers to a polynucleotide, typically a probe or primer, including, without limitation, single-stranded deoxyribonucleotides, single- or double-stranded ribonucleotides, RNA:DNA hybrids, and double-stranded DNAs. Oligonucleotides, such as 5 single-stranded DNA probe oligonucleotides, are often synthesized by chemical methods, for example using automated oligonucleotide synthesizers that are commercially available, or by a variety of other methods, including in vitro expression systems, recombinant techniques, and expression in cells and organisms. 10 The term "polynucleotide," when used in the singular or plural, generally refers to any polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. This includes, without limitation, single- and double-stranded DNA, DNA including single- and double- stranded regions, single- and double-stranded RNA, and RNA including single- and double-stranded regions, hybrid molecules comprising 15 DNA and RNA that may be single-stranded or, more typically, double-stranded or include single- and double-stranded regions. Also included are triple-stranded regions comprising RNA or DNA or both RNA and DNA. Specifically included are mRNAs, cDNAs, and genomic DNAs. The term includes DNAs and RNAs that contain one or more modified bases, such as tritiated bases, or unusual bases, such as inosine. The polynucleotides of 20 the invention can encompass coding or non-coding sequences, or sense or antisense sequences. "Polypeptide," as used herein, refers to an oligopeptide, peptide, or protein sequence, or fragment thereof, and to naturally occurring, recombinant, synthetic, or semi-synthetic 25 molecules. Where "polypeptide" is recited herein to refer to an amino acid sequence of a naturally occurring protein molecule, "polypeptide" and like terms, are not meant to limit the amino acid sequence to the complete, native amino acid sequence for the full-length molecule. It will be understood that each reference to a "polypeptide" or like term, herein, will include the full-length sequence, as well as any fragments, derivatives, or variants 30 thereof. The term "prognosis" refers to a prediction of medical outcome (e.g., likelihood of long term survival); a negative prognosis, or bad outcome, includes a prediction of relapse, disease progression (e.g., tumour growth or metastasis, or drug resistance), or mortality; a 35 positive prognosis, or good outcome, includes a prediction of disease remission, (e.g., disease-free status), amelioration (e.g., tumour regression), or stabilization. 10 WO 2009/045115 PCT/NZ2008/000260 The terms "prognostic signature," "signature," and the like refer to a set of two or more markers, for example GCPMs, that when analysed together as a set allow for the determination of or prediction of an event, for example the prognostic outcome of colorectal cancer. The use of a signature comprising two or more markers reduces the 5 effect of individual variation and allows for a more robust prediction. Non-limiting examples of GCPMs are included in Table A, Table B, Table C or Table D, herein below, and include, but are not limited to, the specific group CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRM1, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC4L1, CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREXI, BUB3, 10 FEN1, DRFI, PREI3, CCNE1, RPA1, POLE3, RFC4, MCM3, CHEKI, CCND1, and CDC37; and the specific group CDC2, RFC4, PCNA, CCNEI, CCND1, CDK7, MCM genes (e.g., one or more of MCM3, MCM6, and MCM7), FEN1, MAD2L1, MYBL2, RRM2, and BUB3. 15 In the context of the present invention, reference to "at least one," "at least two," "at least five," etc., of the markers listed in any particular set (e.g., any signature) means any one or any and all combinations of the markers listed. The term "prediction method" is defined to cover the broader genus of methods from the 20 fields of statistics, machine learning, artificial intelligence, and data mining, which can be used to specify a prediction model. These are discussed further in the Detailed Description section. The term "prediction model" refers to the specific mathematical model obtained by 25 applying a prediction method to a collection of data.- In the examples detailed herein, such data sets consist of measurements of gene activity in tissue samples taken from recurrent and non-recurrent colorectal cancer patients, for which the class (recurrent or non recurrent) of each sample is known. Such models can be used to (1) classify a sample of unknown recurrence status as being one of recurrent or non-recurrent, or (2) make a 30 probabilistic prediction (i.e., produce either a proportion or percentage to be interpreted as a probability) which represents the likelihood that the unknown sample is recurrent, based on the measurement of mRNA expression levels or expression products, of a specified collection of genes, in the unknown sample. The exact details of how these gene-specific measurements are combined to produce classifications and probabilistic predictions are 35 dependent on the specific mechanisms of the prediction method used to construct the model. 11 WO 2009/045115 PCT/NZ2008/000260 The term "proliferation" refers to the processes leading to increased cell size or cell number, and can include one or more of: tumour or cell growth, angiogenesis, innervation, and metastasis. 5 The term "qPCR" or "QPCR" refers to quantative polymerase chain reaction as described, for example, in PCR Technique: Quantitative PCR, J.W. Larrick, ed., Eaton Publishing, 1997, and A-Z of Quantitative PCR, S. Bustin, ed., IUL Press, 2004. The term "tumour" refers to all neoplastic cell growth and proliferation, whether malignant 10 or benign, and all pre-cancerous and cancerous cells and tissues. Sensitivity", "specificity" (or "selectivity"), and "classification rate", when applied to the describing the effectiveness of prediction models mean the following: "Sensitivity" means the proportion of truly positive samples that are also predicted (by the 15 model) to be positive. In a test for cancer recurrence, that would be the proportion of recurrent tumours predicted by the model to be recurrent. "Specificity" or "selectivity" means the proportion of truly negative samples that are also predicted (by the model) to be negative. In a test for CRC recurrence, this equates to the proportion of non-recurrent samples that are predicted to by non-recurrent by the model. "Classification Rate" is the 20 proportion of all samples that are correctly classified by the prediction model (be that as positive or negative). "Stringent conditions" or "high stringency conditions", as defined herein, typically: (1) employ low ionic strength and high temperature for washing, for example 0.015 M sodium 25 chloride/0.0015 M sodium citrate/0.1% sodium dodecyl sulfate at 50*C; (2) employ a denaturing agent during hybridization, such as formamide, for example, 50% (v/v) formamide with 0.1% bovine serum albumin/0.1% Ficoll/0.1% polyvinylpyrrolidone/50 mM sodium phosphate buffer at pH 6.5 with 750 mM sodium chloride, 75 mM sodium citrate at 42 0 C; or (3) employ 50% formamide, 5X SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 30 mM sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5X, Denhardt's solution, sonicated salmon sperm DNA (50 pg/ml), 0.1% SDS, and 10% dextran sulfate at 42*C, with washes at 42"C in 0.2X SSC (sodium chloride/sodium citrate) and 50% formamide at 55"C, followed by a high-stringency wash comprising 0.1X SSC containing EDTA at 55*C. 35 "Moderately stringent conditions" may be identified as described by Sambrook et al., Molecular Cloning: A Laboratory Manual, New York: Cold Spring Harbor Press, 1989, and include the use of washing solution and hybridization conditions (e. g., temperature, ionic 12 WO 2009/045115 PCT/NZ2008/000260 strength, and % SDS) less stringent that those described above. An example of moderately stringent conditions is overnight incubation at 37"C in a solution comprising: 20% formamide, 5X SSC (150 mM NaCl, 15 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5X Denhardt's solution, 10% dextran sulfate, and 20 mg/ml denatured 5 sheared salmon sperm DNA, followed by washing the filters in 1X SSC at about 37-50*C. The skilled artisan will recognize how to adjust the temperature, ionic strength, etc. as necessary to accommodate factors such as probe length and the like. The practice of the present invention will employ, unless otherwise indicated, conventional 10 techniques of molecular biology (including recombinant techniques), microbiology, cell biology, and biochemistry, which are within the skill of the art. Such techniques are explained fully in the literature, such as, Molecular Cloning: A Laboratory Manual, 2nd edition, Sambrook et al., 1989; Oligonucleotide Synthesis, MJ Gait, ed., 1984; Animal Cell Culture, R.L Freshney, ed., 1987; Methods in Enzymology, Academic Press, Inc.; 15 Handbook of Experimental Immunology, 4th edition, D .M. Weir & CC. Blackwell, eds., Blackwell Science Inc., 1987; Gene Transfer Vectors for Mammalian Cells, J.M. Miller & M.P. Calos, eds, 1987; Current Protocols in Molecular Biology, F.M. Ausubel et al., eds., 1987; and PCR: The Polymerase Chain Reaction, Mullis et al., eds., 1994. 20 Description of Embodiments of the Invention Cell proliferation is an indicator of outcome in some malignancies. In colorectal cancer, however, discordant results have been reported. As these results are based on a single proliferation marker, the present invention discloses the use of microarrays to overcome this limitation, to reach a firmer conclusion, and to determine the prognostic role of cell 25 proliferation in colorectal cancer. The microarray-based proliferation studies shown herein indicate that reduced rate of the proliferation signature in colorectal cancer is associated with poor outcome. The invention can therefore be used to identify patients at high risk of early death from cancer. 30 The present invention provides for markers for the determination of disease prognosis, for example, the likelihood of recurrence of tumours, including gastrointestinal tumours. Using the methods of the invention, it has been found that numerous markers are associated with the progression of gastrointestinal cancer, and can be used to determine the prognosis of cancer. Microarray analysis of samples taken from patients with various 35 stages of colorectal tumours has led to the surprising discovery that specific patterns of marker expression are associated with prognosis of the cancer. 13 WO 2009/045115 PCT/NZ2008/000260 An increase in certain GCPMs, for example, markers associated with cell proliferation, is indicative of positive prognosis. This can include decreased likelihood of cancer recurrence after standard treatment, especially for gastrointestinal cancer, such as gastric or colorectal cancer, Conversely, a decrease in these markers is indicative of a negative 5 prognosis. This can include disease progression or the increased likelihood of cancer recurrence, especially for gastrointestinal cancer, such as gastric or colorectal cancer. A decrease in expression can be determined, for example, by comparison of a test sample (e.g., tumour sample) to samples associated with a positive prognosis. An increase in expression can be determined, for example, by comparison of a test sample (e.g., tumour 10 samples) to samples associated with a negative prognosis. For example, to obtain a prognosis, a patient's sample (e.g., tumour sample) can be compared to samples with known patient outcome. If the patient's sample shows increased expression of GCPMs that is comparable to samples with good outcome, 15 and/or higher than samples with poor outcome, then a positive. prognosis is implicated. If the patient's sample shows decreased expression of GCPMs that is comparable to samples with poor outcome, and/or lower than samples with good outcome, then a negative prognosis is implicated. Alternatively, a patient's sample can be compared to samples of actively proliferating/non-proliferating tumour cells. If the patient's sample 20 shows increased expression of GCPMs that is comparable to actively proliferating cells, and/or higher than non-proliferating cells, then a positive prognosis is implicated. If the patient's sample shows decreased expression of GCPMs that is comparable to non proliferating cells, and/or lower than actively proliferating cells, then a negative prognosis is implicated. 25 The invention provides for a set of genes, identified from cancer patients with various stages of tumours, outlined in Table C that are shown to be prognostic for colorectal cancer. These genes are all associated with cell proliferation and establish a relationship between cell proliferation genes and their utility in cancers prognosis. It has also been 30 found that the genes in the prognostic signature listed in Table C are also correlated with additional cell proliferation genes. Based on these finding, the invention also provides for a set of cell cycle genes, shown in Table D, that are differentially expressed between high and low proliferation groups, for use as prognostic markers. Further, based on the surprising finding of the correlation between prognosis and cell proliferation-related genes, 35 the invention also provides for a set of proliferation-related genes differentially expressed between cell lines in high and low proliferative states (Table A) and known proliferative 14 WO 2009/045115 PCT/NZ2008/000260 related genes (Table B). The genes outlined in Table A, Table B, Table C and Table D provide for a set of gastrointestinal cancer prognostic markers (gCPMs). As one approach, the expression of a panel of markers (e.g., GCPMs) can be analysed by 5 techniques including Linear Discriminant Analysis (LDA) to work out a prognostic score. The marker panel selected and prognostic score calculation can be derived through extensive laboratory testing and multiple independent clinical development studies. The disclosed GCPMs therefore provide a useful tool for determining the prognosis of 10 cancer, and establishing a treatment regime specific for that tumour. In particular, a positive prognosis can be used by a patient to decide to pursue standard or less invasive treatment options. A negative prognosis can be used by a patient to decide to terminate treatment or to pursue highly aggressive or experimental treatments. In addition, a patient can chose treatments based on their impact on cell proliferation or the expression of cell 15 proliferation markers (e.g., GCPMs). In accordance with the present invention, treatments that specifically target cells with high proliferation or specifically decrease expression of cell proliferation markers (e.g., GCPMs) would not be preferred for patients with gastrointestinal cancer, such as colorectal cancer or gastric cancer. 20 Levels of GCPMs can be detected in tumour tissue, tissue proximal to the tumour, lymph node samples, blood samples, serum samples, urine samples, or faecal samples, using any suitable technique, and can include, but is not limited to, oligonucleotide probes, quantitative PCR, or antibodies raised against the markers. The expression level of one GCPM in the sample will be indicative of the likelihood of recurrence in that subject. 25 However, it will be appreciated that by analyzing the presence and amounts of expression of a plurality of GCPMs, and constructing a proliferation signature, the sensitivity and accuracy of prognosis will be increased. Therefore, multiple markers according to the present invention can be used to determine the prognosis of a cancer. 30 The present invention relates to a set of markers, in particular, GCPMs, the expression of which has prognostic value, specifically with respect to cancer-free survival. In specific aspects, the cancer is gastrointestinal cancer, particularly, gastric or colorectal cancer, and, in further aspects, the colorectal cancer is an adenocarcinoma. 35 In one aspect, the invention relates to a method of predicting the likelihood of long-term survival of a cancer patient without the recurrence of cancer, comprising determining the expression level of one or more proliferation markers or their expression products in a 15 WO 2009/045115 PCT/NZ2008/000260 sample obtained from the patient, normalized against the expression level of all RNA transcripts or their products in the sample, or of a reference set of RNA transcripts or their expression products, wherein the proliferation marker is the transcript of one or more markers listed in Table A, Table B, Table C or Table D, herein. In particular aspects, a 5 decrease in expression levels of one or more GCPM indicates a decreased likelihood of long-term survival without cancer recurrence, while an increase in expression levels of one or more GCPM indicates an increased likelihood of long-term survival without cancer recurrence. 10 In a further aspect, the expression levels one or more, for example at least two, or at least 3, or at least 4, or at least 5, or at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, or at least 75 of the proliferation markers or their expression products are determined, e.g., as selected from Table A, Table, B, Table C or Table D; as selected from CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, 15 ANLN, APG7L, TOPK, GMNN, RRM1, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC4L1, CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREX1, BUB3, FENI, DRF1, PRE3, CCNE1, RPA1, POLE3, RFC4, MCM3, CHEKI, CCND1, and CDC37; or as selected from CDC2, RFC4, PCNA, CCNE1, CCND1, CDK7, MCM genes (e.g., one or more of MCM3, MCM6, and MCM7), FEN1, MAD2L1, MYBL2, RRM2, and BUB3. 20 In another aspect, the method comprises the determination of the expression levels of all proliferation markers or their expression products, e.g., as listed in Table A, Table, B, Table C or Table D; as listed for the group CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRM1, CDC45L, MAD2L1, RAN, DUT, RRM2, 25 CDK7, MLH3, SMC4L1, CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREXI, BUB3, FENI, DRF1, PRE13, CCNE1, RPA1, POLE3, RFC4, MCM3, CHEK1, CCNDI, and CDC37; or as listed for the group CDC2, RFC4, PCNA, CCNE1, CCND1, CDK7, MCM genes (e.g., one or more of MCM3, MCM6, and MCM7), FEN1, MAD2L1, MYBL2, RRM2, and BUB3. 30 The invention includes the use of archived paraffin-embedded biopsy material for assay of all markers in the set, and therefore is compatible with the most widely available type of biopsy material. It is also compatible with several different methods of tumour tissue harvest, for example, via core biopsy or fine needle aspiration. In a further aspect, RNA is isolated from a fixed, wax-embedded cancer tissue specimen of the patient. Isolation may 35 be performed by any technique known in the art, for example from core biopsy tissue or fine needle aspirate cells. 16 WO 2009/045115 PCT/NZ2008/000260 In another aspect, the invention relates to an array comprising polynucleotides hybridizing to two or more markers as selected from Table A, Table B, Table C or Table D; as selected from CDC2, MCM6, RPA3, MCM7, PCNA, G22P1,. KPNA2, ANLN, APG7L, TOPK, GMNN, RRM1, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC4L1, 5 CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREX1, BUB3, FENI, DRFI, PREI3, CCNE1, RPAI, POLE3, RFC4, MCM3, CHEK1, CCND1, and CDC37; or as selected from CDC2, RFC4, PCNA, CCNE1, CCND1, CDK7, MOM genes (e.g., one or more of MCM3, MCM6, and MCM7), FEN1, MAD2L1, MYBL2, RRM2, and BUB3. 10 In particular aspects, the array comprises polynucleotides hybridizing to at least 3, or at least 5, or at least 10, or at least 15, or at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, or at least 75 or all of the markers listed in Table A, Table B, Table C or Table D; as listed in the group CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRMI, CDC45L, MAD2L1, RAN, DUT, RRM2, 15 CDK7, MLH3, SMC4L1, CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREXI, BUB3, FEN1, DRF1, PREI3, CCNE1, RPA1, POLE3, RFC4, MCM3, CHEK1, CCND1, and CDC37; or as listed in the group CDC2, RFC4, PCNA, CCNEI, CCND1, CDK7, MOM genes (e.g., one or more of MCM3, MCM6, and MCM7), FENI, MAD2L1, MYBL2, RRM2, and BUB3. 20 In another specific aspect, the array comprises polynucleotides hybridizing to the full set of markers listed in Table A, Table B, Table C or Table D; as listed for the group CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRMI, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC4L1, CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREX1, BUB3, FENI, DRFI, PREI3, CCNE1, RPAI, POLE3, RFC4, 25 MCM3, CHEK1, CCND1, and CDC37; or as listed for the group CDC2, RFC4, PCNA, CCNEI, CCND1, CDK7, MOM genes (e.g, one-or more of MCM3, MCM6, and MCM7), FEN1, MAD2L1, MYBL2, RRM2, and BUB3. The polynucleotides can be cDNAs, or oligonucleotides, and the solid surface on which 30 they are displayed can be glass, for example. The polynucleotides can hybridize to one or more of the markers as disclosed herein, for example, to the full-length sequences, any coding sequences, any fragments, or any complements thereof. In still another aspect, the invention relates to a method of predicting the likelihood of 35 long-term survival of a patient diagnosed with cancer, without the recurrence of cancer, comprising the steps of: (1) determining the expression levels of the RNA transcripts or the expression products of the full set or a subset of the markers listed in Table A, Table 17 WO 2009/045115 PCT/NZ2008/000260 B, Table C or Table D, herein, in a sample obtained from the patient, normalized against the expression levels of all RNA transcripts or their expression products in the sample, or of a reference set of RNA transcripts or their products; (2) subjecting the data obtained in step (1) to statistical analysis; and (3) determining whether the likelihood of the long-term 5 survival has increased or decreased. In yet another aspect, the invention concerns a method of preparing a personalized genomics profile for a patient, e.g., a cancer patient, comprising the steps of: (a) subjecting a sample obtained from the patient to expression analysis; (b) determining the 10 expression level of one or more markers selected from the marker set listed in any one of Table A, Table B, Table C or Table D, wherein the expression level is normalized against a control gene or genes and optionally is compared to the amount found in a reference set; and (c) creating a report summarizing the data obtained by the expression analysis. The report may, for example, include prediction of the likelihood of long term survival of 15 the patient and/or recommendation for a treatment modality of the patient. In additional aspects, the invention relates to a prognostic method comprising: (a) subjecting a sample obtained from a patient to quantitative analysis of the expression level of the RNA transcript of at least one marker selected from Table A, Table B, Table C 20 or Table D, herein, or its product, and (b) identifying the patient as likely to have an increased likelihood of long-term survival without cancer recurrence if the normalized expression levels of the marker or markers, or their products, are above defined expression threshold. In alternate aspects, step (b) comprises identifying the patient as likely to have a decreased likelihood of long-term survival without cancer recurrence if the 25 normalized expression levels of the marker or markers, or their products, are decreased below a defined expression threshold. In particular, the relatively low expression of proliferation markers is associated with poor outcome. This can include disease progression or the increased likelihood of cancer 30 recurrence, especially for gastrointestinal cancer, such as gastric or colorectal cancer. By contrast, the relatively high expression of proliferation markers is associated with a good outcome. This can include decreased likelihood of cancer recurrence after standard treatment, especially for gastrointestinal cancer, such as gastric or colorectal cancer. Low expression can be determined, for example, by comparison of a test sample (e.g., tumour 35 sample) to samples associated with a positive prognosis. High expression can be determined, for example, by comparison of a test sample (e.g., tumour sample) to samples associated with a negative prognosis. 18 WO 2009/045115 PCT/NZ2008/000260 For example, to obtain a prognosis, a patient's sample (e.g., tumour sample) can be compared to samples with known patient outcome. If the patient's sample shows high expression of GCPMs that is comparable to samples with good outcome, and/or higher 5 than samples with poor outcome, then a positive prognosis is implicated. If the patient's sample shows low expression of GCPMs that is comparable to samples with poor outcome, and/or lower than samples with good outcome, then a negative prognosis is implicated. Alternatively, a patient's sample can be compared to samples of actively proliferating/non-proliferating tumour cells. If the patient's sample shows high expression 10 of GCPMs that is comparable to actively proliferating cells, and/or higher than non proliferating cells, then a positive prognosis is implicated. If the patient's sample shows low expression of GCPMs that is comparable to non-proliferating cells, and/or lower than actively proliferating cells, then a negative prognosis is implicated. 15 As further examples, the expression levels of a prognostic signature comprising two or more GCPMs from a patient's sample (e.g., tumour sample) can be compared to samples of recurrent/non-recurrent cancer. If the patient's sample shows increased or decreased expression of CCPMs by comparison to samples of non-recurrent cancer, and/or comparable expression to samples of recurrent cancer, then a negative prognosis is 20 implicated. If the patient's sample shows expression of GCPMs that is comparable to samples of non-recurrent cancer, and/or lower or higher expression than samples of recurrent cancer, then a positive prognosis is implicated. As one approach, a prediction method can be applied to a panel of markers, for example 25 the panel of GCPMs outlined in Table A, Table B Table C or Table D, in order to generate a predictive model, This involves the generation of a prognostic signature, comprising two or more GCPMs. The disclosed GCPMs in Table A, Table B, Table C or Table Dtherefore provide a useful 30 set of markers to generate prediction signatures for determining the prognosis of cancer, and establishing a treatment regime, or treatment modality, specific for that tumour. In particular, a positive prognosis can be used by a patient to decide to pursue standard or less invasive treatment options. A negative prognosis can be used by a patient to decide to terminate treatment or to pursue highly aggressive or experimental treatments. In 35 addition, a patient can chose treatments based on their impact on the expression of prognostic markers (e.g., GCPMs). 19 WO 2009/045115 PCT/NZ2008/000260 Levels of GCPMs can be detected in tumour tissue, tissue proximal to the tumour, lymph node samples, blood samples, serum samples, urine samples, or faecal samples, using any suitable technique, and can include, but is not limited to, oligonucleotide probes, quantitative PCR, or antibodies raised against the markers. It will be appreciated that by 5 analyzing the presence and amounts of expression of a plurality of GCPMs in the form of prediction signatures, and constructing a prognostic signature, the sensitivity and accuracy of prognosis will be increased. Therefore, multiple markers according to the present invention can be used to determine the prognosis of a cancer. 10 The invention includes the use of archived paraffin-embedded biopsy material for assay of the markers in the set, and therefore is compatible with the most widely available type of biopsy material. It is also compatible with several different methods of tumour tissue harvest, for example, via core biopsy or fine needle aspiration. In certain aspects, RNA is isolated from a fixed, wax-embedded cancer tissue specimen of the patient. Isolation may 15 be performed by any technique known in the art, for example from core biopsy tissue or fine needle aspirate cells, In one aspect, the invention relates to a method of predicting a prognosis, e.g., the likelihood of long-term survival of a cancer patient without the recurrence of cancer, 20 comprising determining the expression level of one or more prognostic markers or their expression products in a sample obtained from the patient, normalized against the expression level of other RNA transcripts or their products in the sample, or of a reference set of RNA transcripts or their expression products. In specific aspects, the prognostic marker is one or more markers listed in Table A, Table -B, Table C or Table D or is 25 included as one or more of the prognostic signatures derived from the markers listed in Table A, Table B, Table C or Table D. In further aspects, the expression levels of the prognostic markers or their expression products are determined, e.g., for the markers listed in Table A, Table B, Table C or Table 30 D, a prognostic signature derived from the markers listed in Table A, Table B, Table C or Table D. In another aspect, the method comprises the determination of the expression levels of a full set of prognosis markers or their expression products, e.g., for the markers listed in Table A, Table B, Table C or Table D, or, a prognostic signature derived from the markers listed in Table A, Table B, Table C or Table D. 35 In an additional aspect; the invention relates to an array (e.g., microarray) comprising polynucleotides hybridizing to two or more markers, e.g., for the markers listed in Table A, 20 WO 2009/045115 PCT/NZ2008/000260 Table B, Table C or Table D, or a prognostic signature derived from the markers listed in Table A, Table B, Table C or Table D. In particular aspects, the array comprises polynucleotides hybridizing to prognostic signature derived from the markers listed in Table A, Table B, Table C or Table D, or e.g., for a prognostic signature. In another 5 specific aspect, the array comprises polynucleotides hybridizing to the full set of markers, e.g., for the markers listed in Table A, Table B, Table C or Table D, or, e.g., for a prognostic signature. For these arrays, the polynucleotides can be cDNAs, or oligonucleotides, and the solid jo surface on which they are displayed can be glass, for example. The polynucleotides can hybridize to one or more of the markers as disclosed herein, for example, to the full-length sequences, any coding sequences, any fragments, or any complements thereof. In particular aspects, an increase or decrease in expression levels of one or more GCPM indicates a decreased likelihood of long-term survival, e.g., due to cancer recurrence, 15 while a lack of an increase or decrease in expression levels of one or more GCPM indicates an increased likelihood of long-term survival without cancer recurrence. In further aspects, the invention relates to a kit comprising one or more of: (1) extraction buffer/reagents and protocol; (2) reverse transcription buffer/reagents and protocol; and 20 (3) quantitative PCR buffer/reagents and protocol suitable for performing any of the foregoing methods.- Other aspects and advantages of the invention are illustrated in the description and examples included herein. 21 WO 2009/045115 PCT/NZ2008/000260 Table A: GCPMs for cell proliferation signature Unique ID Gene Symbol Gene Name GenBank Acc. No. Gene Aliases A:09020 CCND1 cyclin D1 NM_053056 BCL1; PRAD1; U21B31: ---------- - -D11S287E C:0921 CCNEI cyclin El NM 001238, CCNE NM 057182 A:05382 CDC2 cell division cycle 2, NM 001786, CDKI; GI to S and G2 to NM_033379 MGCI 11195; M DKFZp686L2022 A:09842 CDK7 cyclin-dependent NM_001799 CAKI; STK1; kinase 7 (MO15 CDKN7; homolog, Xenopus p39MO15 laevis, cdk - ------------------------------ activating kinase) B:7793 CHEK1 CHK1 checkpoint NM_001274 CHK1 homolog (S. pombe) A:03447 CSE1L CSE1 chromosome NM_001316 CAS; CSE1; segregation 1 -like XPO2; (yeast) MGC1 17283; MGC130036; A:ON55 DKC dyskeratosis NM 001363 DKC; NAP57; congenita 1, NOLA4; XAP1011 dyskerin dyskerin A:07296 DUT dUTP NM 001025248, dUTPase; pyrophosphatase NM_001025249' FLJ20622 NM_001948 C:2467 E4F1 E4F transcription NM_004424 E4F; MGC99614 factor 1 8:065 FEN1 flap structure- NM 004111 MF1; RAD2; specific FEN-1 endonuclease 1 A:01437 FH fumarate hydratase NM_000143 MCL; LRCC; HLRCC; MCUL1 B:9714 XRCC6 X-ray repair t NM_001469 ML8; KU70; complementing TLAA; CTC75; I defective repair in CTCBF; G22P1 Chinese hamster cells 6 (Ku autoantigen, 70kDa) 6:3553 hk- GPS1 G protein pathway NM 004127 CSN1; COPSI; r1 suppressor 1 NM 212492 MGC71287 B:4036 KPNA2 karyopherin alpha 2 NM_002266 QIP2; RCHI; (RAG cohort 1, IPOA1; importin alpha 1) SRP1alpha A:06387 MAD2L1 MAD2 mitotic arrest NM 002358 MAD2; HSMAD2 deficient-like I (yeast) 22 WO 2009/045115 PCT/NZ2008/000260 A:08668 MCM3 MCM3 NM_002388 HCC5; Pl.h; minichromosome RLFB; maintenance MGC1157; P1 deficient 3 (S. MCM3 cerevisiae) B:8147 MCM6 MCM6 NM_005915 Mis5; PISO5 M; minichromosome MCG40308 maintenance deficient 6 (MISS homolog, S. pombe) (S. ------- cerevisiae) B:7620 MCM7 MCM7 NM_005916, MCM2; CDC47; minichromosome NM_182776 P85MCM; maintenance P1CDC47; deficient 7 (S- PNAS-146; cerevisiae) CDABP0042; ___________ _______ P1.1-MCM3 A:10600 RAB8A RABBA, member NM_005370 MEL; RAB8 RAS oncogene _ _ _ _ ~~~family_ _j _ __ _ __ A:09470 KITLG KIT ligand NM_ 000899 SF; MGF; SCF; NM 003994 KL-1; Kitl; - ------------------ DKFZp686F2250 A:06037 MYBL2 v-myb NM 002466 BMYB; I myeloblastosis viral MGC1 5600 oncogene homolog -------- (avian)-like 2 A:01677 non-metastatic NM_000269, AWD; GAAD; cells 1, protein NM_198175 NM23; NDPKA; (NM23A) NM23-H1 I expressed in A:03397 PRDX1 peroxiredoxin 1 NM_002574 PAG; PAGA; NM_181696, PAGB; MSP23; NM_181697 NKEFA; TDPX2 A:03715 PCNA proliferating cell NM_002592, MGC8367 nuclear antigen NM_182649 A:02929 P polymerase (DNA NM 006230 None directed), delta 2, regulatory subunit 5AkDa A:04680 POLE2 polymerase (DNA NM 002692 DPE2 directed), epsilon 2 A------------ -(p59 subunit) A:09169 RAN RAN, member RAS NM_006325 TC4; Gsp Roncogene family ARA24 A:09145 RBBP8 retinoblastoma NM 002894. RIM; CTIP binding protein 8 NM_203291, NM 203292 A:09921 RFC4 replication factor C NM_002916 A1 RFC37; (activator 1) 4, NM_181573 MGC27291 37kDa A: 10597 RPAI replication protein NM_002945 HSSB; RF-A; RP A1, 70kDa A; REPA1; RPA70 A:00231 RPreplication protein NM_002947 REPA3 A3, 14kDa-

----------

23 WO 2009/045115 PCT/NZ2008/000260 A:09802 RRMI ribonucleotide NM_001033 R1; RRI; RlR1 reductase MI ______ ~ polypeptide ________ ______ B:3501 RRM2 ribonucleotide NM 001034 R2; RR2M reductase M2 polypeptide A:08332 S100A5 T100 calcium NM_002962 S100D binding protein A5 A:07314 FSCN1 fascin homolog 1, NM_003088 SNL; p55; actin-bundling FLJ38511 protein (Strongylocentrotus ------ ----- --- purpuratus) A:03507 FOSL1 _ FOS-like antigen I NM_005438 FRA1; fra-1 A:09331 CDO45L oDC45 cell division NM_003504 CO45; cycle 45-like (S. CDC45L2; cerevisiae)

.

PORC-PI-1 09436 SMC3 structural NM_005445 BAM; BMH: maintenance of HCAP CSPG6; ____chromosomes 3 ___ __ SMC3LI____ A:09747 BUB33 budding NM_001007793 BL; hBUB3 uninhibited by NM_004725 benzimidazoles 3 A:00891 W0R39 homolog (yeast) ___ ____ ____ A:00891 WDR39 WD repeat domain NM_004804 UiAO1 3 9 - - -- ------ - :05648 SMC4 structural NM 001002799: CAPC; SMC4LI; maintenance of NM_001002800 hCAP-C chromosomes 4' NM 005496 transducer ofPI NM_005749 TOB;TROB; ERBB2, I APR6; PIG49 TROB1I; MGC34446; MGC104792 A:0476 0 0ATG7 0AT7 cellophao NM_006395 G5A7; APG7: related 7 homolog DKFZp434NO735 AS04950 OCT cerevisiae) NM00109570 Ccth; Nip71; containing TCP1 NM_006429 CCT-ETA; subunit 7 (eta) MGC1 10985; - ------ ----- TCP-1-eta A:09500 CCT2 chaperonin NM_006431 O ;99D8.1;containing TCP1 PRO1633; CCT subunit 2 (beta) beta; MGC14.2074; MGC142076; ------- TCP-1-beta 05486 CDC37 CDC37 cell division NM_007065 P0b3 cycle 37 homnolog (---- S. cerevisiae)--------- B:7247 TE1 three prime repair NM_016381, AGSI; DRN3; exonuclease 1 NM_032166, ATRIP; NM_033627, FLJ12343; NM_033628, DKp4J31 NM_033629, NM_130384 24 WO 2009/045115 PCT/NZ2008/000260 A:01322 PARK7 Parkinson disease NM 007262 DJI; DJ-1; (autosomal FLJ27376 recessive, early I onset) 7 1 A:09401 PRE13 preimplantation NM 015387, 2C4D; MOBI; protein 3 NM 199482 M0B3; CGI-95; 1MGC12264 A mutL homolog 3 (E. NM 001040108, HNPCC7; coli) NM 014381 MGC1 38372 A:02984 CACYBP cacyclin binding NM_001007214 ISIP; GIG5; protein NM 014412 MGC87971; PNAS-107; S10OA6BP; RPI -------- - 102G20.6 A:09821 MCTS1 malignant T cell NM 014060 MCTI; MCT-1 amplified sequence SA03435 GMNN geminin, DNA NM_015895 Gem; RP3 Eplication inhibitor 369A17.3 :1035 GINS complex NM_016095 PSF2; Pfs2; subunit 2 (Psf2 HSPC037 homolog) A:02209 POLE3 polymerase (DNA NM_017443 p17; YBL1; directed), epsilon 3 CHRAC17; (p17 subunit) CHARAC17 A:05280 ANLN anillin, actin binding NM_018685 scra; Scraps; protein ANILLIN; A:07468 SEPT11 septin 11 NM_018243 None A:03912 PBK PDZ binding kinase NM_018492 SPK; TOPK; _ _Nori-3; FLJ14385 1:8449 BCCIP BRCA2 and NM_016567, TOK-1 CDKN1A NM_078468, interacting protein NM_078469 B:2392 DBF4B DBF4 homolog B NM_025104, DRF1; ASKL1; (S. cerevisiae) NM 145663 FLJ13087; t _ MGC15009 I:6501 CD276 CD276 molecule NM_001024736, B7H3 B7-H3 NM 025240 [:5467 LAMAI laminin, alpha 1 NM_005559 LAMA Table A: Proliferation-related genes differentially expressed between cell lines in high and low proliferative states. Genes that were differentially expressed between cell lines in confluent (low proliferation) and semi-confluent 5 (high proliferation) states (see Figure 1) were identified by microarray analysis on 30K MWG Biotech arrays. Table A comprises the subset of these genes that were categorized by gene ontology analysis as cell proliferation-related. 25 WO 2009/045115 PCT/NZ2008/000260 Table B: GCPMs for cell proliferation signature Uniue ID Gene Description LocusLink GenBank Accession B:7560 v-abl Abelson murine leukaemia 25 NM_005157 viral oncogene homolog 1 (ABL1), transcript variant a, mRNA A:09071 acetyIcholinesterase (YT blood 43 NM_015831, group) (ACHE), transcript variant NM_000665 E4-E5, mRNA A:04114 acid phosphatase 2, lysosoma 53 NM 001610 (ACP2), mRNA A:09146 acid phosphatase, prostate (ACPP), 55 NM_001099 m RNA - -------- A:09585 adrenergic, alpha-1D-, receptor 146 NM_000678 (ADRA1D), mRNA A:08793 adrenergic, alpha-1B-, receptor 147 NM_000679 (ADRA1B), mRNA C:0326 ARA),m A:148 NM 033304 adrenergic, alpha-IA-, receptor 1 (ADRAIA), transcript variant 4, mRNA A:02272 adrenergic, alpha-2A-, receptor 150 NM_000681 (ADRA2A), mRNA A:05807 jagged I (Alagille syndrome) 182 NM_000214 (JAG1), mRNA A:02268 aryl hydrocarbon receptor (AHR), 196 NM_001621 mRNA A:00978 allograft inflammatory factor I 199 NM_004847 (AIF1), transcript variant 2, mRNA A:06335 adenylate kinase I (AKI), mRNA 203 NM_000476 A:07028 v-akt murine thymoma viral 207 NM 005163 oncogene homolog I (AKTI), transcript variant 1, mRNA A:05949 v-akt murine thymoma viral 208 NM_001626 oncogene homolog 2 (AKT2), mRNA B:9542 arachidonate 15-lipoxygenase, 247 NM_001141 second type (LOX15B)mRNA A:02569 bridging integrator 1 (BINI), 274 NM_004305 transcrpt variant 8, mRNA C:0393 amyloid beta (A4) precursor protein- 322 NM 001164 binding, family B, member I (Fe65) (APB1), transcript variant 1, I mRNA B:5288 amyloid beta (A4) precursor protein- 323 NM_173075 binding, family B, member 2 (Fe65 ike) (APBB2), mRNA A:09151 adenomatosis polyposis coli (APC), 324 NM_000038 mRNA B:3616 baculoviral IAP repeat-containing 5 332 NM_001168 (survivin) (BIRC5), transcript variant 1, mRNA 02007 androgen receptor 367 NM-001011645 (dihydrotestosterone receptor; testicular feminization; spinal and bulbar muscular atrophy; Kennedy disease) (AR), transcript variant 2, mRNA A:04819 amphiregulin (schwannoma-derived 374 NM_001657 growth factor) (AREG), mRNA ............... 26 WO 2009/045115 PCT/NZ2008/000260 A:01709 ras homolog gene family, member 391 NM_001665 G (rho G) (RHOG), mRNA B:6554 ataxia telangiectasia mutated 472 NM_000051 (includes complementation groups A, C and D) (ATM), transcript variant 1, mRNA A:02418 ATPase, Cu++ transporting, beta 545 NM_000053 polypeptide (ATP7B), transcript variant 1, mRNA A:05997 AXL receptor tyrosine kinase (AXL), 558 NM_001699 transcript variant 2, mRNA B:0073 brain-specific angiogenesis inhibitor 575 NM_001702 1 (BAI1), mRNA A:07209 BCL2-associated X protein (BAX) 581 NM_004324 transcript variant beta, mRNA B:1845 Bardet-Biedl syndrome 4 (BBS4). 586 NM 033028 mRNA A:00571 branched chain aminotransferase 2, 588 NM 001190 mitochondrial (BCAT2), mRNA A:09020 cyclin D1 (CCND1), mRNA 595 NM 053056 A:10775 B-cell CLL/lymphoma 2 (BCL2), 596 NM_000633 nuclear gene encoding mitochondrial protein, transcript variant alpha, mRNA A:09014 B-cell CLL/lyrnphoma 3 (BCL3), 602 NM_005178 mRNA C:2412 B-cell CLL/lymphoma 6 (zinc finger 604 NM_001706 protein 51) (BCL6), transcript variant 1, mRNA A 08794 tumour necrosis factor receptor 608 NM_001192 superfamily, member 17 (TNFRSF17), mRNA A:01162 Bloom syndrome (BLM), mRNA 641 NM_000057 B:5276 basonuclin 1 (BNC1), mRNA 646 NM 001717 B:3766 polymerase (RNA) IllI (DNA 661 NM_001722 directed) polypeptide D, 44kDa (POLR3D), mRNA C:2188 dystonin (DST), transcript variant 1, 667 NM_183380 mRNA :5103 breast cancer 1, early onset 672 NM 007294 (BRCA1), transcript variant BRCAIa, mRNA A:03676 breast cancer 2, early onset 675 NM_000059 (BRCA2,mRNA A:07404 zinc finger protein 36, C3H type-like 677 NM_004926 1 (ZFP36L1), mRNA B:5146 zinc finger protein 36, C3H type-like 678 NM 006887 1 2 (ZFP36L2), mRNA --- B:4758 bone marrow stromal cell antigen 2 684 NM 004335 (BST2), mRNA B:4642 betacellulin (BTC), mRNA 685 NM_001729 C:2483 B-cell translocation gene 1, anti- 694 NM_001731 proliferative (BTG1 .mRNA 00 B3:0618 BUBI budding uninhibited by 699 NM004336 benzimidazoles I homolog (yeast) (BUBI), mRNA A:09398 BUB1 budding uninhibited by 701 NM_001211 benzimidazoles I homolog beta _ (yeast) (BUB1B) mRNA ---- 27 WO 2009/045115 PCT/NZ2008/000260 Ar01104 chromosome 8 open reading frame 734 NM_004337 1 C8orf1), mRNA B:3828 calmodulin 2 (phosphorylase 805 NM_001743 kinase, delta) (CALM2), mRNA B:6851 calpain 1, (mull) large subunit 823 NM_005186 (CAPNI), mRNA A:09763 calpain, small subunit I (CAPNS1) 826 NM_001749 transcript variant 1, mRNA B:0205 core-binding factor, runt domain, 863 NM 175931 alpha subunit 2; translocated to, 3 (CBFA2T3), transcript variant 2, mRNA B:2901 runt-related transcription factor 3 864 NM_004350 (RUNX3), transcript variant 2, mRNA ------- A:01132 cholecystokinin B receptor 887 NM_176875 (CCKBR), mRNA A:04253 cyclin A2 (CCNA2), mRNA 890 NM_001237 A:04253 cyclin A2 (CCNA2), mRNA 891 NM 001237 A:09352 cyclin C (CCNC), transcript variant 892 NM_005190 1, mRNA A:10559 cyclin D2 (CCND2), mRNA 894. NM_001759 A:02240 T cyclin D3 (CCND3),RN 896 NM_001760 C:0921 cyclin El (CCNEI), transcript 898 NM 001238 variant 1, mRNA C:0921 cyclin El (CCNE1), transcript 899 NM_001238 variant 1, mRNA B:5261 cyclin G1 (CCNGI), transcript 900 NM_004060 variant 1, mRNA A:07154 cyclin G2 (CCNG2), mRNA 901 NM_004354 A:07930 cyclin H (CCNH), rnRNA 1 902 NM_001239 A:01253 cyclin TI (CCNTI), mRNA 904 NM_001240 8:0645 cyclin T2 (CCNT2), transcript 905 NM_058241 variant b, mRNA C2676 CD3E antigen, epsilon polypeptide 916 NM_000733 (TiT3 complex) (CD3E), mRNA A:10068 CD5 antigen (p56-62) (CD5), 921 NM_014207 mRNA A:07504 tumour necrois factor receptor 939 NM_001242 superfamily, member 7 (TNFRSF7), mRNA A:05558 CD28 antigen (Tp44) (CD28), 940 NM_006139 mRNA A:07387 CD86 antigen (CD28 antigen ligand 942 NM_ 175862 2, B7-2 antigen) (CD86), transcript variant 1, mRNA A:06344 tumour necrosis factor receptor 943 NM_001243 superfamily, member 8 (TNFRSF8), transcript variant 1, mRNA ...-- _ A:03064 tumour necrosis factor (ligand) 944 NM 001244 superfamily, member 8 (TNFSF8), mRNA A:03802 CD33 antigen (gp67) (CD33), 945 NM_001772 mRNA - - -- -- A:07407 CD40 antigen (TNF receptor 958 NM_001250 superfamily member 5) (CD40), transcript variant 1, mRNA 28 WO 2009/045115 PCT/NZ2008/000260 B:9757 ]CD40 ligand (TNF superfamily, 959 NM_000074 member 5, hyper-IgM syndrome) (CD40LG), mRNA A:07070 CD68 antigen (CD68), mRNA 968 NM 001251 A:04715 tumour necrosis factor (ligand) 970 NM_001252 superfamily, member 7 (TNFSF7), mRNA A:09638 CD81 antigen (target of 975 NM 004356 antiproliferative antibody 1) (CD81) mRNA A:05382 cell division cycle 2, GI to S and G2 983 NM_001786 to M (CDC2), transcript variant 1, mRNA A:00282 cell division cycle 2-like 1 (PITSLRE 984 NM_033486 proteins) (CDC2L1), transcript variant 2, mRNA A:00282 cell division cycle 2-like 1 (PITSLRE 985 NM 033486 proteins) (CDC2LI), transcript variant 2, mRNA A:07718 CDC5 cell division cycle 5-like (S. 988 NM_001253 pombe) (CDC5L), mRNA --------- ------- A:00843 septin 7 (SEPT7), transcript variant 989 NM 001788 1, mRNA A05789 CDC6 cell division cycle 6homolog 990 NM_001254 (S. cerevisiae) (CDC6), mRNA A:03063 CDC20 cell division cycle 20 991 NM 001255 homolog (S. cerevisiae) (CDC20), mRNA B:4185 cell division cycle 25A (CDC25A), 993 NM 001789 transcript variant 1, mRNA A:04022 cell division cycle 25B (CDC25B), 994 NM 021873 __ transcript variant 3, mRNA --------- ------- B:9539 cell division cycle 25C (CDC25C), 995 NM_001790 transcript variant 1, mRNA..........---- B:5590 cell division cycle 27 CDC27 996 NM 001256 B:9041 cell division cycle 34 (CDC34), 997 NM_004359 mRNA A:03518 cyclin-dependent kinase2 (CDK2), 1017 NM_052827 transcript variant 2, mRNA A:02068 cyclin-dependent kinase 3 (CDK3), 1018 NM_001258 mRNA B:4838 cyclin-dependent kinase 4 (CDK4), 1019 - NM 000075 mRNA A:10302 cyclin-dependent kinase 5 (CDK5), 1020 NM_004935 mRNA A:01923 cyclin-dependent kinase 6 (CDK6), 1021 NM 001259 mRNA A:09842 cyclin-dependent kinase 7 (MO15 1022 NM_001799 homolog, Xenopus laevis, cdk activating kinase) (CDK7), mRNA - ------------- A:08302 cyclin-dependent kinase 8 (CDK8), 1024 NM 001260 mRNA A:05151 cyclin-dependent kinase 9 (CDC2- 1025 NM 001261 related kinase) (CDK9), mRNA A:09736 cyclin-dependent kinase inhibitor IA 1126 NM_078467 (p21, Cip) (CDKN1A), transcript variant 2,2mRNA 29 WO 2009/045115 PCT/NZ2008/000260 05571 cyclin-dependent kinase inhibitor1B 1027 NM 004064 ---------- (p27, Kip1) (CDKN1B), mRNA -------- A:08441 cyclin-dependent kinase inhibitor 1028 NM_000076 11C (p57, Kip2) (CDKN1C), mRNA B:9782 cyclin-dependent kinase inhibitor 2A 1029 NM 058195 (melanoma, p16, inhibits CDK4) (CDKN2A), transcript variant 4, mRNA C:6459 cyclin-dependent kinase inhibitor 28 1030 NM_004936 (p15, inhibits CDK4) (CDKN2B), transcript variant 1, mRNA 8:0604 cyclin-dependent kinase inhibitor 1031 NM_001262 2C (p18, inhibits CDK4) (CDKN2C), tra n sc rip t v a ria n t 1 , m R N A - - - - A:03310 cyclin-dependent kinase inhibitor 1032 NM_079421 2D (p19, inhibits CDK4) (CDKN2D), transcript variant 2, mRNA A-05799 cyclin-dependent kinase inhibitor 3 1033 NM 005192 (CDK2-associated dual specificity phosphatase) (CDKN3), mRNA ......------ B:9170 centromere protein B, 80kDa 1059 NM 001810 ..... (CENPB), mRNA A:07769 centromere protein E, 312kDa 1062 NM_001813 (CENPE), mRNA A:06471 centromere protein F, 350/400ka 1063 NM 016343 (mitosin) (CENPF), mRNA A:03128 centrin, EF-hand protein, 1 1068 NM_004066 (CETN1), mRNA A:05554 centrin, EF-hand protein,2 1069 NM 004344 (CETN2), mRNA 8:4016 centrin, EF-hand protein, 3 (CDC31 1070 NM_004365 homolog, yeast) (CETN3), mRNA b:5082 regulator of chromosome 1104 NM_001048194, condensation 1 RCC1 NM_001048195, NM 001269 B:7793 CHK1 checkpoint homolog (S. 1111 NM_001274 pombe) (CHEKI), mRNA B:8504 checkpoint suppressor 1 (CHES1), 1112 N M_005197 m R N A--------------_ _ --- __ A:00320 cholinergic receptor, muscarinic 1 1128 NM_000738 (CHRM1), mRNA A:10168 cholinergic receptor, muscarinic 3 1131 NM_000740 (CHRM3), mRNA A:06655 cholinergic receptor, muscarinic 4 1132 NM_000741 (CHRM4), mRNA A:00869 cholinergic receptor, muscarinic 5 1133 NM_012125 (CHRM5) rnRNA C:0649 CDC28 protein kinase regulatory 1163 NM_001826 subunit 18B (CKS1 B), mRNA______________ B:6912 CD28 protein kinase regulatory 1164 NM_001827 subunit 2 CKS2), mRNA_ A:07840 CDC-like kinase 1 (CLK1), 1195 NM_004071 transcript variant 1, mRNA B:8665 polo-like kinase 3 (Drosophila) 1263 NM_004073 (PK3),mrRNA B:8651 collagen, type IV, alpha 3 11285 NM_000091 (Goodpasture antigen) (COL4A3), transcript variant 1, mRNA 30 WO 2009/045115 PCT/NZ2008/000260 8:4734 mitogen-activated protein kinase 8 1326 NM_005204 (MAP3K8), mRNA --- B:3778 cysteine-rich protein I (intestinal) 1396 NM_001311 (CRIP1), mRNA B:3581 cysteine-rich protein 2 (CRIP2), 1397 NM_001312 _______mRNA ------ - --- B:5543 v-crk sarcoma virus CTIO 1398 NM_005206 oncogene homolog (avian) (CRK), transcript variant I, mRNA - - ----- B:6254 v-crk sarcoma virus CT10 1399 NM_005207 oncogene homolog (avian)-like (CRKL), mRNA A:03447 CSE1 chromosome segregation 1- 1434 NM_177436 like (yeast) (CSE1 L), transcript --variant 2, m RNA _-_-_-----_-------------- A:10730 colony stimulating factor 1 1435 NM_172210 (macrophage) (CSFI), transcript variant 2, mRNA A:05457 colony stimulating factor 1 receptor, 1436 NM_005211 formerly McDonough feline sarcoma viral (v-fms) oncogene homolog (CSF1R), mRNA B:1908 colony stimulating factor 3 1440 NM_172219 (granulocyte) (CSF3), transcript variant 2, mRNA A:01629 c-src tyrosine kinase (CSK), mRNA 1445 NM_004383 A:07097 casein kinase 2, alpha prime 1459 NM 001896 __poypeptde (CSNK2A2), mRNA B:3639 cysteine and glycine-rich protein 2 1466 NM 001321 (CSRP2), mRNA B:8929 C-terminal binding protein 1 CTBP1 1487 NM_001012614, -------------------------- ------ N M _ 001328 A:08689 C-terminal binding protein 2 1488 NM_001329 (CTBP2), transcript variant 1, mRNA -02604 cardiotrophin I (CTFI), mRNA 1489 NM 001330 A:05018 disabled homolog 2, mitogen- 1601 NM_001343 responsive phosphoprotein ---- (Drosophila) (DAB2), mRNA A:09374 deleted in colorectal carcinoma 1630 NM_005215 (DCC), mRNA A:05576 dynactin 1 (p150, glued homolog, 1639 NM_004082 Drosophila) (DCTN1), transcript variant 1, mRNA A:04346 growth arrest and DNA-damage- 1647 NM_001924 inducible, alpha (GADD45A), mRNA B:9526 DNA-damage-inducible transcript 3 1649 NM_004083 (DDIT3), mRNA _____ ______ B:6726 DEAD/H (Asp-Glu-Ala-Asp/His) box 1663 NM_030653 polypeptide 11 (CH LI-like helicase homolog, S. cerevisiae) (DDX11), transcript variant 1, mRNA B:1955 deoxyhypusine synthase (DHPS), 1725 NM_001930 transcript variant 1, mRNA A:09887 diaphanous homolog 2 (Drosophila) 1730 NM_007309 (DIAPH2), transcript variant 12C, mRNA B:4704 septin 1 (SEPTI), mRNA 1731 NM 052838 31 WO 2009/045115 PCT/NZ2008/000260 A:105535 dskeratosis congenita 1, dyskerin 1736 NM_001363 ____- (DKCI), mRNA ---- --- - A:06695 discs, large homolog 3 1741 NM_021120 (neuroendocrine-dig, Drosophila) (DLG3), mRNA 8:9032 dystrophia myotonica-containing 1762 NM_004943 WD repeat motif (DMWD), mRNA B:4936 DNA2 DNA replication helicase 2- 1763 XM_166103, .... like (yeast) (DNA2L)_mRNA XM_938629 B:5286 dynein, cytoplasmic 1, heavy chain 1778 NM_001376 1 (DYNC1H1), mRNA __NM001376 B:9089 dynamin 2 (DNM2), transcript 1785 NM_001005362 * _ variant 4, mRNA A:05674 deoxynucleotidyltransferase, 1791 NM_004088 terminal (DNTT), transcript variant 1, mRNA A:00269 heparin-binding EGF-iike growth 1839 NM_001945 I factor (HBEGF), mRNA B:3724 deoxythymidylate kinase 1841 NM_012145 (thymidylate kinase) (DTYMK), mRNA A:01114 dual specificity phosphatase 1 1843 NM _004417 (DUSP1), mRNA A:08044 dual specificity phosphatase 4 1846 NM_057158 (DUSP4), transcript variant 2, mRNA B:0206 dual specificity phosphatase 6 1848 NM_001946 (DUSP6), transcript variant 1, mRNA A07296 dUTP pyrophosphatase (DUT), 1854 NM_001948 nuclear gene encoding mitochondrial protein, transcript variant 2, mRNA

--------

B:5540 E2F transcription factor 1 (E2FI). 1869 NM_005225 mRNA 8:4216 E2F transcription factor 2 (E2F2), 1870 NM_004091 mRNA B:6451 E2F transcription factor 3 (E2F3), 1871 NM_001949 mRNA A:03567 E2F transcription factor 4- 1874 NM 001950 p107/p130-binding (E2F4, mRNA C:2484 I E2F transcription factor 5, p130- 1875 NM_001951 binding (E2F5), mRNA .... . _--_- B:9807 E2F transcription factor 6 (E2F6). 1876 NM 001952 transcript variant a, mRNA C:2467 E4F transcription factor 1 (E4FI), 1877 NM_004424 mRNA A:04592 endothelial cell growth factor 1 1890 NM_001953 (platelet-derived) (ECGFI), mRNA A:00257 endothelial differentiation, 1903 NM_001401 lysophosphatidic acid G-protein 1 coupled receptor, 2 (EDG2), I transcript variant 1, mRNA A:08155 endothelin 1 (EDNI), mRNA 1906 NM 001955 A:08447 endothelin receptor type A 1909 NM_001957 ___(EDNRA), mRNA_____ ________ A:09410 epidermal growth factor (beta- 1950 NM 001963 urogastrone) (EGF), mRNA 32 WO 2009/045115 PCT/NZ2008/000260 A: 10005 epidermal growth factor receptor 1956 NM 005228 (erythroblastic leukaemia viral (v erb-b) oncogene homolog, avian) (EGFR), transcript variant 1, mRNA _ _ A:03312 early growth response 4 (EGR4), 1961 NM_001965 mRNA A:06719 eukaryotic translation initiation 1982 NM_001418 factor 4 gamma, 2 (EIF4G2), mRNA A:10651 E74-.Iike factor 5 (ets domain 2001 NM_001422 transcription factor) (ELF5), ___transcript variant 2, mRNA__________ -- 07972 tEK3, ETS-domain protein (SRF 2004 NM_005230 accessory protein 2) (ELK3)mRNA A:06224 elastin (supravalvular aortic 2006 NM_000501 stenosis, Williams-Beuren Syndrome) (LN), mRNA A:10267 epithelial membrane protein 1 2012 NM 001423 (EMP1), mRNA A:09610 epithelial membrane protein 2 2013 NM_001424 (EMP2),mRNA __ A:00767 epithelial membrane protein 3 2014 NM_001425 (EMP3), mRNA I :07219 glutamyl aminopeptidase 2028 NM_001977 (aminopeptidase A) (ENPEP), mRNA A:10199 EIA binding protein p300 (EP300), 2033 NM_001429 m RNA --- --- -- _-_- A 10325 EPH receptor B4 (EPHB4), mRNA 2050 NM_004444 A:04352 glutamyl-proly-tRNA synthetase 2059 NM_004446 (EPRS), mRNA A:04352 glutamyl-prolyl-KtRNA synthetase 2060 NM_004446 (EPRS), mRNA A:08200 nuclear receptor subfamily 2, group 2063 NM_005234 F, member 6 (NR2F6), mRNA

---

B:1429 v-erb-b2 erythroblastic leukaemia 2064 NM_001005862, viral oncogene homolog 2, NM_004448 neuro/glioblastoma derived oncog ene horm olog (avian) E R B B2 ------ A:02313 v-erb-a erythroblastic leukaemia 2066 NM_005235 viral oncogene homolog 4 (avian) (ERBB4), mRNA A:08898 epiregulin (EREG), mRNA 2069 NM_001432 A:07916 Ets2 repressor factor (ERF), mRNA 2077 NM 006494 B:9779 v-ets erythroblastosis virus E26 2078 NM_182918 oncogene like (avian) (ERG), transcript variant 1, mRNA C:2388 enhancer of rudimentary homolog 2079 NM_004450 (Drosophila) (ERH), mRNA B:5360 endogenous retroviral sequence 2087 U87595 K(C4), 2 ERVK2 C:2799 estrogen receptor I (ESRI), mRNA 2099 NM 000125 A:01596 v-ets erythroblastosis virus E26 2113 NM_005238 oncogene homolog I (avian) (ETS1), mRNA -- ----- __..... A:07704 v-ets erythroblastosis virus E26 2114 NM_005239 oncogene homolog 2 (avian) (ETS2), mRNA 33 WO 2009/045115 PCT/NZ2008/000260 A:00924 ecotropic viral integration site 2A 2123 NM 014210 (EY02A), transcript variant 2, mRNA A:077 exostoses (multiple) I (EXTI), 2131 NM_000127 A 10493 exostoses (multiple) 2 (EXT2), 2132 NM_ 000401 transcript variant 1. mRNA

-_------

A:07741 coagulation factor 11 (thrombin) (F2)- 2147 NM000506 mRNA _1-2)) 2147 NM000506 A:06727 coagulation factor If (thrombin) 2149 NM 001992 receptor(F2RmRNA ~ A:10554 fatty acid binding protein 3, muscle 2170 NM 004102 and heart (mammary-derived growth inhibitr (F E3), mRNA A: 10780 fatty acid binding protein 5 2172 NM_001444 (Psoriasis-associated) (FABP5), mRNA______ __ __ __ _ B:9700 fatty acid binding protein 7, brain 2173 NM_001446 -- FABP7 ------ C:2632 PTK2B protein tyrosine kinase 2 2185 NM_173174 beta (PTK2B), transcript variant 1, mRNA A:07570 Fanconi anemia, complementation 2189 NM_004629 group G (FANCG), mRNA . ....-- A:08248 membrane-spanning 4-domains, 2206 NM_000139 subfamily A, member 2 (Fc fragment of IgE, high affinity I, receptor for; beta polypeptide) (MS4A2 mRNA B:9065 flap structure-specific endonucleas 2237 NM_004111 LL(FEN1), mRNA ----- _ ----- A:10689 glypican 4 (GPC4), mRNA 2239 NM 001448 897 fer (fps/fes related) tyroine kinase 2242 NM_005246 (phosphoprotein NCP94) (FER), mRNA B:1852 fibrinogen alpha chain (FGA), 2243 NM_000508 transcript variant alpha-E, mRNA B.1909 fibrinogen beta chain (FGB,mRNA 2244 -_---- NM 005141 A:07894 fibroblast growth factor 1 (acidic) 2246 NM_000800 --- - - (FGF1), transcript variant 1, mRNA B:7727 fibroblast growth factor 2 (basic) 2247 NM 002006 A(FGF2), mRNA A--01551 fibroblast growth factor 3 (murine 2248 NM_005247 mammary tumour virus integration site (v-int-2) oncogene homolog) (FGF3), mRNA A:10568 fibroblast growth factor 4 (heparin 2249 NM_002007 secretory transforming protein 1, Kaposi sarcoma oncogene) (FGF4), mRNA C:2679 fibroblast growth factor 5 (FGF5), 2250 - NM_033143 transcript variant 2, mRNA A:04438 fibroblast growth factor 6 (FGF6), 2251 INM 020996 mRNA C:2713 fibroblast growth factor 72252 NM 002009 (keratinocyte growth factor) (FGF7), mRNA :8151 fibroblast growth factor8 2253 NM 006119 (androgen-induced) (FGF8).. 34 WO 2009/045115 PCT/NZ2008/000260 transcript variant B, mRNA A:10353 fibroblast growth factor 9 (glia- 2254 NM_002010 a c t i v a t i n g f a c t o r ) ( F G F 9 ) , m R N A -- --- --- - A:10837 fibrobIast growth factor 10 (FGFI 0), 2255 NM_004465 mRNA B:1815 fibrinogen gamma chain (FGG), 2266 NM_021870 transcript variant gamma-B, mRNA 2 NM_000143 A:01437 fumarate hydratase (Fl-), nuclear 2271 NM_000143 gene encoding mitochondrial protein, mRNA A:04648 fragile histidine triad gene (FHT), 2272 NM_002012 mRNA B:1938 c-fos induced grw factor 2277 NM 004469 (vascular endothelial growth factor D) (FiGF). mRNA B:5100 fms-related tyrosine kinase 1 2321 NM 002019 (vascular endothelial growth factor/vascular permeability factor receptor) FLT1 A:05859 fms-related tyrosine kinase 3 2322 NM_004119 (FLT3), mRNA---------____ ________ A:05362 fis-related tyrosine kinase 3 ligand 2323 NM_001459 (FLT3LG), mRNA A:05281 v-fos FBJ imurine osteosarcoma 2353 NM_005252 viral oncogene homolog (FOS), mRNA A:01 965 FBJ murine osteosarcoina viral 2354 NM_006732 oncogene homolog B (FOSB), ___ mRNA _ _ _ _ A-01738 fyn-related kinase (FRK), mRNA 2444 NM_002031 A:03614 FK506 binding protein 12- 2475 NM_004958 rapamycin associated protein I (FRAP1), m RNA ------- 002032 A:08973 ferritin, heavy polypeptide I (FTHI), 2495 NM_002032 mRNA A:03646 FYN oncogene related to SRC, 2534 NM_002037 FGR, YES (FYN), transcript variant 1, mRNA 714 X-ray repair complementing 2547 NM_001469 defective repair in Chinese hamster cells 6 (Ku autoantigen, 70kDa) (XRCC6), mRNA----0

------

A:02378 GRB2-associated binding protein 1 2549 NM_002039 (GAB1), transcript variant 2, mRNA A:07229 cyclin G associated kinase (GAK), 2580 NM_005255 mRNA 8:9019 growth arrest-specific I (GASI), 2619 NM 002048 mRNA B:9019 growth arrest-specific- (AS1), 2620 NM 002048 mRNA B:9020 growth arrest-specific 6 (GAS6). 2621 NM_000820 mRNA A:10093 growth arrest-specific 8 (GAS8), 2622 NM_001481 mRNA A:09801 glucagon (GOG), mRNA - _ 2641 NM 002054 A:09968 nuclear receptor subfamily 6, group 2649 NM 033335 A, member 1 (NR6AI), transcript variant 3, mRNA 35 WO 2009/045115 PCT/NZ2008/000260 :4833 growth factor, augmenter of liver 2671 NM_005262 regeneration (ERVI homolog, S. cerevisiae) (GFER), mRNA A:08908 growth factor independent 1 (GFI), 2672 NM_005263 _ mRNA A:02108 GPI anchored molecule like protein 2765 NM 002066 ---- GML), mRNA A:05004 gonadotropin-releasing hormone 1 2796 NM_000825 (luteinizing-releasing hormone) GNRH1),rmRNA B:4823 stratifin (SFN), mRNA 2810 NM 006142 B:3553_hk G protein pathway suppressor 1 2873 NM_212492 r1____ (GPSI), transcript variant 1, mRNA I A:04124 G protein pathway suppressor 2 2874 NM 004489 (GPS2), mRNA A:05918 granulin (GRN), transcript variant 1, 2896 NM_002087 mRNA glucocorticoid receptor DNA binding 2909 NM_004491 factor I GRLF1 A:04681 chemokine (C-X-C motif) ligand 1 2919 NM_ (melanoma growth stimulating A---- activity, alpha) (CXCL1),rmRNA -_--_----_-----_ A:07763 gastrin-releasing peptide receptor 2925 NM_005314 (GRPR), mRNA B:9294 glycogen synthase kinase 3 beta 2932 NM_002093 (G_ SK3B),. mRNA --------

---

A:07312 G1 to S phase transition 1 2935 NM 002094 (GSPT1), mRNA A:09859 mutS homolog 6 (E. coli) (MSH6), 2956 NM_000179 mRNA A:04525 general transcription factor 1iH, 2965 NM_005316 polypeptide 1 (62kD subunit) (GTF2H1), mRNA B:9176 hepatoma-derived growth factor 3068 NM_004494 (high-mobility group protein 1-like) (HDGF, mRNA B:8961 hepatocyte growth factor 3082 NM_001010932 (hepapoietin A; scatter factor) S (HGF). transcript variant 3, mRNA A:05880 hematopoletically expressed 3090 NM_002729 homeobox(HHEX), mRNA A:05673 hexokinase 2 (HK2), mRNA 3099 NM 000189 A:10377 high-mobility group box 1 (HMGB1), 3146 NM 002128 mRNA A:07252 solute carrier family 29 (nucleoside 3177 NM_001532 transporters), member 2 (.LC29A2), mRNA A:04416 heterogeneous nuclear 3191 NM_001533 ribonucleoprotein L (HNRPL), transcript variant 1, mRNA C:1926_ | homeo box C10 (HOXC10), mRNA 3226 NM 017409 A:08912 homeo box D13 (HOXD13), mRNA 3239 NM_000523 A:05637 v-Ha-ras Harvey rat sarcoma viral 3265 NM_005343 oncogene homolog (HRAS), transcript variant 1, mRNA ----- 3---

-

A:08143 heat shock 70kDa protein 1A 3304 NM_005345 (HSPAIA), mRNA 36 WO 2009/045115 PCT/NZ2008/000260 05469 heat shock 70kDa protein 2 3306 NM_021979 ----- (HSPA2), mRNA ~ A:09246 5-hydroxytryptamine serotoninn) 3350 NM_000524 receptor 1A (HTRIA), mRNA A:07300 HUS1 checkpoint homolog (S. 3364 NM004507 pombe) (HUS1), mRNA B:7639 interferon, gamma-inducible protein 3428 NM 005531 16 IF116 A:04388 interferon, beta 1, fibroblast 3456 NMO02.76 (IFNB1), mRNA ~ A:02473 interferon, omega 1 (IFNW1). 3467 NM_002177 mRNA -:22 --- -- _ _ _ _ _ _ _[ -_--_----- B:5220 insulin-like growth factor 1 3479 NM 000618

-------

(somatomedin C) IGF1 -_- ----- C:0361 insulin-like growth factor 1 receptor 3480 NM_000875 IGFIR B-5688 insulin-like growth factor 2 3481 NM 000612 (somatomedin A) (IGF2), mRNA - - ------- A:09232 insulin-like growth factor binding 3487 NM 001552 protein 4 (IGFBP4, mRNA A:02232 insulin-like growth factor binding 3489 NM_002178 - -------- protein 6 (IGFBP6), mRNA . . -------- ----- A:03385 insulin-like growth factor binding 3490 NM 001553 n protei7 (IGFBP7), mRNA 8:8268 cysteine-rich, angiogenic inducer, 3491 NM 001554 61 CYR61 - C:2817 immunoglobulin mu binding protein 3508 NM 002180 ....... 2 (IGHMBP2), mRNA A:07761 interleukin 1, alpha iliaA). mRNA 3552 NM 000575 A08500 interleukin 1, beta (IL1B), mRNA 3553 - NM 000576 A:02668 interleukin 2 (IL2), mRNA 3558 NM 000586 A:03791 interleukin 2 receptor, alpha 3559 NM 000417 ---- - --- (IL2RA), mRNA B:4721 interleukin 2 receptor, gamma 3561 NM 000206 (severe combined immunodeficiency) (IL2RG), mRNA A:09679 interleukin 3 (colony-stimulating 3562 - NM_000588 factor, multiple) (L3), mRNA A:05115 interleukin 4 (1L4), transcript variant 3565 NM_000589 1, mRNA A:04767 interleukin 5 (colony-stimulating 3567 NM_000879 factor, eosinophil) (IL5), mRNA A:00154 interleukin 5 receptor, alpha 3568 NM 000564 (IL5RA), transcript variant 1, mRNA A:00705 interleukin 6 (interferon, beta 2) 3569 NM_000600 (11L6), mRNA B:6258 interleukin 6 receptor (IL6R), 3570 NM000565 transcript variant 1, mRNA A:04305 interleukin 7 (L7), mRNA 3574 NM 000880 A:06269 interleukin 8 (1L8), mRNA 3576 NM 000584 A:10396 interleukin 9 (IL9), mRNA 3578 NM 000590 B:9037 - interleukin 8 recepto, beta (IL8RB), 3579 NM_00155-7 mRNA A"07447 interleukin 9 receptor (IL9R), 3581 NM_002186 transcript variant 1, mRNA A:07424 interleukin 10 (IL10), mRNA 3586 NM 000572 C:2709 interleukin 11 (IL11), mRNA _ 3589 NM 000641 37 WO 2009/045115 PCT/NZ2008/000260 A:02631 interleukin 12A (natural killer cell 3592 NM 000882 stimulatory factor 1, cytotoxic lymphocyte maturation factor 1, p35) (IL12A),.mRNA A:01248 interleukin 12B (natural killer cell 3593 NM 002187 stimulatory factor 2, cytotoxic lymphocyte maturation factor 2, p40)_(!L12B), mRNA _ A:02885 interleukin 12 receptor, beta 1 3594 NM_00555 (IL12RBI1), transcript variant 1, m RNA ---.. _ B:4956 interleukin 12 receptor beta 2 3595 NM_001559 (IL12RB2), mRNA C:2230 interleukin 13 (IL13), mRNA 3596 NM 002188 A:02144 interleukin 13 receptor, alpha 2 3599 NM 000640 I(IL13RA2), mRNA A:05823 interleukin 15 (IL15), transcript 3600 NM 000585 _ variant 3, mRNA A:05507 interleukin 15 receptor, alpha 3601 NM_002189 (IL1 5RA), transcript variant 1, mRNA A:09902 tumour necrosis factor receptor 3604 NM_001561 superfamily, member 9 (TNFRSF9), mRNA A:01751 interleukin 18 (interferon-gamma- 3606 NM 001562 inducing factor) (IL18), mRNA_ B:1174 interleukin enhancer binding factor 3609 NM 012218 3, 90kDa (ILF3), transcript variant 1, mRNA A:06560 integrin-linked kinase (ILK), 3611 NM 004517 transcript variant 1, mRNA - _--------- A:04679 inner centromere protein antigens 3619 NM_020238 135/155kDa (INCENP), mRNA B:8330 inhibitor of growth family, member 1 3621 NM005537 (INGI), transcript variant 4, mRNA A:05295 inhibin, alpha (INHA), mRNA 3623 NM_002191 A:02189 inhibin, beta A (activin A, activin AB 3624 NM_002192 alpha polypeptide (INHBA), mRNA B:4601 chemokine (C-X-C motif) ligand 10 3627 NM 001565 _jCXCL10), mRNA ---------- ---------- B:3728 insulin induced gene I (INSIGI), 3638 NM 005542 I transcript variant 1, mRNA A:08018 insulin-like 4 (placenta) (INSL4), 3641 NM 002195 mRNA A:02981 interferon regulatory factor I (IRFI), 3659 NM_002198 mRNA ___ -- N 1 - A:00655 interferon regulatory factor 2 (IRF2), 3660 NM_002199 mRNA B:4265 interferon stimulated exonuclease 3669 NM_002201 ----- gene 20kDa (SG20), m RNA - ---------- C:0395 jagged 2 (JAG2), transcript variant 3714 NM 002226 1, mRNA A:05470 Janus kinase 2 (a protein tyrosine 3717 NM_004972 kinase) (JAK2), mRNA 3 NM_002228 A:04848 v-jun sarcoma virus 17 oncogene 3725 NM_002228 homolog (avian) (JUN), mRNA A:0873O jun B proto-oncogene (JUNB), 37262229 mRNA 38 WO 2009/045115 PCT/NZ2008/000260 A:06684 kinesin family member 11 (KIF11), 3832 NM_004523 mRNA N B:4887 kinesin family member C1 (KIFC1) 3833 N 2263 A:0390 mRNA_______ A02390 kinesin family member 22 (KIF22), 3835 NM 007317 mRNA ~ B:4036 karyopherin alpha 2 (RAG cohort 1, 3838 NM_002266 mportin alpha 1) (KPNA2), mRNA B:8230 v-Ki-ras2 Kirsten rat sarcoma viral 3845 NM 004985 oncogene homolog (KRAS), ---- _ transcript variant b, mRNA A:08264 keratin 16 (focal non-epiderrnolytic 3868 NM 005557 palmoplantar keratoderma) (KRT16), mRNA B:6112 lymphocyte-specific protein tyrosine 3932 NM_005356 kinase (LCK), mRNA A:02572 leukaemia inhibitory factor 3976 NM_002309 (cholinergic differentiation factor) ________(LIF), mRNA ____ ___I ___ ___ __ A:02207 ligase 1, DNA, ATP-dependent 3978 NM 000234 (LIGI), mRNA A:08891 ligase Ill, DNA, ATP dependent 8390 NM 013975 (LIG3), nuclear gene encoding mitochondrial protein, transcript ..... 9 v ___Lariant alpha, mRNA 38 N s05297 se IV DNA A P-dependent 3206937 8 ....... (LIG4), m RNA B:8631 LIM domain only 1 (rhomboin1) 4004 NM 00231 ( .LMO1), mRNA .. *_- A:00504 LIM domain containing preferred 4029 NM 005578 translocation partner in lipoma A:00504 (LPP) mRNANM057 A:0054 LIM domain containing preferred 4030 N 005578 translocation partner in lipoma 2:077 (PP)mRNA 43 Bl0707 ow density ipoprotein-related 4035 NM_002332 protein 1 (alpha-2-macroglobulin receptor) (LRP1), mRNA A:09461 low density lipoprotein receptor- 4041 NM_002335 ------- related protein 5 (LRP5),NA _ J03776 low density lipoprotein receptor- 4043 NM_002337 related protein associated protein I (LRPAP1 mRNA B:7687 latent transforming growth4053 NM000428 beta binding protein 2 (LTBP2), mRNA v-yes-1 Yamaguchi sarcoma viral 4067 NM 0 02350 related oncogene homolog (LYN), mRNA A:10613 tumour-associated calcium signal { 4070 NM 002353 transducer 2 (TACSTD2), mRNA A:03716 MAX dimerization protein I (MXD1), 4084 NM 002357 mRNA A:06387 MAD2 mitotic arrest deficient-like 1 4085 NM_002358 (yeast) (MAD2L1), mRNA 2:5699 v-maf musculoaponeurotic 4O97 NM 002359 fibrosarcoma oncogene homolog G (avian) (MAFG), transcript variant 1. 39 WO 2009/045115 PCT/NZ2008/000260 mRNA :03848 MA51 oncogene (MAS1), mRNA 4142 NM 002377 B:9275 megakaryocyte-associated tyrosine 4145 NM_139355 kinase (MATK), transcript variant 1, mRNA B:4426 mutated in colorectal cancers 4163 NM_002387 I___ MCC), mRNA ____j ___ ____ A08834 MCM2 minichromosome 4171 NM _004526 maintenance deficient 2, mitotin (S. __ cerevisiae) (MCM2), mRNA A:08668 MCM3 minichromosome 4172 NM 002388 maintenance deficient 3 (S. cerevisiae) (MCM3), mRNA :7581 MCM4 minichromosome 4173 NM005914 maintenance deficient 4 (S. cerevisiae) (MCM4), transcript variant 1. mRNA B37805 MCM5 minichromosome 4174 NM_006739 maintenance deficient 5, cell division cycle 46 (S. cerevisiae) ------ (MCM5),-mRNA - -- - B:8147 MCM6 minichromosome 4175 NM 005915 maintenance deficient 6 (MISS homolog, S. pombe) (S. cerevisiae) ---- .(MCM6), mRNA----- B:7620 MCM7 minichromosome 4176_ maintenance deficient 7 (S. cerevisiae) MCM7 ...... ._-------- :4650 midkine (neurite growth-promoting 4192 NM 001012334 factor 2) (MDK), transcript variant 1, __ _ mRNA 8:8649 Mdm2, transformed 3T3 cell double 4193 NM 006878 minute 2, p53 binding protein (mouse) (MDM2), transcript variant ___ ~ MDM2a,rmRNA ______ A:03964 Mdm4, transformed 3T3 cell double 4194 NM_002393 minute 4, p53 binding protein (mouse) (MDM4),mRNA :10600 RAB8A, member RA5 oncogene 4218 NM 005370 family (RA8A, mRNA B:8222 met proto-oncogene (hepatocyte 4233 NM 000245 growth factor receptor) MET ~ A:09470 KlT ligand (KITLG), transcript 4254 NM 000899 variant b, mRNA A:01575 0-6-methylguanine-DNA 4255 NM 002412 ___ m ethyltransferase (M GMT), m RNA ------ - _- _----- A:10388 antigen identified by monoclonal 4288 NM_002417 antibody Ki-67 (MKl67), mRNA A:06073 mutL homolog 1, colon cancer, 4292 NM_000249 nonpolyposis type 2 (E. coli) .7 2 . . (MLHI), mRNA -------- ----- -------- B:7492 myeloid/lymphoid or mixed-lineage 4303 NM_005938 leukaemia (trithorax homolog, Drosophila); translocated to, 7 ______ *(MLLT7 ), mRNA________ ___ A:09644 me ingioma (disrupted in balanced 4330 NM 002430 translocation) 1 (MNI), mRNA ~A:08968 menage a trois I (CAK assembly 4331 NM_002431 ------ -Ifactor) (MNATI) m RNA - ------- . . ...- 40 WO 2009/045115 PCT/NZ2008/000260 A:02100 MAX binding protein (MNT), mRNA 4335 NM 020310 A:02282 v-mos Moloney urine sarcoma 4342 NM 005372 viral oncogene homolog (MOS), mRNA

-_-_------

A:06141 myeloproliferative leukaemia virus 4352 NM_005373 oncogene (MPL), mRNA A:04072 MREI I meiotic recombination 11 4361 NM 005591 homolog A (S. cerevisiae) (MREI 1A), transcript variant 1, mRNA A:04072 MRE1 I meiotic recorbination i 4362 NM 005591 homolog A (S. cerevisiae) (MRE1 1A), transcript variant 1, i __ rmRNA

-__----

A:04514. mutS homolog 2 colon cancer, 4436 NM_00025. nonpolyposis type 1 (E. coli) .(MSH2), mRNA - - --- -- - - K06785 mutS homolog 3 (E. coli) (MSH3), 4437 NM 002439 mRNA A:02756 mutS homolog 4 (E. coi)(MSH4), 4438 NM 002440 mRNA - - _---- A09339 mutS homolog 5 (E. coli) (MSH5), 4439 NM 025259 transcript variant I mRNA A:04591 macrophage stimulating 1 receptor 1 4486 NM-002447 (c-met-related tyrosine kinase) T1R), mRNA A:05992 metallothionein 3 (growth inhibitory - 4504 NM 005954 factor neurotrophicc)) (MT3) mRNA C:2393 mature T-cell proliferation 1 4515 NM 014221 (MTCP1), nuclear gene encoding mitochondrial protein, transcript variant B1, mRNA A:01898 mutY homolog (E. coli) (MUTYH), 4595 NM 012222 ____mRNA{ A:10478 MAX interactor 1 (MXI1), transcript 4601 NM 005962 I variant 1, mRNA B:5181 v-myb myeloblastosis viral 4602 NM0_05375 oncogene homolog(avian)MYB B 5429 v-myb myeloblastosis viral 4603 XM034274 oncogene homolog (avian)-like I XM 933460 (MYBLI), rnRNA XM 938064 A:06037 v-myb myeloblastosis viral- 4605 NM_002466 oncogene homolog (avian)-like 2 (MYBL2), mRNA A:02498 v-myc myelocytomatosis viral 4609 N 2 oncogene homolog (avian) (MYC), mRNA C2723 myosin, heavy polypeptide 10, non- 4628 NM-005964 muscle (MYH10) rnRNA B:4239 NGFI-A binding protein 2 (EGRi 4665 NM 005967 binding protein 2) (NAB2), mRNA 8:1584 nucleosome assembly protein 1-like 4673 NM_139207 I (NAPI LI), transcript variant 1, mRNA A:09960 neuroblastoma suppression of 4681 NM_182744 tumourigenicity I (NBLI), transcript variant 1, mRNA A:02361 nucleotide binding protein I (MinD 4682 NM_002484 homolog, E. coi)(NUBPI), mRNA ..----- 41 WO 2009/045115 PCT/NZ2008/000260 A:10519 nibrin (NBN), transcript variant 1, 4683 NM 002485 mRNA A:08868 NCK adaptor protein 1 (NCK1), 4690 NM_006153 mRNA A:07320 necdin homolog (mouse) (NUN), 4692 NM_002487 mRNA E:5481 Norrie disease (pseudoglioma) -4693 NM000266 847 1 (NDP), mRNA B:4761 ---- septin 2 (SEPT2), transcript variant 4735 NM004404 S4, mRNA A:04128 neural precursor cell expressed, 4739 NM_006403 developmentally down-regulated 9 (NEDD9), transcript variant, mRNA B:7542 NIMA (never in mitosis gene a)- 4750 NM 012224 related kinase 1 (NEKI), mRNA :00847 NIMA (never in mitosis gene a)- 4751 NM 002497 related kinase 2 (NEK2), mRNA B:7555 NIMA (never in mitosis gene a)- 4752 NM 002498 related kinase 3 (NEK3), transcript variant 1, mRNA l3:9751 neurofibromin 1 (neurofibromatosis, 4763 NM 000267 von Recklinghausen disease, Watson disease) (NF1), mRNA 8:7527 neurofibromin 2 (bilateral acoustic 4771 NM_181825 neuroma) (NF2), transcript variant L __ _12, mRNA B:8431 nuclear factor I/A (NFIA), mRNA 4774 NM 005595 A:03729 nuclear factor I/B (NFIB), mRNA 4781 NM 005596 B:48 nuclear factor i/C (CCAAT.-binding 4782 NM 005597 transcription factor) (NFIC), ------- transcript variant 1, mRNA --- - ----- - - C:5826 nuclear factor I/X (CCAAT-binding 4784 NM 002501 transcription factor) (NFIX), mRNA ----- -_-_-- B:5078 nuclear transcription factor Y, 4802 NM_014223 A.. gamma NFYC A:0462 NHP2 non-histone chromosome 4809 NM 0 protein 2-like I (S. cerevisiae) (NHP2LI), transcript variant 1, mRNA A:01677 non-metastatic cells 1, protein (NM23A) expressed in (NMEI), _ transcript variant 2, mRNA A:04306 non-metastatic cells 2, protein 831 NM_002512 (NM23B) expressed in (NME2), transcript variant 1, mRNA C:1522 nucleolar protein 1, 12OkDa 4839 (NOLI), transcript variant 2, mRNA A06565 neuropeptide Y (NPY), mRNA 4852 NM 000905 A:00579 Notch homolog 2 (Drosophila) 4853 NM_024408 (NOTCH2), mRNA A:02787 neuroblastoma RAS viral (v-ras) 1 4893 NM 002524 oncogene homolog (NRAS), mRNA B:6139 nuclear mitotic apparatus protein 1 4926 NM 006185 A04 (NUMA1), mRNA 0618 32 opioid receptor, mu I (OPRM 4988 NMIT) transcript variant MOR-1, mRNA ___ A:02654 origin recognition complex, subunit 4998 NM_004153 1-like (yeast) (ORCIL), mRNA 0041.5 42 WO 2009/045115 PCT/NZ2008/000260 A:01697 origin recognition compix, subunit 4999 NM 006190 2-like (yeast) (ORC2L), mRNA A:06724 origin recognition complex, subunit 5000 NM 002552 4-like (yeast) (ORC4L), transcript variant 2, mRNA C origin recognition complex, subunit 5001 NM 181747 5-like (yeast) (ORC5L), transcript variant 2, mRNA A:09399 oncostatin M (OSM), mRNA 5008 NM 020530 A:07058 proliferation-associated 2G4, 38kDa 5036 NM_006191 __(PA2g4,mN________ A:04710 platelet-activating factor 5048 NM 000430 acetylhydrolase, isoform lb, alpha subunit 45kDa (PAFAH1BI), mRNA A:03397 peroxiredoxin 1 (PRDXI), transcript 5052 NM_002574 variant 1, mRNA B:4727 regenerating islet-derived 3 alpha 5068 NM_002580 (REG3A), transcript variant 1, mRNA A:03215 PRKC, apoptosis, WTI, regulator 5074 NM_002583 -(PAWR), mRNA ---------- A:03715 proliferating cell nuclear antigen 5111 NM 002592 (PCNA), transcript variant 1, mRNA A:09486 PCTAIRE protein kinase 1 5127 NM 006201 (PCTKI), transcript variant 1, mnRNA ______ A:09486 PCTAIRE protein kinase 1 5128 NM_006201 (PCTKI), transcript variant 1, mRNA C:2666 platelet-derived growth factor alpha 5154 NM_002607 polypeptide (PDGFA), transcript variant 1, mRNA 8:7519 platelet-derived growth factor beta 5155 NM_002608 polypeptide (simian sarcoma viral (v-sis) oncogene homolog) (PDGFB), transcript variant 1, 1 mRNA A:02349 platelet-derived growth factor 5156 NM 006206 receptor, alpha polypeptide S(PDGFRA), mRNA A:00876 PDZ domain containing I (PDZKI), 5174 NM 002614 mRNA A:04139 serpin peptidase inhibitor, clade F 5176 NM 002615 (alpha-2 antiplasmin, pigment epithelium derived factor), member 1 (SERPINFI), transcript variant 4, mRNA 1B:4669 prefoldin I (PFDN1), mRNA 5201 NM 002622 A 00156 placental growth factor, vascular 5228 NM 002632 endothelial growth factor-related p r o t e i n ( P G F ) , m R N A 7_-- -- B:9242 phosphoinositide-3-kinase, 5291 NM_006219 catalytic, beta polypeptide ____ PIK3CB), mRNA ______ ___ A:09957 protein (peptidyl-prolyl cis/trans 5300 NM 006221 isomerase) NIMA-interacting I (N1), mRNA A:00888 pleiomorphic adenoma gene-like 1 5325 NM 006718 (PLAGLI), transcript variant 2, ---- - -- 43 WO 2009/045115 PCT/NZ2008/000260 mRNA A:08398 plasminogen (PLG), mRNA 5340 NM 000301 B:3744 polo-like kinase I (Drosophila) 5347 NM 005030 (PLK1), mRNA

---------

B:4722 peripheral myelin protein 22 5376 NM_000304 (PMP22), transcript variant 1, mRNA A: 10286 PMSI postmeiotic segregation 5378 NM 000534 increased I (S. cerevisiae) (PMSI), mRNA A: 10286 PMS1 postmeiotic segregation 5379 NM _000534 increased 1 (S. cerevisiae) (PMSI), m RNA --- --- -- -- B:9336 postmeiotic segregation increased 5380 NM 002679 2-like 2 (PMS2L2 , mRNA B:9336 postmeiotic segregation increased 5382 NM_002679 2-like 2 (PMS2L2), mRNA A:10467 postmeiotic segregation increased 5383 NM_174930 2-like 5 (PMS2L5), mRNA A:10467 postmeiotic segregation increased 5386 NM_174930 2-like 5 (PMS2L5), mRNA A:02096 PMS2 postmeiotic segregation 5395 NM_000535 increased 2 (S. cerevisiae) (PMS2), transcript variant 1, mRNA B:0731 septin 5 (SEPT5), transcript variant 5413 NM_002688 1, mRNA A:09062 septin 4 (SEPT4), transcript variant 5414 NM_004574 1, mRNA A:05543 polymerase (DNA directed), alpha 5422 NM_016937 (POLA), mRNA A:02852 polymerase (DNA directed), beta 5423 NM_002690 (POLB), mRNA A:09477 polymerase (DNA directed), delta 1, 5424 NM_002691 catalytic subunit 125kDa (POLDI), mRNA A:02929 polymerase (DNA directed), delta 2, 5425 NM_006230 regulatory subunit 50kDa (POLD2), mRNA 8:3196 polymerase (DNA directed), epsilon 5426 NM_006231 POLE A:04680 polymerase (DNA directed), epsilon 5427 NM_002692 2 (p59 subunit) (POLE2), mRNA ------------- A:08572 polymerase (DNA directed), gamma 5428 NM_002693 (POLG), mRNA A:08948 polymerase (RNA) mitochondrial 5442 NM_005035 (DNA directed) (POLRMT), nuclear gene encoding mitochondrial protein, m RNA - -_---------------- A:00480 POU domain, class 1, transcription 5449 NM 000306 factor 1 (PitI, growth hormone factor 1) (POUIFI), mRNA C:6960 peroxisome proliferative activated 5467 NM_006238 receptor, delta (PPARD), transcript variant 1, mRNA 8:0695 PPAR binding protein (PPARBP), 5469 NM 004774 mRNA A:10622 pro-platelet basic protein 5473 NM_002704 (chemokine (C-X-C motif) ligand 7) ____ (PPBP)_ mRNA - -__--_-------__- 44 WO 2009/045115 PCT/NZ2008/000260 A:08431 protein phosphatase IG (formerly '5496 NM_177983 2C), magnesium-dependent, gamma isoform (PPM1G), transcript variant 1, mRNA -M_0 270 A:05348 protein phosphatase 1, catalytic 5499 NM_002708 subunit, alpha isoform (PPP1CA), transcript variant I, mRNA --- _-_- B:0943 protein phosphatase 1, catalytic 5500 NM_00270I subunit, beta isoform (PPP1CB), transcript variant 1, mRNA5501 -- A:02064 protein phosphatase 1, catalytic 5501002710 subunit, gamma isoform (PPPICC), mRNA A01231 protein phosphatase 2 (formerly 5515 NM 002715 2A), catalytic subunit, alpha isoform (PPP2gCA), mR NA A:0382 protein phosphatase 2 (formerly 5518 014225 rA:0382 (PPPCA'mRN 2 M042 2A), regulatory subunit A (PR 65) alpha isoform (PPP2R1A), mRNA

A:

0 1064 protein phosphatase 2 (formerly 5519 NM 002716 2A), regulatory subunit A (PR 65), beta isoform (PPP2R1 B), transcript variant 1, mRNA A:00874 protein phosphatase 2 (formerly 2A), regulatory subunit B", alpha (PPP2R3A), transcript variant 1, mRNA A:07683 protein phosphatase 3 (formerly 5532 NM 021132 2B), catalytic subunit, beta isoform (calcineurin A beta) (PPP3CB), m RNA 5 -------- A:00032 protein phosphatase 5, catalytic 5536 06247 subunit (PPP5C), mRNA ---- - :02880 protein phosphatase 6, catalytic 5537 NM_002721 subunit (PPP6C), mRNA_ A:07833 primase, polypeptide 1, 49kDa 5557 NM_000946 (PRIM1), mRNA A:08706 primase, polypeptide 2A, 58kDa 5558 NM_000947 PRIM2A -- M -027-4 protein kinase, cAMP-dependent, regulatory, type 1, alpha (tissue specific extinguisher 1) (PRKARIA), transcript variant 1, mRNA - ---- A:07305 protein kinase, cAMP-dependent, 5578 NM_002736 regulatory, type 11, beta (PRKAR2B), mRNA A:08970 protein kinase D1 (PRKD1), mRNA 5587 NM_ 002742 A:05228 protein kinase, cGMP-dependent, 5593 NM_006259 type 11 (PRKG2), mRNA B:6263 mitogen-activated protein kinase 1 5594 NM_002745 (MAPK1), transcript variant 1, (MRA3,mN ___B___ 4 _ __M0 4 mitogen-activated protein B:908 MPK3), mRNA -- 559 N_002747 088 ~mitogen-acivated protein kinase 4 59 (MAPK4), RNA A0 4 mitogen-activa4ed protein kinase 6 5597 45 WO 2009/045115 PCT/NZ2008/000260 A-09951 mitogen-activated protein kinase 7 5598 NM 139033 (MAPK7), transcript variant 1, mRNA A:00932 mitogen-activated protein kinase 13 5603M002754 (MAPK13), mRNA

-------

A:06747 mitogen-activated protein kinase 6 5608 NM 002758 (MAP2K6), transcript variant 1, mRNA B:4014 mitogen-activated protein kinase 7 5609 NM_145185 MAP2K7 B:1372 eukaryotic translation initiation 5610 NM_002759 factor 2-alpha kinase 2 (EIF2AK2), mRNA B:5991 protein-kinase, interferon-inducible 5612 NM 004705 double stranded RNA dependent inhibitor, repressor of (P58 A: ---- -repressor) (PRKRIR), mRNA A:03959 prolactin (PRL), mRNA 5617 NM 000948 A:09385 protamine 1 (PRM1), mRNA 5619 NM 002761 A:02848 protamine 2 (PRM2), mRNA 5620 NM 002762 A:07907 kallikrein 10 (KLK1 0), transcript 5655 NM_002776 variant 1, mRNA A:01338 proteinase 3 (serine proteinase. 5657 NM_002777 neutrophil, Wegener granulomatosis autoantigen) (PRTN3), mRNA B:4949 presenilin 1 (Alzheimer disease 3) 5663 NM_000021 PSEN1 A:00037 presenilin 2 (Alzheimer disease 4) 5664 NM_000447 (PSEN2), transcript variant 1, mRNA A:05430 peptide YY (PYY), mRNA 5697 NM_004160 A:05083 proteasome (prosome, macropain) 5714 NM_002812 26S subunit, non-ATPase, 8 (PSMD8). mRNA A:10847 patched homolog (Drosophila) 5727 NM_000264 S (PTCH), mRNA__ A:04029 phosphatase and tensin homolog 5728 NM_000314 (mutated in multiple advanced cancers 1) (PTEN , m RNA ._....._-_-

----------

A:08708 parathyroid hormone-like hormone 5744 NM_002820 (PTHLH), transcript variant 2, mRNA B:4775 prothymosin, alpha (gene sequence 5757 NM 002823 28) (PTMA), mRNA A:05250 parathymosin (PTMS), mRNA 5763 NM 002824 C:2316 pleiotrophin (heparin binding growth 5764 NM_002825 factor 8, neurite growth-promoting factor 1) (PTN), mRNA -- C:2627 quiescin Q6 (QSCN6), transcript 5768 NM_002826 variant 1, mRNA A:10310 protein tyrosine phosphatase, non- 5777 NM 080548 receptor type 6 (PTPN6), transcript variant 2, mRNA A:02619 RAD1 homolog (S. pombe) (RAD), 5810 NM_002853 transcript variant 1, mRNA C:2196 purine-rich element binding protein 5813 NM_005859 A (PURA), mRNA 46 WO 2009/045115 PCT/NZ2008/000260 B 1151 ras-related C3 botulinum toxin 5879 NM_018890 substrate I (rho family, small GTP binding protein Rac1) (RAC1), transcript variant Raci b, mRNA A:05292 RAD9 homolog A (S. pombe) 5883 NM_004584 (RAD9A), mRNA A:10635 RAD17 homolog (S. pombe) 5884 NM_002873 (RAD17), transcript variant 8, mRNA ____ ________ A:07580 AD21 homolog (S. pombe) 5885 NM 006265 A:07819 (RALJI.1 mRnNA ._ __ ., A:07819 RAD5 homolog (RecA homolog E 5888 NM 002875 coli) (S. cerevisiae) (RAD51), ------ transcript variant 1, mRNA.-- . A:09744 RAD51-like 1 (S. cerevisiae) 5890 NM_002877 (RAD51ILI), transcript variant 1, .~mRNA ____ ______ E 0346 RAD51-ike 3 (S. cerevisiae) 5892 NM_002878, RAD51L3 NM 133629 B:1043 RAD52 homolog (S. cerevisiae) 5893 NM_134424 (RAD52), transcript variant beta, mRNA C:2457 v-raf-I murine leukaemia viral 5894 NM_002880 oncogene homolog 1 (RAFI), mRNA B:8341 ral guanine nucleotide dissociation 5900 NM_001042368, stimulator RALGDS NM 006266 A:09169 RAN, member RAS oncogene 5901 NM 006325 family (RAN), mRNA C:0082 RAP1A, member of RAS oncogene 5906 NM_001010935, family RAP1A NM 002884 A:00423 RAP1 B, member of RAS oncogene 5908 NM 015646 family (RAPI B), transcript variant 1, mRNA A:09690 retinoic acid receptor responder 5918 NM_002888 (tazarotene induced) 1 (RARRES1), transcript variant 2, mRNA A:08045 retinoic acid receptor responder 5920 NM_004585 (tazarotene induced) 3 (RARRES3), mRNA 8:9011 retinoblastoma 1 (including 5925 NM 000321 osteosarcoma) (RB1), mRNA A:04888 retinoblastoma binding protein 4 5928 NM_005610 (RBBP4), mRNA C:2267 retinoblastoma binding protein 6 5930 NM_006910 (RBBP6), transcript variant 1, - mRNA A:06741 retinoblastoma binding protein 5931 NM_002893 _ (RBBP7), mRNA A:09145 retinoblastoma binding protein 8 5932 NM_002894 (RBBP8), transcript variant 1, mRNA A: 10222 retinoblastoma-like 1 (p107) 5933 NM_002895 (RBIL1), transcript variant 1, mRNA A:08246 retinoblastoma-like 2 (p130) 5934 NM_005611 (RBL2), mRNA B:995 RNA binding motif, single stranded.5937 NM_016836 .------- - interacting protein 1 (RBMS1 )-- - .---- 47 WO 2009/045115 PCT/NZ2008/000260 _ _ transcript variant 1, mRNA

----

B:1393 regenerating islet-derived I alpha 5967 NM 002909 (pancreatic stone protein, pancreatic thread protein) (REGIA), mRNA :741islet-derived 1 beta 5968 NM006507 (pancreatic stone protein, ~ pancreatic thread protein) (REGI B), rmRNA 13:4741 regenerating islet-derived 1 beta 5969 NM_006507 (pancreatic stone protein, pancreatic thread protein) (REG1 B), rnRNA A:04164 REV3-like, catalytic subunit of DNA 5580 NM_002912 polymerase zeta (yeast) (REV3L), mRNA A:03348 replication factor C (activator 1) 1, 5981 NM_002913 145kDa (RFC1), mRNA A:06693 replication factor C (activator 1) 2, 5982 NM_181471 40kDa (RFC2), transcript variant 1, rnRNA A:02491 replication factorC (activator 1) 3, 5983 NM _02915 38kDa (RFC3), transcript variant 1. mRNA A.09921 replication factor C (activator 1) 4, 5984 N 002916 37kDa (RFC4), transcript variant 1, rnRNA B:3726 replication factor C (activator 1) 5, 5985 NM_007370 36kDa (RFC5), transcript variant 1, A:04896 ret finger protein (RFP), transcript 5987 NM 006510 variant alpha, mRNA 04971 regulator of G-protein signalling 2, 5997 NM 002923 24kDa (RGS2), mRNA . . B:8684 relaxin 2 (RLN2), transcript variant 6024 M005059 2, mRNA

-_--

A:10597 replication protein Al, 70kDa 6117 NM 002945 (RPAI), mRNA ~ replication protein A2, 32kDa 6118 NM_002946 (RPA2), rRNA A:00231 replication protein A3, 14kDa 6119 NM_002947 (RA3), mRNA B:8856 ribosomal protein S4, X-linked 6191 NM_001007 (RPS4X), mRNA ... __ 8 ribosomal protein S4, X-linked 6192 NM 001007 (RPS4X), mRNA A:10444 ribosomal protein S6 kinase, 6199 NM 003952 70kDa, polypeptide 2 (RPS6KB2), transcriptyvaiat , rnRNA A:02188 ribosomal protein S25 (RPS25), 6232 NM_001028 m RNA ---- 2 A:08509 related RAS vi(r-ras) oncogene 6237 NM_006270 homolog (RRAS), mRNA A:09802 ribonucleotide reductase M1 6240 NM 001O33 polypeptide(RRM), mRNA_ B:3501 ribonucleotide reductase M2 6241 NM_001034 polypeptide (RRM2), rRNA ~~B35012) SIRANARN A:08332 S100 calcium binding protein A5 6276 NM_002962 (S100A5, rnRNA 48 WO 2009/045115 PCT/NZ2008/000260 C:1129 S100 calcium binding protein A6 6277 NM014624 (caicyclin) (SI00A6), mRNA 8:3690 SI 00 calcium binding protein Al1 6282 NM_005620 (calgizzarin) (S00A11), mRNA A:08910 S100 calcium binding protein, beta 6285 NM 006272 (neural) (S100B), mRNA ..... .__....... A:05458 mitogen-activated protein kinase 12 6300 NM 002969 (MAPK12), rnRNA _ - -----.. . ...... A:07786 .. tetraspanin 31 (TSPAN31), mRNA 6302 NM_005981 A:09884 C-type lectin domain family 11, 6320 NM_002975 member A (CLECIIA), mRNA A:00985 chemokine (C-C motif) ligand 3 6348 NM 002983 S(CCL3, mRNA A:00985 chemokine (C-C motif) ligand 3 6349 NM 002983 (CCL3), mRNA B:0899 chemokine (C-C motif) ligand 14 6358 NM_032962 (CCL14), transcript variant 2, mRNA B:0898 chemokine (C-C motif) ligand 23 6368 NM_145898 (CCL23), transcript variant CKbeta8. mRNA B:5275 chemokine (C-X-C motif) ligand 11 6374 NM_005409 (CXCLII1), mnRNA --------------- ____ C:2038 SET translocation (myeloid 6418 NM_003011 leukaemia-associated) (SET), mRNA A:00679 SHC (Src homology 2 domain 6464 NM 183001 containing) transforming protein 1 (SHC1), transcript variant 1, mRNA 8:9295 SCL/TAL1 interrupting locus (STIL), 6491 NM_003035 m RNA -- - - -- - N 3 8-7410 signal-induced proidferation- 6494 NM_1532538 associated gene 1 (SIPAI), transcript variant 1, mRNA C:5435 S-phase kinase-associated protein 6502 NM_005983 2 (p 4 5) (SKP2), transcript variant 1, mRNA A:09017 signaling lymphocytic activation 6504 NM 003037 molecule family member 1 (SLAMFI), mRNA A:06456 solute carrier family 12 6560 NM_005072 (potassium/chloride transporters), m em ber 4 (SLC12A4), m RNA --- _ _ _ _ _ A:05730 SWI/SNF related, matrix 6598 NM_003073 associated, actin dependent regulator of chromatin, subfamily b, member 1 (SMARCB1), transcript variant 1, mRNA _____ A:07314 fascin homolog 1, actin-bundling 6624 NM_003088 protein (Strongylocentrotus purpuratus) (FSCN1), mRNA A:04540 sparc/osteonectin, cwcv and kazal- 6695 NM 6004598 like domains proteoglycan (testican) 1 (SPOCKI)mRNA A:09441 secreted phosphoprotein 1 6696 NM 000582 (osteopontin, bone sialoprotein I, early T-lymphocyte activation 1) (SPP1), mRNA 49 WO 2009/045115 PCT/NZ2008/000260 A:02264 v-src sarcoma (Schmidt-Ruppin A- 6714 NM_005417 2) viral oncogene homolog (avian) A(..... SRC), transcript variant 1, rnRNA A:04127 single-stranded DNA binding 6742 NM_0-3143 protein 1 (SSBPI), mRNA A:07245 signal sequence receptor, alpha 6745 NM 003144 (translocon-associated protein alpha) (SSRI), mRNA A:08350 somatostatin (SST), rnRNA 6__ 6750 - NM _01_048--- A:03956 somatostatin receptor 1 (SStR1) 6751 NM 001049 mRNA C: 1740 somatostatin receptor 2 (SSTR2) 6752 NM 001050 1 mRNA A:0482 somatostatin receptor 3 (SSTR3), 6753 NM_ mRNA A:01484 somatostatin receptor 5 (SSTR5), 6754 NM_001052 mRNA A:03398 signal transducer and activator of 6772 NM_007315 transcription 1, 91kDa (STATI), transcript variant alpha, mRNA A:05843 stromal interaction molecule 1 6786 NM_003156 A (STIMI), mRNA A:04562 NIMA (never in mitosis gene a)- 6787 NM 003157 _ - ------- related kinase4 (NEK4, mRNA A:04814 serine/threonine kinase 6 (STK6), 6790 NM 198433 ------ _ transcript variant 1, mRNA A:01764 aurora kinase C (AURKC), 6795 NM 003160 transcript variant 3, mRNA A: 10309 suppressor of variegation 3-9 6839 NM_003173 homolog I (Drosophila) (SUV39HI), mRNA -------- - ------- A:01895 synaptonemai complex protein 1 6847 NM_003176 SYCP1), mRNA A:09854 spleen tyrosine kinase (SYK), 6850 NM_003177 mRNA A:02589 transcriptional adaptor 2 (ADA2 6871 NM_001488 homolog; yeast)-like (TADA2L), -- transcript variant 1, mRNA A:01355 TAFI RNA polymerase 11, TATA 6872 NM_004606 box binding protein (TBP) associated factor, 250kDa (TAFI), transcript variant 1, mRNA---- - -- C: 1960 T-cel acute iymphocyt leukaemia 6886 NM_003189 - ------ _- 1 (TAL1), mRNA C:2789 transcription factor 3 (E2A 6930 NM 003200 immunoglobulin enhancer binding factors E12/E47) (TCF3), mRNA 4-_ _ B:4738 transcription factor 8 (represses 6935 NM 030751 interleukin 2 expression) (TCF8), m RNA _ __ _71_9 A:03967 transcription factor 19 (SCI) 6941 NM_007109 _(TCF19), mRNA -_--- - A:05964 telomerase-associated protein 1 - 7011 NM007110 (TEPI), mRNA ..... .... B:9167 telomeric repeat binding factor 7013 NM_003218 (NIMA-interacting) 1 (TERFI), 50 WO 2009/045115 PCT/NZ2008/000260 transcript variant 2, mRNA B:7401 telomeric repeat binding factor 2 7014 NM_005652 (TERF2), mRNA C:0355 telomerase reverse transcriptase 7015 NM 003219 (TERT), transcript variant 1, mRNA A:07625 transcription factor A, mitochondrial 7019 NM 003201 (TFAM), mRNA ~_ _ A:06784 nuclear receptor subfamily 2, group 7025 NM_005654 F, member 1 (NR2FI), mRNA A:06784 nuclear receptor subfamily 2, group 7027 NM 005654 F, rnember 1 (NR2F1), mRNA B:5016 transcription factor Dp-2 (E2F 7029 NM_006286 dimerization partner 2) (TFDP2), mRNA B:5851 transforming growth factor, alpha 7039 NM 003236 ___ (TGFA), rnRNA____ ___ ____ A:07050 transforming growth factor, beta 1 7040 NM 000660 (Camurati-Engelmann disease) ____(TGFBI), mRNA B:0094 transforming growth factor beta 1 7041 NM_015927 induced transcript I (TGFBI1), mRNA A:09824 transforming growth factor, beta 2 7042 NM 003238 (TGFB2), mRNA ~ B:7853 transforming growth factor, beta 3 7043 NM_003239 ... _ (TGFB3), mRNA 8:4156 transforming growth factor, beta- 7045 NM0 00358 03732 minded, 68kDa (TGFBI1). mRNA ______ ______ 373 transforming growth factor, beta 7048 NM_003242 receptor li (70/80kDa) (TGFBR2), transcript variant 2, mRNA B:0258 thrombopoletin (myeloproliferative 7066 NM_199356 leukaemia virus oncogene ligand, I megakaryocyte growth and development factor) (THPO), transcript variant 3, mRNA

---------

B:4371 thyroid hormone receptor, alpha 7067 NM_199334 (erythroblastic leukaemia viral (v erb-a) oncogene homolog, avian) .. (THRA), transcript variant 1, mRNA A:06139 Kruppel-like factor 10 (KLF10), 7071 NM 005655 L ------ - transcript variant 1, mRNA A:08048 TIMP metallopeptidase inhibitor 1 7076 NM_003254 -_ -- (TIMPI), mRNA ~ B:3686 transmembrane 4 L sixfamily 7104 NM_004617 member 4 (TM4SF4), mRNA B:5451 topoisomerase (DNA) I topiI), 7150 NM 003286 mRNA ~ 8:7145 topoisomerase (DNA) 11 alpha 7153 NM_001067 170kDa (TOP2A, TRNA A:04487 topoisomerase (DNA) I beta 7155 NM 001068 180kDa (TOP2B) mRNA A topoisomerase (DNA) IIl alpha 7156 NM 004618 (TOP3A),_mRNA --------- --- - -- - :07597 tumour protein p53 (Li-Fraumeni 7157 NM000546 S--.--syndrome) (TP53), mRNA ------- - -_ B:6951 tumour protein p53 binding protein, 7159 NM_001031685 2 (TP53BP2), transcript variant 1, mRNA 51 WO 2009/045115 PCT/NZ2008/000260 A:10089 tumor protein p73 (TP73), mRNA 7161 NM 005427 A:07179 tumour protein D52-ike 1 7165 NM_001003397 (TPD52LI), transcript variant 4, A:00700 tuberous sclerosis I (TSC1), 7248 NM 000368 transcript variant 1, mRNA C:2440 tuberous sclerosis 2 (TSC2), 7249 NM_021055 transcript variant 2, rnRNA A:06571 thyroid stimulating hormone 7253 NM_000369 receptor (TSHR), transcript variant 1, mRNA A:02759 testis specific protein, Y-linked 1 7258 NM 003308 (TSPY1), mRNA A:09121 tumour suppressing subtransferable 7260 NM_003310 candidate I (TSSC1), mRNA A:07936 TTK protein kinase (TTK), mRNA 7272 NM 003318 A:05365 tumour necrosis factor (ligand) 7292 NM003326 superfamily, member 4 (tax transcriptionally activated glycoprotein 1, 34kDa) (TNFSF4), _____mRNA ___ ___ B:0763 thioredoxin TXN -_--------_-7295 NM 003329 B:4917 ubiquitin-activating enzyme El 7317 NM_003334 (A1S9T and BN75 temperature sensitivity complementing) (UBEI), transcript variant 1, mRNA A:08169 ubiquitin-conjugating enzyme E2D 1 7321 NM_003338 (UBC4/5 homolog, yeast) (UBE2DI), mRNA A:07196 ubiquitin-conjugating enzy E2D 7323 003340 (UBC4/5 homolog, yeast) (UBE2D3), transcript variant 1, mRNA A:04972 ubiquitin-conjugating enzyme E2 7335 - NM_021988 variant I (UBE2VI), transcript variant 1, m RNA ------ _ -------- B:0648 i ubiquitin-conjugating enzyme E2 7336 NM003350 ----- --- variant 2 (UBE2V2), mRNA .......

---------

C:2659 uromodulin (uromucoid, Tamm- 7369 NM 001008389 Horsfall glycoprotein) (UMOD), transcript variant 2, mRNA A:06855 vav 1 oncogene (VAVI), mRNA 7409 NM 005428 A:08040 vav 2 oncogene VAV2 7410 NM 003371 C:1 128 vascular endothelial growth factor 7422 NM 001025369 (VEGF), t anscript variant 5, mRNA B:5229 vascular endothelial growth factor B 7423 NM_003377 (VEGFB), mRNA A:06320 vascular endothelial growth factor C 7424 NM_005429 A-0---(VEGFC), mRNA ------- ----- - ------ A:06488 von Hippel-Lindau tumour 7428 NM_198156 suppressor (VHL), transcript variant 2, m RNA -------- -- C:2407 vasoactive intestinal peptide (VIP), 7432 NM 003381 transcript variant 1, mRNA B:8107 vasoactive intestinal peptide 7433 -NM 004624 receptor 1(VIPR1 mRNA ---- - - -- A:08324 tryptophanyl-tRNA synthetase 7453 NM 004184 (WARS), transcript variant 1, mRNA 52 WO 2009/045115 PCT/NZ2008/000260 A:06953 WEE1 homolog (S pombe) 7465 NM 003390 -(WE1),mRNA ~ ___)__ J:5487 Wilms tumour I (WT1), transcript 7490 NM 024426 _______ vriant DnRNA____ C0172 X-ray repair complementing 7516 NM_005431 defective repair in Chinese hamster cells 2 (XRCC2), mRNA A:02526 v-yes-1 Yamaguchi sarcoma viral 7525 NM 005433 oncogene homolog I (YESI), mRNA :57O2 viralintegration site 5 7813 NM _005665 B:552 (E~VI5), mRNA _____ BTG family, member 2 (BTG2), 7832 N T 03788 mRNA NM 006763 mRNA A:03788 interferon-related developmental 7866 NM 006764 _ __regulator 2(IFRD2), mRNA A:09614 v-maf musculoaponeurotic 775 -NM002360 fibrosarcoma oncogene homolog K -- - ------ (avian)(MAFK), mRNA A:02920 frizzled homolog 3 (Drosophila) 7976 NM 017412 _ (FZD3), mRNA _ _... A-03507 FOS-like antigen I (FOSL1) mRNA 8061 ..... NM 005438 ---- A:00218 cullin 5 (CUL5), mRNA 8065 NM 003478 A:08128 CDK2-associated protein 1 8099 NM_004642 (CDK2AP1), mRNA A9843 melanoma inhibitory activity (MIA), 8190 NM_006533 mRNA A:09310 chromatin assembly factor 1, 8208 NM_005441 suburit B (p60) (CHAFB),_ mRNA A:05798 SMC1 structural maintenance of 8243 NM_0O-306 chromosomes 1-like 1 (yeast) (SMCL1),mRNA C:0317 axin 1 (AXIN1), transcript variant 1 8312 NMO35-2 mRARNA B:0065 BRCA1 associated protein- 8314 NM_004656 (ubiquitin carboxy-terminal hydrolase) (BAPI), mRNA A:08801 CDC7 cell division cycle 7 (S. 8_17 NM_-5503 cerevisiae) (CDC7), mRNA_ A:09331 CDC45 cell division cycle 45-like (S. 8318 NM_003504 cerevisiae) (CDC45L), mRNA A:01727 growth factor independent 1B 8328 NM 004188 (potential regulator of CDKNIA, translocated in CML) (GFIIB), mRNA . A:10009 MADI mitotic arrest deficient-like 1 8379 NM_003550 (yeast) (MADI LI), transcript variant 1, mRNA A:06561 breast cancer anti-estrogen 8412 NM_003567 resistance 3 (BCAR3), mRNA 83NM_021111 064,61 reversion-inducing-cysteine-rich 84;34NM211 protein with kazal motifs (RECK), mRNA A:06991 RAD54-like (S. cerevisiae) 8438 NM_003579 ~D54L mRNA__ A:04140 NCK adaptor proe(NK2 8440 NM_003581 transcript variant 1, mRNA B:6523 DEAH (Asp.-Glu-Ala--His) box 8449 NM003587 53 WO 2009/045115 PCT/NZ2008/000260 polypeptide 16 DHX16 A:09834 cullin 4B (CUL4B), mRNA 8450 NM 003588 A:06931 cullin 4A (CUL4A), transcript variant 8451 NM 001008895 1,mRNA _-- - -- A:05012 cullin 3 (CUL3), mRNA 8452 NM 003590 A:05211 cullin 2 (CUL2), mRNA 8453 NM 003591 A:01673 cuilin 1 (CUL1), mRNA 8454 NM 003592 C:0388 Kruppel-like factor 11 (KLF11)- 8462 NM_003597 mRNA ~ A:01 318 suppressor of Ty 3 homolog (S. 8464 NM_181356 cerevisiae) (SUPT3H), transcript variant 2, mRNA A:01318 suppressor of Ty 3 homolog (S 8465 NM 181356 cerevisiae) (SUPT3H), transcript variant 2, mRNA A:09841 protein phosphatase 1D 8493 NM_003620 magnesium-dependent, delta isoform (PPM1D), mRNA B:3627 interferon induced transmembrane 8519 NM_003641 protein 1 (9-27) (IFITMI, mRNA A:06665 growth arrest-specific 7 (GAS7), 8522 NM 003644 A: transcript variant a, mRNA 10603 basic leucine zipper nuclear factor I T54 NM_003666 (JEM-1) (BLZFI , mRNA 1 A:10266 CDC14 cell division cycle 14 8556 NM 033312 homolog A (S. cerevisiae) (CDC14A), transcript variant 2, mRNA A:09697 cyclin-dependent kinase (CDC2- 8558 NM_003674 like) 10 (CDK1 0), transcript variant 1, mRNA A:10520 protein kinase. interferon-inducible 8575 NM_003690 double stranded RNA dependent activator (PRKRA), mRNA A:00630 phosphatidic acid phosphatase type 8611 NM_176895 2A (PPAP2A), transcript variant 2, mRNA B:9227 cell division cycle 2-like 5 8621 NM 003718 (cholinesterase-related cell division controller) (CDC2L5), transcript variant 1, mRNA A:08282 tumour protein p73-like TP73L 8626 NM 003722 B:8989 aldo-keto reductase family 1, 8644 NM_003739 member C3 (3-alpha hydroxysteroid dehydrogenase, type 11) (AKR1C3), mRNA B:1328 insulin receptor substrate 2 (IRS2), 8660 NM 003749 mRNA 8:4001 0C23 (cell division cycle 23, 8697 NM 004661 yeast, homology) CDC23 A:00144 tumour necrosis factor (ligand) 8740 NM0 03807 superfamily, member 14 (TNFSFI4), transcript variant 1, mRNA B:8481 tumour necrosis factor (ligand) 8741 NM 003808 superfamily, member 13 (TNFSFI 3), transcript variant alpha, I mRNA 54 WO 2009/045115 PCT/NZ2008/000260 :09478 tumour necrosis factor (ligand) 8744 NM_003811 superfamily, member 9 (TNFSF9), rMRNA B:8202 CD164 antigen, sialomucin - 8763 NM 006016 ( CD164),mRNA ~ A:01 775 RIO kinase 3 (yeast) (R1OK3), 8780 NM_45906 transcript variant 2, mRNA A:01 775 RIO kinase 3 (yeast) (RIOK3), 8781 NM_145906 I transcript variant 2, mRNA C:0356 tumour necrosis factor receptor 8792 NM 005839 superfamily, member I Ia, NFKB activator (TNFRSF11A), mRNA ----- -_------ A:03645 cellular repressor of ElA-stimulated 8804 NM 003851 9enes 1 (CREGI), mRNA A:08261 galanin receptor 2 (GALR2), mRNA 8812 NM 003857 A:03558 cyclin-dependent kinase-like1 8814 NM _0O4196 (CDC2-related kinase) (CDKLI), mRNA B:0089 fibroblast growth factor 18 (FGF18), 8817 NM_033649 transcript variant 2, mRNA

-----

B:5592 sin3-associated polypeptide, 3OkDa 8819 NM_003864 SAP30 ~ B:4763 IQ motif containing GTPase 8827 NM 003870 activating protein 1 (IQGAPI), mRNA C:0673 neuropilin 1 NRP1 -------- ----- 8829 NM_001024628, NM_001024629, NM_003873 ---------------------------- ------- NM037 A:09407 histone deacetylase 3 (HDAC3), 8841 NM_003883 mRNA A:07011 alkB, alkylation repair homolog (E. 8847 NM_006020 coli) (ALKBH),mRNA A:06184 p300/CBP-associated factor 8850 NM_ 003884 (PCAF), mRNA A:06285 cyclin-dependent kinase 5, 8851 -- NM 003885 regulatory subunit 1 (p35) (CDK5RI), mRNA I 3696 chromosome 10 open reading 8872 NM_006023 frame 7 (C10orf7),mRNA C:2264 sphingosine kinase 1 (SPHKI) 88- NM_021972 S~transcript variant 1, mRNA____ A:06721 CDC16 cell division cycle 16 8881 NM 003903 homolog (S. cerevisiae) (CDC16), mRNA A:04142 zinc finger protein 259 (ZNF259), 8882 NM_003904 A10737 MCM minichromosome 8888 NM_003906 maintenance deficient 3 (S, cerevisiae) associated protein (MCM3AP), mRNA A:03854 cyclin A (CCNAI), mRNA 8900 NM O03914 B:0704 B-cell CLL/lymphoma 10 (BCL10), 8915 NM_003921 mRNA A:03168 topoisomerase (DNA) I 1beta 8940 NM_003935 (TOP3B), mRNA B:9727 cyclin-dependent kinase 5, 8941 NM003936 regulatory subunit 2 (p39) (CDK5R2),mRNA - ------ 55 WO 2009/045115 PCT/NZ2008/000260 A:06189 protein regulator of cytokinesis 1 9055 NM_003981 (PRC1), transcript variant 1, mRNA A:01 168 DIRAS family, GTP-binding RAS- 9077 I NM 004675 like 3 (DIRAS3), mRNA - ~------- --- - 06043 protein kinase, membrane 9088 NM 004203 associated tyrosine/threonine 1 (PKMYTI), transcript variant 1, mRNA B:4778 ubiquitin specific peptidase 8 9101 NM 005154 (USP8), mRNA B:8108 LATS, large tumour suppressor, 9113 NM_004690 homolog 1 (Drosophila) (LATSI), mRNA A:09436 chondroitin sulfate proteoglycan 6 9126 NM 005445 __ (bama-can) (CSPG6), mRNA4_ ----- ---- -_ - A:03606 cyclin B2 (CCNB2), mRNA 9133 NM 004701 A:10498 cyclin E2 (CCNE2), transcript 9134 NM 057749 variant 1, mRNA A:00971 Rho guanine nucleotide exchange 9138 -NM 0-470 factor (GEF) I (ARHGEF1), transcript variant 2. mRNA B:3843 hepatocyte growth factor-regulated 9146 i NM 004712 tyrosine kinase substrate (HGS), rmRNA A:03143 exonuclease iirXO1), transcript 9156 NM 006027 variant 1, m RNA ~ - - - - - - A:07881 oncostatin M receptor (OSMR), 9180 NM 003999 rnRNA A:00335 ZW10, kinetocnore associated, 9183 NM_004724 homolog (Drosophila) (ZW1O), mRNA A:09747 BUB3 budding uninhited by 9184 NM 004725 _ _ I_j ~ _5 benzimidazoles 3 homolog (yeast) (BU3), transcript variant 1, mRNA B:0692 leucine-rich, glioma inactivated 1 9211 NM_005097 (LGII), mRNA N__005_ _7 B:0692 leucine-rich, gliorna inactivated 1 9212 NM_005097 L_ __(LGI , mRNA -_ __ -_ ----- - 7_ _----_------- __ J 7 1--- A:03609 nucleolar and coiled-body 9221 NM 004741 phosphoprotein I (NOLCI), mRNA A:064043 -- discs, large homolog 5 (D ophila) 9231 NM 004747 _ (DLG5), mRNA A:05954 pituitary tumour-transforming 1 9232 NM 004219 (PTTGI), mRNA_~ 8:0420 transforming growth factor beta 9238 NM 004749 regulator 4 (TBRG4), transcript 1_ _ variant 1, mRNA A:02479 endothelial differentiation, 9294 NM 004230 sphingolipid G-protein-coupled receptor, 5 (EDG), mRNA _ A:06066 Kruppel-like factor 4 (gut) (KLF4), 9314 NM 004235 mRNA A:05541 glucagon-like peptide 2 receptor 9340 NM 004246 _(GLP2R) mRNA~ A:-0891 WD repeat domain 39 (WDR39), 9391 NM_004 A:0091 (LP29391N NM_004804 mnRNA A:00519 lymphocyte antigen 86 (LY86), 9450 NM_004271 6mRNA 56 WO 2009/045115 PCT/NZ2008/000260 A:01180 Rho-associated, coiled-coil 9475 NM 004850 containing protein kinase 2 (ROCK2), mRNA A:01080 kinesin family member 23 (KIF23), 9493 NM_004856 transcript variant 2, mRNA A:04266 ADAM metallopeptidase with 9510 NM 006988 thrombospondin type 1 motif, I (ADAMTSI), mRNA

----------

b:9060 tumour protein p53 inducible protein 9537 NM 006034 11 (TP53111, mRNA A:04813 breast cancer anti-estrogen 9564 NM 014567 .......... resistance 1 (BCARI), mRNA A:09885 M-phase phosphoprotein 1 9585 NM_016195 _L-_- ----- (MPHOSPH), mRNA B:8184 mediator of DNA damage 9656 NM 014641 checkpoint 1 (MDC), mRNA j C:1135 extra spindle poles like 1 (S. 9700 NM 012291 cerevisiae) (ESPLI), mRNA C:0186 histone deacetylase 9 (HDAC9), 9734 NM_178423 transcript variant 4, mRNA A05391 kinetochore associated 1 (KNTCI), 9735 NM_014708 mRNA B:0082 histone deacetylase 4 (HDAC4), 9759 NM 006037 mRNA B:0891 metastasis suppressor 1 (MTSS1), 9788 NM 014751 mRNA 8:0062 Rho guanine nucleotide exchange 9826 NM 014784 factor (GEF) 11 (ARHGEF11), transcript vacant 1, mRNA A:03269 tousled-like kinase I (TLK1), mRNA 9874 1 NM 012290 B:9335 RAB GTPase activating protein 1- 9910 NM_014857 like (RABGAP1 L), transcript variant 1, mRNA A:08624 chromosome condensation-related 9918 NM 014865 SMC-associated protein I (CNAP1), mRNA ---- ---- B:8937 deleted in lung and esophageal 9940 NM_007338 cancer 1 (DLEC1), transcript variant DLEC1-L1, mRNA --------- _ _4 B:8656 major vault protein (MVP), transcript 9961 NM 017458 variant 1, mRNA~ A:02173 tumor necrosis factor (ligand) 9966 NM_005118 superfamily, member 15 (TNFSF15, mRNA 99 fibroblast growth factor binding 9982 NM005130 protein 1 (FGFBP1), m RNA .... . ...----- A:00752 REC8-like 1 (yeast) (REC8L1), 9985 NM_005132 mRNA A:01592 solute carrier family 12 9990 NM 005135 (potassium/chloride transporters), member 6 (SLC12A6), mRNA I A:04645 abl-interactor I (ABII), transcript 10006 NM_005470 I variant 1, mRNA A:10156 histone deacetylase 6 (FIDAC6), 10013 NM_006044 mRNA ! B:2818 histone deacetylase 5 HDAC5 10014 NM 001015053. NM 005474 57 WO 2009/045115 PCT/NZ2008/000260 -g:10510 chromatin assembly factor 1, 10036 NM 005483 subunit A (p150 (CHAFA), mRNA A:05648 SMC4 structural maintenance of 10051 NM 001002799 chromosomes 4-like I (yeast) (SMC4L1), transcript variant 3, mRNA B:0675 tetraspanin 5 (TSPAN5), mRNA 10098 _____J NM 005723 B:0685 tetraspanin 3 (TSPAN3),_transcript 10099 NM 005724 variant 1, mRNA A:82 tetraspanin 2 TSPAN2), mRNA 01000 NM 005725 A:02634 tetraspanin I (TSPAN1), mRNA 101o3 NM 005727 A:07852 RAD50 homoloa (S. cerevisiae) 10111 NM 005732 (RAD50), transcript variant 1, mRNA B:4820 pre-B-cell colony enhancing factor 1 10135 NM_005746 (PBEF1), transcript variant 1, rmRNA B:7911 transducer of ERBB2 (TOBI) 10140 NM 005749 ___ImRNA___ _____ :0969 odz, odd Oz/ten-m homolog 10178 NM 014253 1(Drosophila) (ODZ1),mRNA A:06242 RNA binding motif protein 7 10179 NM 016090 _ (_ RBM7), mRNA A:03840 RNA binding motif protein 5 '10181 - 005778 (RBM5), mRNA M-phase phosphoprotein 9 10198 NM_022782 I MPHOSPH9 A:09658 M-phase phosphoprotein 6 10200 NM 005792 (MPHOSPH6), mRNA ___ K:04009 ret finger protein 2 (RFP2), 10206 NM_005798 transcript variant 1, mRNA - -- -- A:03270 proteogiycan 4 (PRG4, mRNA 10216 - NM 005807 A:01614 A kinase (PRKA) anchor protein 8 10270 NM_005858 (AKAP8), mRNA 8:5575 stromal antigen 1 (STAG1), mRNA 10274 NM 005862 8:8332 aortic preferentially expressed gene 10290 XM 001131579 1 APEGI XM 001128413 A:04828 DnaJ (Hsp40) homolog, subfamily 10294 N 005880 A, member 2 (DNAJA2), mRNA 12N_08 B:0667 katanin p80 (WD repeat containing) 10300 NM 005886 subunit B 1 (KATNBI), mRNA A:04635 deleted in lymphocytic leukaemia, 1 10301 NR_002605 (DLEUI) on chromosome 13 B:2626 uracil-DNA glycosylase 2 (UNG2), 10309 NM 021147 transcript variant 1, mRNA A:09675 T-cell, immune regulator 1, ATPase, 10312 NM_006019 H+ transporting, lysosomal VO protein a isoform 3 (TCIRG1), __ _ racip t vriant 1, mRNA __ A 09047 nucleophosmin/nucleoplasmin 3 10361 NM_006993 ___ M3),mRNA _ _I _ _ _____ A:04517 synaptonemal complex protein 2 10388 NM 014258 ____ (SYCP2,mTRNA I A:06405 anaphase promoting complex 10393 NM_014885 subunit 10 (ANAPCIO mRNA A:04338 phosphatidylethanolamine N- 10400 NM_007169 methyltransferase (PEMT), nuclear gene ending mitochondrial _ t] 58 WO 2009/045115 PCT/NZ2008/000260 protein, transcript variant 2, mRNA 7 -_-------- A:10053 kinetochore associated 2 (KNTC2), 10403 NM_006101 mRNA A:08539 Rap guanine nucleotide exchange 10411 NM 06105 factor (GEF) 3 (RAPGEF3), mRNA I A:01717 SKBI homolog (S. pombe) (SKBI), 10419 TNM 006109 mRNA B:6182 RNA binding motif protein 14 10432 NM_006328 . .(RBM14), mRNA BA641 glycoprotein (transmembrane) nmb 10457 NM_001005340 GPNMB NM 002510 A:10829 MAD2 mitotic arrest deficient-like 2 10459 NM_006341 (yeast) (MAD2L2), mRNA transcriptional adaptor 3 (NGGI 10474 NM_006354 homolog, yeast)--like (TADA3L) transcript variant 1, mRNA A:00010 vesicle transport through interaction 10490 NM 006370 with t-SNAREs homolog 1B (yeast) (VTI1B). mRNA B:1984 cartilage associated protein 10491 NM_006371 (CRTAP), mRNA A:07616 Sjogren's syndrome/scleroderma 10534 NM 006396 autoantigen I (SSSCA1), mRNA A:04760 ribonuclease H2, large subunit 10535 NM 006397 RNASEH2A), mRNA A:10701 dynactin 2 (p50) (DCTN2), mRNA 10540 NM_ 006400 A:04950 chaperonin containing TCP1 10574 NM_006429 subunit 7 (eta) (CCT7), transcript variant 1, mRNA _____ A:04081 chaperonin containing TCPI 10575 NM 006430 - ------- subunit 4 (delta) (CCT4), mRNA

----------

A:09500 chaperonin containing TCPI, 10576 NM_006431 --------- subunit 2 (beta) (CCT2), mRNA A-09726 chromosome 6 open reading frame 10591 NM_006443 108 (C6orfl 08), transcript variant 1, mRNA A:10196 SMC2 structural maintenance of 10592 NM 006444 chromosomes 2-iike I (yeast) (SMC2L1 mRNA B:1048 ubiquitin specific peptidase 16 10600 NM 006447 (USP16), transcript variant 1, mRNA A:08296 MAX dimerization protein 4 (MXD4), 10608 NM_006454 __mRNA ______ j____ ____ A-5163 synaptonemal complex protein 10609 NM 006455 SC6SC65C65), mRNA __ A:04356 STAM binding protein (STAMBP), 10617 NM 006463 transcript variant 1, mRNA B:3717 growth arrest-specific 2 like 1 10634 NM 006478 (GAS2L1), transcript variant 1, mRNA A:01918 S-phase response (cyclin-related) 10638 NM 006542 (SPHAR), mRNA IA:04374 KH domain containing, RNA 10657 NM_006559 binding, signal transduction associated I (KHDRBS)mRNA : ---- - -- - A:08738 CCCTC-binding factor (zinc finger 10664 NM006565 _protein) (CTCF), mRNA - - 59 WO 2009/045115 PCT/NZ2008/000260 A:08733 cell growth regulator with ring finger 10668 NM 006568 ____ domain I (CGRRFI), mRNA______ _____ A:07876 cell growth regulator with EF-hand 10669 NM_006569 _______jdomain I (OGIREFI). mnRNAI 6 3 NM057 At05572 tumor ecoss igad) 10673 NM_006573 superfamily, member 13b (TNFSF13B), rRNA B:4752 polymerase (DNA-directed) delta 3, 10714 NM 006591 ------- accessory subunit (POLD3), mRNA B:3500d polymerase (DNA directed), theta 10721 NqM_1994-20 ___ A03035 nuclear dt A sibution gene C homolog 10726 4 NM 006600 (A. nidulans) NUDC, mRNA A:00069 transcription factor-like 5 (basic 10732 NM_006602 helix-loop-helix) (TCFL5), mRNA :7543 polo-like kinase 4 (Drosophila) 10733 NM_014264 _______(PLK4),_mRNA _______ ____ ____ B:2404 stromal antigen 3 (STAG3). mRNA 10734 --- ____NM_012447 A:10760 Istmi- antigen 2 (STAG2), mRNA 10735 NM 006603 B:5933 1 transducer of ERBB2, 2 (TOB2), 10766 NM_016272 I mRNA A:02195 polo-like kinase 2 (Drosophila) 10769 NM_006622 -(PLK2, mRNA - --- A:04982 zinc finger, MYND domain 10771 NM 006624 containing 11 (ZMYND11), _transcript variant 1, mRNA B:2320 septin 9 (SEPT9), rnRNA 10801 NM 006640 A:07660 thioredoxin-like 4A (TXNL4A), NM_006701 mRNA ~ B:9218 SGT1, suppressor of G2 allele of 10910 NM_006704 SKPI (S. cerevisiae) (SUGTI), _____ mRNA ___ _ ___ A:08320 DBF4 homolog (S. cerevisiae) 10926 NM 06716 _ __(DBF4), mRNA A:08852 spindlin (SPIN), mRNA 10927 NM 006717 A:00006 BTG family, member 3 (BTG3), 10950 NM 006806 mRNA A:01860 cytoskeleton-associated protein 4 10971 NM 006825 _ (CKAP4), mRNA A:01595 microtubule-associated protein, 10982 NM_014268 RP/EB family, member 2 (MAPRE2), transcript variant 5, mRNA A:05220 cyclin I (CCNI), mRNA 10983 NM 006835 B:4359 kinesin family member 2C (KIF2C), 11004 NM_006845 rnRNA A09969 tousled-like kinase 2 (TLK2), mRNA 11011 NM 006852 -__---- A:04957 polymerase (DNA directed) sigma 11044 NM_006999 (POLS), mRNA A:01776 ubiquitin-conjugating enzyme E2C 11065 NM_007019 (UBE2C), transcript variant 1, mRNA A:09200 cytochrome b-561 domain 11068 NM_007022 containing 2 (CYB561D2), rRNA topoisomerase (DNA) 11 binding 11073 NM 007027 protein 1 (TOPBPI) mRNA 60 WO 2009/045115 PCT/NZ2008/000260 B:1407 ADAM metallopeptidase with 11095 NM_007037 thrombospondin type I motif, 8 8-- - (ADAMTS8), rnRNA A:09918 katanin p60 (ATPase-containing) 11104 NM_007044 subunit A(KATNA), mRNA ~ A:09825 PR domain containing 4 (PRDM411), NM_012406 mRNA B:7528 FGFRI oncogene partner 11116 NM_ 07045 (FGFRI OP), transcript variant 1, rmRNA A:04279 CD160 antigen (CD160), mRNA 11126 NM 007053 ----- C:4275 TBC1 domain family, member 8 11138 NM 007063 (with GRAM domain) (TBCID8), mRNA A:03486 CDC37 cell division cycle 37 11140 NM O-7065 _____ _____ _______ ___N007065 homolog (S. cerevisiae) (CDC37), mRNA A:06143 MY57 histone acetyltransferase 2 11143 NM 007067 (MYST2),mrRNA ____ ___ ___ ___ A:06472 DMC1 dosage suppressor of mck1 11144 NM_007068 homolog, meiosis-specific homologous recombination (yeast) -_ ---- ___ (DMC1), mRNA A:07181 coronin, actin binding protein, 1A 11151 NM_007074 - ------ (CORO IA), m RNA -- -5-- 0 A:04421 Huntingtin interacting proteinE 11153 NM_007076 (HYPE), mRNA A:03200 PC4 and SFRS1 interacting protein 11168 NM_3 1 (PSIP1), transcript variant 2, rmRNA C:0370 centrosomal protein 2 (CEP2), 11190 NM_007186 Transcript vriant 1, mRNA _ __9_ -0370 centrosomal protein 2 (CEP2), 11191 NM 007186 transcript variant 1, mRNA ----- A:02177 CHK2 checkpoint homolog (S. 11200 NM_07194 A:02177 Jpombe) (CHEK2), transcript variant 1 A09335 polymerase (DNA directed) gamma 11232 NM_007215 2, accessory subunit (POLG2), m RNA - ---- A:0808 dynactin 3 (p22) (DCtN3), 11258 NM_024348 transcript variant 2, mRNA B:7247 three prime repair exonuclease 1 11277 NM 033627 (TREXI), transcript variant 2, mRNA A:03276 polynucleotide kinase 3- 11284 NM_b7254 phosphatase (PNKP), mRNA A:01322 Parkinson disease (autosomal 1155 INM 007262 recessive, early onset) 7 (PARK7), mRNA B:5525 PDGFA associated protein 1 11333 NM_014891 . ..... (PDAPI), rnRNA A:05117 tumour suppressor candidate 2 11334 NM-007275 (TUSC2), mRNA A:08584 activating transcription factor 5 22809 NM_012068 -..... (ATF5), mRNA 9A:10029 KIAA097IdXXA971) mRNA 22868 NM 014929 61 WO 2009/045115 PCT/NZ2008/000260 C:4180 DENN/MADD domain containing 22898 NM_014957 (DENND3),_mRNA _____ W _0232- A:076 microtubule-associated protein, 22919 NM_012325 RP/EB family, member 1 (MAPRE1), mRNA A:02013 sirtuin (silent mating type 22933 NM_030593 information regulation 2 homolog) 2 ~23 3 3 - n e a 2 ---- g(S. cerevisiae) (SIRT2), transcript ___ variant 2, inRNA_________ A:7965 TPX2, microtubule-associated, 22974 NM_012112 homolog (Xenopus laevis) (TPX2), mRNA B:1032 apoptotic chroratin condensation 22985 NM 14977 inducer 1 ACIN1 A:10375 androgen-induced proliferation 2304 NM 015O32 inhibitor (APRIN), transcript variant 1, mRNA A04696 nuclear receptor coactivator 6 23054 NM _14071 (NCOA6), mRNA A:09165 KIAA0676 protein (KIAA0676), 23061 NM_198868 transcript yariant 1, mRNA B:4976 KIAA0261 (KIAA0261), mRNA 23063 NM 015045 :8950 KIAA0241 protein (KIAAO241), 23080 N 015060 mRNA N 056 Cp p53-associatedrkike 23113 NM_015089 cytopasmic protein (PARC), mRNA B:9549 SMC5 structural maintenance of 23137 NM_015110 chromosomes 5-like I (yeast) (SMC5LI), mRNA BA428 septin 6 (SEPT6), transcript variant 23157 NM 145799 1, mRNA B:6 7A 8 protein (KIA 0882), 23158 ____$ NM_015130 mRNA B-1443 septin 8 (SEPT8), mRN 23176 XM 034872 B:8136 ankyrin repeat domain 15 23189 NM 158 (ANKRD15), transcript variant I mRNA B:4969 KIAAI 086 (KIAA1086), mRNA 23217 XM 001130130 -336XM 001130674 A:10369 phospholipase C, beta 1 23236 NM_182734 (phosphoinositide-specific) (PLCBI), transcript variant 2, mRNA B:0524 RAB6 interacting protein 1 23258 - .NM_015213 fB:0230 in cable cel co stimult igand t 23308 I NM 015259 ICOSLG _ B:0327 SAM and SH3 domain containing 1 23328 NM_015278 S(SASHI, mRNA B3:5714 KIAA0650 protein (KIAn0650) 23347 XM 113962 mRNA XM 938891 8:8897 formin binding protein 4 (FNiP4l) 23360 NM_015308 mRNA NM0-_ _08 B:8228 barren homolog 1 (Drosophila) 23397 NM _015341 .... . (BRRN1), m RNA ------ --- - - 8:9601 ATPase type 13A2 (ATP1 3A2), 23401 NM 022089 mRNA 62 WO 2009/045115 PCT/NZ2008/000260 B:7418 TAR DNA binding protein 23435 NM 007375 j(TARDBP , mRNA B:7878 microtubule-actin crosslinking factor 23499 NM0129 1 (MACF1), transcript variant 1, mRNA

K:

09 1 05 RNA binding motif protein 9 23543 NM 014309 i:1165-- -JRBM9), transcript variant 2, mRNA B: 1165 origin recognition complex, subunit 23594 14321 6 homolog-like (yeast) (ORC6L), ___ mRNA ___ __ B:3180 origin recognition complex, subunit 23595 NM_012381 3-like (yeast) (ORC3L), transcript variant 2, mRNA A:00473 SPO1 I meiotic protein covalently 23626 NM 012444 bound to DSB-like (S. cerevisiae) (SPO1 1), transcript variant 1, mRNA A:02179 RAB GTPase activ2ting protein 1 23637 NM_012197 ---- _ (RABGAPI),_mRNA A:06494 leucine zipper, down-regulated in 23641 NM_012317 cancer 1 (LDOC1), mRNA B:2198 protein phosphatase 1, regulatory 23645 NM 014330 (inhibitor) subunit 15A (PPPIR15A), mRNA C:3173 polymerase (DNA directed), alpha 223649 NM 002689 ... _ ( 70kD subunit) (POLA2, mRNA A:03098 SH3-domain binding protein 4 23677 NM 014521 .... . (SH3BP4),mRNA

.

_____ - - -- C:1904 N-acetyltransferase 6 (NAT6), 24142 NM 012191 mRNA C:2118 unc-84 homolog B (C. elegans) 25777 NM_015374 (UNC84B), mRNA A:05344 RAD54 homolog B (S. cerevisiae) 25788 NM_012415 (RAD54B), transcript variant 1N mRNA A:06762 CDKN1A interacting zinc finger 25792 NM 012127 C ---- protein 1 (CZ1), nRNA C:4297 Nipped-B homolog (Drosophila) 25636 -N -015384 (NIPBL), transcript variant B, mRNA -- - ---- -- A:09401 preimplantation protein 3 (PREI3), 25843 NM_015387 B:3103 breast cancer metastasis 25855 NM015399 suppressor 1 (BRMSI), transcript variant 1, mRNA A:01151 protein kinase D2 (PRKD2), mRNA 25869 NM 016457 A:07688 EGF-like-domain, multiple 6 I 25975 NM 015507 (EGFL6), mRNA B:6248 ankyrin repeat domain 17 26057 NM 032217 (ANKRDI7), transcript variant 1. mRNA A:02605 adaptor protein containing pH 26060 NM 012096 domain, PTB domain and leucine - ------ jzipper motif 1 APPL), mRNA - -- -_----- A:02500 ets homologous factor (EHF), 26298 NM012153 L__-------- __-- mRNA_ _ _ _ A:09724 mutL homolog 3 (E. coli) (MLH3), 27030 NM 014381 mRNA __~ 63 WO 2009/045115 PCT/NZ2008/000260 A:06200 lysosomal-associated membrane 27074 NM _014398 j-- protein 3 (LAMP3), mRNA A:00686 tetraspanin 13 (TSPANI3), mRNA 1 27075 NM~014399 A:02984 calcyclin binding protein (CACY 2P) 27101 NM_0 14412 ____ ranscript variant 1, mRNA _____ A:0045 eukaryotic translation initiation 27104 NM 014413 factor 2-alpha kinase 1 (EIF2AK1i), mRNA C:8169 SMCI structural maintenance of 27127 NM 148674 chromosomes 1-like 2 (yeast) (SMC1L2), mRNA A:00927 sestrin 1 (SESN1), mRNA 27244 NM 0 14454 A:01831 RNA binding motif, single stranded 27303 NM 014483 interacting protein (RBMS3), transcript variant 2, mRNA A:06053 zinc finger protein 330 (ZNF330), 27309 NM 014487 mRNA A:03501 down-regulated in metastasis 27340 NM 014503 ______(DRlM),mRNAK B:3842 polymerase (DNA directed), lambda 27343 NM_013274 0 (POLL), mRNA B5 polymerase (DNA directed), mu 27434 NM 013284 81 (POLM), mRNA :4351 echinoderm microtubule associated 27436 1 NM 019063 protein like 4 (EML4), mRNA B:1612 cat eye syndrome chromosome 27443 AF307448 ------ _ * ion, candidate 4 CECR4 _____ _____ A:08058 protein phosphatase 2 (formerly 28227 NM_013239 2A), regulatory subunit B", beta (PPP2R3B), transcript variant 1, mRNA A:09647 response gene to compilement 32 28984 - NM_014059 (RGC32). mRNA A:09821 malignant T cell amplified sequence 28985 NM 014060 ------- 1 (MCTSI), mRNA B:6485 HSPC135 protein (HSPC135), 29083 NM 014170 transcript variant 1, mRNA 9945 PYD and CARD domain containing 29108 3258 (PYCARD), transcript variant 1, mRNA C:1944 lectin, galactoside-binding, soluble, 29124 NM 013268 ----- -13 (gaiectin 13) (LGALS13), mRNA A:02160 CD274 antigen (CD274), mRNA 29126 NM 014143 A:08075 replication initiator 1 (REPINI), 29803 NM_013400 transcript variant 1, mRNA B:1479 . anaphase promoting complex 29882 NM_013366 subunit 2 (ANAPC2), mRNA A:08657 protein predicted by clone 23882 29903 NM 013301 A -- (HSU79303), mRNA23 A:10453 replication protein A4, 34kDa 2993 NM_013347 (RPA4), mRNA A:02862 anaphase promoting complex 29945 NM_013367 subunit 4 ANAPC4), mRNA A:10100 SERTA domain containing 1 29950 NM 013376 (SERTAD1), mRNA _ ---- A:05316 striatin, calmoduiin bindIng protein 3 29966 NM_014574 --- (STRN3), mRNA A:06440 G0/G1 switch 2 ( mRNA 50486 _ 64 WO 2009/045115 PCT/NZ2008/000260 A:08113 deleted in esophageal cancer 1 50514 NM 017418 --- (DECI),mRNA 8:7919 hepatoma-derived growth factor, 50810 NM016073 related protein 3 (HDGFRP3), mRNA A:07482 par-6 partitioning defective 6 50855 N -- 0 16948 homolog alpha (C. elegans) (PARD6A), transcript variant 1, mRNA A:03435 geminin, DNA replication inhibitor 51053 NM_015895 (GMNN), mRNA A:00171 ribosomal protein S27-like 51065 NM015920 -:15 (RPS27L, mRNA B:1459 EGF-like-domain, multiple 7 51162 NM_016215 (EGFL7), transcript variant 1, mRNA A-09081 tubulin, epsilon 1 (TUBE1), mRNA 51175 NM 016262 -_--- A:08522 hect domain and RLD 5 (HERC5), 51191 NM 016323 mRNA M_0163~ 3 A05174 phospholipase C epsilon 1 51196 NM_016341 A.353 . (PLCE1 , mRNA NM016341 B:3533 dual specificity phosphatase 13 51207 NM 001007271, DUSP13 NM 001007272, NM 001007273, NM_001007274, NM_001007275, NM 016364 A:06537 ABI gene family, member 3 (AB13), 51225 NM 016428 ____ RNA____ A:03107 transcription factor Dp family 51270 NM_016521 _ _ r member 3 (TFDP3), mRNA A:09430 SCAN domain containing 1 51282 NM_016558 (SCANDI), transcript variant 1, mRNA B:9657 CD320 antigen (CD320), mRNA 51293 NM 016579 A:07215 fizzy/cell division cycle 20 related 1 51343 NM 016263 ----- ----- (-Drosophila) (FZIR1),_mRNA A:06101 Wilms tumour upstream neighbor 1 51352 NM 01555 (WITI , mRNA A:10614 E3 ubiquitin protein ligase, HECT 51366 NM _015902 domain containing, I (EDD1), mRNA 3:9794 anaphase promoting complex 51433 NM )16237 subunit 5 ANAPC5), mRNA 14 81 anaphase promoting complex t51434 NM 016238 subunit 7 (ANAPC7),mRNA! A:08459 G-2 and S--phase expres sed 1 51512 N----- M16426 S- GTSE1), mRNA . APC1 I anaphase promoting 51529 0164760 complex subunit 11 homolog (yeast) (ANAPCI 1), transcript ------- _--variant 2, mRNA B:2670 histone deacetylase 7A HDAC7A 56NM 015401 7 ubiquitin-conjugating enzyme E2D 4 51619 NM 015983 (putative) (UBE2D4), mRNA CDK5 regulatory subunit associated 51654 NM_016082 protein I (CDK5RAP1), transcript variant 2, mRNA 65 WO 2009/045115 PCT/NZ2008/000260 B:1035 DNA replication complex GINS 51659 NM 016095 protein PSF2 (Pfs2), mRNA - - ------- B:9464 sterile alpha motif and leucine 51776 i-M 133646 zipper containing kinase AZK (ZAK), transcript variant 2, mRNA B:787 1 ZWI0 interactor antisense 53588 X98261 ZWINTAS B:3431 RNA binding motif protein 11 54033 NM_144770 (RBM11), mRNA A:02209 polymerase (DNA directed), epsilon 54107 NM 017443 4 (p17 subunit (POLE3) mRNA A:04070 DKFZp434AO131 protein 54441 NM_018991 DKFZP434AO131 5280 anillin, actin binding protein (scraps 54443 NM_018685 homolog, Drosophila) (ANLN), mRNA A:0 spindlin family, member 2 (SPIN2). 54466 NM_019003 ___mRNA __ A:03960 cyclin J (CCNJ), mRNA 54619 lN -- 019084 B:3841 M-phase phosphoprotein, mpp8 54737 NM 017520 (HSMPP8), mRNA B:8673 ropporin, rhophilin associated 54763 NM 017578 protein 1 (ROPNI), mRNA 01757 _ A:02474 B-cell translocation gene 4 (BTG4), 54766 NM_017589 mRNA B:2084 G patch domain containing 4 54865 NM_182679 (GPATC4), transcript variant 2, mRNA :06639 hypothetical protein FJ20422 54929 NM 07814 (FLJ20422), mRNA ~ C:2265 thioredoxin-like 4B (TXNL4B)154957 NM 017853 rnRNA B:7809 PIN2-interacting protein 1 (PINX1), 54984 NM_017884 I mRNA 204 polybromo 1 (PB1), transcript 55193 NM_018313 variant 2, mRNA , __--- A:05321 hypothetical protein FUi0781 55228 NM_018215 (-FLJ10781), mRNA B:2270 MOBi Mps One Binder kinase 55233 NM 018221 activator-like 1B (yeast) MOBK1B A:08002 signal-regulatory protein beta 2 55423 NM.018556 (SIRPB2), transcript variant 1, mRNA A:03524 tripartite motif-containing 36 55522 NM_018700 (TRIM36), transcript variant 1, I mRNA IA:0947 chrNom_ __osome_ openrai__ngframe 7M 29 C2orf29), mRNA hypothetical protein H41 (H41), 55573 NM _017548 :65 14 mRNA 3N 0 7 4 B:2133 CDC37 cell division cycle 37 55664 NM 017913 homolog (S. cerevisiae)-like 1 -841 (CDC37L1L mRNA 413 Nedd4 binding protein 2 (N4BP2) 55728 NM_018177 mRNA

---

A:02898 checkpoint with forkhead and ring 55743 NM 018223 Singer domains (CHFR), mRNA A:07468 septin 11 (SEPT11), mRNA 1.55752 NM01243 66 WO 2009/045115 PCT/NZ2008/000260 B3:2252 chondroitin betal,4 N- 55790 NM_018371 acetylgalactosaminyltransferase ---- _ (ChGn), mRNA C:0033 B double prime 1, subunit of RNA 55814 NM_018429 polymerase Ill transcription initiation factor 1IIB BDP1 A:03912 PDZ binding kinase (PBK), mRNA 55872 NM 018492 A:10308 unc-45 homolog A (C. eiegans) 55898 NM_017979 (UNC45A), transcript variant 1, mRNA A:02027 bridging integrator 3 (BIN3), mRNA 55909 NM_018688 C:0655 erbb2 interacting protein ERBB2iP 55914 NM_001OO6600 ------ NM 018695 8:1503 -septin 3 (SEPT3), transcript variant 55964 NM_145734 C, mnRNA B:8446 gastrokine 1 (GKN1), mRNA 56287 NM 019617 A:00073 par-3 partitioning defective 3 56288 NM_019619 homolog (C. elegans) (PARD3), rmRNA A:03990 CTP synthase 11 (CTPS2), transcript 56475 NM 19857 variant 1, mRNA

___

B:8449 BRCA2 and CDKNiA interacting 56647 NM_078468 protein (BCCIP), transcript variant 2:123 BmRNA ___ ___ ___ :~1203 interferon, kappa ( -FNK), mnRNA 56832 ---- NM 020124 B:1205 SLAM family member 8 (SLAMF) 7 56833 NM_020125 mRNA A:00149 sphingosine kinase 2 (SPHK2), 56848 NM_020126 mRNA A:0420 Werner helicase interacting protein 6- __897 -- -- NM_20135 I (WRNIP1), transcript variant 1, mRNA A:09095 latexin (LXN),rRNA 56925 NM 020169 A:02450 dual specificity phosphatase 22 56940 NM 020185 (DUSP22), mRNA ____ __ ___ C:0975 DC13 protein (DC13), mRNA 56942 NM 020188 A:04008 5',3-nucleotidase, mitochondrial 56953 NM 020201 (NT5M), nuclear gene encoding _______mitochondrial protein, mRNA____ 51586 knesin family member KIF15), 56992 NM 020242 rmRNA S:0396 catenin, beta interacting protein I 56998 - NM_ 20248 (CTNNBIP1), transcript variant 1, rnRNA B:508 cyclin L1 (CCNLI ) mRNA 57018 NM 020307 A:06501 cholineraic receptor, nicotinic, alpha 57053 NM_020402 polypeptide 10 (CHRNA10), mnRNA - - --

---

B:7311 poly(rC) binding protein 4 (PCBP4), 57060 NM_020418 transcript variant 1, mRNA * A:08184 chromosome 1 open reading frame 57095 NM_020362 _ J 128 (C1orf28, mRNA .3:3446 S100 calcium binding protein A14 57402 NM_020672 (S100A14),mrRNA __ __ ___ C:5669 odz, odd Oz/ten-m homolog 2 57451 XM_047995 (Drosophila) (ODZ2), mRNA XM_931456 XM_942208, XM_945786, ....... - ----- XM_94 5788 67 WO 2009/045115 PCT/NZ2008/000260 HC4 ~~57 7 ---------- __ _ _ __ __ T:8403 membrane-associated ring finger -57574 NM_020814 (C3HC4)4 (MARCH4), mRNA B:1442 polymerase (DNA-directed), delta 4 57804d NM 021173 (POLD4), mRNA _ B:1448 prokineticin 2 (PROK2), mRNA 60675 NM 021935 B:4091 CTF1 8, chromosome transmission 63922 NM 02209 fidelity factor 18 homolog (S. cerevisiae) (CHTF18)r mRNA C:0644 2 (TY2(TSPYL2), mRNA 64061 --- NM 022117 B809 chromsome 10 open reading 64115 NM_022153 frame 54 (C10gOorf4) RNA A: 10488 chiromOsom e condenstii-on protein 64151 --- N M_--0 2 2346-- - __8 G (HCAP-G), m6RNA 022354 A:10186 spermatogenesis associated 1 64173 NM022354 A:29 _(SPATA1), mRNA A:02978 DNA cross-link repair IC (PSO2 64421 NM_022487 homolog, S. cerevisiae) (DCLRE1C), transcript variant b, mRNA A:10112 anaphase promoting complex 64682 NM 022662 I subunit 1 (ANAPC1), mRNA6 A:10470 FLJ20859 gene (FLJ20859), 64745 NM 010299 _transcript variant 1, mRNA B:3988 interferon stimulated exonuclease 64782 NM 022767 gene 20kDa-like I (ISG20LI), mRNA A:06358 DNA cross-link repair 18 (PS02 64858 NM homolog, S. cerevisiae) (DCLRE1B), mRNA A: 10073 centromere protein H (CENNH) 64946 NM_022909 mRNA A:05903 chromosome readingNM023933 frame 24 (C16orf24), mRNA A:07975 spermatogenesis associated 5ke --- 79029NM_024063 1 (SPATA5L1), mRNA A:01368 hypothetical protein MGC5297 79072 -NM 024091 _(MGC5297), mRNA C:1382 basic helix-loop-helix domain 79365

NM_

3 07 62 containing, class B, 3 (BHLHB3), mRNA A:00699 NADPH oxidase, EF-hand calcium 79400 NM 024505 binding domain 5 (NOX5), rnRNA A-05363 SMC6 structural maintenance of 79677 NM_024624 chromosomes 6-like 1 (yeast) (SMC6LI), mRNA A:09775 V-set domain containing T cell 79679 NM_024626 activation inhibitor 1 (VTCN1), m RNA p o enF J 1 2 -- 79--6--8--0---

-

- - 8:6021 hypothetical protein FLJ21125 79680 NM_024627 --- (7FLJ21I25), mRNA A:06447 Sin3A associated protein p30-like 79685 NM_024632 ______(SAP3OL, mrnNA __ __ sp08767 ressor of variegation 3-9 79723 NM_024670 homolog 2 (Drosophila) (SUV39H2)I, mRNA [A:0116 chromosome 15 open reading - NM 024713 A:OT1156 '- (SV3H 7961

-----------

___ _NM0271

----

frame 29 (C15orf29), mRNA 68 WO 2009/045115 PCT/NZ2008/000260 A:03654 hypothetical protein FLJ13273 79807 NM 001031720 (FLJ 13273), transcript variant 1. mRNA A:10726 hypothetical proteiFLJ 13265 79935 I NM 024877 (FLJ 13265), mRNA B:2392 Dbf4-related factor 1 (DRF1) 80174 NM 025104 transcript variant 2, mRNA B:2358 SMP3 mannosyltransferase 80235 NM 025163 - SM---- m NA____* A:02900 CDK5 regulatory subunit associated 80279 NM 025197 protein 3 (CDK5RAP3), transcript variant 2, mRNA NM026 leucine rich repeat containing 27 80313 N -30626 ---- (LRRC27), mRNA B:9631 ADAM metallopeptidase domain 33 80332 NM 025220 (ADAM33), transcript variant 1, mRNA B:6501 CD276 antigen (CD276) transcript 80381 NM 025240 variant 2, mRNA A:05386 hypothetical protein MGC10334 80772 NM001029885 (MGC10334), mRNA A:08918 collagen, type XVIII, alpha 1 80781 NM_030582 (COL18AI), transcript variant 1, rn" RNA C:0358 EGF-like-domai ultie8 80864 NM 030652 (EGFL8), mRNA B:1020 C/EBP-induced protein 81558 NM 030802 (LOC81558), mRNA B :3550 DNA replication factor (CDT1), 81620 NM_030928 mRNA B:5661 cycling L2 (CCNL2), rnRNA 81669 NM 030937 B:1735 exonuclease NEF-sp (LOC81691) 81691 NM 030941 mRNA ~ B:2768 ring finger protein 146 (RNFI46), 81847 M_030963 mRNA B:2350 interferon stimulated exonuclease 81875 NM 030980 gene 20kDa-like 2 (ISG2OL2), mRNA B:3823 Cdk5 and Abl enzyme substrate 2 81928 NM_ 031215 S(CABLS2), mRNA B:8839 leucine rich repeat containing 48 83450 NM 031294 (LRRC48, mRNA B:9709 katanin p60 subunit A-like 2 83473 NM 031303 _ (KATNAL2), m RNA ~............._. B:8709 sestrin 2 (SESN2), mRNA 83667 _ NM 031459 B:8721 CD99 antigenlike 2 (CD99L2), 83692 NM_031462 0:0--- transcript variant 1, mRNA C:0565 regenerating islet-derived family 83998 NM_032044 member 4 (REG4), mRNA 8:3599 katanin p60 subunit A-like 1 84056 NM_032116 (KATNALI), transcript variant 1 rnRNA B:3492 GAJ protein (GAJ), mRNA 84057 NM 032117 -_- A:00224 IQ motif containing G (IQCG). 84223 NM 032263 mRNA C:1051 hypothetical protein MGC10911 84262 NM_032302 (MGC1O911), mRNA B:1756 prokineticin I (PROKI), mRNA 84432 NM 032414 69 WO 2009/045115 PCT/NZ2008/000260 B:3029 MCM8 minichromosome 84515 NM_03Z4--5 maintenance deficient 8 (S. cerevisiae) (MCM8), transcript variant 1, mRNA C :06555 RfN Abinding motif protein 13 8 NM _ -32509 ___ vriat , mNA8455,2 NM_032509 (RBM13) mRNA C:1586 par-6 partitioning defective 6 84612 NM 032521 homolog beta (C. elegans) .... . .(PARD6B), mRNA I C:1872 resistin like beta (RETNLB), mRNA 84666 NM 032579 B:9569 protein phosphatase 1, regulatory 84687 NM_-359 subunit 98, spinophilin (PPPIR9B), mRNA B:3610 hepatoma-derived growth factor- - 84717 NM_032631 related protein 2 (HDGF2), transcript variant 2mRNA B:4127 lamin B2 (LMNB2), mRNA 84823 ------ NM 032737 B:2733 apoptosis-inducing factor (AIF)-like - 84883 NM_032797 mitochondrion-associated inducer of death (AMID, mRNA ...... :4273 RAS-like, estrogen-regulated, 85004 NM 032918 growth inhibitor (RERG), mRNA B:9560 cyclin B3 (CCNB3), transcript 85417 NM 033670 variant 1, mRNA C:0075 leucine rich repeat and coiled-coil 85444 NM-633402 domain containing I (LRRCC1), mRNA B:8110 tripartite motif-containing 4 (TRIM4). 89765 NM 033017 transcript variant alpha, mRNA B:6017 hypothetical gene CGO18, CGO18 90634 NM 052818 C:0238 NIMA (never in mitosis gene a)- 91754 NM_033116 related kinase 9 (NEK9), mRNA B:3862 Cdk5 and Abi enzyme substrate 1 91768 NM 138375 ...... _ (CABLESI), m RNA ----- -_------- _----- B:3802 chordin-like 1 (CRDL1), mRNA 91860 NM 145234 B:3730 family with sequence similarity 58, 92002 NM_152274 member A (FAM58A), mRNA B:762 secretoglobin, family 3A. member 1 92304 NM_052863 -SCGB3A), mRNA 1:4458 membrane-associated ring finger 92979 NM_138396 (CHC4) 9 MARCH9 --- B:9351 immunoglobulin superfamily, 93185 NM-052868 member 8 (IGSF8), mRNA 1687 acid phosphatase, testicular 93650 NM 033068 -ACPT), transcript variant A, mRNA B:3540 RAS guanyl releasing protein 4 115727 NM-170603 (RASGRP4), transcript variant 1, mRNA C:4836 topoisomerase (DNA) I, -116447 NM_052963 mitochondrial (TOPIMT), nuclear gene encoding mitochondrial __ protein, mRNA ----- -- - - - - B:9435 mediator of RNA polymerase 11 116931 NM_053002 transcription, subunit 12 homolog (y like (MED12L), mRNA C:3793 amyotrophic lateral sclerosis 2 117583 NM 152526 cuvenie) chromosome region, candidate 19 (ALS2CR19), transcript variant b, mRNA - -- _------ 70 WO 2009/045115 PCT/NZ2008/000260 0:3467 KIAA1977 protein (KIAA1977), 124404 NM 133450 mRNA C:3112 ubiquitin specific protease 43 124817 XM945578 0:52 . (USP43), mRNA C:5265 hypothetical protein BC009732 133396 NM_178833 (LC33308) mRNA A:07401 myosin light chain I slow a 140466 NM 002475 _____(MLC1SA), mRNA _ ___ ____ C:1334. CCCTC-binding factor (zinc finger 140690 NM_080618 protein)-like (CTCFL), mRNA B:5293 chromosome 20 open reading 140849 U63828 frame 181 C20or181 B:9316 hypothetical protein MGC20470 14366 NM_145053 S (MGC20470) mRNA B:9599 septin 10 (SEPT10), transcript 151011 NM 144710 variant 1, mRNA patocellularcarcinoma- 151195 NM 145280 associated antigen HCA557b (LOC151194), mRNA sim~~~arto- -eaclu cacnoa

--

151195 -00-005852 C:1752 connexin40 (CX40), mRNA 219771 NM 153368 3:3031 kinesin family member 6 (KIF6), 221527 NM 145027 mRNA B:1737 chromosome Y open reading frame 246176 NM_001005852 15A (CYorf15A), mRNA B:8632 DNA directed RNA polymerase 1| 246778 NM_032959 polypeptide J-related gene (POLR2J2), transcript variant 3, mRNA A:08544 zinc finger, DHHC-type containing 254394 NM 207340 24 (ZDHHC24), mRNA _ 0:3659 growth arrest-specific 2 like 3 283431 NM 174942 B( GAS2L3), mRNA ---- ---- B:5467 laminin, alpha 1 (LAMAI), mRNA 284217 NM 005559 hypothetical protein MGC26694 284439 NM 178526 (MGC26694), mRNA 0:5315 cation channel, sperm associated 3 347733 NM 178019 (CATSPER3), mRNA polymerase (DNA directed) nu 353497 NM 18808 (PO LN), m RNA ----- --------------- Table B: Known cell proliferation-related genes. All genes categorized as cell proliferation-related by gene ontology analysis and present on the Affymetrix HG U133 platform. General Approaches to Prognostic Marker Detection The following approaches are non-limiting methods that can be used to detect the proliferation markers, including GCPM family members: microarray approaches using 10 oligonucleotide probes selective for a GCPM; real-time qPCR on tumour samples using GCPM specific primers and probes; real-time qPCR on lymph node, blood, serum, faecal, or urine samples using GCPM specific primers and probes; enzyme-linked immunological 71 WO 2009/045115 PCT/NZ2008/000260 assays (ELISA); immunohistochemistry using anti-marker antibodies; and analysis of array or qPCR data using computers. Other useful methods include northern blotting and in situ hybridization (Parker and 5 Barnes, Methods in Molecular Biology 106: 247-283 (1999)); RNase protection assays (Hod, BioTechniques 13: 852-854 (1992)); reverse transcription polymerase chain reaction (RT-PCR; Weis et al., Trends in Genetics 8: 263-264 (1992)); serial analysis of gene expression (SAGE; Velculescu et al., Science 270: 484-487 (1995); and Velculescu et al, Cell 88: 243-51 (1997)), MassARRAY technology (Sequenom, San Diego, CA), and 10 gene expression analysis by massively parallel signature sequencing (MPSS; Brenner et al., Nature Biotechnology 18: 630-634 (2000)). Alternatively, antibodies may be employed that can recognize specific complexes, including DNA duplexes, RNA duplexes, and DNA RNA hybrid duplexes or DNA-protein duplexes. 15 Primary data can be collected and fold change analysis can be performed, for example, by comparison of marker expression levels in tumour tissue and non-tumour tissue; by comparison of marker expression levels to levels determined in recurring tumours and non-recurring tumours; by comparison of marker expression levels to levels determined in tumours with or without metastasis; by comparison of marker expression levels to levels 20 determined in differently staged tumours; or by comparison of marker expression levels to levels determined in cells with different levels of proliferation. A negative or positive prognosis is determined based on this analysis. Further analysis of tumour marker expression includes matching those markers exhibiting increased or decreased expression with expression profiles of known gastrointestinal tumours to provide a 25 prognosis. A threshold for concluding that expression is increased is provided as, for example, at least a 1.5-fold or 2-fold increase, and in alternative embodiments, at least a 3-fold increase, 4-fold increase, or 5-fold increase. A threshold for concluding that expression is 30 decreased is provided as, for example, at least a 1.5-fold or 2-fold decrease, and in alternative embodiments, at least a 3-fold decrease, 4-fold decrease, or 5-fold decrease. It can be appreciated that other thresholds for concluding that increased or decreased expression has occurred can be selected without departing from the scope of this invention. 35 It will also be appreciated that a threshold for concluding that expression is increased will be dependent on the particular marker and also the particular predictive model that is to 72 WO 2009/045115 PCT/NZ2008/000260 be applied. The threshold is generally set to achieve the highest sensitivity and selectivity with the lowest error rate, although variations may be desirable for a particular clinical situation. The desired threshold is determined by analysing a population of sufficient size taking into account the statistical variability of any predictive model and is calculated from 5 the size of the sample used to produce the predictive model. The same applies for the determination of a threshold for concluding that expression is decreased. It can be appreciated that other thresholds, or methods for establishing a threshold, for concluding that increased or decreased expression has occurred can be selected without departing from the scope of this invention. 10 It is also possible that a prediction model may produce as it's output a numerical value, for example a score, likelihood value or probability. In these instances, it is possible to apply thresholds to the results produced by prediction models, and in these cases similar principles apply as those used to set thresholds for expression values 15 Once the expression level of one or more proliferation markers in a tumour sample has been obtained the likelihood of the cancer recurring can then be determined. In accordance with the invention, a negative prognosis is associated with decreased expression of at least one proliferation marker, while a positive prognosis is associated 20 with increased expression of at least one proliferation marker. In various aspects, an increase in expression is shown by at least 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or 75 of the markers disclosed herein. In other aspects, a decrease in expression is shown by at least 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or 75 of the markers disclosed herein 25 From the genes identified, proliferation signatures comprising one or more GCPMs can be used to determine the prognosis of a cancer, by comparing the expression level of the one or more genes to the disclosed proliferation signature. By comparing the expression of one or more of the GCPMs in a tumour sample with the disclosed proliferation signature, 30 the likelihood of the cancer recurring can be determined. The comparison of expression levels of the prognostic signature to establish a prognosis can be done by applying a predictive model as described previously. Determining the likelihood of the cancer recurring is of great value to the medical 35 practitioner. A high likelihood of reoccurrence means that a longer or higher dose treatment should be given, and the patient should be more closely monitored for signs of recurrence of the cancer. An accurate prognosis is also of benefit to the patient. It allows 73 WO 2009/045115 PCT/NZ2008/000260 the patient, along with their partners, family, and friends to also make decisions about treatment, as well as decisions about their future and lifestyle changes. Therefore, the invention also provides for a method establishing a treatment regime for a particular cancer based on the prognosis established by matching the expression of the markers in 5 a tumour sample with the differential proliferation signature. It will be appreciated that the marker selection, or construction of a proliferation signature, does not have to be restricted to the GCPMs disclosed in Table A, Table B, Table C or Table D, herein, but could involve the use of one or more GCPMs from the disclosed 10 signature, or a new signature may be established using GCPMs selected from the disclosed marker lists. The requirement of any signature is that it predicts the likelihood of recurrence with enough accuracy to assist a medical practitioner to establish a treatment regime. 15 Surprisingly, it was discovered that many of the GCPM were associated with increased levels of cell proliferation, and were also associated with a positive prognosis. It has similarly been found that there is a close correlation between the decreased expression level of GCPMs and a negative prognosis, e.g., an increased likelihood of gastrointestinal cancer recurring. Therefore, the present invention also provides for the use of a marker 20 associated with cell proliferation, e.g., a celi cycle component, as a GCPM. As described herein, determination of the likelihood of a cancer recurring can be accomplished by measuring expression of one or more proliferation-specific markers. The methods provided herein also include assays of high sensitivity. In particular, qPCR is 25 extremely sensitive, and can be used to detect markers in very low copy number (e.g., I 100) in a sample. With such sensitivity, prognosis of gastrointestinal cancer is made reliable, accurate, and easily tested. Reverse Transcription PCR (RT-PCR) 30 Of the techniques listed above, the most sensitive and most flexible quantitative method is RT-PCR, which can be used to compare RNA levels in different sample populations, in normal and tumour tissues, with or without drug treatment, to characterize patterns of expression, to discriminate between closely related RNAs, and to analyze RNA structure. 35 For RT-PCR, the first step is the isolation of RNA from a target sample. The starting material is typically total RNA isolated from human tumours or tumour cell lines, and corresponding normal tissues or cell lines, respectively. RNA can .be isolated from a 74 WO 2009/045115 PCT/NZ2008/000260 variety of samples, such as tumour samples from breast, lung, colon (e.g., large bowel or small bowel), colorectal, gastric, esophageal, anal, rectal, prostate, brain, liver, kidney, pancreas, spleen, thymus, testis, ovary, uterus, etc., tissues, from primary tumours, or tumour cell lines, and from pooled samples from healthy donors. If the source of RNA is a 5 tumour, RNA can be extracted, for example, from frozen or archived paraffin-embedded and fixed (e.g., formalin-fixed) tissue samples. The first step in gene expression profiling by RT-PCR is the reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction. The 10 two most commonly used reverse transcriptases are avilo myeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murine leukaemia virus reverse transcriptase (MMLV-RT). The reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the circumstances and the goal of expression profiling. For example, extracted RNA can be reverse-transcribed using a 15 GeneAmp RNA PCR kit (Perkin Elmer, CA, USA), following the manufacturer's instructions. The derived cDNA can then be used as a template in the subsequent PCR reaction. Although the PCR step can use a variety of thermostable DNA-dependent DNA 20 polymerases, it typically employs the Taq DNA polymerase, which has a 5'-3' nuclease activity but lacks a 3-5' proofreading endonuclease activity. Thus, TaqMan (g) PCR typically utilizes the 5' nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5' nuclease activity can be used. 25 Two oligonucleotide primers are used to generate an amplicon typical of a PCR reaction. A third oligonucleotide, or probe, is designed to detect nucleotide sequence located between the two PCR primers. The probe is non-extendible by Taq DNA polymerase enzyme, and is labeled with a reporter fluorescent dye and a quencher fluorescent dye. 30 Any laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together as they are on the probe. During the amplification reaction, the Taq DNA polymerase enzyme cleaves the probe in a template dependent manner. The resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the second fluorophore. 35 One molecule of reporter dye is liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data. 75 WO 2009/045115 PCT/NZ2008/000260 TaqMan RT-PCR can be performed using commercially available equipment, such as, for example, ABI PRISM 7700tam Sequence Detection System (Perkin-Elmer-Applied Biosystems, Foster City, CA, USA), or Lightcycler (Roche Molecular Biochemicals, 5 Mannheim, Germany). In a preferred embodiment, the 5' nuclease procedure is run on a real-time quantitative PCR device such as the ABI PRISM 7700tam Sequence Detection System. The system consists of a thermocycler, laser, charge-coupled device (CCD), camera, and computer. The system amplifies samples in a 96-well format on a thermocycler. During amplification, laser-induced fluorescent signal is collected in real 10 time through fibre optics cables for all 96 wells, and detected at the CCD. The system includes software for running the instrument and for analyzing the.data. 5' nuclease assay data are initially expressed as Ct, or the threshold cycle. As discussed above, fluorescence values are recorded during every cycle and represent the amount of 15 product amplified to that point in the amplification reaction. The point when the fluorescent signal is first recorded as statistically significant is the threshold cycle. To minimize errors and the effect of sample-to-sample variation, RT-PCR is usually performed using an internal standard. The ideal internal standard is expressed at a 20 constant level among different tissues, and is unaffected by the experimental treatment. RNAs most frequently used to normalize patterns of gene expression are mRNAs for the housekeeping genes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and-actin. Real-time quantitative PCR (qPCR) 25 A more recent variation of the RT-PCR technique is the real time quantitative PCR, which measures PCR product accumulation through a dual-labeled fluorigenic probe (i.e., TaqMan@ probe). Real time PCR is compatible both with quantitative competitive PCR and with quantitative comparative PCR. The former uses an internal competitor for each target sequence for normalization, while the latter uses a normalization gene contained 30 within the sarnple, or a housekeeping gene for RT-PCR. For further details see, e.g., Held et al, Genome Research 6: 986-994 (1996). Expression levels can be determined using fixed, paraffin-embedded tissues as the RNA source. According to one aspect of the present invention, PCR primers and probes are 35 designed based upon intron sequences present in the gene to be amplified. In this embodiment, the first step in the primer/probe design is the delineation of intron sequences within the genes. This can be done by publicly available software, such as the 76 WO 2009/045115 PCT/NZ2008/000260 DNA BLAT software developed by Kent, W. J., Genome Res. 12 (4): 656-64 (2002), or by the BLAST software including its variations. Subsequent steps follow well established methods of PCR primer and probe design. 5 In order to avoid non-specific signals, it is useful to mask repetitive sequences within the introns when designing the primers and probes. This can be easily accomplished by using the Repeat Masker program available on-line through the Baylor College of Medicine, which screens DNA sequences against a library of repetitive elements and returns a query sequence in which the repetitive elements are masked. The masked sequences can then 10 be used to design primer and probe sequences using any commercially or otherwise publicly available primer/probe design packages, such as Primer Express (Applied Biosystems); MGB assay-by-design (Applied Biosystems); Primer3 (Steve Rozen and Helen J. Skaletsky (2000) Primer3 on the WWW for general users and for biologist programmers in: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: 15 Methods in Molecular Biology. Humana Press, Totowa, NJ, pp 365-386). The most important factors considered in PCR primer design include primer length, melting temperature (Tm,,), and G/C content, specificity, complementary primer sequences, and 3' end sequence. In general, optimal PCR primers are generally 17-30 bases in 20 length, and contain about 20-80%, such as, for example, about 50-60% G+C bases. Tms between 50 and 80*C, e.g., about 50 to 70'C are typically preferred. For further guidelines for PCR primer and probe design see, e.g., Dieffenbach, C. W. et al., General Concepts for PCR Primer Design in: PCR Primer, A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 1995, pp. 133-155; Innis and Gelfand, Optimization of PCRs 25 in: PCR Protocols, A Guide to Methods and Applications, CRC Press, London, 1994, pp. 5-11; and Plasterer, T. N. Primerselect: Primer and probe design. Methods Mol. Biol. 70: 520-527 (1997), the entire disclosures of which are hereby expressly incorporated by reference. 30 Microarray analysis Differential gene expression can also be identified, or confirmed using the microarray technique. Thus, the expression profile of GCPMs can be measured in either fresh or paraffin-embedded tumour tissue, using microarray technology. In this method, polynucleotide sequences of interest (including cDNAs and oligonucleotides) are plated, 35 or arrayed, on a microchip substrate. The arrayed sequences (i.e., capture probes) are then hybridized with specific polynucleotides from cells or tissues of interest (i.e., targets). Just as in the RT-PCR method, the source of RNA typically is total RNA isolated from 77 WO 2009/045115 PCT/NZ2008/000260 human tumours or tumour cell lines, and corresponding normal tissues or cell lines. Thus RNA can be isolated from a variety of primary tumours or tumour cell lines. If the source of RNA is a primary tumour, RNA can be extracted, for example, from frozen or archived paraffin-embedded and fixed (e.g., formalin-fixed) tissue samples, which are routinely 5 prepared and preserved in everyday clinical practice. In a specific embodiment of the microarray technique, PCR amplified inserts of cDNA clones are applied to a substrate. The substrate can include up to 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or 75 nucleotide sequences. In other aspects, the substrate can 10 include at least 10,000 nucleotide sequences. The microarrayed sequences, immobilized on the microchip, are suitable for hybridization under stringent conditions. As other embodiments, the targets for the microarrays can be at least 50, 100, 200, 400, 500, 1000, or 2000 bases in length; or 50-100, 100-200, 100-500, 100-1000, 100-2000, or 500 5000 bases in length. As further embodiments, the capture probes for the microarrays can 15 be at least 10, 15, 20, 25, 50, 75, 80, or 100 bases in length; or 10-15, 10-20, 10-25, 10 50, 10-75, 10-80, or 20-80 bases in length. Fluorescently labeled cDNA probes may be generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from tissues of interest. 20 Labeled cDNA probes applied to the chip hybridize with specificity to each spot of DNA on the array. After stringent washing to remove non-specifically bound probes, the chip is scanned by confocal laser microscopy or by another detection method, such as a CCD camera. Quantitation of hybridization of each arrayed element allows for assessment of corresponding mRNA abundance. With dual colour fluorescence, separately labeled 25 cDNA probes generated from two sources of RNA are hybridized pairwise to the array. The relative abundance of the transcripts from the two sources corresponding to each specified gene is thus determined simultaneously. The miniaturized scale of the hybridization affords a convenient and rapid evaluation of 30 the expression pattern for large numbers of genes. Such methods have been shown to have the sensitivity required to detect rare transcripts, which are expressed at a few copies per cell, and to reproducibly detect at least approximately two-fold differences in the expression levels (Schena et al, Proc. Nati. Acad. Sci. USA 93 (2): 106-149 (1996)). Microarray analysis can be performed by commercially available equipment, following 35 manufacturer's protocols, such as by using the Affymetrix GenChip technology, or Incyte's microarray technology. The development of microarray methods for large-scale analysis 78 WO 2009/045115 PCT/NZ20081000260 of gene expression makes it possible to search systematically for molecular markers of cancer classification and outcome prediction in a variety of tumour types. RNA isolation, purification, and amplification 5 General methods for mRNA extraction are well known in the art and are disclosed in standard textbooks of molecular biology, including Ausubel et al., Current Protocols of Molecular Biology, John Wiley and Sons (1997). Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker, Lab Invest. 56: A67 (1987), and De Sandres et al, BioTechniques 18: 42044 (1995). In particular, RNA 10 isolation can be performed using purification kit, buffer set, and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions. For example, total RNA from cells in culture can be isolated using Qiagen RNeasy mini columns. Other commercially available RNA isolation kits include MasterPure Complete DNA and RNA Purification Kit (EPICENTRE (D, Madison, WI), and Paraffin Block RNA 15 Isolation Kit (Ambion, Inc.). Total RNA from tissue samples can be isolated using RNA Stat-60 (Tel-Test). RNA prepared from tumour can be isolated, for example, by cesium chloride density gradient centrifugation. The steps of a representative protocol for profiling gene expression using fixed, paraffin 20 embedded tissues as the RNA source, including mRNA isolation, purification, primer extension and amplification are given in various published journal articles (for example: T. E. Godfrey et al. J. Molec. Diagnostics 2: 84-91 (2000); K. Specht et al., Am. J. Pathol. 158: 419-29 (2001)). Briefly, a representative process starts with cutting about 10 pm thick sections of paraffin-embedded tumour tissue samples. The RNA is then extracted, and 25 protein and DNA are removed. After analysis of the RNA concentration, RNA repair and/or amplification steps may be included, if necessary, and RNA is reverse transcribed using gene specific promoters followed by RT-PCR. Finally, the data are analyzed to identify the best treatment option(s) available to the patient on the basis of the characteristic gene expression pattern identified in the tumour sample examined. 30 Immunohistochemistry and proteomics Immunohistochemistry methods are also suitable for detecting the expression levels of the proliferation markers of the present invention. Thus, antibodies or antisera, preferably polyclonal antisera, and most preferably monoclonal antibodies specific for each marker, 35 are used to detect expression. The antibodies can be detected by direct labeling of the antibodies themselves, for example, with radioactive labels, fluorescent labels, hapten labels such as, biotin, or an enzyme such as horse radish peroxidase or alkaline 79 WO 2009/045115 PCT/NZ2008/000260 phosphatase. Alternatively, unlabeled primary antibody is used in conjunction with a labeled secondary antibody, comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody. Immunohistochemistry protocols and kits are well known in the art and are commercially available. Proteomics can be used to analyze the polypeptides present in a sample (e.g., tissue, organism, or cell culture) at a certain point of time. In particular, proteomic techniques can be used to asses the global changes of protein expression in a sample (also referred to as expression proteomics). Proteomic analysis typically includes: (1) separation of individual 10 proteins in a sample by 2-D gel electrophoresis (2-D PAGE); (2) identification of the individual proteins recovered from the gel, e.g., my mass spectrometry or N-terminal sequencing, and (3) analysis of the data using bioinformatics. Proteomics methods are valuable supplements to other methods of gene expression profiling, and can be used, alone or in combination with other methods, to detect the products of the proliferation 15 markers of the present invention. Selection of Differentially Expressed Genes. An early approach to the selection of genes deemed significant involved simply looking at the "fold change" of a given gene between the two groups of interest. While this approach 20 hones in on genes that seem to change the most spectacularly, consideration of basic statistics leads one to realize that if the variance (or noise level) is quite high (as is often seen in microarray experiments), then seemingly large fold-change can happen frequently by chance alone. 25 Microarray experiments, such as those described here, typically involve the simultaneous measurement of thousands of genes. If one is comparing the expression levels for a particular gene between two groups (for example recurrent and non-recurrent tumours), the typical tests for significance (such as the t-test) are not adequate. This is because, in an ensemble of thousands- of experiments (in this context each gene constitutes an 30 "experiment")' the probability of at least one experiment passing the usual criteria for significance by chance alone is essentially unity. In a test for significance, one typically calculates the probability that the "null hypothesis" is correct. In the case of comparing two groups, the null hypothesis is that there is no difference between the two groups. If a statistical test produces a probability for the null hypothesis below some threshold (usually 35 0.05 or 0.01), it is stated that we can reject the null hypothesis, and accept the hypothesis that the two groups are significantly different. Clearly, in such a test, a rejection of the null hypothesis by chance alone could be expected 1 in 20 times (or I in 100). The use of t 80 WO 2009/045115 PCT/NZ2008/000260 tests, or other similar statistical tests for significance, fail in the context of microarrays, producing far too many false positives (or type I errors) In this type of situation, where one is testing multiple hypotheses at the same time, one 5 applies typical multiple comparison procedures, such as the Bonferroni Method (43). However such tests are too conservative for most microarray experiments, resulting in too many false negative (type 11) errors. A more recent approach is to do away with attempting to apply a probability for a given 10 test being significant, and establish a means for selecting a subset of experiments, such that the expected proportion of Type I errors (or false discovery rate; 47) is controlled for. It is this approach that has been used in this investigation, through various implementations, namely the methods provided with BRB Array Tools (48), and the limma (11.42) package of Bioconductor (that uses the R statistical environment; 10,39). 15 General methodology for Data Mining: Generation of Prognostic Signatures Data Mining is the term used to describe the extraction of "knowledge", in other words the 20 "know-how", or predictive ability from (usually) large volumes of data (the dataset). This is the approach used in this study to generate prognostic signatures. In the case of this study the "know-how" is the ability to accurately predict prognosis from a given set of gene expression measurements, or "signature" (as described generally in this section and in more detail in the examples section). 25 The specific details used for the methods used in this study are described in Examples 17-20. However, application of any of the data mining methods (both those described in the Examples, and those described here) can follow this general protocol. 30 Data mining (49), and the related topic machine learning (40) is a complex, repetitive mathematical task that involves the use of one or more appropriate computer software packages (see below). The use of software is advantageous on the one hand, in that one does not need to be completely familiar with the intricacies of the theory behind each technique in order to successfully use data mining techniques, provided that one adheres 35 to the correct methodology. The disadvantage is that the application of data mining can often be viewed as a "black box": one inserts the data and receives the answer. How this is achieved is often masked from the end-user (this is the case for many of the techniques 81 WO 2009/045115 PCT/NZ2008/000260 described, and can often influence the statistical method chosen for data mining. For example, neural networks and support vector machines have a particularly complex implementation that makes it very difficult for the end user to extract out the "rules" used to produce the decision. On the other hand, k-nearest neighbours and linear discriminant 5 analysis have a very transparent process for decision making that is not hidden from the user. There are two types of approach used in data mining: supervised and unsupervised approaches. In the supervised approach, the information that is being linked to the data is 10 known, such as categorical data (e.g. recurrent vs. non recurrent tumours). What is required is the ability to link the observed response (e.g. recurrence vs. non-recurrence) to the input variables. In the unsupervised approach, the classes within the dataset are not known in advance, and data mining methodology is employed to attempt to find the classes or structure within the dataset. 15 In the present example the supervised approach was used and is discussed in detail here, although it will be appreciated that any of the other techniques could be used. The overall protocol involves the following steps: 20 " Data representation. This involves transformation of the data into a form that is most likely to work successfully with the chosen data mining technique. In where the data is numerical, such as in this study where the data being investigated represents relative levels of gene expression, this is fairly simple. If the data 25 covers a large dynamic range (i.e. many orders of magnitude) often the log of the data is taken. If the data covers many measurements of separate samples on separate days by separate investigators, particular care has to be taken to ensure systematic error is minimised. The minimisation of systematic error (i.e. errors resulting from protocol differences, machine differences, operator differences and 30 other quantifiable factors) is the process referred to here as normalisationn". " Feature Selection. Typically the dataset contains many more data elements than would be practical to measure on a day-to-day basis, and additionally many elements that do not provide the information needed to produce a prediction 35 model. The actual ability of a prediction model to describe a dataset is derived from some subset of the full dimensionality of the dataset. These dimensions the most important components (or features) of the dataset. Note in the context of 82 WO 2009/045115 PCT/NZ2008/000260 microarray data, the dimensions of the dataset are the individual genes. Feature selection, in the context described here, involves finding those genes which are most "differentially expressed". In a more general sense, it involves those groups which pass some statistical test for significance, i.e. is the level of a particular .5 variable consistently higher or lower in one or other of the groups being investigated. Sometimes the features are those variables (or dimensions) which exhibit the greatest variance. The application of feature selection is completely independent of the method used 10 to create a prediction model, and involves a great deal of experimentation to achieve the desired results. Within this invention, the selection of significant genes, and -those which correlated with the earlier successful model (the NZ classifier), entailed feature selection. In addition, methods of data reduction (such as principal component analysis) can be applied to the dataset. 15 " Training. Once the classes (e.g. recurrence/non-recurrence) and the features of the dataset have been established, and the data is represented in a form that is acceptable as input for data mining, the reduced dataset (as described by the features) is applied to the prediction model of choice. The input for this model is 20 usually in the form a multi-dimensional numerical input,(known as a vector), with associated output information (a class label or a response). In the training process, selected data is input into the prediction model, either sequentially (in techniques such as neural networks) or as a whole (in techniques that apply some form of regression, such as linear models, linear discriminant analysis, support 25 vector machines). In some instances (e.g. k-nearest neighbours) the dataset (or subset of the dataset obtained after feature selection) is itself the model. As discussed, effective models can be established with minimal understanding of the detailed mathematics, through the use of various software packages where the parameters of the model have been pre-determined by expert analysts as most 30 likely to lead to successful results. " Validation. This is a key component of the data-mining protocol, and the incorrect application of this frequently leads to errors. Portions of the dataset are to be set aside, apart from feature selection and training, to test the success of the 35 prediction model. Furthermore, if the results of validation are used to effect feature selection and training of the model, then one obtains a further validation set to test the model before it is applied to real-life situations. If this process is not strictly 83 WO 2009/045115 PCT/NZ2008/000260 adhered to the model is likely to fail in real-world situations. The methods of validation are described in more detail below. * Application. Once the model has been constructed, and validated, it must be 5 packaged in some way as it is accessible to end users. This often involves implementation of some form a spreadsheet application, into which the model has been imbedded, scripting of a statistical software package, or refactoring of the model into a hard-coded application by information technology staff. 10 Examples of software packages that are frequently used are: - Spreadsheet plugins, obtained from multiple vendors. - The R statistical environment. - The commercial packages MatLab, S-plus, SAS, SPSS, STATA. - Free open-source software such as Octave (a MatLab clone) 15 - many and varied C++ libraries, which can be used to implement prediction models in a commercial, closed-source setting. Examples of Data Mining Methods. The methods can be by first performing the step of data mining process (above), and then 20 applying the appropriate known software packages. Further description of the process of data mining is described in detail in many extremely well-written texts.(49) " Linear models (49, 50): The data is treated as the input of a linear regression model, of which the class labels or responses variables are the output. Class 25 labels, or other categorical data, must be transformed into numerical values (usually integer). In generalised linear models, the class labels or response variables are not themselves linearly related to the input data, but are transformed through the use of a "link function". Logistic regression is the most common form of generalized linear model. 30 " Linear Discriminant analysis (49, 51, 52). Provided the data is linearly separable (i.e. the groups or classes of data can be separated by a hyperplane, which is an n-dimensional extension of a threshold), this technique can be applied. A combination of variables is used to separate the classes, such that the between 35 group variance is maximised, and the within-group variance is minimised. The byproduct of this is the formation of a classification rule. Application of this rule to samples of unknown class allows predictions or classification of class membership 84 WO 2009/045115 PCT/NZ2008/000260 to be made for that sample. There are variations of linear discriminant analysis such as nearest shrunken centroids which are commonly used for microarray analysis. 5 . Support vector machines (53): A collection of variables is used in conjunction with a collection of weights to determine a model that maximizes the separation between classes in terms of those weighted variables. Application of this model to a sample then produces a classification or prediction of class membership for that sample. 10 " Neural networks (52): The data is treated as input into a network of nodes, which superficially resemble biological neurons, which apply the input from all the nodes to which they are connected, and transform the input into an output. Commonly, neural networks use the "multiply and sum" algorithm, to transform the inputs from 15 multiple connected input nodes into a single output. A node may not necessarily produce an output unless the inputs to that node exceed a certain threshold. Each node has as its input the output from several other nodes, with the final output node usually being linked to a categorical variable. The number of nodes, and the topology of the nodes can be varied in almost infinite ways, providing for the ability 20 to classify extremely noisy data that may not be possible to categorize in other ways. The most common implementation of neural networks is the multi-layer perceptron. " Classification and regression trees (54): In these. variables are used to define a 25 hierarchy of rules that can be followed in a stepwise manner to determine the class of a sample. The typical process creates a set of rules which lead to a specific class output, or a specific statement of the inability to discriminate. A example classification tree is an implementation of an algorithm such as: if gene A> x and gene Y > x and gene Z = z 30 then class A else if geneA = q then class B 35 * Nearest neighbour methods (51, 52). Predictions or classifications are made by comparing a sample (of unknown class) to those around it (or known class), with 85 WO 2009/045115 PCT/NZ2008/000260 closeness defined by a distance function. It is possible to define many different distance functions. Commonly used distance functions are the Euclidean distance (an extension of the Pythagorean distance, as in triangulation, to n-dimensions), various forms of correlation (including Pearson Correlation co-efficient). There are 5 also transformation functions that convert data points that would not normally be interconnected by a meaningful distance metric into euclidean space, so that Euclidean distance can then be applied (e.g. Mahalanobis distance). Although the distance metric can be quite complex, the basic premise of k-nearest neighbours is quite simple, essentially being a restatement of "find the k-data vectors that are 10 most similar to the unknown input, find out which class they correspond to, and vote as to which class the unknown input is". * Other methods: - Bayesian networks. A directed acyclic graph is used to represent a collection of 15 variables in conjunction with their joint probability distribution, which is then used to determine the probability of class membership for a sample. - Independent components analysis, in which independent signals (e.g., class membership) re isolated (into components) from a collection of variables. These components can then be used to produce a classification or prediction of class 20 membership for a sample. Ensemble learning methods in which a collection of prediction methods are combined to produce a joint classification or prediction of class membership for a sample 25 There are many variations of these methodologies that can be explored (49), and many new methodologies are constantly being defined and developed. It will be appreciated that any one of these methodologies can be applied in order to obtain an acceptable result. Particular care must be taken to avoid overfitting, by ensuring that all results are tested via a comprehensive validation scheme. 30 Validation Application of any of the prediction methods described involves both training and cross-validation (43, 55) before the method can be applied to new datasets (such as data from a clinical trial). Training involves taking a subset of the dataset of interest (in this 35 case gene expression measurements from colorectal tumours), such that it is stratified across the classes that are being tested for (in this case recurrent and non-recurrent 86 WO 2009/045115 PCT/NZ2008/000260 tumours). This training set is used to generate a prediction model (defined above), which is tested on the remainder of the data (the testing set). It is possible to alter the parameters of the prediction model so as to obtain better 5 performance in the testing set, however, this can lead to the situation known as overfitting, where the prediction model works on the training dataset but not on any external dataset. In order to circumvent this, the process of validation is followed. There are two major types of validation typically applied, the first (hold-out validation) involves partitioning the dataset into three groups: testing, training, and validation. The validation set has no input 10 into the training process whatsoever, so that any adjustment of parameters or other refinements must take place during application to the testing set (but not the validation set). The second major type is cross-validation, which can be applied in several different ways, described below. 15 There are two main sub-types of cross-validation: Kfold cross-validation, and leave-one out cross-validation K-fold cross-validation: .The dataset is divided into K subsamples, each subsample containing approximately the same proportions of the class groups as the original. 20 In each round of validation, one of the K subsamples is set aside, and training is accomplished using the remainder of the dataset. The effectiveness of the training for that round is guaged by how correctly the classification of the left-out group is. This procedure is repeated K- times, and the overall effectiveness ascertained by comparison of the predicted class with the known class. 25 Leave-one-out cross-validation: A commonly used variation of K-fold cross validation, in which K=n, where n is the number of samples. Combinations of CCPMS, such as those described above in Tables I and 2, can be used 30 to construct predictive models for prognosis. Prognostic Signatures Prognostic signatures, comprising one or more of these markers, can be used to determine the outcome of a patient, through application of one or more predictive models 35 derived from the signature. In particular, a clinician or researcher can determine the differential expression (e.g., increased or decreased expression) of the one or more markers in the signature, apply a predictive model, and thereby predict the negative 87 WO 2009/045115 PCT/NZ2008/000260 prognosis, e.g., likelihood of disease relapse, of a patient, or alternatively the likelihood of a positive prognosis (continued remission). In still further aspects, the invention includes a method of determining a treatment regime 5 for a cancer comprising: (a) providing a sample of the cancer; (b) detecting the expression level of a GgCPM family member in said sample; (c) determining the prognosis of the cancer based on the expression level of a CCPM family member; and (d) determining the treatment regime according to the prognosis. 10 In still further aspects, the invention includes a device for detecting a GCPM, comprising: a substrate having a GCPM capture reagent thereon; and a detector associated with said substrate, said detector capable of detecting a GCPM associated with said capture reagent. Additional aspects include kits for detecting cancer, comprising: a substrate; a GCPM capture reagent; and instructions for use. Yet further aspects of the invention 15 include method for detecting aGCPM using qPCR, comprising: a forward primer specific for said CCPM; a reverse primer specific for said GCPM; PCR reagents; a reaction vial; and instructions for use. Additional aspects of this invention comprise a kit for detecting the presence of a GCPM 20 polypeptide or peptide, comprising: a substrate having a capture agent for said GCPM polypeptide or peptide; an antibody specific for said GCPM polypeptide or peptide; a reagent capable of labeling bound antibody for said GCPM polypeptide or peptide; and instructions for use. 25 In yet further aspects, this invention includes a method for determining the prognosis of colorectal cancer, comprising the steps of: providing a tumour sample from a patient suspected of having colorectal cancer; measuring the presence of a GCPM polypeptide using an ELISA method. In specific aspects of this invention the GCPM of the invention is selected from the markers set forth in Table A, Table B, Table C or Table D. In still further 30 aspects, the GCPM is included in a prognostic signature While exemplified herein for gastrointestinal cancer, e.g., gastric and colorectal cancer, the GCPMs of the invention also find use for the prognosis of other cancers, e.g., breast cancers, prostate cancers, ovarian cancers, lung cancers (such as adenocarcinoma and, 35 particularly, small cell lung cancer), lymphomas, gliomas, blastomas (e.g., medulloblastomas), and mesothelioma, where decreased or low expression is associated 88 WO 2009/045115 PCT/NZ2008/000260 with a positive prognosis, while increased or high expression is associated with a negative prognosis. EXAMPLES 5 The examples described herein are for purposes of illustrating embodiments of the invention. Other embodiments, methods, and types of analyses are within the scope of persons of ordinary skill in the molecular diagnostic arts and need not be described in detail hereon. Other embodiments within the scope of the art are considered to be part of this invention. 10 EXAMPLE 1: Cell cultures The experimental scheme is shown in FIG. 1. Ten colorectal cell lines were cultured and harvested at semi- and full-confluence. Gene expression profiles of the two growth stages were analyzed on 30,000 oligonucleotide arrays and a gene proliferation signature (GPS; 15 Table C) was identified by gene ontology analysis of differentially expressed genes. Unsupervised clustering was then used to independently dichotomize two cohorts of clinical colorectal samples (Cohort A: 73 stage I-IV on oligo arrays, Cohort B: 55 stage I on Affymetrix chips) based on the similarities of the GPS expression. Ki-67 immunostaining was also performed on tissue sections from Cohort A tumours. Following 20 this, the correlation between proliferation activity and clinico-pathologic parameters was investigated. Ten colorectal cancer cell lines derived from different disease stages were included in this study: DLD-1, HCT-8, HCT-116, HT-29, LoVo, Ls174T, SK-CO-1, SW48, SW480, and 25 SW620 (ATCC, Manassas, VA). Cells were cultivated in a 5% CO 2 humidified atmosphere at 37"C in alpha minimum essential medium supplemented with 10% fetal bovine serum, 100 IU/ml penicillin and 100 pg/mI streptomycin (GIBCO-invitrogen, CA). Two cell cultures were established for each cell line. The first culture was harvested upon reaching semi confluence (50-60%). When cells in the second culture reached full-confluence 30 (determined both microscopically and macroscopically), media was replaced, and cells were harvested twenty-four hours later to prepare RNA from the growth-inhibited cells. Array experiments were carried out on RNA extracted from each cell culture. In addition, a second culturing experiment was done following the same procedure and extracted RNA was used for dye-reversed hybridizations. 35 89 WO 2009/045115 PCT/NZ2008/000260 EXAMPLE 2: Patients Two cohorts of patients were analysed. Cohort A included 73 New Zealand colorectal cancer patients who underwent surgery at Dunedin and Auckland hospitals between 1995 and 2000. These patients were part of a prospective cohort study and included all disease 5 stages. Tumour samples were collected fresh from the operation theatre, snap frozen in liquid nitrogen and stored at -80"C. Specimens were reviewed by a single pathologist (H-S Y) and tumours were staged according to the TNM system (34). Of the 73 patients, 32 developed disease recurrence and 41 remained recurrence-free after a minimum of five years follow up. The median overall survival was 29.5 and 66 months for recurrent and 10 recurrent-free patients, respectively. Twenty patients received 5-FU-based post-operative adjuvant chemotherapy and 12 patients received radiotherapy (7 pre- and 5 post operative). Cohort B included a group of 55 German colorectal patients who underwent surgery at the 15 Technical University of Munich between 1995 and 2001 and had fresh frozen samples stored in a tissue bank. All 55 had stage il disease, 26 developed disease recurrence (median survival 47 months) and 29 remained recurrence-free (median survival 82 months). None of patients received chemotherapy or radiotherapy. Clinico-pathologic variables of both cohorts are summarised as part of Table 2. 20 90 WO 2009/045115 PCT/NZ2008/000260 Table 2: Clinico-pathologic parameters and their association with the GPS expression and Ki-67 P Number of patients GPS Ki-67 Pl* cohortA cohortB Parameters cohort A cohort B (p-value) (p-value)5 M i SD p-value Age <Mean 34 31 1 0.79 74-4±17-9 0 Mean 39 2 77.9A17.3 Sex Male 35 33 0.16 1 77.3A15.3 1 Female 38 22 75.3i19.5 SWt Right side 3 0 ii2 1 0.2 80,4-i13.3 0-2 eft side 43 43 73.1±19.1 Grade 'Well 9 0 0.22 0.2 75.6±18.1 Moderate 50 33 73.9A18.9 098 Poor 14 22 84.3&9.3 Dukes stage A 10 0 0.006 A 78.8:17.3 0.73 B 27 55 75,718.4 C 28 0 76±16.1 D 8 0 719±22 T stage T1 5 0 0.16 0.62 71.3i22,4 0.16 T2 11 11 854&7.4 T3 50 41 76A17 T4 7 3 66.2:26.3 N stage N 3 55 0.03 76.5:17.9 1 N14N2 35 0 76±174 Vascular invasion Yes 5 1 0.67 NA 544&31.5 0.32 No 68 54 78 15 Lymphatic invasion Yes 32 0.06 0-35 6.5 183 6 No 41 50 75.1ai73 Lymphocyte infiltration Mild 35 15 0.89 1 75±18,6 0.85 Moderate 27 25 79.4A16.5 Prominent 11 15 73.5 18.3 Margin Infiltrative 45 N 0.47 NA 75.8±18.9 1 Expansive 28 77,1 15.7 Recurrence Yes 32 26 0.03 <0.001 71.6:19 0.79 No 41 29 76.8±16.2 Total 73 55 76.3a17.5 § A Fisher's Exact Test or Krmskal-Wallis Test were used for testing association between clinico-pathologic parameters and GPS expression or Ki-67 PI, as appropriate. * Ki-67 immunostaining was performed on tumor sections from cohort A patients. EProximal and distal to splenic flexure, respectively ? Average age 68 and 63 years for cohort A and B patients, respectively NA: not applicable 5 EXAMPLE 3: Array preparation and gene expression analysis Cohort A tumours and cell lines: Tissue samples and cell lines were homogenised and RNA was extracted using Tri-Reagent (Progenz, Auckland, NZ). The RNA was then purified using RNeasy mini column (Qiagen, Victoria, Australia) according to the manufacture's protocol. Ten micrograms of total RNA extracted from each culture or 10 tumour sample was oligo-dT primed and cDNA synthesis was carried out in the presence of aa-dUTP and Superscript I RNase H-Reverse Transcriptase (Invitrogen). Cy dyes were incorporated into cDNA using the indirect amino-allyl cDNA labelling method. cDNA derived from a pool of 12 different cell lines was used as the reference for all hybridizations. The Cy5-dUTP-tagged cDNA from an individual colorectal cell line or 15 tissue sample was combined with Cy3-dUTP-tagged cDNA from reference sample. The 91 WO 2009/045115 PCT/NZ2008/000260 mixture was then purified using a QiaQuick PCR purification Kit (Qiagen, Victoria, Australia) and co-hybridized to a microarray spotted with the MWG 30K Oligo Set (MWG Biotech, NC). cDNA samples from the second culturing experiment were additionally analysed on microarrays using reverse labelling. 5 Arrays were scanned with a GenePix 4000B Microarray Scanner and data were analysed using GenePix Pro 4.1 Microarray Acquisition and Analysis Software (Axon, CA). The foreground intensities from each channel were log 2 transformed and normalised using the SNOMAD software (35) Normalised values were collated and filtered using BRB-Array 10 Tools Version 3.2 (developed by Dr. Richard Simon and Amy Peng Lam, Biometric Research Branch, National Cancer Institute). Low intensity genes, and genes for which over 20% of measurements across tissue samples or cell lines were missing, were excluded from further analysis. 15 Cohort B tumours: Total RNA was extracted from each tumour using RNeasy Mini Kit and purified on RNeasy Columns (Qiagen, Hilden, Germany). Ten micrograms of total RNA was used to synthesize double-stranded cDNA with SuperScript II reverse transcriptase (GIBCO-invitrogen, NY) and an oligo-dT-T7 primer (Eurogentec, Koeln, Germany). Biotinylated cRNA was synthesized from the double-stranded cDNA using the Promega 20 RiboMax T7-kit (Promega, Madison, WI) and Biotin-NTP labelling mix (Loxo, Dossenheim, Germany). Then, the biotinylated cRNA was purified and fragmented. The fragmented cRNA was hybridized to Affymetrix HGU133A GeneChips (Affymetrix, Santa Clara, CA) - and stained with streptavidin-phycoerythrin. The arrays were then scanned with a HP argon-ion laser confocal microscope and the digitized image data were processed using 25 the Affymetrix@ Microarray Suite 5.0 Software. All Affymetrix U133A GeneChips passed quality control to eliminate scans with abnormal characteristics. Background correction and normalization were performed in the R computing environment using the robust multi array average function implemented in the Bioconductor package affy. 30 EXAMPLE 4: Quantitative real-time PCR (QPCR) The expression of eleven genes (MAD2L1, POLE2, CDC2, MCM6, MCM7, RANSEH2A, TOPK, KPNA2, G22P1, PCNA, and GMNN) was validated using the cDNA from the cell cultures. Total RNA (2 pg) was reverse transcribed using Superscript Il RNase H-Reverse Transcriptase kit (Invitrogen) and oligo dT primer (Invitrogen). QPCR was performed on 35 an ABI Prism 7900HT Sequence Detection System (Applied Biosystems) using Taqman Gene Expression Assays (Applied Biosystems). Relative fold changes were calculated 92 WO 2009/045115 PCT/NZ2008/000260 using the 2 -aCT method36 with Topoisomerase 3A as the internal control Reference RNA was used as the calibrator to enable comparison between different experiments. EXAMPLE 5: Immunohistochemical analysis 5 Immunohistochemical expression of Ki-67 antigen (MIB-1; DakoCytomation, Denmark) was investigated on 4 pm sections of 73 paraffin-embedded primary colorectal tumours from Cohort A. Endogenous peroxidase activity was blocked with 0.3% hydrogen peroxidase in methanol and antigens were retrieved in boiling citrate buffer (pH 6). Non specific binding sites were blocked with 5% normal goat serum containing 1% BSA. 10 Primary antibody (dilution 1:50) was detected using the EnVision system (Dako EnVision, CA) and the DAB substrate kit (Vector laboratories, CA). Five high-power fields were selected using a 10 x 10 microscope grid and cell counts were performed manually in a blind fashion without knowledge of the clinico-pathologic data. The Ki-67 proliferation index (PI) was presented as the percentage of positively stained nuclei for each tumour. 15 EXAMPLE 6: Statistical analysis Statistical analyses were performed using SPSS@ version 14.0.0 (SPSS Inc., Chicago, IL). Ki-67 proliferation indices were presented as mean ± SD. A Fisher's Exact Test or Kruskal-Wallis Test was used to evaluate the differences between categorized groups 20 based on the expression of the GPS or the Ki-67 PI versus the clinico-pathologic parameters. A P value 5 0.05 was considered significant. Overall survival (OS) and recurrence-free survival (RFS) were plotted using the method of Kaplan and Meier (37). A log-rank test was used to test for differences in survival time between the categorized groups. Relative risk and associated confidence intervals were also estimated for each 25 variable using the Cox univariate model, and a multivariate Cox proportional hazard model was developed using forward stepwise regression with predictive variables that were significant in the univariate analysis. K-means clustering method was used to classify clinical samples based on the expression level of GPS. 30 EXAMPLE 7: Identification of a gene proliferation signature (GPS) using a colorectal cell line model An overview of the approach used to derive and apply a gene proliferation signature (GPS) is summarised in FIG. 1. The GPS, including 38 mitotic cell cycle genes (Table C), was relatively over-expressed in cycling cells in semi-confluent cultures. Low proliferation, 35 defined by low GPS expression, was associated with unfavourable clinico-pathologic variables, shorter overall and recurrence-free survival (p<0.05). No association was found between Ki-67 proliferation index and clinico-pathologic variables or clinical outcome. 93 WO 2009/045115 PCT/NZ2008/000260 Table C: GCPMs for cell proliferation signature unique Average I Gene Symbol Gene Name GenBank Acc. Gene Aliases ID Fold No. change EP/SP A 05382 1.91 CDC2 cell division cycle NM_001786, CDK1; 2, G1 to S and NM_033379 MGC111195; G2 to M DKFZp686L2 0222 3:84 1.89 MCM6 MCM6 NM 005915 - Mis5; minichromosome IP105MCM; maintenance MCG40308 deficient 6 (MIS5 homolog, S. pombe) (S. cerevisiae) A 00231 1.75 RPA3 replication NM_002947 REPA3 protein A3, - ---------------- ---- 14 k D a B:7620 1.69 MCM7 MCM7 NM_005916. 1CM2i minichromosome NM_182776 CDC47; maintenance P85MCM; deficient 7 (S. P1CDC47; cerevisiae) PNAS-146: CDABP0042, ----------- --- ---------------- - I P 1 .1-M C M 3 A:03715 1 .68 PCNA proliferating cell NM_002592, MGC8367 nuclear antigen NM_182649 B:9714 1.59 XRCC6 X-ray repair NM_001469 ML8; KU70; complementing TLAA; defective repair CTC75; in Chinese CTCBF; hamster cells 6 G22P1 (Ku autoantigen, - ____ _ - -------- 70kDa) B:4036 1.56 KPNA2 karyopherin NM_002266 QIP2; RCH1 alpha 2 (RAG IPOAI; cohort 1, importin SRPlalpha alpha 1) A:05280 1.56 ANLN aniliin, actin NM 018685 scra: Scraps; binding protein ANILLIN; DKFZp779A0 55 -A:04760 1.52 APG7L ATG7 autophagy NM_006395 GSA7; related 7 APG7L; homolog (S. DKFZp434NO __W- - cerevisiae) 735; ATG7 A03912 1.52 PBK PDZ binding NM 018492 SPK; TOPK; kinase Nori-3; - - - -- FLJ14385 A:03435 1.51 GMNN geminin, DNA NM 015895 1 Gem; RP3 replication 369A17.3 inhibitor A:O9802 1 51 RRMI ribonucleotide NM 001033 RI; RRI; reductase M1 RIR1 ...... .... . ..-- polypeptide A-09331 1.49 CDC45L CDC45 cell NM_003504 CDC45; division cycle 45- CDC45L2; like (S. PORC-PI-1 94 WO 2009/045115 PCT/NZ2008/000260 cerevisiae) [ - - - - - --- - - A:06387 1 46 MAD2L1 MAD2 mitotic NM 002358 MAD2; arrest deficient- HSMAD2 like 1(yeast) U0169 1.45 RAN RAN, member NM_006325 TC4; Gspl; RAS oncogene ARA24 family A:07296 1.43 DUT dUTP NM 001025248, dUTPase; pyrophosphatase NM 001025249, FLJ20622 NM 001948 B3501 1.42 RRM2 ribonucleotide NM_001034 R2; RR2M reductase M2 polypeptide A:09842 1.41 CDK7 cyclin-dependent NM 001799 CAKI; STK1 kinase 7 (MO15 CDKN7; homolog, p39MO15 Xenopus laevis, cdk-activating ----------- _ _ _ _ _ _.._._ _............._ _.._ _._ _. ......------- k- -- ---- e------------- A:09724 1.40 MLH3 mutL homolog 3 NM 001040108 HNPCC7; (E. coli) NM_014381 MGC138372 A:05648 1.39 SMC4 structural NM_001002799 CAPC; maintenance of NM_001002800 SMC4L1; chromosomes 4 NM_005496 hCAP-C A:09436 1 39 SMC3 structural NM_005445 BAM; BMH; maintenance of HCAP; chromosomes 3 CSPG6; SMC3L1 A:02929 1 39 POLD2 polymerase NM_006230 None (DNA directed), delta 2, regulatory subunit 50kDa A:04680 1 38 POLE2 polymerase NM_002692 DPE2 (DNA directed), epsilon 2 (p59 subunit) - -- ----- -- B:8449 1 38 BCCIP BRCA2 and NM_016567, TOK-1 CDKN1A NM_078468, interacting NM_078469 --------- -------- r ----- ----------- B:1035 1 37 GINS2 GINS complex NM_016095 PSF2; Pfs2; subunit 2 (Psf2 HSPC037 B:7247 1.37 TREXI three prime NM_016381, AGSI; repair NM_032166, DRN3; exonuclease 1 NM_033627, ATRIP; NM_033628, FLJ12343; NM_033629, DKFZp434JO NM_130384 310 A:09747 1.35 BUB3 BUB3 budding NM 001007793, BUB3L; uninhibited by NM 004725 hBUB3 benzimidazoles 3 homolog.(yest) B:9065 1.32 FEN1 flap structure- NM_004111 MFI; RAD2; 95 WO 2009/045115 PCT/NZ2008/000260 specific FEN--1 endonuclease 1 B:2392 1.32 DBF4B DBF4 homolog B NM_025104, DRF1; (S. cerevisiae) NM_145663 ASKLI; FLJ13087; MGC15009 A:09401 1 .31 PREIS preimplantation NM_015387, 2C4D; protein 3 NM_199482 MOB1; MOB3; CGI 95; MGC12264 00921 1 30 CCNEI cyclin El NM_001238, CCNE NM_057182 A:10597 1.30 RPA1 replication NM_002945 HSSB, RF-A; protein Al. RP-A; 70kDa REPA1; RPA70 A:02209 1.29 POLE3 polymerase NM_017443 p17; YBLI; (DNA directed), CHRAC17; epsilon 3 (p 1 7 CHARAC17 --------------- _-------------- s u b u nit) A:09921 1.26 RFC4 replication factor NM_002916, Al;RFC37; C (activator 1)4, NM_181573 MGC27291 ---- - -37kDa A:08668 1.26 MCM3 MCM3 NM 002388 HCC5; P1.h; minichromosome RLFB; maintenance MGC1 157; deficient 3 (S. P1-MCM3 ------------ cerevisiae_) . .

B T793 1.25 CHEKI CHKI checkpoint NM_001274 CHK1 homolog (S. pombe) A:09020 1.22 CCND1 cyclin D1 NM 053056 BCL1; PRADI; U21B31; - --------- --------------------------- I D 1 1 8 2 8 7 E A:03486 1.22 CDC37 CDC37 cell NM_007065 P50CDC37 division cycle 37 homolog (S. ----- - --------- -e r----a -- ...- ---------------- e-v i The GPS was identified as a subset of genes whose expression correlates with CRC cell proliferation rate. Statistical Analysis of Microarray (SAM; Reference 38) was used to identify genes differentially expressed (DE) between exponentially growing (semi 5 confluent) and non-cycling (fully-confluent) CRC cell lines (FIG. 1, stage 1). To adjust for gene specific dye bias and other sources of variation, each culture set was analysed independently. Analyses were limited to 502 DE genes for which a significant expression difference was observed between two growth stages in both sets of cultures (false discovery rate < 1 %). Gene Ontology (GO) analysis was carried out using EASE39 to 10 identify the biological process categories that were significantly reflected in the DE genes. 96 WO 2009/045115 PCT/NZ2008/000260 Cell-proliferation related categories were over-represented mainly due to genes upregulated in exponentially growing cells. The mitotic cell cycle category (GO:0000278) was defined as the GPS because (i) this biological process was the most over represented GO term (EASE score=5.521 1); and (ii) all 38 mitotic cell cycle genes (Table 5 C) were expressed at higher levels in rapidly growing compared to growth-inhibited cells. The expression of eleven genes from the GPS was assessed by QPCR and correlated with corresponding values obtained from the array data. Therefore, QPCR confirmed that elevated expression of the proliferation signature genes correlates with the increased proliferation in CRC cell lines (FIG. 5). 10 EXAMPLE 8: Classification of CRC samples according to the expression level of gene proliferation signature In order to examine the relative proliferation state of CRC tumours and the utility of the GPS for clinical application, CRC tumours from two cohorts were stratified into two 15 clusters. based on the expression of GPS (FIG. 1, stage 2). Expression values of the 38 genes defining the GPS were first obtained from the microarray-generated expression profiles of tumours. Tumours from each cohort were then separately classified into two clusters (K=2) based on their GPS expression level similarities using K-means unsupervised clustering. Analysis of DE genes between two defined clusters using all 20 filtered genes revealed that the GPS was contained within the list of genes upregulated in cluster I (FIG. 2A, upper panel) relative to cluster 2 (lower panel) in both cohorts. Thus, the tumours in cluster 1 are characterised by high GPS expression, while the tumours in cluster 2 are characterised by low GPS expression. 25 EXAMPLE 9: Low gene proliferation signature is associated with unfavourable clinico-pathologic variables Table 2 summarises the association between GPS expression levels and clinico pathologic variables. An association was observed between low proliferation activity, defined by low GPS expression, and an increased risk of recurrence in both cohorts 30. (P=0.03 and <0.001 for Cohort A and B, respectively). In Cohort A, low GPS expression was also associated with a higher disease stage and lymph node metastasis (P=0.006 and 0.03 respectively). In addition, tumours with lymphatic invasion from Cohort A tended to be less proliferative than tumours without lymphatic invasion, albeit without reaching statistical significance (P=0.06). No association was found between the GPS expression 35 level and tumour site, age, sex, degree of differentiation, T-stage, vascular invasion, degree of lymphocyte infiltration and tumour margin. 97 WO 2009/045115 PCT/NZ2008/000260 EXAMPLE 10: Gene proliferation signature predicts clinical outcome To examine the performance of the GPS in predicting patient outcome, Kaplan-Meier survival analysis was used to compare RFS and OS between low and high GPS tumours (FIG. 3). All patients were censored at 60 months post-operation. In colorectal cancer 5 Cohort A, OS and RFS were shorter in patients with low GPS expression (Log rank test P=0.04 and 0.01, respectively). In colorectal cancer Cohort B, low GPS expression was also associated with decreased OS (P=0.0004) and RFS (P=0.0002). When the parameters predicting OS and RFS in univariate analysis were investigated in a multivariate model, disease stage was the only independent predictor of 5-year OS, while 10 disease stage and T-stage were independent predictors of RFS in Cohort A. In Cohort B, low GPS expression and lymphatic invasion showed an independent contribution to both OS and RFS. If survival analysis was limited to Cohort B patients without lymphatic invasion, low GPS was still associated with shorter OS and RFS, confirming the independence of the GPS as a predictor. Analyses of single and multiple-variable 15 associations with survival are summarized in Table 3. Low GPS expression was also associated with decreased 5-year overall survival in patients with gastric cancer (p=0.008). A Kaplan-Meier survival plot comparing the overall survival of low and high GPS gastric tumours is shown in Fig. 4. 20 98 WO 2009/045115 PCT/NZ2008/000260 Table 3: Uni- and multivariate analysis of prognostic factors for OS and RFS in both cohorts Overall Survival Recurrence-free Survival Univariate analysis Multivariate Univariate analysis Multivariate -------- ------------ analysis a --- ------ analysis § Parameter Hazard Hazard Hazard Hazard s ratio * p-value ratio * p-vaue ratio * p-value ratio * p-value Dukes 4.2 412 39 29 stage .) 0 001 (21- <0001 0001 <0001 a n (2.072 (2.1-2) (1.9-6.6) 2A1 2.72. T-stage (.23) 04) 0.003 0.040 (12 ((.45.2) N stage 37- < 00 1 4 -010 (2-9.6) (xps10) Lymphatic 0.16 0.2 invasion (0.07- <0001 (0.09- 0001 0 1 (+ vs.-) 0.36) 0.43) *Margin 4. (infiltrative 002 vs, t7- ,0 (1.4- 0.008 expansive)11910) GPS expression 0.46 0.33(0.14 (low vs. (0.2-0 .9) 0.037 - - -0.78) 0.01 Ihigh)__ _ _ _ _ _ _ _ _ _ _ _ _ _ oLymphatic 0250.023.7 0= invasion O(.8 006 (00- 0037 (0.08- 0,005 (0.1- 0.014 (+,vs. -) 0.78) 0.9) 0.63) 0.77) W PS 02 .502 expression 0.23 02 .502 (low vs. (0.06- 0.022 (0.07- 0.032 (0.09- 0.006 (0.1- 0.010 high 0.81) 0.89) 0.67) 0.73) * Hazard ratio determined by Cox regression model; confidence interval=95% § Final results of Cox regression analysis using a forward stepwise method (enter limit=0.05, remove lim it=0.10).- - ------- - EXAMPLE 11: Ki-67 is not associated with clinico-pathologic variables or survival 5 Ki-67 immunostaining was performed on tissue sections from Cohort A tumours only as paraffin-embedded samples were unavailable for Cohort B (FIG. 1, stage 3). Nuclear staining was detected in all 73 CRC tumours. Ki-67 PI ranged from 25 to 96 %, with a mean value of 76.3±17.5. Using the mean Ki-67 value as a cut-off point, tumours were assigned into two groups with low or high PL. Ki-67 PI was neither associated with clinico 10 pathologic variables (Table 2) nor survival (FIG. 3). When the survival analysis was limited to the patients with the highest and lowest Ki-67 values, no statistical difference was observed (data not shown). The sum of these results indicates that the low expression of growth-related genes is associated with poor outcome in colorectal cancer, and Ki-67 was not sensitive enough to detect an association. These findings can be used as additional 15 criteria for identifying patients at high risk of early death from cancer. 99 WO 2009/045115 PCT/NZ2008/000260 EXAMPLE 12: Selection of correlated cell proliferation genes Cohort B (55 German CRC patients; Table 2) were first classified into low and high proliferation groups using the 38 gene cell proliferation signature (Table C) and the K means clustering method (Pearson uncentered, 1000 permutations, threshold of 5 occurrence in the same cluster sat at 80%). Statistical Analysis of Microarrays (SAM) was then applied to identify differentially expressed genes between low and high proliferation groups (FDR=0) when all filtered genes (16041 genes) were included for the analysis. 754 genes were found to be over-expressed in high proliferation group. The GATHER gene ontology program was then used to identify the most over-represented gene 10 ontology categories within the list of differentially expressed genes. The cell cycle category was the most over-represented category within the list of differentially expressed genes. 102 cell cycle genes which are differentially expressed between the low and high proliferation groups (in addition to the original 38 gene signature) are shown in Table D. Table D: Cell Cycle Genes that are Differentially Expressed in Low and High Proliferation Gene Title Gene Chromosomal Probe Set ID Representative Symbol Location Public ID - ------------ ___ -, ---- -- ----- ---- - -- ----- ---- - asp (abnormal spindle) ASPM chrlq31 219918 s at NM 018123 homolog, microcephaly associated (Drosophila) aurora kinase A AURKA chr2Oq13.2-q13.3 204092_s at NM 003600 208079 s at NM 003158 aurora kinase B 1 AURKB chr17p13.1 209464at AB011446 baculoviral lAP repeat- BIRC5 chrl7q25 202094_at AA648913 containing 5 (survivin) J 202095 s at NM 001168 1210334 x at AB028869 -------------- ---- --------- _ _ __ _ _ _ _ _ _ _ _ Bloom syndrome BLM chrl5q26.1 205733_at NM_000057 breast cancer 1, early BRCA1 chr17q21 204531 s at NM007295 onset 211851 x at AF005068 BUBI budding uninhibited BUB1 chr2q14 209642_at AF043294 by benzimidazoles I homolog (yeast) .215509_s-at AL137654 BUBI buddina uninhibited BUBIB chr15qI5 203755 at NM 001211 by benzimidazoles I homolog beta (yeast) cycling A2 CCNA2 chr4q25-q31 203418 at NM 001237 213226_at A1346350 cyclin B1 ----- CCNB1 - chr5q12 214710 s at BE407516 cyclin B2 CCNB2 chrl5q22.2 202705_at NM_00471 cyclin E2 CCNE2 chr8q22.1 205034_at NM 004702 211814_s_at AF112857 cyclin F CCNF chrl6pl3.3 204826 at NM 001761 204827 s at U17105 cyclinJCCNJ chr IOpter-q26.12 219470 x at NM 9084 100 WO 2009/045115 PCT/NZ2008/000260 cyclin T2 CCNT2 chr2q21i3 204645 at NM_001241 chaperonin containing -CCT2 chr12ql5 201946 s at AL545982 TCP1, subunit 2 (beta) cell division cycle 20 CDC20 chrlp34.1 202870_s at NM_001255 homolog (S. cerevisiae) cell division cycle 25 CDC25A chr3p2l 204695_ at A1343459 homolog A (S. pombe)_ cell division cycle 25 CDC25C chr5q3l 205167_s_at NM_001790 homolog C (S. pombe) 217010 s at AF277724 cell division cycle 27 CDC27 chrl7ql2-q232 217879_at AL566824 homolog (S. cerevisiae) ---- cell division cycle 6 CDC6 chrl7q21.3 203968_s_at NM_001254 homolog (S. cerevisiae) cyclin-dependent kinase 2 CDK2 chr12q13 204252_at M68520 211804_s_at AB012305 I cyclin-dependent kinase 4. CDK4 chr12q14 202246_s_at NM-_000075 cyclin-dependent kinase CDKN3 chrl4q22 209714_s_at AF213033 inhibitor 3 (CDK2 associated dual specificity phosphatase) chromatin licensing and CDT1 chrl6q24.3 209832_s_at AF321125 DNA replication factor 1 centromere protein E, CENPE chr4q24 q25 205046_at NM_001813 312kDa centromere protein F, CENPF chrlq32-q41 207828_s_at NM_005196 350/400ka (mitosin) 209172 s at U30872 chromatin assembly CHAFIA chr19p13.3 203975_s-at BF000239 factor 1, subunit A (p150) 203976 s at NM 005483 214426 xat BF062223 CHK2 checkpoint CHEK2 chr22q11|22q12. 210416_s at BC004207 homolog (S. pombe) 1 CDC28 protein kinase CKS1B chrlq21.2 201897_s at NM_001826 regulatory subunit 1B - -- ----- ---- CDC28 protein kinase CKS2 chr9q22 204170_s at NM_001827 regulatory subunit 2 __ __ _ - - - -- ----------- DEAD/H (Asp-Glu-Ala- DDX1 1 chrl2pl 1 210206_s at U33833 Asp/His) box polypeptide 11 (CHL1 -like helicase homolog, S. cerevisiae) extra spindle pole bodies ESPL1 chr2q -38158_at D79987 homolog 1 (S. cerevisiae) exonuclease 1 I EXO1 chr1q42 q43 204603_at NM_003686 fumarate hydratase FH chrlq42.1 203032_s_at A1363836 fyn-related kinase IFRK chr6q21-q22.3 207178 s at NM_002031 G-2 and S-phase GTSEI chr22q13.2-q13.3 204318_s at NM 016426 expressed I 215942 s at BF973178 101 WO 2009/045115 PCT/NZ2008/000260 high mobility group AT- HMGA1 chr6p2l 206074_s_at NM_002131 hook I high-mobility group box 2 HMGB2 chr4q31 208808_sat BC000903 interleukin enhancer ILF3 chrl9p13.2 208931_s_at AF147209 binding factor 3, 90kDa 211375 s at AF141870 kinesin family member 11 KI1 1i chrlOq24.1 204444_at NM 004523 kinesin family member 22 K1F22 chrl6p11.2 202183 s at NM_007317 216969_s at AC002301 kinesin family member 23 KIF23 chr15q23 204709_s at NM_004856 kinesin family member 2C KIF2C chrlp34.1 209408_a U63743 211519 s at AY026505 kinesin family member C1 KIFC1 chr6p2l.3 209680_s at BC000712 kinetochore associad 1 KNTC1 chr12q24.31 206316_s at NM_014708 ligase 1, DNA, ATP- LIG1 chr19q13.2-q13.3 202726_at NM_000234 dependent mitogen-activated protein MAPK1 chr22q 11.2122q1 208351 's_at NM_002745 kinase 1 1.21 minichromosome MCM2 chr3q2l 202107__ at NM_004526 maintenance complex component 2 minichromosome MCM4 chr8ql1.2 212141_at AA604621 maintenance complex component 4 at-------- 212142 at A1936566 222036 s at A1859865 222037 at A1859865 minichromosome MCM5 chr22q13.1 201755_at NM_006739 maintenance complex component 5 ------------- 216237 s at AA807529 antigen identified by MK167 chrlOq25-qter 212020_s_at AU152107 monoclonal antibody-Ki 67 212021 s at AU132185 2f2022 s at BF001806 212023 s at AU147044 M-phase phosphoprotein MPHOS chrlOq23.31 205235_sat NM_016195 1 PH1 M-phase phosphoprotein MPHOS chrl2q24.31 206205_at NM_022782 9 PH9 mutS homolog 6 (E. coli) MSH6 chr2p16 202911_at NM_000179 1211450_at D89646 non-SMO condensin I NCAPD2 chrl2p13.3 201774_sat AK022511 complex, subunit D2 --- __non-SMC condensin I NCAPG chr4p15.33 218662_sat NM_022346 complex, subunit G N 022346 218663_at NM_022346 non-SMO condensin I NCAPH chr2cill.2 212949_at D38553 com plIex, subunit H -------- ---- --- ----- -- 102 WO 2009/045115 PCT/NZ2008/000260 NDC80 homolog, NDC8O chri18pl.32 204162_at NM_006101 kinetochore complex component (S. cerevisiae) NIMA (never in mitosis NEK2 chrlq32.2-q41 204641 at NM_002497 gene a)-related kinase 2 chrlq32.2-q41 211080 s at Z25425 NIMA (never in mitosis NEK4 chr3p2l.1 204634_at NM_003157 gene a)-related kinase 4 non-metastatic cells 1, NME1 chr17q21.3 201577_at NM_000269 protein (NM23A) expressed in ---- nucleolar and coiled-body NOLCI chrI0q24.32 205895_s_at NM 004741 phosphoprotein 1 nucleophosmin (nucleolar NPM1 chr5q35 221691_x_at AB042278 phosphoprotein B23, numatrin) 221923 s at AA191576 nucleoporin 98kDa NUP98 chr11p1D5 203194 s at AA527238 origin recognition ORCiL chr1p32 205085 at NM 004153 complex, subunit 1-like (yeast) origin recognition ORC4L chr2q22-q23 203351 s_at AF047598 complex, subunit 4-like (ye origin recognition ORC6L chr16q12 219105 x_at NM_014321 complex, subunit 6 like (yeast) protein kinase, membrane PKMYTI chr16p13.3 204267_x_at NM_004203 associated tyrosine/threonine 1 -----------Ipolo-like kinase I PLK1 chr16p12.1 202240_at NM_005030 (Drosophila) polo-like kinase 4 PLK4 chr4q28 204886 at AL043646 (Drosophila) 204887 s at NM 014264 211088 s at Z25433 PMSI postmeiotic PMS1 chr2q3l- 213677_s_at BG434893 segregation increased I q33|2q31.1 ( c e re v is iae_ _) -- polymerase (DNA POLQ chr3qI3.33 219510_at NM_006596 directed), theta protein phosphatase 1D PPMID chrl7q23.2 204566_at NM_003620 magnesium-dependent, delta isoform protein phosphatase 2 PPP2RI chrI q23.2 202886_s at M65254 (formerly 2A), regulatory B subunit A, beta isoform protein phosphatase 6, PPP6C chr9q33.3 206174_s at NM_002721 catalytic subunit protein regulator of PRCI chr15q26.1 218009_s at NM_003981 cytokinesis I primase, DNA, PRIM1 chr12q13 205053_at NM_000946 polypeptide I 49kDa) 103 WO 2009/045115 PCT/NZ2008/000260 primase, DNA, PRIM2 chr6p12-p11.1 205628_at NM_000947 polypeptide 2 (58kDa) protein arginine PRMT5 chrl4q11.2-q21 217786_at NM_006109 methyltransferase 5 pituitary tumor- PTTG1 chr5q35.1 20355_x at NM 004219 transforming 1 pituitary tumor- PTTG3 chr8q13A 208511_at NM 021000 Itransforming3 ____ __ __ __ RDs1 homolog (RecA RAD51 chr15q5.1 205024_sat NM 002875 homolog, E. coli) (S. cerevisiae) - --- RAD54 homolog B (S. RAD54B chr8q21 3-q22 219494_at NM 012415 cerevisiae) Ras association RASSF1 chr3p2l.3 204346_sat NM 007182 (RalGDS/AF-6) domain family member 1 -- replication factor C RFC2 chr7ql1.23 1053_at M87338 (activator 1) 2, 40kDa ----- 203696 s at NM 002914 replication factor C RFC3 chr13q12.3-q13 204128s_at NM_002915 (activator 1) 3, 38kDa replication factor C RFC5 chr12q24.2-q24.3 203209_at BCO0 1866 (activator 1) 5, 36.5kDa 203210 s at NM_007370 ribonuclease H2, subunit RNASEH chrl9p13.13 203022_at NM_006397 A 2A SET nuclear oncogene SET chr9q34 2 13 04 7 x at A1278616 S-phase kinase- SKP2 chr5p13 210567_s_at BC001441 associated protein 2 (p45) structural maintenance of SMC2 chr9q31. 1 204240_s_at NM 006444 chromosomes 2 213253_at AU154486 sperm associated antigen SPAG5 chrl7q 11.2 203145_at NM_006461 5 SFRS protein kinase 1 SRPK1 chr6p2l.3-p21.2 202199_s at AW082913 signal transducer and STAT1 chr2q32.2 AFFX- AFFX activator of transcription HUMISGF3AI HUMISGF3A/M9 1,91kDa M97935 5 at 7935 5 suppressor -oF viegation SUV39H chr10p13 219262_at NM_024670 3-9 homolog 2 2 (Drosophila) TAR DNA binding protein TARDBP chrlp36.22 200020_at NM_007375 transcription factor A, TFAM chrI0q21 203177_x_at NM_003201 mitochondrial topoisomerase (DNA) I1 TOPBP1 chr3q22. 202633_at NM_007027 binding protein 1 TPX2, microtubule- TPX2 chr20qi 1.2 210052_aat AF098158 associated, homolog (Xenopus laevis) _ TTK protein kinase TTK chr6ql3-q21 204822 at NM 003318 tubulin, gamma 1 TUBGI chrl7q2l 201714 at NM 001070 104 WO 2009/045115 PCT/NZ2008/000260 Conclusions The present invention is the first to report an association between a gene proliferation signature and major clinico-pathologic variables as well as outcome in colorectal cancer. The disclosed study investigated the proliferation state of tumours using an in vitro 5 derived multi-gene proliferation signature and by Ki-67 immunostaining. According to the results herein, low expression of the GPS in tumours was associated with a higher risk of recurrence and shorter survival in two independent cohorts of patients. In contrast, Ki-67 proliferation index was not associated with any clinically relevant endpoints. 10 The colorectal GPS encompasses 38 mitotic cell cycle genes and includes a core set of genes (CDC2, RFC4, PCNA, CCNE1, CDK7, MCM genes, FEN1, MAD2L1, MYBL2, RRM2 and BUB3) that are part of proliferation signatures defined for tumours of the breast (40),(41), ovary (42), liver (43), acute lymphoblastic leukaemia (44), neuroblastoma (45), lung squamous cell carcinoma (46), head and neck (47), prostate (48), and stomach (49). 15 This represents a conserved pattern of expression, as most of these genes have been found to be highly overexpressed in fast-growing tumours and to reflect a high proportion of rapidly cycling cells (50). Therefore, the expression level of the colorectal GPS provides a measure for the proliferative state of a tumour. 20 In this study, several clinico-pathologic variables related to poor outcome (disease stage, lymph node metastasis and lymphatic invasion) were associated with low GPS expression in Cohort A patients. In Cohort B, consisting entirely of stage iI tumours, the study assessed the association between the GPS and lymphatic invasion. The association failed to reach statistical significance due to the small number of tumours with lymphatic 25 invasion in this cohort (5/55). Without being bound by theory, the low GPS expression in more advanced tumours may indicate that CRC progression is not driven by enhanced proliferation. While accelerated proliferation may still be an important driving force during the initial phases of tumourigenesis, it is possible that more advanced disease is more dependent on processes such as genetic instability to allow continuous selection. 30 Consistent with our finding, two large-scale studies reported an association between decreased expression of CDK2, cyclin E and A, and advanced stage, deep infiltration and lymph node metastasis (51),(52). The relationship between low GPS and unfavourable clinico-pathologic variables 35 suggested that the GPS should also predict patient outcome. Indeed, in both Cohort A and B, low GPS expression was associated with a higher risk of recurrence and shorter overall and recurrence-free survival. In Cohort B, where all patients had stage lI tumours, 105 WO 2009/045115 PCT/NZ2008/000260 the association remained in multivariate analysis. However, in Cohort A, where patients had stage I-IV disease, the association was not independent of tumour stage. The number of patients with and without recurrence, within each stage of disease in Cohort A, was probably insufficient to demonstrate an independent association between the GPS and 5 survival. In Cohort B, low GPS expression and lymphatic invasion remained independent predictors in multivariate analysis suggesting that the GPS may improve the prediction of CRC patient outcome within the same disease stage. Not surprisingly, the presence of lymph node and distant organ involvement were the most powerful predictors of outcome as these are direct manifestations of tumour metastasis. 10 Treatment with radiotherapy or chemotherapy, used in 18% and 27% of Cohort A patients respectively, was a possible confounding factor in this study. Theoretically, the improved survival associated with elevated GPS expression might reflect the better response of fast proliferating tumours to cancer treatment (53),(54). However, no correlation was found 15 between treatment and GPS expression. Furthermore, no patients in Cohort B received adjuvant therapy indicating that the association between GPS and survival is independent of treatment. It should be noted that this study was not designed to investigate the relationship between tumour proliferation and response to chemotherapy or radiotherapy. 20 The sample size may also explain the lack of an association between clinico-pathologic variables and survival with Ki-67 PI in the present study. As mentioned above, other studies on Ki-67 and CRC outcome have reported inconsistent findings. However, in the three other CRC studies with the largest sample size a low Ki-67 PI was associated with a worse prognosis (27),(29),(30). We came to the same conclusion applying the GPS, but 25 based on a much smaller sample size. The multi-gene expression analysis was therefore a more sensitive tool to assess the relationship between proliferation and prognosis than the Ki-67 PL The biological reason behind an unfavourable prognosis in tumours with a low GPS will 30 involve further investigation. Mechanisms that could potentially contribute to worse clinical outcome in low GPS tumours include: (i) a more effective immune response to rapidly proliferating tumours; (ii) a higher level of genetic damage that may render cancer cells more resistant to apoptosis, and increase invasiveness, but also perturb smooth replication machinery; (iii) an increased number of cancer stem cells that divide slowly, 35 similar to normal stem cells, but have a high metastatic potential; and (iv) a higher proportion of microsatellite unstable tumours which have a high proliferation rate but a relatively good prognosis. 106 WO 2009/045115 PCT/NZ2008/000260 In sum, the present invention has clarified the previous, conflicting results relating to the prognostic role of cell proliferation in colorectal cancer. A GPS has been developed using CRC cell lines and has been applied to two independent patient cohorts. It was found that 5 low expression of growth-related genes in CRC was associated with more advanced tumour stage (Cohort A) and poor clinical outcome within the same stage (Cohort B). Multi-gene expression analysis was shown as a more powerful indicator than the long established proliferation marker, Ki-67, for predicting outcome. For future studies, it will be useful to determine the reasons that CRC differs from other common epithelia cancers, 10 such as breast and lung cancers (e.g., in reference to Ki-67). This will likely provide insights into important underlying biological mechanisms. From a practical viewpoint, the ability to stratify recurrence risk within a given pathological stage could enable adjuvant therapy to be targeted more accurately. Thus, GPS expression can be used as an adjunct to conventional staging for identifying patients at high risk of recurrence and death from 15 colorectal cancer. All publications and patents mentioned in the above specification are herein incorporated by reference. 20 Wherein in the foregoing description reference has been made to integers or components having known equivalents, such equivalents are herein incorporated as if individually set fourth. Although the invention has been described by way of example and with reference to 25 possible embodiments thereof, it is to be appreciated that improvements and/or modifications may be made without departing from the scope or the spirit thereof. References: 1. Evan GI, Vousden KH: Proliferation, cell cycle and apoptosis in cancer. Nature 30 411:342-8, 2001 2. Whitfield ML, George LK, Grant GD, et al: Common markers of proliferation. Nat Rev Cancer 6:99-106, 2006 3. Rew DA, Wilson GD: Cell production rates in human tissues and tumours and their significance. Part 1: an introduction to the techniques of measurement and their 35 limitations. Eur J Surg Oncol 26:227-38, 2000 4. Endle E, Gerdes J: The Ki-67 protein: fascinating forms and an unknown function. Exp Cell Res 257:231-7, 2000 107 WO 2009/045115 PCT/NZ2008/000260 5. Brown DC, Gatter KC: Ki67 protein: The immaculate deception. Histopathology 40:2 11,2002 6. Paik S, Shak S, Tang G, et al: A multigene assay to predict recurrence of tamoxifen treated, node-negative breast cancer. N Engl J Med 351:2817-26, 2004 5 7. Ofner D, Grothaus A, Riedmann B, et al: MIB1 in colorectal carcinomas: its evaluation by three different methods reveals lack of prognostic significance. Anal Cell Pathol 12:61 70, 1996 8. Ihmann T, Liu J, Schwabe W, et al: High-level mRNA quantification of proliferation marker pKi-67 is correlated with favorable prognosis in colorectal carcinoma. J Cancer 10 Res Cin Oncol 130:749-756, 2004 9. Van Oijen MG, Medema RH, Slootweg PJ, et al: Positivity of the proliferation marker pKi-67 in non-cycling cells. Am J Clin Pathol 110:24-31, 1998 10. Duchrow M, Ziemann T, Windh6vel U, et at: Colorectal carcinomas with high MIB-1 labelling indices but low pKi67 mRNA levels correlate with better prognostic outcome. 15 Histopathology 42:566-574, 2003 11. Evans C, Morrison I, Heriot AG, et al: The correlation between colorectal cancer rates of proliferation and apoptosis and systemic cytokine levels; plus their influence upon survival. Br J Cancer 94:1412-9, 2006 12. Rosati G, Chiacchio R, Reggiardo G, et al: Thymidylate synthase expression, p53, bcl 20 2, Ki-67 and p27 in colorectal cancer: relationships with tumour recurrence and survival. Tumour Biol 25:258-63, 2004 13. Ishida H. Miwa H, Tatsuta M, et al: Ki-67 and CEA expression as prognostic markers in Dukes' C colorectal cancer. Cancer Left 207:109-115, 2004 14. Buglioni S, D'Agnano I, Cosimelli M, et al: Evaluation of multiple bio-pathological 25 factors in colorectal adenocarcinomas: independent prognostic role of p53 and bcl-2. Int J Cancer 84:545-52, 1999 15. Guerra A, Borda F, Javier Jimenez F, et al: Multivariate analysis of prognostic factors in resected colorectal cancer: a new prognostic index. Eur J Gastroenterol Hepatol 10:51 8, 1998 30 16, Kyzer S, Gordon PH: Determination of proliferative activity in colorectal carcinoma using monoclonal antibody Ki67. Dis Colon Rectum 40:322-5, 1997 17. Jansson A, Sun XF: Ki-67 expression in relation to clinicopathological variables and prognosis in colorectal adenocarcinomas. APMIS105:730-4, 1997 18. Baretton GB, Diebold J, Christoforis G, et al: Apoptosis and immunohistochemical bcl 35 2 expression in colorectal adenomas and carcinomas. Aspects of carcinogenesis and prognostic significance. Cancer 77:255-64, 1996 108 WO 2009/045115 PCT/NZ2008/000260 19. Sun XF, Carstensen JM, Stal 0, et a!: Proliferating cell nuclear antigen (PCNA) in relation to ras, c-erbB-2, p53, clinico-pathological variables and prognosis in colorectal adenocarcinoma. Int J Cancer 69:5-8, 1996 20. Kubota Y, Petras RE, Easley KA, et al: Ki-67-determined growth fraction versus 5 standard staging and grading parameters in colorectal carcinoma. A multivariate analysis. Cancer 70:2602-9, 1992 21. Valera V, Yokoyama N, Walter B, et ai: Clinical significance of Ki-67 proliferation index in disease progression and prognosis of patients with resected colorectal carcinoma. Br J Surg 92:1002-7, 2005 10 22. Dziegiel P, Forgacz J, Suder E, et al: Prognostic significance of metallothionein expression in correlation with Ki-67 expression in adenocarcinomas of large intestine. Histol Histopathol 18:401-7, 2003 23. Scopa CD, Tsamandas AC, Zolata V, et al: Potential role of bcl-2 and Ki-67 expression and apoptosis in colorectal carcinoma: a clinicopathologic study. Dig Dis Sci 15 48:1990-7, 2003 24. Bhatavdekar JM, Patel DD, Chikhlikar PR, et al: Molecular markers are predictors of recurrence and survival in patients with Dukes B and Dukes C colorectal adenocarcinoma. Dis Colon Rectum 44:523-33, 2001 25. Chen YT, Henk MJ, Carney KJ, et al: Prognostic Significance of Tumor Markers in 20 Colorectal Cancer Patients: DNA Index, S-Phase Fraction, p53 Expression, and Ki-67 Index. J Gastrointest Surg 1:266-273, 1997 26. Choi HJ, Jung IK, Kim SS, et al: Proliferating cell nuclear antigen expression and its relationship to malignancy potential in invasive colorectal carcinomas. Dis Colon Rectum 40:51-9, 1997 25 27. Hilska M, Collan YU, 0 Laine VJ, et al: The significance of tumour markers for proliferation and apoptosis in predicting survival in colorectal cancer. Dis Colon Rectum 48:2197-208, 2005 28. Salminen E, Palmu S, Vahlberg T, et al: Increased proliferation activity measured by immunoreactive Ki67 is associated with survival improvement in rectal/recto sigmoid 30 cancer. World J Gastroenterol 11:3245-9, 2005 29. Garrity MM, Burgart LJ, Mahoney MR, et a[: Prognostic value of proliferation, apoptosis, defective DNA mismatch repair, and p53 overexpression in patients with resected Dukes' B2 or C colon cancer: a North Central Cancer Treatment Group Study. J Clin Oncol 22:1572-82, 2004 35 30. Allegra CJ, Paik S, Colangelo LH, et al: Prognostic value of thymidylate synthase, Ki 67, and p53 in patients with Dukes' B and C colon cancer: a National Cancer Institute 109 WO 2009/045115 PCT/NZ2008/000260 National Surgical Adjuvant Breast and Bowel Project collaborative study. J Clin Oncol 21:241-50, 2003 31. Palmqvist R, Sellberg P, Oberg A, et a[: Low tumour cell proliferation at the invasive margin is associated with a poor prognosis in Dukes' stage B colorectal cancers. Br J 5 Cancer 79:577-81, 1999 32. Paradiso A, Rabinovich M, Vallejo C, et al: p53 and PCNA expression in advanced colorectal cancer: response to chemotherapy and long-term prognosis, Int J Cancer 69:437-41, 1996 33. Neoptolemos JP, Oates GD, Newbold KM, et al: Cyclin/proliferation cell nuclear 10 antigen immunohistochemistry does not improve the prognostic power of Dukes' or Jass' classifications for colorectal cancer. Br J Surg 82:184-7, 1995 34. Compton C, Fenoglio-Preiser CM, Pettigrew N, et al: American joint committee on cancer prognostic factors consensus conference. Colorectal working group. Cancer 88: 1739-1757, 2000 15 35. Colantuoni C, Henry G, Zeger S, et al: SNOMAD (Standarization and NOrmalization of MicroArray Data): web-accessible gene expression data analysis. Bioinformatics 18:1540 1541,2002 36. Livak KJ, Schmittgen TD: Analysis of Relative Gene Expression Data Using Real Time Quantitative PCR and the 2-AACT Method. METHODS 25:402-408, 2001 20 37. Pocock SJ, Clayton TC, Altman DG: Survival plots of time-to-event outcomes in clinical trials: good practice and pitfalls. Lancet 359:1686-89, 2002 38. Trusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98:5116-21, 2001 39. Hosack DA, Dennis G, Sherman BT, et al: Identifying biological themes within lists of 25 genes with EASE. Genome biology 4:R70, 2003 40. Perou CM, Jeffrey SS, DE Rijn MV: Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc. Nati. Acad. Sci. USA 96:9212-17, 1999 41. Perou CM: Molecular portraits of human breast tumours. Nature 406:747-752, 2000 30 42. Welsh JB/Zarrinkar PP, Sapinoso LM, et al: Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc. Natl Acad. Sci. USA 98:1176,-1181, 2001 43. Chen X, Cheung ST, So S, et al: Gene expression patterns in human liver cancers. Mol. Biol. Cell 13:1929-1939, 2002 35 44. Kirschner-Schwabe R, Lottaz C, Todling J, et al: Expression of late cell cycle genes and an increased proliferative capacity characterize very early relapse of childhood acute lymphoblastic leukemia. Clin Cancer Res 12:4553-61, 2006 110 WO 2009/045115 PCT/NZ2008/000260 45, Krasnoselsky AL, Whiteford CC, Wei JS, et al: Altered expression of cell cycle genes distinguishes aggressive neuroblastoma. Oncogene 24:1533-1541, 2005 46. Inamura K, Fujiwara T, Hoshida Y, et a[: Two subclasses of lung squamous cell carcinoma with different gene expression profiles and prognosis identified by hierarchical 5 clustering and non-negative matrix factorization. Oncogene 24:7105-13, 2005 47. Chung CH, Parker JS, Karaca G, et a[: Molecular classification of head and neck squamous cell carcinomas using patterns of gene expression. Cancer Cell 5:489-500, 2004 48. LaTulippe E, Satagopan J, Smith A, et al: Comprehensive gene expression analysis of io prostate cancer reveals distinct transcriptional programs associated with metastatic disease. Cancer Res 62:4499-4506, 2002 49. Hippo Y, Taniguchi H, Tsutumi S, et al: Global gene expression analysis of gastric cancer by oligonucleotide microarrays. Cancer Res 62:233-40, 2002 50. Whitfield ML, Sherlock G, Saldanha AJ, et a[: Identification of genes periodically 15 expressed in the human cell cycle and their expression in tumours. Mo Biol Cell 13:1977 2000, 2002 51, Li JQ, Miki H, Ohmori M, et al: Expression of cyclin E and cyclin-dependent kinase 2 correlates with metastasis and prognosis in colorectal carcinoma. Hum Pathol 32:945-53, 2001 20 52. Li JQ, Miki H, Wu F, et at: Cyclin A correlates with carcinogenesis and metastasis, and p27 (kipl) correlates with lymphatic invasion, in colorectal neoplasms. Hum Pathol 33, 1006-15, 2002 53. Itamochi H, Kigawa J, Sugiyama T, et al: Low proliferation activity may be associated with chemoresistance in clear cell carcinoma of the ovary. Obstet Gynecol 100:281-287, 25 2002 54: lmdahl A, Jenkner J, lhling C, et al: Is MIB-1 proliferation index a predictor for response to neoadjuvant therapy in patients with esophageal cancer? Am J Surg 179:514-520, 2000 1i

Claims

1. A prognostic signature for determining progression of gastrointestinal cancer in a patient, comprising one or more genes selected from Table A, Table B, Table C or Table D. 5

2. The signature of claim 1, wherein the signature comprises one or more genes selected from any one of CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRM1, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC41, CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREX1, BUB3, FEN1, DRF1, PREI3, CCNE1, 10 RPA1, POLE3, RFC4, MCM3, CHEK1, CCNDI, and CDC37.

3. A method of predicting the likelihood of long-term survival of a gastrointestinal cancer patient without the recurrence of gastrointestinal cancer, comprising determining the expression level of one or more prognostic RNA transcripts or their expression products in 15 a gastrointestinal sample obtained from the patient, normalized against the expression level of all RNA transcripts or their products in the gastrointestinal cancer tissue sample, or of a reference set of RNA transcripts or their expression products; wherein the prognostic RNA transcript is the transcript of one or more genes selected from table A, Table B, Table C or Table D ; and 20 establishing likelihood of long-term survival without gastrointestinal cancer recurrence.

4. The method of claim 3, wherein at least one prognostic RNA transcripts or its expression products is selected from any one of CDC2, MCM6, RPA3, MCM7, PCNA, 25 G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, ,RRM1, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC4L1, CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREXI, BUB3, FENI, DRFI, PRE3, CCNE1, RPA1, POLE3, RFC4, MCM3, CHEK1, CCNDI, and CDC37 30 5. The method of claim 3 or claim 4 comprising determining the expression level of at least two, at least five, at least 10, or at least 15 of the prognostic RNA transcripts or their expression products.

6. The method according to any one of claims 3 to 5, wherein increased expression of the 35 one or more prognostic RNA transcripts or their expression products indicates an increased likelihood of long-term survival without gastrointestinal cancer recurrence. 112 WO 2009/045115 PCT/NZ2008/000260

7. The method according to any one of claims 3 to 5, wherein a predictive model is applied, established by applying a predictive method to expressions levels of the predictive signature in recurrent and non-recurrent tumour samples, to establishing likelihood of long-term survival without gastrointestinal cancer recurrence. 5

8. The method of claim 7, wherein said predictive method is selected from the group consisting of linear models, support vector machines, neural networks, classification and regression trees, ensemble learning methods, discriminant analysis, nearest neighbor method, bayesian networks, independent components analysis. 10

9. The method of any one of claims 3 to 8 wherein the gastrointestinal cancer is gastric cancer or colorectal cancer.

10. The method of any one of claims 3 to 9 wherein the expression level of one or more 15 prognostic RNA transcripts is determined.

11. The method of any one of claims 3 to 10 wherein the RNA is isolated from a fixed, wax- embedded gastrointestinal cancer tissue specimen of the patient. 20 12. The method of any one of claims 3 to 10 wherein the RNA is isolated from core biopsy tissue or fine needle aspirate cells.

13. An array comprising polynucieotides hybridizing to two or more genes selected from table A, Table B, Table C or Table D. 25 14 An array of claim 13 comprising polynucleotides hybridizing to two or more of the following genes: CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRM1, CDC45L, MAD2LI, RAN, DUT, RRM2, CDK7, MLH3, SMC4L1, CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREX1, BUB3, FEN1, DRF1, PREI3, CCNEI, 30 RPA1, POLE, RFC4, MCM3, CHEKI, CCND1, and CDC37.

15. The array of claim 13 or claim 14 comprising polynucleotides hybridizing to at least 3, at least five, at least 10 or at least 15 of the genes. 35 16. The array of claim 13 comprising polynucleotides hybridizing to the following genes: CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRMI, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC4L1, CSPG6, POLD2, 113 WO 2009/045115 PCT/NZ2008/000260 POLE2, BCCIP, Pfs2, TREXI, BUB3, FEN1, DRFI, PREI3, CCNEI, RPAI, POLE, RFC4, MCM3, CHEKI, CCND1, and CDC37.

17. The array of any one of claims 13 to 16 wherein the polynucleotides are cDNAs. 5

18. The array of claim 17 wherein the cDNAs are about 500 to 5000 bases long.

19. The array of claim any one of claims 13 to 16 wherein the polynucleotides are oligonucleotides. 10

20. The array of claim 19 wherein the oligonucleotides are about 20 to 80 bases long.

21. The array of any one of claims 13 to 20 wherein the solid surface is glass. 15 22. A method of predicting the likelihood of long-term survival of a patient diagnosed with gastrointestinal cancer, without the recurrence of gastrointestinal cancer, comprising the steps of: (1) determining the expression levels of the RNA transcripts or the expression products of genes or a gene selected from table A, Table B, Table C or Table D, in a 20 gastrointestinal cancer tissue sample obtained from the patient, normalized against the expression levels of all RNA transcripts or their expression products in the gastrointestinal cancer tissue sample, or of a reference set of RNA transcripts or their products; (2) subjecting the data obtained in step (1) to statistical analysis; and 25 (3) determining whether the likelihood of the long-term survival has increased or decreased; and establishing the likelihood of long-term survival without gastrointestinal cancer recurrence. 30 23 The method of claim 22, wherein at least one prognostic RNA transcripts or its expression products is selected from any one CDC2, MCM6; RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRM1, CDC45L, MAD2L1, RAN, DUT, RRM2, CDK7, MLH3, SMC41, CSPG6, POLD2, POLE2, BCCIP, Pfs2, TREX1, BUB3, FENI, DRFI, PREI3, CCNEI, RPAI, POLE, RFC4, MCM3, CHEKI, CCNDI, and 35 CDC37. 114 WO 2009/045115 PCT/NZ2008/000260

24. The method of claim 22 or claim 23 wherein the statistical analysis is performed by using the Cox Proportional Hazards model.

25. A method of preparing a personalized genomics profile for a cancer patient, 5 comprising the steps of: (a) subjecting RNA extracted from a gastrointestinal tissue obtained from the patient to gene expression analysis; (b) determining the expression level of one or more genes selected from the gastrointestinal cancer gene set listed in any one of Table A, Table B, Table C or Table D, wherein the expression level is normalized against a control gene or genes and optionally is compared to the amount found in a 10 gastrointestinal cancer reference tissue set; and (c) creating a report summarizing the data obtained by the gene expression analysis. 25. The method of claim 24, wherein the gastrointestinal tissue comprises gastrointestinal cancer cells. 15

26. The method of claim 24 wherein the gastrointestinal tissue is obtained from a fixed, paraffin-embedded biopsy sample.

27. The method of claim 26 wherein the RNA is fragmented. 20

28. The method of any on of claims 22 to 27 wherein the report includes prediction of the likelihood of long term survival of the patient.

29. The method of any one of claims 22 to 29 wherein the report includes 25 recommendation for a treatment modality of the patient.

30. A prognostic method comprising: (a) subjecting a sample comprising gastrointestinal cancer cells obtained from a patient to quantitative analysis of the levels of RNA transcripts of at least one gene selected from any one of Table A, Table B, Table C 30 or table D, or its product, and (b) identifying the patient as likely to have an increased likelihood of long-term survival without gastrointestinal cancer recurrence if normalized expression levels of the gene or genes, or their products, are elevated above a defined expression threshold. 35 31. The method of claim 30, wherein at least one prognostic RNA transcripts or its expression products is selected from any one CDC2, MCM6, RPA3, MCM7, PCNA, G22P1, KPNA2, ANLN, APG7L, TOPK, GMNN, RRMI, CDC45L, MAD2L1, RAN, DUT, 1.15 WO 2009/045115 PCT/NZ2008/000260 RRM2, CDK7, MLH3, SMC4L1, CSPG6, POLD2, POLE2. BCCIP, Pfs2, TREXI, BUB3, FEN1, DRF1, PRE3, CCNE1, RPA1, POLE3, RFC4, MCM3, CHEKI, CCNDI, and CDC37. 5 32. The method of claim 30 or claim 31, wherein the levels of the RNA transcripts of the genes are normalized relative to the mean level of the RNA transcript or the product of two or more housekeeping genes.

33. The method of claim 32 wherein the housekeeping genes are selected from the 10 group consisting of glyceraldehyde-3-phosphate dehydrogenase (GAPDH), Cypl, albumin, actins, tubulins, cyclophilin hypoxantine phosphoribosyltransferase (HRPT), L32, 28S, and

185. 34. The method of any one of claims 30 to 33 wherein the sample is subjected to global 15 gene expression analysis of all genes present above the limit of detection. 35. The method of any one of claims 30 to 34 wherein the levels of RNA transcripts of the genes are normalized relative to the mean signal of the RNA transcripts or the products of all assayed genes or a subset thereof. 20 36. The method of any one of claims 30 to 35 wherein the levels of RNA transcripts are determined by quantitative RT-PCR, and the signal is a Ct value. 37. The method of claim 35 wherein the assayed genes include at least 50 or at least 25 100 cancer related genes. 38. The method of any one of claims 30 to 37 wherein the patient is human. 39. The method of any one of claims 30 to 38 wherein the sample is a fixed, paraffin 30 embedded tissue (FPET) sample, or fresh or frozen tissue sample. 40. The method of any one of claims 30 to 38 wherein the sample is a tissue sample from fine needle, core, or other types of biopsy. 35 41. The method of any one of claims 30 to 4.0 wherein the quantitative analysis is performed by quantitative RT-PCR. 116 WO 2009/045115 PCT/NZ2008/000260 42. The method of any one of claims 30 to 40 wherein the quantitative analysis is performed by quantifying the products of the genes. 43. The method of any one of claims 30 to 40 wherein the products are quantified by 5 immunohistochemistry or by proteomics technology. 44. The method of any one of claims 30 to 43 further comprising the step of preparing a report indicating that the patient has an increased likelihood of long-term survival without gastrointestinal cancer recurrence. 10 45. A kit comprising one or more of (1) extraction buffer/reagents and protocol; (2) reverse transcription buffer/reagents and protocol; and (3) quantitative RT-PCR buffer/reagents and protocol suitable for performing the method of any one of claims 3, 25, and 30. 15 117