AU2004219989B2 - Expression profiling of tumours - Google Patents

Expression profiling of tumours Download PDF

Info

Publication number
AU2004219989B2
AU2004219989B2 AU2004219989A AU2004219989A AU2004219989B2 AU 2004219989 B2 AU2004219989 B2 AU 2004219989B2 AU 2004219989 A AU2004219989 A AU 2004219989A AU 2004219989 A AU2004219989 A AU 2004219989A AU 2004219989 B2 AU2004219989 B2 AU 2004219989B2
Authority
AU
Australia
Prior art keywords
tumour
primary
gene expression
sample
origin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2004219989A
Other versions
AU2004219989A1 (en
Inventor
David Bowtell
Andrew Holloway
Adam Kowalczyk
Richard Tothill
Ryan Van Laar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peter MacCallum Cancer Institute
Original Assignee
Peter MacCallum Cancer Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2003901177A external-priority patent/AU2003901177A0/en
Priority claimed from AU2003907084A external-priority patent/AU2003907084A0/en
Application filed by Peter MacCallum Cancer Institute filed Critical Peter MacCallum Cancer Institute
Priority to AU2004219989A priority Critical patent/AU2004219989B2/en
Priority claimed from PCT/AU2004/000299 external-priority patent/WO2004081564A1/en
Publication of AU2004219989A1 publication Critical patent/AU2004219989A1/en
Application granted granted Critical
Publication of AU2004219989B2 publication Critical patent/AU2004219989B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Description

WO 2004/081564 PCTiAU20041000299 Expression Profiling Of Tumors The present invention relates to methods of profiling tumours and characterisation of the tissue types associated with the tumour. The present invention also relates to a method of analysing gene expression data. Also provided is a means to identify primary tumours and to further determine the identity of a tumour of unknown primary. The invention also provides a method of treatment of a tumour by diagnosis of primary tumours identified by the methods described.
BACKGROUND
Advances in the treatment of cancer have resulted in significant improvements in median survival times for patients with many forms of the disease. These improvements have been the result of tailoring treatments to specific types of tumours based on tissue specific molecular targets, for example hormone treatments for ovarian and breast cancers. However, if a tumour is misdiagnosed inappropriate treatment may delay recovery or have no effect on the disease. Therefore, there remains a need to correctly and reliably identify the source tissue of a tumour.
Despite the enormous amount of information regarding cancer and its diagnosis, there remains a significant proportion of new cancer cases that present with atypical symptoms. It has become apparent that the site of a tumour might belie its true origin. Metastatic tumours especially are in this class because the primary tumour may be small or undetectable, consequently a large metastasis may be misdiagnosed as the primary tumour. Carcinomas of unknown origin account for between 3 and 5% of carcinomas. For example, a so-called Krukenberg tumour is a metastatic secondary carcinoma in the ovary and represents 6% of ovarian tumours. The primary tumour is usually a mucinous carcinoma of the stomach. The definition is also broadly applied to tumours of the breast, pancreas and bowel metastatic to the ovary. Overall, approximately 20% of cancers in the ovary are thought to be of non-ovarian WO 2004/081564 PCT/AU20041000299 origin. Figure 1 shows the most common sites of the primary in carcinoma of unknown primary in the cases where a primary is identified.
Current diagnostic techniques to identify the primary tumour in a patient with multiple disseminated metastases include morphological assessment, molecular pathological analyses including immunohistochemical staining, imaging techniques (CT, PET, mammography), and endoscopic techniques (gastroscopy, bronchoscopy, colonoscopy). While representing a small fraction of all patients, carcinoma of unknown primary accounts for the fourth most common cause of cancer death, mostly as the prognosis for these patients is bleak, with median survival eleven months. Carcinoma of unknown primary presents clinicians with a dilemma, namely how far to take investigation given the survival of patients with carcinoma of unknown primary is so poor. There is considerable debate concerning the value of detailed investigation to determine site of origin. Oncologists have been reluctant to perform low-yield investigations because of the unacceptable cost-effectiveness ratio. The cost of these investigations is not only monetary, but also impacts quality of life for the patient, and morbidity arising from invasive diagnostic procedures. Whether patients benefit from a more definitive diagnosis is unclear, however, it is the case that treatment approaches can vary significantly depending on cellular origin. For example, the drug therapies used for mnetastatic adenocarcinoma of the lung are significantly different to those used for metastatic adenocarcinoma of the pancreas, which, in turn are different to the therapeutic approach to metastatic colorectal adenocarcinoma.
Accordingly, it is desirable to provide a method to identify the origin of a tumour or the primary site in a carcinoma of an unknown primary so that effective treatment can be administered.
High-throughput expression analysis has recently been employed to great effect in the sub-classification of many tumour types. Since cancer is a disease of aberrant gene regulation, our ability to use microarrays to profile gene expression on a massively parallel level has begun to unravel the molecular WO 2004/081564 PCT/AU20041000299 mechanisms behind tumour initiation, progression and response to therapy.
The power of large-scale genetic analysis lies in the fact that the expressions of thousands of genes are used to characterise the tumour, rather than just several markers. Many examples now exist where formerly homogenous groupings of tumours based on conventional histopathological techniques have been subdivided into groups based on molecular profiling. Not unsurprisingly, the wealth of gene expression data in several diseases has begun to support the hypothesis that morphologically indistinguishable tumours may be molecularly distinguishable. This has potentially widespread application in the clinical application of technologies aimed at refining diagnosis and prognostication in cancer.
However, with the complex data derived from expression analysis, it is difficult to discern a meaningful result to fully diagnose and identify the primary tumour.
Accordingly, it is an aspect of the invention to provide a method to identify a primary site of a tumour.
A further difficulty encountered by those trying to identify a tumour's origin occurs when a patient develops a new tumour following an earlier disease.
Typically such earlier disease will have been treated and a sample of the diseased tissue may have been stored by standard techniques such paraffin embedding. In order to determine whether the new disease is related to the earlier disease it may be necessary to analyse gene expression in that archived sample. Conventional methods of gene expression analysis require high quality nucleic acid to be isolated, which is not possible from, for example, paraffin embedded tissue.
Thus, it is a further aspect of the present invention to provide a method of identifying a primary tumour from an archived or preserved sample.
P NOPERVAMSPECR2 219989- I spa md-t dc-17/I12/)8 00 -4-
U
SSUMMARY OF THE INVENTION In one aspect of the present invention, there is provided a method of profiling a biological sample, said method including: Sobtaining a gene expression profile from the biological sample; 00 5 obtaining a gene expression database from one or more biological samples; identifying different patterns of gene expression between the biological samples; identifying genes that comprise the different patterns of gene expression; and correlating the genes that comprise the different patterns of gene expression of the gene expression profile of the biological sample and the gene expression database to provide a profile of the biological sample.
In another aspect there is provided a method of identifying an origin of an unknown tumour sample, said method including: obtaining a gene expression profile of the unknown tumour sample; comparing the gene expression profile of the unknown tumour sample to a predictive model for tumours established from a gene expression database said database including gene expression profiles from known tumour samples and wherein the model has been validated for tumour identification, and identifying the origin of the unknown tumour sample when a gene expression profile from the predictive model correlates with the gene expression profile of the unknown tumour sample.
In another aspect of the present invention, there is provided a method of analysing gene expression data to generate a gene expression profile or a gene expression database for use in diagnosing tumours. Preferably the method allows comparison of data obtained from different experiments.
PA\OPERNMKRSPECR20D4219989- I spa modudoe1711212008 00 -4A- In yet another aspect of the present invention, there is provided a gene expression database generated using a method described herein.
In a further aspect there is provided a predictive model for identifying an origin of 00 an unknown tumour established from a gene expression database said database including gene expression profiles from known tumour samples and wherein the model has been validated for tumour identification.
In a further aspect of the present invention, there is provided an expression-based diagnostic evaluation of the tissue of origin of a tumour. Preferably the expression-based evaluation is based on comparing a gene expression profile of a tumour with a gene expression database representing one or more tumour or tissue types.
In another aspect of the present invention, there is provided a method of treatment of a patient having a tumour of unknown origin including the steps of: identifying the tissue of origin of the tumour of unknown origin; and WO 2004/081564 PCT/AU2004/000299 treating the patient in a manner appropriate for treating a tumour originating from that tissue.
An alternative gene expression profiling platform to cDNA microarray analysis is proposed using a system of high throughput RT-PCR (real time PCR). Key cancer class specific markers, identified through microarray analysis, can be easily translated to the RT-PCR method, allowing utilization of more robust and reproducible platform that could be integrated into a conventional pathology laboratory. Additionally, through using the method of RankLevels it has been shown that microarray and RT-PCR datasets can be used for building integrated SVM predictor algorithms. This allows the utilization of datasets from both platforms for training and building such predictors. The RankLevel method can also be applied to cross platform meta-analysis to use or mine pre-existing gene expression datasets.
BRIEF DESCRIPTION OF THE FIGURES Figure 1 shows the most common sites of the primary in carcinoma of unknown primary.
Figure 2 shows the results of unsupervised hierarchical clustering of gene expression data from 121 primary tumours from a diverse range of human tumours.
Figure 3 shows a subset of genes which describes differences between tumour types.
Figure 4 shows a graph indicating the results from the ranking of genes in order to identify a subset with the highest predictive strength.
Figure 5 shows a confusion matrix constructed to show predictor accuracy as determined using the proportions of correct classifications from a leave-one-out cross validation in conjunction with a k-nearest neighbours algorithm.
WO 2004/081564 PCT/AU2004/000299 Figure 6 shows the validity of the predictor algorithm by using it to identify the origin of twelve samples of metastatic tumour of unknown primary.
Figure 7 shows hierarchical clustering of ovarian (blue) and colorectal (red) primary tumours with Krukenberg-like tumours (green). All Krukenberg tumours co-cluster with colorectal primary tumour.
Figure 8 shows that support vector machine analysis with twelve tumour types identifies a colorectal source for the five Krukenberg-like tumours shown in Figure 7. The Y-axis represents a confidence measure of the prediction.
Figure 9 shows a heat map alignment of data generated using cDNA microarray and RT-PCR.
Figure 10 shows a hierarchical cluster analysis of RT-PCR data.
Figure 11 shows the performance of RankLevels for in classification of microarray data. Experiments presents accuracy of LOO (leave-one-out) cross validation on a set of 133 cancer samples divided into 16 classes. For Figure A, full precision of pin-group normalised expressions was used, for Figures B and C used RankLevels with 3 and 5 levels, respectively.
Figure 12 demonstrates the effect of dataset size and complexity on distribution of predictions within the three confidence levels and their relative accuracies.
The Complete dataset represents LOOCV results from the complete dataset (n=229). Training/Test represents LOOCV results from Training set only (n=167). LSO represents the accumulated results from iteratively leaving subtypes from training LCO represents accumulated results from iteratively leaving site of origin classes from training (n=229).
WO 2004/081564 PCT/AU20041000299 DETAILED DESCRIPTION OF THE INVENTION In one aspect of the present invention, there is provided a method of profiling a biological sample, said method including: obtaining a gene expression profile from the biological sample; obtaining a gene expression database from one or more biological samples; identifying different patterns of gene expression between the biological samples; identifying genes that comprise the different patterns of gene expression; and correlating the genes that comprise the different patterns of gene expression of the gene expression profile of the biological sample and the gene expression database to provide a profile of the biological sample.
Applicants have used molecular profiling techniques to characterise tumours and various tissues of biological samples based on their gene expression profile. The underlying principle of this work is that an individual cell type only expresses a subset of the total number of genes present in the genome. The fraction of genes expressed reflects and determines the biological state of the cell and provides a molecular snapshot of the cellular phenotype.
As used herein the term "gene expression profile" includes information on the expression levels of a plurality of genes within a biological sample. A biological sample within the scope of the present invention may be any biological sample that includes cellular material from which DNA, RNA or protein may be isolated.
The expression level of a gene may be determined by the amount of DNA, RNA or protein present in the sample which corresponds with the gene. The gene expression profile therefore, may include levels of DNA, RNA and/or protein correlated to specific genes within the biological sample.
Gene expression levels may be obtained in a variety of ways including, but not limited to analysing DNA levels, mRNA levels, analysing protein levels and determining transcription initiation rates. Preferably gene expression levels are WO 2004/081564 PCT/AU20041000299 determined by analysis of mRNA'levels. More preferably mRNA levels are determined by a hybridisation-based method or a PCR-based method.
A variety of different biological samples may be used to generate a gene expression profile. For example, the biological sample may be a tissue sample and the tissue may be normal or diseased. A diseased tissue sample may include a pre-cancerous tissue, a cancerous tissue, a tumour, a primary tumour, a metastatic tumour, or cells collected from a pleural effusion. A pre-cancerous tissue includes a tissue which may become cancerous. The biological sample may include freshly collected tissue, frozen tissue or archived tissue. In the case of archived tissue the sample may be a paraffin-embedded sample.
A gene expression profile may be established by hybridising a labelled nucleic acid sample from a biological sample to a plurality of target nucleic acids, and detecting to which of the plurality of target nucleic acids the labelled nucleic acid has bound, thereby determining which of the plurality of target nucleic acids are expressed in the biological sample and establishing a gene expression profile for the biological sample. An exemplary method of gene expression analysis by a hybridisation-based technology includes the use of a microarray. In this example, mRNA from a sample may be labelled either directly or through the synthesis of labelled cDNA. The labelled nucleic acid may then be hybridised to the microarray and expression levels determined by detecting the amount of labelled nucleic acid bound at particular positions on the microarray.
Alternatively or additionally, a PCR-based method of gene expression analysis may be used. For example, a quantitative RT-PCR technique. In this method, RNA from a biological sample may be reverse transcribed to generate segments of cDNA which may then be amplified by gene-specific quantitative PCR. The rate of accumulation of specific PCR products can be correlated to the abundance of the corresponding RNA species in the original sample and thereby provide an indication of gene expression levels. An RT-PCR method of gene expression analysis provides a robust method for obtaining expression data in a short time, compared with hybridisation-based techniques.
WO 2004/081564 PCT/AU20041000299 Both of the aforementioned techniques determine the expression of a gene by measuring the amount of mRNA corresponding to the gene.
Protein expression data may also be included in a gene expression profile since the level of a protein product generally represents the functional expression level of a gene. Protein expression levels may be determined by a hybridisation assay such as binding to an antibody or other ligand, or a functional assay where a specific protein function or activity may be measured directly.
Although less suited to high throughput or rapid analysis, transcription initiation rates may also provide an indication of gene expression levels. Such analyses require the use of a living sample in which nascent RNA transcripts are pulse labelled in vivo and analysed in a gene specific manner, generally involving hybridisation to unlabelled target nucleic acid representing the gene of interest.
The labelled RNA only represents genes being actively transcribed and gives an indication of the rate of transcription initiation of a gene.
Hence a gene expression profile provides information on the expression level of a plurality of genes within a biological sample. Preferably the biological sample is a tissue sample. More preferably the biological sample is a tumour sample.
The tumour sample may be of known origin or of unknown origin.
In particular embodiments of the present invention a plurality of gene expression profiles may be used to generate a gene expression database.
As used herein the term "gene expression database" refers to the expression profiles for a given sample type or types. A plurality of gene expression profiles may be used to generate the gene expression database. The gene expression profiles are statistically analysed to identify gene expression levels that characterise particular sample types. The gene expression database may also be established for a given tissue type or plurality of tissue types, and thus, in particular embodiments of the present invention, may allow the identification of WO 2004/081564 PCT/AU20041000299 the tissue from which a tumour was originally derived, by comparing the tumour's gene expression profile to the gene expression database.
Hence a gene expression database establishes a "fingerprint" of the expression profiles for a given sample type. Preferably the sample is a tissue sample.
More preferably the sample is a tumour sample. In particular embodiments of the present invention a gene expression database includes gene expression information for one or more sample types, including but not limited to any one or more of the following tumours: gastric, colorectal, pancreatic, breast and ovarian.
Patterns of gene expression may be determined by statistical analysis of a gene expression profile or a gene expression database. Preferably the analysis employs an algorithm which utilises a number of informatic tools including knearest neighbours and a support vector machine (SVM) approach. In analyzing gene expression data, the first stage is to reduce the number of genes analysed to an optimal subset, capable of reliably describing differences between tumour types. This step is necessary as microarray-derived gene expression profiles may include data from the many thousands of genes represented on the array. Preferably, an initial step of normalizing the data is employed. Depending on the method by which the expression data is obtained, the normalization procedure may be accompanied by a Ranking System, described below. Generally, with microarray data, the number of data points is large and normalization is needed to reduce the numbers and exclude noise and aberrant data to a manageable level. However, when using RT_PCR to generate the gene expression data, the number of data points is much less and hence more manageable. Therefore, these datapoints may undergo a Ranking process at this stage as described below.
The optimal number and selection of genes for classification of tumours and biological samples from a range of primary origins is determined by using an iterative signal to noise ratio algorithm. This method ranks genes according to the difference of their mean expression values for each class of tumour, divided WO 2004/081564 PCT/AU20041000299 by the sum of the standard deviations, ie. (ml m 2 s2). This effectively identifies those genes that have a consistently different expression measurement within a given class of tumours, relative to the values of that gene across all other tumour types present. This method may also be employed when RT-PCR is used to validate the gene expression profiles of the samples.
For, instance, a microarray may be used to initially test a number of genes from which a reduced set of expressing genes indicative of the sample may then be applied to an assay such as RT-PCR which requires less gene sets (but more specific genes) and generates fewer data points.
To select and test subsets of genes, a leave-one-out (LOO) cross validation in conjunction with the k-nearest neighbors algorithm can be used. Briefly, this algorithm seeks to classify an unknown sample by comparing it to samples of known class by using a distance metric. The class of the closest samples is assigned to the sample being tested. LOO involves permutations of the dataset whereby each sample is held out separately and a class assigned to it by using the remaining samples. This is repeated until each sample has been left out of the training set once and been assigned to a class. The proportion of correct classifications is used a measure of predictor accuracy. By plotting the actual tumour classes on one axis and the predicted classes on the other, a histogram-type view of the overall success or failure of the classification approach can be achieved. This representation (see for example Figure 5) also allows identification of any particular classes with more incorrect predictions relative to other tumour types. The average prediction accuracy in LOO analysis in one particular training set is approximately 97%.
The applicants have generated a training set of over 120 primary tumours from a diverse range of human tumours, representing the major tumour types accounting for carcinoma of unknown primary (see Table Unsupervised heirarchical clustering of gene expression data from these tumours results in a near perfect segregation of different tumour types. Figure 2 shows the results of such a cluster, with approximately 500 genes selected on the basis of at least three samples with an expression ratio greater than or equal to 2.7.
WO 2004/081564 PCT/AU2004/000299 Table 1. Summary of tumour samples used in training set.
Key: test: samples processed on MFC, trai: samples processed by microarray not by MFC trai2: samples not used for MFC Patient ID CancerType DataClass Comment P00030 breast test Primary |ER Positive P00640 breast test Metastasis IER Positive P00734 breast test Primary I P00743 breast test Primary |ER Positive P01026 breast test Primary IER Negative P01212 breast test Primary [ER Positive P01374 breast test Primary |ER Positive P01398 breast test Primary |ER Positive P01696 breast test Primary |ER Positive P02274 breast test Primary |ER Positive P02288 breast test Primary |ER Positive P00541 colorectal test Primary I P00617 colorectal test Primary I P01740 colorectal test Primary P01757 colorectal test Primary P01840 colorectal test Primary P02225 colorectal test Primary I P02553 colorectal test Primary I P02740 colorectal test Primary P00448 Gastric test Primary Intestinal P00514 Gastric test Primary Intestinal P00553 Gastric test Primary |Diffuse P00559 Gastric test Primary Intestinal P00628 Gastric test Primary Intestinal P00661 Gastric test Primary ISignet ring P02173 Gastric test Primary Diffuse P02176 Gastric test Primary |Diffuse P02318 Gastric test Primary |Diffuse P00195 ovarian test Primary Iserous P00446 ovarian test Primary Iserous P00633 ovarian test Primary Iserous P00756 ovarian test Primary Iserous P00772 ovarian test Primary Iserous P01164 ovarian test Metastasis Iserous P01246 ovarian test Primary serous P01428 ovarian test Primary Iserous P01436 ovarian test Primary Iserous P02244 pancreas test Primary I P02245 pancreas test Primary I P02246 pancreas test Primary I P02248 pancreas test Primary I P02249 pancreas test Primary I P02250 pancreas test Primary I P03056 pancreas test Primary P00006 breast trai Primary |ER Positive P00009 breast trai Primary IER Positive P00066 breast trai Primary IER Negative WO 2004/081564 P00442 breast P00467 breast P00469 breast P00478 breast P00504 breast P00546 breast P00572 breast P00621 breast P00746 breast P00776 breast P00786 breast P00905 breast P00993 breast P01289 breast P01292 breast P01579 breast P01843 breast P01944 breast P00002 colorectal P00049 colorectal P00578 colorectal P00587 colorectal P00721 colorectal P00759 colorectal P00896 colorectal P00961 colorectal P00967 colorectal P00974 colorectal P01016 colorectal P01060 colorectal P01838 colorectal P01844 colorectal P01905 colorectal P00035 Gastric P00048 Gastric P00051 Gastric P00433 Gastric P00483 Gastric P00503 Gastric P00536 Gastric P00551 Gastric P00109 ovarian P00130 ovarian P00151 ovarian P00155 ovarian P00160 ovarian P00164 ovarian P00165 ovarian P00169 ovarian P00188 ovarian P00488 ovarian P00496 ovarian P00505 ovarian PCT/AU2004/000299 Primary IER Negative Primary IER Positive Primary IER Negative Primary IER Positive Primary IER Positive Primary JER Negative Primary IER Positive Primary IER Positive Primary IER Negative Primary IER Negative Primary IER Positive Primary IER Negative Primary IER Negative Metastasis IER Positive Primary |ER Negative Metastasis ]ER Positive Primary |ER Negative Metastasis ]ER Negative Primary I Primary I Primary I Primary I Metastasis Metastasis Metastasis I Primary I Metastasis ]Sigmoid Primary I Metastasis Metastasis I Primary I Metastasis Primary I Primary IDiffuse Metastasis ]Signet ring Primary ISignet ring Primary IDiffuse Primary IDiffuse Metastasis IDiffuse Primary |Diffuse Primary |Diffuse Primary lendometriod Primary Iserous Primary |endometriod Primary Iserous Primary [serous Primary lendometriod Primary Iserous Primary Imucinous Primary lendometriod Primary Imucinous Metastasis Primary lendometriod WO 2004/081564 PCT/AU20041000299 P00506 ovarian tral Primary lendometriod P00511 ovarian tral Primary Iserous P00627 ovarian trai Primary Imucinous P00706 ovarian trai Metastasis Iserous P00718 ovarian trai Primary Imucinous P00784 ovarian tral Primary Imucinous P00807 ovarian trai Primary Imucinous P00809 ovarian trai Primary Iserous P00933 ovarian tral Primary Iserous P00935 ovarian trai Primary Imucinous P01348 ovarian trai Primary Iserous P01563 ovarian trai Primary Iserous RBH 91 ovarian trai Primary lendometriod I RBH 91.033 RBH 92 ovarian trai Primary Imucinous I RBH 92.011 RBH 93 ovarian trai Primary lendometriod I RBH 93.118 RBH 93 ovarian trai Primary lendometriod I RBH 93.061 RBH 93 ovarian trai Primary mucinous I RBH 93.002 RBH 93 ovarian trai Primary mucinous I RBH 93.085 RBH 94 ovarian trai Primary endometriod I RBH 94.037 RBH 94 ovarian trai Primary lendometriod I RBH 94.120 RBH 94 ovarian trai Primary lendometriod I RBH 94.020 RBH 94 ovarian trai Primary Imucinous I RBH 94.030 RBH 94 ovarian trai Primary Imucinous I RBH 94.072 RBH 94 ovarian trai Primary Imucinous I RBH 94.080 WM 090 ovarian trai Primary Imalignant mucinous WM 223 ovarian trai Primary Imucinous WM 438 ovarian trai Primary Imucinous WM 439 ovarian trai Primary Imucinous WM 454 ovarian trai Primary |malignant mucinous P02078 pancreas trai Primary I P02247 pancreas trai Primary I P00815 Lung trai2 Primary |scc P00817 Lung trai2 Primary Isce P00925 Lung trai2 Primary |scc P01323 Lung trai2 Primary |scc P01400 Lung trai2 Primary |scc P01759 Lung trai2 Primary ladenocarcinoma P01770 Lung trai2 Primary ladenocarcinoma P01907 Lung trai2 Primary Isco P01909 Lung trai2 Primary ladenocarcinoma P02021 Lung trai2 Primary Iscc P02023 Lung trai2 Primary ladenocarcinoma P02024 Lung trai2 Primary |large cell P02025 Lung trai2 Primary ladenocarcinoma P02026 Lung trai2 Primary ladenocarcinoma P02028 Lung trai2 Primary |large cell P02029 Lung trai2 Primary |scc P02030 Lung trai2 Primary |large cell P02031 Lung trai2 Primary Isce P02032 Lung trai2 Primary ladenocarcinoma P02033 Lung trai2 Primary ladenocarcinoma P02034 Lung trai2 Primary [large cell P02035 Lung trai2 Primary jadenocarcinoma WO 2004/081564 WO 204/01564PCT/A1J20041000299 P02037 Lung P02038 Lung P02039 Lung P02040 Lung P02041 Lung P02042 Lung P02043 Lung P02044 Lung P02045 Lung P02090 Lung P00508 melanoma P00576 melanoma P00761 melanoma P00825 melanoma P00833 melanoma P00923 melanoma P00977 melanoma P00979 melanoma P01537 melanoma P01861 melanoma trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 tral2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 P01726 mesotheliomna trai2 P01728 mesotheliomna trai2 P01729 mesotheliomna trai2 P01730 mesotheliomna trai2 P01731 mesotheliomna trai2 P01733 mesotheliomna trai2 P00050 Oesophageal trai2 P00450 Oesophageal trai2 Primary ladenocarcinomna Primary jadenocarcinoma Primary Isc Primary large cell Primary Ilarge cell Primary Ilarge cell Primary Iscc Primary Isoc Primary Ilarge cell Primary Ilarge cell Metastasis I Metastasis I MetastasisI MetastasisI MetastasisI MetastasisI Metastasis Metastasis Metastasis Metastasis Primary I Primary I Primary I Primary I Primary I Primary I Primary IMixed Primary lDiffuse PrimaryI Primary I Primary I Primary I Primary I Primary I Primary!I Primary I Metastasis I Primary I Primary I Primary Primary Primary IClear cell Primary I Primary I Primary IClear cell Primary I Metastasis I Primary I MetastasisI Primary [Larynx, NOS Primary [Tongue, NOS Primary [Tongue, NOS Primary [Pharynx, NOS P00032 prostate P00880 prostate P00890 prostate P00954 prostate P01I09 prostate P01421 prostate P01653 prostate P01813 prostate P00916 renal P00998 renal P01020 renal P01038 renal P01043 renal P01048 renal P01098 renal P01218 renal P01270 renal P01278 renal P01574 renal P01817 renal P01908 renal P01093 SCCother P01158 SCCother P01308 SCCother P01343 SCCother trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 tra12 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 WO 2004/081564 PCT/AU2004/000299 P01394 P01472 P01749 P01273 P01341 P01371 P01402 P01633 P01660 P01766 P01832 P00876 P01124 P02345 P00635 P00724 P00741 P00742 P00848 P00909 P00940 P00943 P01872 SCCother SCCother SCCother SCCother Skin SCCother Skin SCCother Skin SCCother Skin SCCother Skin SCCother Skin SCCother Skin SCCother Skin testicular testicular testicular uterine uterine uterine uterine uterine uterine uterine uterine uterine trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 trai2 Primary jPyriform sinus Primary ILarynx, NOS Primary ILarynx, NOS Primary ISkin of lip, NOS Unknown [Unknown primary site Primary ISkin, NOS Primary ISkin of scalp and neck Primary ISkin of other and unspecified parts of face Primary [Skin, NOS Primary ISkin of other and unspecified parts of face Unknown [Unknown primary site Primary Primary Primary Primary [endometriod Primary [endometriod Primary jendometriod Primary jendometriod Primary [endometriod Primary lendometriod Primary lendometriod Primary |endometriod Primary lendometriod In a preferred embodiment of the present invention there is provided a set of approximately 90 genes (see Table 2 below), many of which may be used for discriminating between a plurality of sample types, including but not limited to any one or more of the following tumours: gastric, colorectal, pancreatic, breast and ovarian.
Table 2. A set of genes useful in discriminating gastric, colorectal, pancreatic, breast and ovarian tumnours.
Genbank RefSeq Name Symbol Class AA291749 NM_000125 estrogen receptor '1 ESRI brea AA479494 NM_020423 ezrin-binding partner PACE-I PACE-i brea AA479888 NM_-004703 rabaptin-5 RAB5EP brea AA482035 NM_-014804 KIAA0753 gene product KIAA0753 brea AA489647 NM_-004354 cyclin G2 CCNG2 brea A1362703 NM_-007255 xylosylprotein beta 1 ,4-galactosyltransferase, polypeptide 7 (gal actosyltransferase I) B34GALT7 brea A1635773 NM_-025202 likely ortholog of neuronally expressed calcium binding protein FLJ 13612 brea A1669721 NM_-014112 trichorhinophalangeal syndrome I TRPSI brea A1972286 NM_-002652 prolactin-induced protein PIP brea H10045 NM_-006113 vav 3 ncogene VAV3 brea H29315 NM_-012319 LIV-1 protein, estrogen regulated LIV-1 brea H72875 NM_-002051 GATA binding protein 3 GATA3 brea N23299 NM_014674 ER degradation enhancing alpha mannosidase-like EDEM brea N49284 NM_-005375 v-myb myeloblastosis viral oncogene homolog (avian) MYB brea R06567 NM_-003629 phosphoinositide-3-kinase, regulatory subunit, polypeptide 3 (p55, gamma) PiK3R3 brea R63647 NM_-000949 prolactin receptor PRLR brea H95792 NM_-001609 acyl-Coenzyme A dehydrogenase, short/branched chain ACADSB brea AA088420 NM_-015869 peroxisome proliferative activated receptor, gamma PPARG Colo AA099136 NM_-002296 lamin B receptor LBR Colo AA130579 NM_-006149 lectin, galactoside-binding, soluble, 4 (galectin 4) LGALS4 Colo AA130584 NM_-004363 carcinoembryonic antigen-related cell adhesion molecule 5 CEACAM5 Colo AA262074 NM_-147130 natural cytotoxicity triggering receptor 3 NCR3 Colo AA279081 NM_-015250 coiled-coil protein BICD2 BICD2 Colo AA284184 NM_018438 F-box only protein 6 FBX06 cobO AA406571 NM_-001712 carcinoembryonic antigen-related cell adhesion molecule 1 (biliary glycoproteln) CEACAM1 Colo AA465495 NM_016234 fatty-acid-Coenzyme A ligase, long-chain 5 FACL5 Colo AA699679 NM_-003889 nuclear receptor subfamily 1, group 1, member 2 NR112 Colob AA975612 NM_012396 pleckstrin homology-like domain, family A, member 3 PHLDA3 Colo A1433336 NM_007127 vilbin 1 VILI Colob A1681730 NM_007052 NADPH oxidase I NOXi Colo AW009320 N741 31 W72792 AA490044 AA664101 AA702350 AA702640 AA8451 56 A1090702 A1333599 AW009769 AW029441 AW058221 H23 187 H94487 N63943 R32848 R39069 T60861 AA405767 AA41 9229 AA453742 AA459363 AA621 342 AA683520 A1139437 A1963941 N52450 R24530 AA1 22287 AA450265 AA454651 AA670378 AA670429 NM_001804 caudal type homeo box transcription factor 1 NM_-003226 trefoil factor 3 (intestinal) NM_-004442 EphB2 NM_006933 solute carrier family 5 (inositol transporters), member 3 NM_000689 aldehyde dehydrogenase 1 family, member Al NM_015570 autism susceptibility candidate 2 NM_000790 dopa decarboxylase (aromatic L-amino acid decarboxylase) NM_003122 serine protease inhibitor, lKazal type i NM_014970 kinesin-associated protein 3 NM_019617 18 kDa antrum mucosa protein NM_003225 trefoil factor 1 (breast cancer, estrogen-inducible sequence expressed in) NM_002630 progastriosin (pepsinogen C) NM_004190 lipase, gastric NM_000067 carbonic anhydrase III NM_-001910 cathepsin E NM_000239 lysozyme (renal amyloidosis) NM_005980 S100 calcium binding protein P NM_003558 p hosph atidyli nositol-4-phosph ate 5-kinase, type 1, beta NM_01 7846 tRNA selenocysteine associated protein NM_013952 paired box gene 8 NM_144586 hypothetical protein MGC29643 NM_004172 solute carrier family 1 (glial high affinity glutamate transporter), member 3 NM_017495 RNA-binding region (RNPI, RRM) containing I NM_015415 DKFZP564Bl67 protein NM_-003064 secretory leukocyte protease inhibitor (antileukoproteinase) NM_005046 kallikrein 7 (chymotryptic, stratum corneum) NM_144505 kallikrein 8 (neuropsin/ovasin) NM_033624 F-box only protein 21 NM_016730 folate receptor I (adult) NM_005512 glycoprotein A repetitions predominant NM_002592 proliferating cell nuclear antigen NM_020831 megakaryoblastic leukemia (translocation) I NM_014504 putative Rab5 GDP/GTP exchange factor homologue NM_003020 secretory granule, neuroendocrine protein I (7B2 protein)
CDXI
TFF3 EPHB2 SLC5A3 ALDHlA1 AUTS2
DDC
SPINKI
KIFAP3 AMP18
TFFI
PGC
LIPF
CA2
CTSE
LYZ
sloop PIP5K<1 B SECP43 PAX8 MGC29643 SLCIA3 RNPCl DKFZP564B1 67
SLPI
KLK7 KLK8 FBX021
FOLRI
GARP
PCNA
MKILI
RABEX5
SGNEI
Colo Colo Colo gast gast gast gast gast gast gast gast gast gast gast gast gast gast gast gast ovar ovar ovar ovar ovar ovar ovar ovar ovar ovar pane pane pane pane pane AA844864 AA8451 78 AA894687 A1651 194 A1669320 A1685081 A1829222 T54662 W45219 W72322 AA400464 AA402040 AA430565 AA443558 AA477165 AA676466 AA872020 AA972350 A100221 7 N58558 N68998 R99562 AA001 444 AA1 301 87 AA142980 H89996 AA430524 AA078976 AA421 230 AA456028 AA419281 NM_-006507 regenerating islet-derived 1 beta (pancreatic stone protein, pancreatic thread protein) NM_-001868 carboxypeptidase Al (pancreatic) NM_ -004515 interleukin enhancer binding factor 2, 45kDa NM_015089 p53-associated parkin-like cytoplasmic protein NM_-006418 differentially expressed in hematopoietic lineages NM_000207 insulin NM_ -000371 transthyretin (prealbumin, amyloidosis type 1) NM_001832 colipase, pancreatic NM_ -006229 pancreatic lipase-related protein I NM_-001419 ELAV (embryonic lethal, abnormal vision, Drosophila)-Iike 1 (Hu antigen R) NM_-000346 SRY (sex determining region Y)-box 9 (campomelic dysplasia, autosomal sex-reversal) NM_-014428 tight junction protein 3 (zona occludens 3) NM_001305 claudin 4 NM_032420 protocadherin 1 (cadherin-like 1) NM_002906 radixin NM_-054012 argininosuccinate synthetase NM_-002773 protease, serine, 8 (prostasin) NM_-000542 surfactant, pulmonary-associated protein B NM_-003019 surfactant, pulmonary-associated protein D NM_-006215 serine (or cysteine) proteinase inhibitor, clade A (alpha-i antiproteinase, antitrypsin), member 4 NM_-014382 ATPase, Ca++ transporting, type 2C, member 1 NM_004497 forkhead box A3 NM_-002398 Meisi, myeloid ecotropic viral integration site 1 homolog (mouse) NM_024426 Wilms tumour I NM_-015470 gamma-SNAP-associated factor 1 NM_-006565 CCCTC-binding factor (zinc finger protein) NM_-004930 capping protein (actin filament) muscle Z-line, beta NM_-004785 thioredoxin-like, 32kDa NM_-012433 splicing factor 3b, subunit 1, 1 55kDa NM_-004582 Rab geranylgeranyltransferase, beta subunit NM_002046 glyceraldehyde-3-phosphate dehydrogenase REGI B
CPAI
ILF2
PARC
GWII12
INS
TTR
CLPS
PNLIPRPI
ELAVL1 50x9 TJP3 CLDN4
PCDHI
RDX
ASS
PRSS8
SFTPB
SFTPID
SERPINA4 ATP2CI FOXA3 MEISi
WTI
GAFI
OTOF
CAPZB
TXNL
SF3BI
RABGGTB
GAPOD
panc pano panc pane panc pane pane panc pane pane oth oth oth oth oth oth oth oth oth oth oth oth ovar ovar ovar control control control control control control WO 2004/081564 PCT/AU20041000299 An alternative or complementary method for analysis of a gene expression database uses analyses similar to those described above to identify a subset of informative genes which may be used to discriminate between various sample types. For example a subset of approximately 90 genes may be used to discriminate between five classes of tumours: gastric, colorectal, pancreatic, breast and ovarian. Expression levels of each of those genes may then be ranked within each sample type thus resulting in an ordered list of genes that may be used to discriminate between different samples based on the relative expression levels of specific genes. This is known as Ranking, as herein described. This method has particular application and utility as it provides a method by which a sample may be identified without reference to a database.
In a preferred embodiment a sample may be analyzed for expression levels of a specific set of genes, the relative expression levels of those genes may then be determined and ranked, then compared to a listing generated from different samples on the same set of genes, thereby providing a simple method of identifying the sample. This ranking procedure allows for meta-analysis which provides for cross-platform comparisons of gene expression profiles and databases.
In another aspect of the present invention, there is provided a method of analysing gene expression data to generate a gene expression profile or a gene expression database for use in diagnosing tumours. Preferably the method uses normalising gene expression data which allows comparison of data obtained from different experiments.
The methods described above relating to the generation and analysis of a gene expression profile or a gene expression database will now be described in more detail in this specific example. However, this application is not limited to this description and should not limit the generality of this invention.
As the present invention may use data generated from a variety of gene expression analysis methods including, but not limited to, microarray analysis and RT-PCR, a statistical method is required which facilitates amalgamation of WO 2004/081564 PCT/AU20041000299 these data into a form which allows comparison of these different data.
Applicants have also developed a Ranking System which is a surprisingly straightforward and robust approach to gene expression analysis.
Current microarray based measurements of gene expression are very noisy.
This applies in particular to spotted array technologies used for development of this invention. The current dogma is that the raw measurement values have to be accordingly normalised, then various machine learning techniques should/could be applied to the normalised expression levels. A particular aim of the normalisation is to combat the noise and some innate biases of the technology, such as the non-linear dependence between intensity and level of hybridisation of the Cy3 and Cy5 channels, wherein, Cy3 and Cy5 are fluorescent dyes used to label probes for the detection of nucleic acid hybridisation to microarrays. A number of sophisticated statistical normalisation techniques were custom designed to suit various microarray platforms and results of their experimental evaluations can be found in the literature. These include the intensity dependent loess pin group normalisation for spotted arrays of Yang et al (2002, Nucleic Acids Res 30(4): e15), the SNOMAD algorithm of Colantuoni et al (2002, Biotechniques 32(6): 1316-20) for spatial normalisation of spotted arrays, standardisation of gene expression values for Affymetrix array data to zero mean and unit standard deviation and universally applied log transformation alleviating routinely observed large dynamic ranges of expression values.
The present invention introduces, in particular, a novel normalisation technique based on ranking. Applicants propose to rank all genes according to their expression levels, then allocate to each gene a rough level of its rank (RankLevel). The RankLevels are then used for statistical analysis and predictive modelling instead of using normalised expression levels.
Effectively, raw expression data is obtained which provides a gene expression profile. This raw expression data may be obtained by microarray or RT-PCR analysis or any means that provides gene expression data. This data is WO 2004/081564 PCT/AU2004/000299 preferably normalized and reduced to a manageable level before processing through a k-nearest neighbours or SVM procedure or any learning algorithm process which is trained from the the data in a gene expression database. The ranking system, described herein, ranks the expression levels of the various data points within a sample.
Each data point represents an expression of a gene and is measured by the relative abundance of mRNA species in the sample compared to expression of that gene in a reference sample or median expression across many samples or genes. The intensity is assigned an intensity level which is determined relative to a reference point such as the background. Hence an intensity ratio or expression ratio may be obtained which represents the data point. This intensity or expression ratio is then ranked along with other data points within the sample ranging fromn the lowest intensity to the highest intensity. Each data point is then allocated a rank number. It is this rank number which is used to determine the rank level.
More specifically, if the expression ratio of a gene A is say 1.883 and gene B is 10.34, these values may be ranked as 5405 and 7283 among all 8378 genes of the array ordered from the lowest to the highest gene expression. Therefore, they are the 5 4 0 5 th and the 7 2 8 3 rd intense genes of the 8378 genes analysed A random number of rank levels may be assigned. Preferably, these are ranked from 1 to 10. However, any number may be assigned. Useful ranks are between 2 to 10, most preferably 4 to 10, more preferably, 5 to 10. Assuming RankLevels are used, then these genes are allocated RankLevels according to the following formula: RankLevel(A) ceil(5 5405/8378) ceil(3.22) 4, RankLevel(B) ceil(5 7283/8378) ceil(4.34) where ceil(x) denotes smallest integer 2x. Thus, for building a predictive model based on RankLevels the values 4 and 5 can be used instead of the original values 1.883 and 10.34 for the expression of gene A and B, WO 2004/081564 PCT/AU2004/000299 respectively. Surprisingly, in spite of its crudeness, the RankLevel value allows development of very accurate predictive models (Figure 11). The intuitive explanation is that for successful predictive modelling the consistency of the features used to represent measurement is paramount. Thus although RankLevels may lose accuracy of expression level, they gain in stability, since crude RankLevels are more likely to be unchanged or not changed significantly between various experiments. The RankLevels naturally eliminate the issue of huge dynamic range of expression values and global variations between average expression levels of different microarray slides. Hence a general formula for determining a RankLevel is as follows: RankLevel ceil (number of rank levels x rank/number of genes assayed) ceil (x) wherein ceil smallest integer x Another important property of RankLevels is that models built on them can be easily transferred across various technologies for measuring gene expression levels, as long as the monotonicity of measurement across the technological platforms is roughly preserved. In various experiments applicants have very successfully transferred predictive models developed for spotted microarrays to RT-PCR and vice versa (Example 7 and Figure The same transfer can be done between other platforms, for instance, the spotted arrays and Affymetrix arrays. Note also that RankLevels are readily applicable to a mixture of arrays with different total number of genes that paves an avenue to practical statistical analysis and modelling across large amounts of data from a variety of studies developed by different laboratories using various technologies.
RankLevel based models (using small number of rank levels, say 2-5) are also very amenable to human comprehension and rationalisation that can be readily carried across range of technological platforms. In this context RankLevel normalization is especially attractive proposition, for emerging applications of microarray and RT-PCR technology and for other high throughput genetic experiments and their applications.
WO 2004/081564 PCT/AU20041000299 High-throughput expression analysis can be employed to great effect in the subclassification of many tumour types. The wealth of gene expression data in several diseases has begun to support the hypothesis that morphologically indistinguishable tumours may be molecularly distinguishable. This has potentially widespread application in the clinical application of technologies aimed at refining diagnosis and prognostication in cancer.
In yet another aspect of the present invention, there is provided a gene expression database generated using a method described herein. Preferably, the gene expression database includes a subset of genes selected to demonstrate differences in expression between sample types, and arranged according to their RankLevel as described above.
In a preferred embodiment the present invention provides a gene expression database which includes gene expression data from a variety of tumour types.
Preferably the tumour types includes at least one of the following: gastric, colorectal, pancreatic, breast and ovarian.
In a further aspect of the present invention, there is provided an expression-based diagnostic evaluation of the tissue of origin of a tumour.
Preferably the expression-based evaluation is based on comparing a gene expression profile of a tumour with a gene expression database representing one or more tumour or tissue types. More preferably, the comparison is based on comparing RankLevels between the gene expression profile and the gene expression database.
Being able to provide disease appropriate treatment is essential in order to provide the best level of care for a patient. Given that different tumour types respond differently to different treatment regimens, it is therefore beneficial to be able to correctly diagnose a patient's tumour. At present in medicine, the ability to classify tumours is based upon the use of a limited number of markers, which are often thought to be "tumour specific" in expression but in practice may WO 2004/081564 PCT/AU20041000299 produce equivocal results regarding the tissue of origin of a tumour sample. For instance, although the estrogen receptor is employed as a diagnostic marker for breast cancer, the molecule is expressed in only a small percentage of clinically identifiable breast cancer samples. To further complicate the analysis, the estrogen receptor is also expressed in various other tumour types. Thus present diagnosis is based on a limited set of imperfect predictors.
As stated above, the fraction of genes expressed in a cell reflects and determines the biological state of that cell and provides a molecular snapshot of the cellular phenotype. Despite being propagated for many years in vitro, cell lines retain some level of lineage specific expression. This has the effect of allowing cell lines of similar origin to co-cluster following gene expression analyses. In addition, expression profiles of tumour cells in vivo or in vitro may group the cells according to their presumptive tissue of origin. Our ability to rapidly profile the expression of many thousands of genes simultaneously, and use that information to diagnose the origin of a tumour has as yet not been reflected in modern diagnostics. The power of molecular profiling as an approach to diagnostic evaluation of tumours lies in the fact that instead of deriving information about a tumour from a handful of markers, the expression of thousands of genes contributes to an overall picture of the tumour cells. The present invention confirms the diagnostic utility of such an approach, and foreshadows an expanding use of this technology. Preferably the expression-based evaluation uses expression data generated by the use of microarray technology to determine RNA expression levels in a sample.
Alternatively or additionally, the expression-based evaluation uses expression data generated by the use of quantitative RT-PCR technology to determine RNA expression levels in a sample.
The use of microarrays and quantitative RT-PCR generates a large amount of data and requires considerable analysis to identify an optimal subset of genes, as discussed above. Once an optimal subset of genes has been identified, it is only necessary to investigate those genes in the optimal subset in order to perform identification according to the present invention.
WO 2004/081564 PCT/AU20041000299 In a particular embodiment of the present invention there is provided a method by which a tissue of origin or a tumrnour of origin may be assigned to a biological sample, the method including the steps of: obtaining a gene expression profile of the biological sample; and comparing the gene expression profile to a gene expression database; wherein the gene expression database includes gene expression data relating to various tissue types or tumour types; wherein similarities and differences between the gene expression profile and the gene expression database allow assignment of the tissue of origin or the tumour of origin to the biological sample.
In a preferred embodiment the biological sample is a tumour sample. More preferably the tumour sample is an unidentified adenocarcinoma. Preferably the gene expression database includes gene expression data relating to any one or more of the following tumour types: gastric, colorectal, pancreatic, breast and ovarian.
Thus the present invention provides a method of diagnosing a patient's tumour by comparing a gene expression profile of the patient's tumour with a gene expression database generated from known tumour types.
In a particular embodiment of the present invention the methods of the invention can be used to identify a tumour of unknown origin.
In a specific, but non-limiting example, the present application illustrates the process in the identification of tumours found in the ovary, but suspected to be extra-ovarian in origin. Approximately 10-20% of patients presenting with ovarian malignancies have tumours suspected to be of extra-ovarian origin, rather than primary ovarian cancers. Tumours that metastasise from the stomach to the ovary and present as primary ovarian cancer are typically referred to as Krukenberg tumours but the term has also been more broadly applied to colon, breast and pancreatic secondaries to the ovary. Combining WO 2004/081564 PCT/AU20041000299 the data from a number of studies, in a total of 68 Krukenberg tumours, approximately 40% are metastatic from the stomach, 25% are colorectal in origin, 10% arise in the breast, and 25% arise elsewhere or do not have a primary site diagnosed. Prior to surgery many of these patients have clinical and CT findings consistent with a diagnosis of ovarian cancer, and hence undergo a laparotomy. In many of these patients no evidence of another primary is found at operation, and subsequent investigations often do not reveal a primary. The pathologist may suspect that such a tumour is of non-ovarian origin based on the morphologic appearance and immunohistochemical profile, but is generally not able to exclude the possibility that it could be a primary ovarian cancer, nor suggest a more likely origin. Generally these patients are given the benefit of the doubt and are treated with platinum based chemotherapy as per standard management of ovarian cancer. They usually respond poorly, and in some instances an extra-ovarian primary becomes apparent at a later date.
The present invention also provides a method of using a gene expression database according to the present invention for prognosis and/or diagnosis of a patient.
Conventional methods for treatment of cancer rely upon clinical parameters relating to anatomical site of origin, grade and spread of disease. These observations today are essentially made through such modalities as intraoperative assessment, conventional pathology through light microscopy and a suite of imaging techniques. For a proportion of tumours several molecular markers can also be used to predict the behaviour of the disease or to assess the suitability of a patient for specific treatment. One example is breast tumours that express the cell surface estrogen receptor (ESR). Such patients are known to respond to treatment with the ESR antagonist tamoxifen and it is commonly used as an adjuvant therapy for low grade breast cancers. For a large proportion of tumours, however, there are currently no methods for assessing such prognostic factors. Two cancer cases that may appear identical in their pathological and clinical profiles, may respond differently to chemotherapy or WO 2004/081564 PCT/AU20041000299 radiotherapy, they may also show a different prevalence to recur and may or may not metastasise. Underlying this phenotypic behaviour are the molecular mechanisms relating to tumour development, its cellular functioning and the relationship it has with the rest of the body.
Using gene expression microarray analysis the activity of thousands of genes can be used to identify expression patterns related to the phenotypic behaviour.
A gene expression dataset of samples that have been clinically annotated to study specific prognostic factors, relating to treatment suitability or recurrence, can be used to identify the associated molecular markers or molecular pathways. Therefore, similar to the application for identifying site of origin, where tissue differentiation markers may elude to the identity of a primary tumour, markers relating to cell survival, angiogenesis, metastasis or T-cell infiltration may be associated with tumour behaviour, patient survival or other prognostic factors.
Identification of such markers or expression profiles can be translated to clinically viable tests using similar methods discussed here allowing better cancer patient management.
In another aspect of the present invention, there is provided a method of treatment of a patient having a tumour of unknown origin including the steps of: identifying the tissue of origin of the tumour of unknown origin; and treating the patient in a manner appropriate for treating a tumour originating from that tissue site.
Identification of the tissue of origin permits disease-appropriate therapy to be given to a patient and thereby give the patient the best chance of receiving an effective treatment. Such treatments are known to those skilled in the art and vary between different tumour origins.
Preferably, the step of identifying a tissue of origin of the tumour of unknown origin is as described herein. However, this aspect of the invention is based on WO 2004/081564 PCT/AU20041000299 the underlying principle that an individual cell type only expresses a subset of the total number of genes present in the genome. The fraction of genes expressed reflects and determines the biological state of the cell and provides a molecular snapshot of the cellular phenotype. This is carried through to the secondary or metastatic tumours and provides and identification system of their origin which allows for appropriate treatment which may not coincide with the surrounding tissue type and treatment of tumours of that tissue type.
Throughout the description and claims of this specification, the word "comprise" and variations of the word, such as "comprising" and "comprises", is not intended to exclude other additives, components, integers or steps.
The discussion of documents, acts, materials, devices, articles and the like is included in this specification solely for the purpose of providing a context for the present invention. It is not suggested or represented that any or all of these matters formed part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed in Australia before the priority date of each claim of this application.
Examples of the procedures used in the present invention will now be more fully described. It should be understood, however, that the following description is illustrative only and should not be taken in any way as a restriction on the generality of the invention described above.
EXAMPLES
Example 1: Creating a Gene Expression Database.
A training dataset containing the gene expression measures of approximately 10,000 genes in a wide range of human tumour types was created. To develop the dataset, and also to ensure its usefulness for diagnosing tumour type from small biopsies, a protocol incorporating an amplification step in preparation of labelled cDNA for hybridisation was used. The protocol reliably produced expression data from 3 tg of starting total RNA. Amplification was an important approach to take, as the amount of tissue available is often limited to WO 2004/081564 PCT/AU2004/000299 small amounts in excess of tissue required for other diagnostic purposes. In particular, the approach allows utilising small biopsies (for example core biopsy or fine needle aspirate) of tissue collected from metastatic deposits that would otherwise not be collectable by excision biopsy.
a) Collection of tissue samples All human tumour material was collected and used in accordance with the Ethical Principles as described in the Australian National Health and Medical Research Council National Statement on Ethical Conduct in Research Involving Humans. Histopathology of the tumour samples was reviewed to ensure an unequivocal clinical diagnosis. Metastatic tumours arising from known primary tumour were obtained from patients with clear clinical history of metastatic disease. Pathology review of these samples unequivocally identified the primary site. Metastatic tumour arising from an unknown primary tumour was submitted after substantial clinical workup. Immunohistochemical and morphological staining and review were carried out according to standard protocols.
b) Total RNA preparation and labelling Tissues samples were homogenised in Trizol reagent (Invitrogen) followed by phase separation and subsequent purification of Total RNA using an RNeasy column (Qiagen) according to the manufacturers' protocols. mRNA was then amplified using standard techniques. Briefly, mRNA was reverse transcribed to cDNA using a T7 promoter tagged anchored PolyT primer. A second strand was synthesized in the presence of RNaseH and Klenow. The resulting double stranded molecules were used as template in an in vitro transcription reaction using a T7 Megascript kit (Ambion), according to the manufacturer's protocol, and purified using an RNeasy column. Amplified RNA was indirectly labeled by incorporation of amino-allyl dUTP (Sigma) during reverse transcription followed by coupling of cyanine-5 flurophor (Amersham). A common reference RNA containing eleven human tumour cell line RNAs was used in all hybridisations.
Reference total RNA was isolated, amplified and labeled with cyanine-3 fluorphor (Amersham) in an identical manner to the tumour samples. Samples WO 2004/081564 PCT/AU2004/000299 of labeled cDNA were cohybridised to spotted cDNA microarrays containing approximately 10,500 elements representing 9,389 unique cDNAs (UniGene build 144), washed and scanned (Scanarray 5000, Perkin Elmer) according to standard protocols. Data was extracted from scanned images using the Quantarray program (GSI Luminomics).
Example 2: Profiling a tumour sample.
Samples of RNA from 121 well characterized tumour samples were analysed.
To ensure the authenticity of the gene expression profiles and not to introduce errors into the class prediction algorithm, the diagnosis of these samples was verified by histopathology prior to inclusion in the study. RNA from tumour samples was isolated, amplified, and labelled, and the resulting labelled cDNA was hybridised to a spotted cDNA microarray containing 9,389 unique genes (UniGene build 144). After filtering to remove unusable spots, the data were normalized. Unsupervised hierarchical clustering using all genes in the filtered and normalized dataset showed the tumours grouped into their tissue of origin (Figure although not perfectly. This is a not an unexpected observation and is in agreement with other studies of a similar type. A list of genes that were significantly different in expression (p<0.05) between all the different tumour groups was then identified using the normalization technique and informatic tools such as k-nearest neighbours and SVM. Hierarchical clustering of the samples using these genes showed significant clustering of most members of the tumour groups (Figure Some tumour groups were distinct from every other tumour type (for example prostate), while others were initially more difficult to separate (lung, breast, ovarian). This most likely reflects the heterogeneity of the samples, and is overcome by increasing the representation of these tumour types.
An algorithm for identifying the origin of carcinoma of unknown primary was implemented, which utilises a number of informatic tools including k-nearest neighbours and a support vector machine approach. The first stage is to reduce the number of genes from the approximately 9,389 unique genes on the microarray to an optimal subset, capable of reliably describing differences WO 2004/081564 PCT/AU20041000299 between tumour types. The optimal number and selection of genes for classification of tumours from a range of primary origins is determined by using an iterative signal to noise ratio algorithm. This method ranks genes according to the difference of their mean expression values for each class of tumour, divided by the sum of the standard deviations, ie. (m m 2 )l(sl s2). This effectively identifies those genes that have a consistently different expression measurement within a given class of tumours, relative to the values of that gene across all other tumour types present. A subset of such genes is shown in Figure 3. Genes are ranked according to this measurement and varying numbers of genes are tested to identify a subset with the highest predictive strength (Figure 4).
A leave-one-out (LOO) cross validation in conjunction with the k-nearest neighbors algorithm to select and test subsets of genes was then used. Briefly, this algorithm seeks to classify an unknown sample by comparing it to samples of known class by using a distance metric. The class of the closest'k' samples is assigned to the sample being tested. LOO involves permutations of the dataset whereby each sample is held out separately and a class assigned to it by using the remaining samples. This is repeated until each sample has been left out of the training set once and has been assigned to a class. The proportion of correct classifications is used a measure of predictor accuracy.
From these predictions a confusion matrix can be constructed, as shown in Figure 5. By plotting the actual tumour classes on one axis and the predicted classes on the other, a histogram-type view of the overall success or failure of the classification approach can be achieved. This representation also allows identification of any particular classes with more incorrect predictions relative to other tumour types. The average prediction accuracy in LOO analysis in our training set is approximately 97%.
Techniques such as clustering and class-prediction algorithms are sensitive to systematic differences between samples, for example the quality of RNA, or the preparation method. To verify that amplification did not introduce errors into the dataset, the fidelity of amplification by comparison of results derived from WO 2004/081564 PCT/AU2004/000299 amplified and unamplified starting material was determined. The correlation between amplified and unamplified results was typically greater than 0.85.
Between amplified samples, the correlation was greater, with a correlation coefficient of at least 0.97 (data not shown). We believe therefore that the class predictions made by this algorithm are unlikely to be influenced by amplification of mRNA derived fromsamples.
To test the validity of the prediction algorithm we used it to identify the origin of twelve samples of metastsic tumour from a known primary. All metastases were assigned to their correct class, ie known site of origin, p-values were significant in 10 cases (P<0.05) and bordering on significant 0.057, and p= .058) in the remaining two (see Figure These specimens were not involved in any way in the construction of the prediction algorithm, and demonstrate that the prediction method is not specific (or 'over fitted') to samples contained in the training set of tumours and reflects classifications based on gene expression inherent to the tumour types.
Example 3: Diagnosis of metastatic tumour in the ovary and identification of extra-ovarian origin.
To demonstrate the wider utility of this approach to diagnosing metastatic tumour in the ovary, we analysed three samples of tumours from the ovary which were atypical presentations suggestive of an extra-ovarian origin for the tumour. Expression data from these samples strongly suggested a colorectal origin for these tumours (p<0.001 in all cases). Using only the unequivocally diagnosed ovarian and colorectal tumours in the training dataset, we identified a list of 55 genes which were significantly different between the ovarian and colorectal tumours. Importantly, several genes already known to be discriminators between these tumour types were included in the list. Using just these 55 genes, the five cases described above were clearly identified as colorectal in origin, and not unexpectedly, all ovarian and colorectal tumours were correctly segregated. We suggest that these genes are likely to be extremely useful as discriminators between colorectal and ovarian tumour in cases where the diagnosis is unclear or uncertain.
WO 2004/081564 PCT/AU2004/000299 Example 4: Case studies and diagnosis of primary tumours a) Colorectal primary and ovarian secondary A patient (P00819, Figure 7) presented with a large left ovarian mass. While the clinical picture was thought to be consistent with a possible primary ovarian cancer, this patient had presented with a Duke's C colon carcinoma one year previously. She underwent surgery and the histology was initially reported as a moderately differentiated mucinous adenocarcinoma with light microscopic appearances favouring a primary left ovarian cancer with omental involvement.
Immunohistochemical analysis revealed a phenotype more consistent with a colonic metastasis, as the tumour was found to be CK 7 negative and CK positive. This illustrates the scenario that we expect to frequently encounter where the clinical picture, diagnostic pathology and PET imaging suggest a primary tumour location, but without a high degree of certainty and with some conflicting data regarding the origin of the tumour. In this case, molecular profiling was of immediate applicability, and supported the immunophenotying of this tumour as colorectal.
b) Colorectal primary and pelvic and peritoneal secondary The patient (P00644, Figure 7) presented with a pelvic tumour mass and widespread peritoneal metastases. Surgical notes from the time of operation indicated it was unclear whether the patient had a primary ovarian or a primary colorectal cancer. Histology of the tumour was reported as a moderately differentiated endometroid adenocarcinoma with some focal mucinous differentiation, with the light microscopic appearances favouring ovary as the primary. Immunohistochemical staining with CK7 and CK20 monoclonal antibodies showed tumour cells variably co-expressing the two markers, which was thought to support an ovarian origin, although without a high degree of certainty. In this case, our molecular profiling suggested that the likely true origin of the tumour was colorectal.
c) Colorectal primary and ovarian secondary WO 2004/081564 PCT/AU2004/000299 Further samples of tumours isolated from the ovary, where an extra-ovarian origin for the tumour was likely were examined. The first (P00482, Figure 7) was a sample collected from the ovary of a woman with abdominal metastases at the time of left hemicolectomy, total abdominal hysterectomy and bilateral salpingo-oophrectomy. Molecular profiling identified a colorectal origin for the tumour, which was confirmed by histological analysis of sections of the colon, which showed that the patient had a Duke's stage D moderately differentiated adenocarcinoma of the sigmoid colon. The second patient (P00493, Figure 7) presented with tumour present in both ovaries, and omentum. She had previously been treated for carcinoma of the sigmoid colon, and clinicians queried whether the tumour was a recurrence from the colorectal tumour, or an ovarian primary. In this case, microarray analysis indicated colorectal as the likely source of the tumour, which was confirmed by immunohistochemical staining which showed negative staining for cytokeratin 7, and positive staining for cytokeratin 20. The third patient (P00206, Figure 7) was never diagnosed as a colorectal tumour metastatic to the ovary, although at the time of surgery, the pathologist noted, "although the (histologic) appearances would be consistent with mucinous ovarian carcinoma the appearances nevertheless raise the possibility of this representing spread from a colorectal primary tumour." Example 5: Various uses of the microassay.
We expect that this test will be useful in a number of clinical situations. The first involves a patient presenting with no previous history of cancer, with extensive undifferentiated carcinoma. This is the classical presentation of carcinoma of unknown primary. In one such case (P00459), we analysed a sample of a carcinoma taken from a forty year-old non-smoker who presented with cough and dyspnoea. The patient was subsequently found to have multiple lung, supraclavicular, mediastinal and liver metastases. Histology review of the metastatic tumour described as an undifferentiated carcinoma. There was a larger lesion in the right lung on CT that may have been consistent with a primary. A PET scan did not reveal a definite primary, although a questionable abnormality in the lower oesophagus was noted. Subsequent gastroscopy was normal. Although the clinical picture was consistent with a diagnosis of a non- WO 2004/081564 PCT/AU20041000299 small cell lung cancer, there remained considerable uncertainty about the primary origin of this cancer in a young non-smoker. Expression profiling of this sample, and subsequent comparison with the training dataset determined that this sample had an expression profile most consistent with the tumour being lung in origin, with a significant p-valueof 0.027. This case illustrates the scenario where the clinical picture, diagnostic pathology and imaging suggested a primary tumour location, but with some remaining doubt. Array analysis subsequently confirmed the clinical observations.
The second scenario we expect to encounter frequently is the unusual presentation of a common tumour, when that patient has a clinical history of a previous cancer. One of the patients in our study (P00563), a thirty-one year old woman with a past history of a stage I high-grade mucinous borderline ovarian tumour six years previously, presented with a twelve month history of left pelvic pain and was found to have a sclerotic abnormality involving the left ilium and left upper femur. Bone scan revealed multiple bone metastases.
Biopsy from the left ilium revealed adenocarcinoma. The patient underwent CT scan of the chest/abdomen/pelvis, a PET scan, a thallium scan and a mammogram without any evidence of a primary being found. Pathology review suggested that the histology was consistent with a previous ovarian malignancy, but could not exclude a carcinoma arising in the breast, lung or gastrointestinal tract. The presentation with bone metastases was thought to be most unusual for recurrent ovarian cancer and the treating clinician thought that it was more likely that the cancer had arisen from another site. The patient was treated as an unknown primary with a combination of epirubicin, cisplatin and fluorouracil. Our array analysis confirmed the possible diagnosis of a relapse from the ovarian primary, and it is possible that information such as this may have altered the management of this patient.
The third scenario involves a patient with a clear history of malignancy, but with metastatic tumour where it is unclear whether the metastatic tumour has arisen from the first, or a new, primary tumour. In some cases, we expect that array analysis would be able to confirm the identification of a relapsed primary WO 2004/081564 PCT/AU2004/000299 tumour, and in others to suggest a new primary site. Both of these scenarios were encountered during this work. The first was a patient (P00563) diagnosed in February 1994 with Stage IIC endometrioid carcinoma of the ovary. CA125 was elevated at 327 pre-operatively and was still elevated post-operatively at 80. She underwent a total abdominal hysterectomy with bilateral salpingooophorectomy and omentectomy. She was then treated with six cycles of carboplatin and cyclophosphamide. She remained well until May 1998 when she developed back pain and was found to have sclerotic bone metastases (investigations included plain x-ray, CT and bone scan). Mammogram at this time was normal. The CT scan did not reveal any other evidence of metastatic disease. T9 metastasis was biopsied and revealed a poorly differentiated adenocarcinoma, which was oestrogen and progesterone receptor negative.
There was no clinical evidence of another primary. Although the development of sclerotic bone metastases was thought to be an unusual pattern of relapse for ovarian cancer, the decision was made to treat her as ovarian cancer. In addition to the CT she also had a PET scan, which was unhelpful. Following palliative radiotherapy to thoracic spine, she went on to receive six cycles of carboplatin and taxol. The response was difficult to assess though there was some slight improvement on bone scan. The CA125 tumour marker was not elevated and never rose subsequently. By October 1999 there was a definite mass in the left neck and initial attempt to biopsy this mass in January 2000 did not reveal any malignant cells. In April 2000, due to progressive growth and symptoms from the neck mass the patient received palliative radiotherapy, and commencing in June 2000 received four cycles of carboplatin with the best response of stable disease. In November 2000, there was an impression of a mass in the outer left quadrant of the left breast with some suspicious changes of malignancy on mammogram in the same area. However, biopsy of the breast mass was negative. The patient also had a repeat biopsy of the neck mass, which revealed an undifferentiated carcinoma. At this time she was also thrombocytopaenic and had developed liver metastases. Bone marrow examination revealed a similar undifferentiated carcinoma. She was commenced on capecitabine but tolerated this poorly and this treatment was ceased. She subsequently went on to receive weekly taxotere with WO 2004/081564 PCT/AU2004/000299 improvement in her thrombocytopaenia but progressive liver metastases. She was subsequently treated palliatively, and died in May 2001. At the time she was initially found to have bone metastases she was regarded as an atypical relapse of ovarian cancer but there was always concern that she may have had another primary, in particular, a breast primary in view of the sclerotic bone metastases. By November 2000, there was a strong clinical suspicion that she may have had breast cancer, and this influenced the decision to use capecitabine and docetaxel chemotherapy. Although the biopsy of the breast mass was negative this was a poor sample and may have been a false negative. The clinical and mammographic appearances of the breast lesion were consistent with a breast primary, as was the pattern of metastases with liver, bone marrow, left supraclavicular and sclerotic bone metastases. Analysis of this sample by microarray, subsequent to the patient's death confirmed the suspicion that the metastatic cancer was not a relapse of the patient's initial ovarian cancer, but a new breast primary.
The alternative scenario, where a relapse was suggested, involved a seventy year old man (P01242) who was diagnosed with prostate cancer ten years previously and treated by transurethral resection of the prostate, radiotherapy and had remained on Zoladex and flutamide. He presented with a painful lesion on his left ear and a lump in the left upper neck. Biopsy of the ear lesion revealed no evidence of malignancy, but initial core biopsy of the hard 1.5 cm left upper neck mass was reported as poorly differentiated metastatic carcinoma, and immunohistochemistry for PSA was negative. Serum PSA was also normal. He was referred to another hospital for further investigation. A repeat biopsy was performed and our analysis of a sample by molecular profiling identified prostate as the likely source of the tumour. This biopsy was initially reported as metastatic adenocarcinoma with focal neuroendocrine differentiaition, and the pathologist recommended that a lung primary should be excluded. Repeat immunohistochemical staining for PSA on this biopsy was requested after the array result was already known, and this was positive consistent with metastatic prostate cancer.
WO 2004/081564 PCT/AU2004/000299 The data presented shows that the use of expression profiling is able to contribute to the management of cancer patients. This work demonstrates that whilst expression changes may occur in some genes as a result of tumour development, or admixing of cells with other cell types such as stroma or vascular elements, the mass effect of measuring the expression patterns of thousands of genes means that distinctive patterns of tumour types are identifiable. Further, expression profiles are shown to be sufficient to classify tumour samples according to tissue of origin. It has been demonstrated that, not only can tumours be partitioned with respect to tissue of origin using microarray analysis, but additionally the expression patterns can be used to positively identify samples which were previously unknown.
Example 6: Use of RT-PCR.
a) Extraction of RNA from Paraffin Embedded Formalin Fixed Tissue
(FFPET)
Extraction of RNA was performed using a modification to the protocol described by Specht et al (2001, Am J Pathol 158(2): 419-29). Briefly, paraffin was removed from microtome sectioned material by incubating in Xylene, repeating the procedure twice and then sequentially washing with 100%, 90% and ethanol. Samples were then dried before the addition of Proteinase K digestion buffer (10mM TrisHCI (pH 0.1mM EDTA (pH8.0), 2% SDS), and 100 mg of Proteinase K, followed by incubation at 60°C for 16 hours or overnight.
Following the initial incubation period an additional 100 mg of proteinase K was added and samples were digested for a further 3-4 hours at 60°C. RNA was purified from the tissue lysate by column chromatography (Rneasy, Qiagen) using a modification to the manufacturers protocol. This involved sequentially adding 440 [iL of 100% ethanol and 660 pL of buffer RLT buffer to the tissue lysate. The sample was then briefly mixed before passing it through the column by centrifugation (RNeasy mini). Subsequent washes were applied as per described by the manufacturer followed by elution in an appropriate volume of RNAse free deionised water.
b) Quantitative real time PCR WO 2004/081564 PCT/AU2004/000299 Total RNA was reverse transcribed by priming with random hexamers. Success of the reverse transcription and relative quantification of the cDNA was interpreted using 5 endogenous control genes, analysed by real time PCR using SYBR green chemistry (ABI Prism 7000). T he endogenous control genes CTCF, CAPZB, TXNL, SF3B1, RABGGTB and PGK were chosen from microarray experiments based on the criteria of low variability across multiple cancer classes and a minimum expression level in excess of three times that of background. All primer pairs were designed across exon boundaries to prevent amplification, of genomic DNA. An average of the Ct values for the endogenous controls was used to assess the quantity of cDNA present. A maximum average Ct threshold was set to exclude samples not suitable for further analysis on micro fluidics card.
c) Micro Fluidics Card A set of 89 genes was chosen by signal to noise gene selection using a 6 class training set of breast, colorectal, ovarian, gastric, pancreas and a combined class (others) representing other sites of origin (ie lung, melanoma, prostate, renal, mesothelioma, testicular, SCC). The genes represent the top ranked 12 to 17 markers for each respective class by signal to noise gene selection. All genes were chosen from Applied Biosystems Assay on Demand (AoD) prevalidated primer probe sets. If a gene marker selected by the signal to noise metric was not available from the AoD set then the next highest ranking gene was selected. Additionally, seven endogenous controls were added to the assay set including the 5 genes previously described for cDNA quality control and mandatory controls 18s rRNA and GAPDH. Custom microfluidics cards were designed in a configuration allowing the processing of 4 samples and 96 assays on a single card.
A master mix of reagent was prepared from TaqMan® Universal PCR Master Mix and sample cDNA template. The volumetric amount of template used was proportional to that used for quality control with no attempt to standardise the absolute amount of template added between samples. Reactions were run according to the manufacturers protocol with data collection based on absolute WO 2004/081564 PCT/AU2004/000299 Ct values. Normalisation of RT-PCR assays was conducted using an average Ct value for all endogenous controls excluding GAPDH. Samples were then converted to a fold ratio relative to endogenous controls described using standard delta Ct formula.
i.e X= 2 Act where ACt (Ct target Ct average endogenous controls) Example 7: Generation of gene expression database validation of RT-PCR results.
A cohort of 42 samples spanning five anatomical sites of origin (breast, colorectal, gastric, pancreas, ovarian) was profiled using RT-PCR by custom micro fluidics cards. All reactions were performed using cDNA generated from RNA extracted from fresh frozen tissue. These samples had been previously analysed using cDNA microarrays. A comparison of median normalised data by heat map alignment shows the consistency between the two platforms (Figure 9).
The chemistry used for RT-PCR analysis allows the utilisation of nucleic acids that may be partially degraded or fragmented, as opposed to microarray analysis where high quality intact mRNA is required. Formalin fixation of tissue is routinely used in conventional pathology to conserve tissue architecture and preserve protein complexes that may be targeted by immunohistochemical detection as cancer specific markers. The cross-linking events that allow this preservation, however, are detrimental to RNA and DNA integrity. Nucleic acids extracted from such material are therefore composed of short fragments, typically of around 300 bp in length. RT-PCR requires the amplification of only short lengths of DNA. Amplicon lengths generated from AoD primer sets are approximately 60 bp in length.
Applicants have used RNA extracts from FFPET for expression profiling using RT-PCR using the micro fluidics format. A total of 13 samples from 5 sites of origin were processed providing high quality data. Clustering of samples processed from both fresh frozen tissue and FFPET show that samples can WO 2004/081564 PCT/AU20041000299 accurately be grouped into respective tumour classes regardless of the tissue processing method used prior to RNA extraction (Figure Similar to microarray data, data generated from RT-PCR can be used for machine learning and creating class predictor models. All RT-PCR data was used for generating an SVM predictor model of 5 classes (breast, gastric, ovarian, colorectal and pancreas) using the method of ranking. Using RankLevels applicants achieved a LOO cross validation accuracy of 100%.
The versatility of a rank method for cross platform meta-analysis was also applied to both microarray and RT-PCR datasets. Training solely using data generated by cDNA microarray SVMI models were generated that can be tested upon similar samples profiled using RT-PCR. Using this cross platform metaanalysis a high prediction accuracy of 93% was obtained in the independent test.
Example 8: Testing Strengths of Predictions The strength of the prediction capability for a carcinoma unknown primary (CUP) was tested. This test is indicative of whether a prediction of the tissue of origin for a carcinoma of unknown primary is correct. When a class or histological subclass is left out of a training setused to establish the gene database the prediction accuracy of the test is compromised. This demonstrates the importance of having all classes or subclasses present when establishing a training set.
The present example tests the veracity of the prediction strength algorithm, and associates a confidence with the prediction.
Figure 12 shows that data set size has an impact on the confidence of the prediction. By changing the number of samples in the dataset available for comparison, the degree of confidence is affected. Lowering the number or leaving out data sets reduces the confidence level.
WO 2004/081564 PCT/AU2004/000299 Finally it is to be understood that various other modifications and/or alterations may be made without departing from the spirit of the present invention as outlined herein.

Claims (12)

1. A method of identifying an origin of an unknown tumour sample, said method including: 00oo obtaining a gene expression profile of the unknown tumour sample; comparing the gene expression profile of the unknown tumour sample to a predictive model for tumours established from a gene expression database said database including gene expression profiles from known tumour samples and wherein the model has been validated for tumour identification, and identifying the origin of the unknown tumour sample when a gene expression profile from the predictive model correlates with the gene expression profile of the unknown tumour sample.
2. A method according to claim 1 wherein the predictive model is established from tumour samples selected from Table 1.
3. A method according to claim 1 or 2 wherein the predictive model is established from a gene expression profile comprising genes selected from Table 2.
4. A method according to any one of claims 1 to 3 wherein the gene expression profiles and the gene expression databases include gene expression data that is processed by ranking genes according to their expression levels within a sample and allocating a rank to the gene such that the rank of the gene identifies different patterns of gene expression between the biological samples.
A method according to any one of claims 1 to 4 wherein the gene expression database is normalised by ranking gene expression to a rank level using the formula: P kOPERWKRSPECIUM42 199S9. I spa aawd,,r do. 17/I2/2DD 00 RankLevel ceil (number of rank levels x rank of the gene/number of genes assayed) ceil wherein ceil smallest integer x
6. A method according to claim 5 wherein the origin of the unknown tumour 00o sample is identified when a rank level for gene expression from the predictive model correlates to a rank level from the gene expression profile of the unknown -tumour sample.
7. A method according to any one of claims 1 to 6 wherein the tumour sample presents as a tumour type selected from the group including gastric, colorectal, pancreatic, breast and ovarian.
8. A method according to any one of claims 1 to 7 wherein the tumour sample presents as a tumour selected from Table 1.
9. A use of a method for identifying an origin of an unknown tumour sample according to any one of claims 1 to 8 for the preparation of a medicament to treat a tumour of unknown origin wherein the medicament is appropriate for treating a tumour identified as the origin.
A predictive model for identifying an origin of an unknown tumour established from a gene expression database said database including gene expression profiles from known tumour samples and wherein the model has been validated for tumour identification.
11. A predictive model according to claim 10 comprising a gene expression database said database including gene expression profiles from genes selected from Table 2.
12. A predictive model according to claim 10 or 11 wherein the known tumour samples are selected from Table 1.
AU2004219989A 2003-03-14 2004-03-12 Expression profiling of tumours Ceased AU2004219989B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2004219989A AU2004219989B2 (en) 2003-03-14 2004-03-12 Expression profiling of tumours

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
AU2003901177 2003-03-14
AU2003901177A AU2003901177A0 (en) 2003-03-14 2003-03-14 Profiling of tumours
AU2003907084A AU2003907084A0 (en) 2003-12-22 Profiling of tumours (2)
AU2003907084 2003-12-22
AU2004219989A AU2004219989B2 (en) 2003-03-14 2004-03-12 Expression profiling of tumours
PCT/AU2004/000299 WO2004081564A1 (en) 2003-03-14 2004-03-12 Expression profiling of tumours

Publications (2)

Publication Number Publication Date
AU2004219989A1 AU2004219989A1 (en) 2004-09-23
AU2004219989B2 true AU2004219989B2 (en) 2009-01-15

Family

ID=35116202

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2004219989A Ceased AU2004219989B2 (en) 2003-03-14 2004-03-12 Expression profiling of tumours

Country Status (1)

Country Link
AU (1) AU2004219989B2 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001051667A2 (en) * 2000-01-14 2001-07-19 Integriderm, L.L.C. Informative nucleic acid arrays and methods for making same

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001051667A2 (en) * 2000-01-14 2001-07-19 Integriderm, L.L.C. Informative nucleic acid arrays and methods for making same

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Ben-Dor A et.al, Computational Biology, 2000, 7(3-4):559-583 *
Ghosh D, Pacific Symposium on Biocomputing, 2002, pages 18-29 *
Hess KR et aI, International Journal ojMolecular Medicine, *
Ooi C et aI, Bioinjormatics, January 2003, 19(1):37-44 *
Yeang C-H et aI, Bioinformatics, 2001, 17 Supplement 1:S316-S322 *

Also Published As

Publication number Publication date
AU2004219989A1 (en) 2004-09-23

Similar Documents

Publication Publication Date Title
US20060265138A1 (en) Expression profiling of tumours
US20230287511A1 (en) Neuroendocrine tumors
JP6140202B2 (en) Gene expression profiles to predict breast cancer prognosis
US8781750B2 (en) Cell-type-specific patterns of gene expression
EP2402758B1 (en) Methods and uses for identifying the origin of a carcinoma of unknown primary origin
KR101530689B1 (en) Prognosis prediction for colorectal cancer
Xu et al. Pan-cancer transcriptome analysis reveals a gene expression signature for the identification of tumor tissue origin
Galamb et al. Dysplasia-carcinoma transition specific transcripts in colonic biopsy samples
WO2008095152A2 (en) Methods and materials for identifying the origin of a carcinoma of unknown primary origin
WO2008070301A9 (en) Predicting lung cancer survival using gene expression
WO2011086174A2 (en) Diagnostic gene expression platform
EP2785873A2 (en) Methods of treating breast cancer with taxane therapy
Kerr et al. A 92-gene cancer classifier predicts the site of origin for neuroendocrine tumors
US9347088B2 (en) Molecular signature of liver tumor grade and use to evaluate prognosis and therapeutic regimen
WO2009037090A1 (en) Molecular markers for tumor cell content in tissue samples
KR101847815B1 (en) A method for classification of subtype of triple-negative breast cancer
Delmonico et al. Expression concordance of 325 novel RNA biomarkers between data generated by NanoString nCounter and Affymetrix GeneChip
AU2004219989B2 (en) Expression profiling of tumours
EP2138589A1 (en) Molecular signature of liver tumor grade and use to evaluate prognosis and therapeutic regimen
US20150011411A1 (en) Biomarkers of cancer
Fey The impact of chip technology on cancer medicine

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)
MK14 Patent ceased section 143(a) (annual fees not paid) or expired