KR101725985B1 - Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients - Google Patents

Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients Download PDF

Info

Publication number
KR101725985B1
KR101725985B1 KR1020130009394A KR20130009394A KR101725985B1 KR 101725985 B1 KR101725985 B1 KR 101725985B1 KR 1020130009394 A KR1020130009394 A KR 1020130009394A KR 20130009394 A KR20130009394 A KR 20130009394A KR 101725985 B1 KR101725985 B1 KR 101725985B1
Authority
KR
South Korea
Prior art keywords
breast cancer
prognostic
genes
prognosis
gene
Prior art date
Application number
KR1020130009394A
Other languages
Korean (ko)
Other versions
KR20130023312A (en
Inventor
신영기
김영덕
오은설
김시은
Original Assignee
주식회사 젠큐릭스
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 젠큐릭스 filed Critical 주식회사 젠큐릭스
Priority to KR1020130009394A priority Critical patent/KR101725985B1/en
Publication of KR20130023312A publication Critical patent/KR20130023312A/en
Application granted granted Critical
Publication of KR101725985B1 publication Critical patent/KR101725985B1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development

Abstract

본 발명은 암의 예후(prognosis) 예측을 위한 유전자의 선정방법, 선정된 암의 예후예측용 유전자 및 이를 이용한 유방암 환자의 전이 예측용 키트에 관한 것이다.
본 발명은 초기 유방암의 유전적 특성을 분석함으로써 환자의 예후를 간단한 방법을 통해 높은 신뢰도로 예측함으로써 불필요한 항암치료를 줄일 수 있는 예후 진단에 유용하게 이용될 수 있다.
The present invention relates to a method for predicting the prognosis of cancer, a gene for predicting the prognosis of a selected cancer, and a kit for predicting metastasis of a breast cancer patient using the gene.
The present invention can be used to diagnose a prognosis which can reduce unnecessary cancer treatment by predicting the prognosis of a patient with high reliability through a simple method by analyzing the genetic characteristic of early breast cancer.

Description

초기유방암의 예후 예측용 유전자 및 이를 이용한 초기유방암의 예후예측 방법{Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients}Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients

본 발명은 초기유방암의 예후 예측 유전자의 발굴 및 이를 이용한 초기유방암의 예후예측 방법에 관한 것이다.
The present invention relates to discovery of genes for predicting prognosis of early breast cancer and a method for predicting prognosis of early breast cancer using the same.

인간 유전체정보가 활발하게 활용되면서 암연구는 유전체 수준에서 메카니즘을 밝히는 방향으로 나아가고 있다. 특히 마이크로어레이를 이용하여 수만 개의 유전자의 발현패턴이나 유전자 개수의 증가 혹은 감소에 대한 정보를 바탕으로 거시적인 관점에서 암세포의 특성을 규명할 수 있게 되었다. 이러한 유전체수준의 정보를 분석하는 것은 유기적이고 복잡한 생명현상을 이해하는데 매우 획기적인 방법으로, 앞으로 더욱더 활성화될 것이다. 특히 암과 같은 복합질병(complex disease)의 경우, 소수의 특정유전자에 대한 분석으로는 편협한 결과를 얻기 쉬우며, 암의 발생 및 발달에 대한 큰 행동패턴을 포착하는 것이 중요하기 때문에 유전체 정보 분석이 반드시 필요하다. 이처럼 암 연구에 기본이 되는 대부분의 유전체 정보는 마이크로어레이와 같은 유전체 칩을 이용하여 생성되는데, 수만 개의 유전자에 대한 정보를 한꺼번에 얻을 수 있는 기술은 날로 진화하고 있으며, 고비용의 단점에도 불구하고 마이크로어레이를 이용한 연구 활동이 활발하게 전개되면서 관련정보의 양도 폭발적으로 증가하고 있다. 2000년도 중반부터 이러한 유전체 정보가 수집되어 데이터베이스화되기 시작하였고, 이렇게 수집된 정보를 이용하여 2차, 3차 분석을 수행하는 일은 생명현상 연구의 구심점이 되어가고 있다. As human genome information is actively used, cancer research is moving toward revealing mechanisms at the genome level. In particular, using microarrays, it is possible to identify the characteristics of cancer cells from a macroscopic point of view based on information on the expression patterns of tens of thousands of genes or the increase or decrease of the number of genes. Analyzing such genomic-level information is a very innovative way to understand organic and complex life phenomena, and will be more and more active in the future. In particular, in the case of complex diseases such as cancer, it is easy to obtain narrow results by analyzing a small number of specific genes, and because it is important to capture large behavior patterns for the occurrence and development of cancer, analysis of genomic information is necessary. It is absolutely necessary. As such, most of the genome information, which is the basis for cancer research, is generated using a genome chip such as a microarray, and the technology that can obtain information on tens of thousands of genes at once is evolving day by day, and despite the disadvantages of high cost, microarrays The amount of related information is explosively increasing as research activities using the technology are actively developed. Since the middle of 2000, such genome information has been collected and started to be converted into a database, and performing secondary and tertiary analysis using the collected information is becoming a central point of research on life phenomena.

일반적인 발현(expression) 유전자 칩의 경우, 약 2만-3만개의 유전자를 나타내는 수만 개의 probe가 심어져있고, SNP와 같은 정밀한 정보를 측정하는 마이크로어레이는 백만 개 이상의 probe를 가지고 있는 경우도 있다. 이러한 마이크로어레이는 실험법이 비교적 간단하고 표준화가 되어있으며, 대량의 정보를 짧은 시간에 한꺼번에 얻어 매우 효율적이나, 얻어진 결과를 분석하는 일이 핵심이자 어려운 병목지점이 되었다. 기존의 소수의 유전자를 분석하는 것과는 비교가 되지 않는 수만 개의 유전자에 대한 종합적 분석은, 통계적 분석기술뿐 만 아니라 유전체에 대한 해박한 지식이 뒷받침되어야 비로소 유용한 정보를 캐낼 수 있는 것이다. 뿐만 아니라 대량의 정보를 저장하고 분석을 수행할 수 있는 고성능 전산장비도 필요하며, 관련 전산기술 역시 필수이다. 전통적인 생물학적 연구범위와 실험방법에만 익숙한 연구자가 수행하기 어렵기 때문에, 유전체정보가 엄청난 속도로 증가하더라도 이를 유용하게 활용하지 못하고 있는 것이 국내의 현실이다. 북미나 유럽에 비해 부족한 자본과 연구기술력에 대한 국내 사정을 감안한다면, 공개된 유전체 정보를 적극 활용하는 것이야말로 생물정보학에서 선두 지휘해야 할 부분이다. 특히 암에 대한 연구는 가장 활발하게 유전체 분석을 도입해 왔으며, 관련 정보가 상당한 양으로 축적되어 있다. In the case of a general expression gene chip, tens of thousands of probes representing about 20,000 to 300,000 genes are planted, and microarrays that measure precise information such as SNP may have more than one million probes. These microarrays have relatively simple and standardized experimental methods, and are very efficient by obtaining large amounts of information at once in a short time, but analyzing the obtained results has become a key and difficult bottleneck. Comprehensive analysis of tens of thousands of genes, which is incomparable with analyzing a small number of existing genes, can only unearth useful information only when not only statistical analysis techniques but also extensive knowledge of the genome are supported. In addition, high-performance computing equipment capable of storing and analyzing a large amount of information is required, and related computing technology is also essential. Since it is difficult for researchers who are familiar with only the traditional biological research scope and experimental methods to perform, it is a reality in Korea that even though genome information increases at a tremendous rate, it is not usefully utilized. Considering the domestic situation of insufficient capital and research technology compared to North America or Europe, the active use of the open genome information is the part to be led in bioinformatics. In particular, research on cancer has been the most actively introducing genome analysis, and related information has been accumulated in a considerable amount.

유방암은 자가진단이 가능하고 자가진단의 중요성이 많이 홍보되면서 초기에 발견되는 경우가 많다. 이러한 초기 유방암 환자들에 대해 수술 후 항암치료의 여부를 결정하기가 어려웠다. 병리학적 관찰로 대략적인 예후를 예측할 수 있으나, 관찰결과에 대한 표준화와 정량화가 어렵고 예후예측에 대한 신뢰성이 낮아, 실제 임상에서는 대부분의 초기 유방암 환자에게 항암치료를 권하고 있다. 항암치료의 특성상 환자가 겪는 고통이 매우 크고 경제적 지출이 요구되는데, 초기 유방암의 경우, 항암치료가 필요하지 않은 환자가 절반 이상일 것으로 추측된다. 따라서, 초기 유방암의 특성을 분석하여 환자의 예후를 예측하여 불필요한 항암치료를 줄인다면, 환자의 삶의 질에 큰 도움이 될 것이다. 마이크로어레이를 이용하여 유방암의 수만 개의 유전자의 발현량에 대한 정보를 한 번에 얻을 수 있게 되면서, 분자수준에서 유방암을 분류하고 암의 발생과 발달에 대한 메커니즘을 밝히고자 하는 연구가 활발하게 수행되고 있다. 초기유방암 환자의 예후를 예측하는 것은 임상에서 중요한 일이고, 마이크로어레이를 이용하여 예후를 예측하는 유전자를 발굴하는 일은 이미 2000년대 초부터 시작되었다. 마이크로어레이를 이용한 연구가 고비용임에도 불구하고, 상당한 수의 유방암조직에 대한 발현 profiles이 생산되었고, 연구자들에게 공개되어왔다. 2002년, 78명의 초기유방암 조직과 10여년 동안 추적된 환자의 생존정보를 분석하여 70개의 예후예측유전자가 발굴된 것을 시작으로 하여, 이후 십여 가지의 예후예측 유전자들이 발표되었고, 그 중 몇 가지는 이미 상용화되어 임상에서 활용되고 있다(1-13). 대표적으로 mammaprint(Agendia)와 Oncotype DX(genomic health)가 있으며 임상에서 현재 활용되고 있지만, 여전히 예후에 대한 하나의 참고 자료로서 사용되는 경우가 많은 실정이다 (2, 7).
Breast cancer can be self-diagnosed, and the importance of self-diagnosis is widely promoted, so it is often found early. For these early breast cancer patients, it was difficult to decide whether to use chemotherapy after surgery. Although the approximate prognosis can be predicted by pathological observation, standardization and quantification of the observation results are difficult, and the reliability of the prognosis prediction is low. In practice, chemotherapy is recommended to most early breast cancer patients. Due to the nature of chemotherapy, the pain suffered by patients is very large and economic expenditure is required. In the case of early breast cancer, it is estimated that more than half of the patients do not need chemotherapy. Therefore, if the patient's prognosis is predicted by analyzing the characteristics of early breast cancer and reducing unnecessary chemotherapy, it will be of great help to the patient's quality of life. With the use of microarrays to obtain information on the expression levels of tens of thousands of breast cancer genes at once, research to classify breast cancer at the molecular level and to clarify the mechanisms for the incidence and development of cancer has been actively conducted. have. Predicting the prognosis of early breast cancer patients is important in clinical practice, and discovering genes that predict prognosis using microarrays has already started in the early 2000s. Despite the high cost of microarray studies, a significant number of breast cancer tissue expression profiles have been produced and published to researchers. In 2002, starting with the discovery of 70 prognostic genes by analyzing the survival information of 78 early breast cancer tissues and patients followed for more than 10 years, dozens of prognostic genes have been published since then, some of which have already been reported. It has been commercialized and used in clinical practice (1-13). Representatively, there are mammaprint (Agendia) and Oncotype DX (genomic health), which are currently used in clinical practice, but are still used as a reference material for prognosis in many cases (2, 7).

본 명세서 전체에 걸쳐 다수의 논문 및 특허문헌이 참조되고 그 인용이 표시되어 있다. 인용된 논문 및 특허문헌의 개시 내용은 그 전체로서 본 명세서에 참조로 삽입되어 본 발명이 속하는 기술 분야의 수준 및 본 발명의 내용이 보다 명확하게 설명된다.
Throughout this specification, a number of papers and patent documents are referenced and citations are indicated. The disclosure contents of cited papers and patent documents are incorporated by reference in this specification as a whole, and the level of the technical field to which the present invention belongs and the contents of the present invention are more clearly described.

본 발명자들은 초기 유방암 환자에 대한 항암치료 여부를 결정하기 위하여 유방암의 예후를 예측하는 신뢰도 있는 유전자 진단 시스템을 개발하기 위하여 예의 연구 노력하였다. 그 결과, 초기 유방암 조직으로부터 얻은 마이크로어레이 데이터와 임상정보를 수집, 분석하여 예후와 관련된 유전자를 발굴하고, 이를 이용한 초기유방암환자의 예후예측 모델을 개발하였다.The present inventors have made extensive research efforts to develop a reliable genetic diagnosis system that predicts the prognosis of breast cancer in order to determine whether or not to treat cancer for early breast cancer patients. As a result, microarray data and clinical information obtained from early breast cancer tissues were collected and analyzed to discover genes related to prognosis, and a prognostic model for early breast cancer patients was developed using this.

따라서 본 발명의 목적은 암의 예후(prognosis) 예측을 위한 유전자 선정 방법을 제공하는 데 있다.Accordingly, an object of the present invention is to provide a method for selecting a gene for predicting cancer prognosis.

본 발명의 다른 목적은 암의 예후예측을 위해 발굴된 유전자를 제공하는 데 있다.
Another object of the present invention is to provide a gene discovered for predicting the prognosis of cancer.

본 발명의 다른 목적 및 이점은 하기의 발명의 상세한 설명, 청구범위 및 도면에 의해 보다 명확하게 된다.
Other objects and advantages of the present invention will become more apparent by the following detailed description, claims and drawings.

본 발명의 일 양태에 따르면, 본 발명은 다음의 단계를 포함하는 암의 예후(prognosis) 예측을 위한 유전자 선정 방법을 제공한다:According to one aspect of the present invention, the present invention provides a method for selecting a gene for predicting cancer prognosis, comprising the following steps:

(a) 임상정보를 알고 있는 환자 군으로부터 암 조직을 수집하는 단계;(a) collecting cancer tissue from a group of patients with known clinical information;

(b) 상기 환자군 내에서 기준시점이 경과하기 전에 전이가 발생한 환자를 예후가 나쁜 집단으로 분류하고, 기준시점이 경과한 이후에 전이가 발생하지 않은 환자를 예후가 좋은 집단으로 분류하는 단계; (b) classifying patients with metastasis occurring before the baseline time point within the patient group into a poor prognosis group, and classifying patients who have not developed metastasis after the baseline time point into a good prognosis group;

(c) 상기 수집한 암 조직으로부터 유전자의 발현 프로파일을 수집하는 단계;(c) collecting the gene expression profile from the collected cancer tissues;

(d) 상기 예후가 나쁜 집단 및 예후가 좋은 집단 간 발현량의 차이를 보이는 유전자를 선정하는 단계;(d) selecting a gene showing a difference in expression levels between the group having a poor prognosis and a group having a good prognosis;

(e) 상기 선정된 유전자를 발현패턴에 대한 군집분석을 통하여 발현패턴별로 분류하는 단계;(e) classifying the selected gene by expression pattern through cluster analysis of the expression pattern;

(f) 상기 발현패턴별로 분류된 유전자 군집에 대한 기능분석을 수행하여 특정한 기능과 유의적인 연관성을 가지는 발현패턴을 선정하는 단계; 및 (f) selecting an expression pattern having a significant correlation with a specific function by performing a function analysis on the gene cluster classified by the expression pattern; And

(g) 상기 선정된 발현패턴에 속하는 유전자들 중, 발현량이 많고 예후가 나쁜 집단 및 예후가 좋은 집단 간 발현량의 차이가 큰 유전자를 선정하는 단계.
(g) selecting a gene having a large difference in expression level between the group having a high expression level and a poor prognosis and a group having a good prognosis among the genes belonging to the selected expression pattern.

본 발명자들은 초기 유방암 환자에 대한 항암치료 여부를 결정하기 위하여 유방암의 예후를 예측하는 신뢰도 있는 유전자 진단 시스템을 개발하기 위하여 예의 연구 노력하였다. 그 결과, 암 조직으로부터 얻은 마이크로어레이 데이터와 임상정보를 수집, 분석하여 예후와 관련된 유전자를 발굴하고, 이를 이용하여 암 환자의 예후예측모델을 개발하였다.The present inventors have made extensive research efforts to develop a reliable genetic diagnosis system that predicts the prognosis of breast cancer in order to determine whether or not to treat cancer for early breast cancer patients. As a result, microarray data and clinical information obtained from cancer tissues were collected and analyzed to discover genes related to prognosis, and using this, a prognostic model for cancer patients was developed.

본 명세서에서 용어“예후(prognosis)”는 질병을 진단하여 판단된 장래의 증세 또는 경과에 대한 전망을 말한다. 암 환자에 있어서 예후는 통상적으로 암 발병 또는 외과적 시술 후 일정기간 내의 전이 여부 또는 생존기간을 뜻한다. 예후의 예측은 특히 초기유방암 환자의 화학치료 여부를 비롯하여 향후 유방암 치료의 방향에 대한 단서를 제시하므로 매우 중요한 임상적 과제이다.
In this specification, the term "prognosis" refers to a prospect for a future symptom or course determined by diagnosing a disease. In cancer patients, the prognosis usually refers to whether or not metastasis or survival within a certain period of time after the onset of cancer or surgical procedure. Prediction of prognosis is a very important clinical task as it provides clues on the direction of breast cancer treatment in the future, including whether or not chemotherapy for early breast cancer patients.

본 발명의 바람직한 구현예에 따르면, 본 발명의 (a) 단계의 상기 임상정보는 암의 전이상태에 대한 정보를 포함한다. According to a preferred embodiment of the present invention, the clinical information in step (a) of the present invention includes information on the metastatic state of cancer.

본 명세서에서 용어“전이(metastasis)”는 어떤 종양이 그 원발 부위에서 여러 경로를 따라 다른 신체의 부위에 이식되어 그곳에 정착 및 증식하는 상태를 말한다. 암의 전이여부는 해당 암의 고유의 특성에 의하여 결정될 뿐만 아니라 암의 예후 결정에 있어서 가장 중요한 단서가 되는 사건이므로, 암 환자의 생존과 관련된 가장 중요한 임상정보로 다루어진다. 본 발명에 따르면, 암 조직을 수집한 환자의 전이에 대한 정보를 확보하고 있는 상태에서, 전이여부가 서로 다른 집단 간의 유전자 발현 프로파일의 차이를 분석함으로써 예후 예측의 마커가 되는 유전자를 선정할 수 있다. In the present specification, the term “metastasis” refers to a state in which a tumor is implanted from its primary site to another body site along various pathways and settles and proliferates there. The metastasis of cancer is not only determined by the specific characteristics of the cancer, but it is an event that is the most important clue in determining the prognosis of cancer, so it is treated as the most important clinical information related to the survival of cancer patients. According to the present invention, in a state in which information on metastasis of a patient who has collected cancer tissues is secured, a gene that is a marker for predicting prognosis can be selected by analyzing the difference in gene expression profiles between different groups with or without metastasis. .

본 발명의 (b) 단계에 있어서, 상기 기준시점은 통상적으로 당업계에서 암 환자의 예후 판단의 기준으로 삼는 기간으로서, 발병 후 전이가 발생하기까지의 경과기간을 의미한다. 기준시점은 바람직하게는 발병 후 3-12년이며, 보다 바람직하게는 5-10년이다. 또한 예후가 나쁜 집단으로 분류하기 위한 기준시점과 예후가 좋은 집단으로 분류하기 위한 기준시점은 동일한 기간일 수도 있으며, 상이한 기간일 수도 있다. 가장 바람직하게는, 상기 환자군 내에서 발병 후 5년 이내에 전이가 발생한 환자를 예후가 나쁜 집단으로 분류하고, 발병 후 10 년 이상 전이가 발생하지 않은 환자를 예후가 좋은 집단으로 분류한다.
In step (b) of the present invention, the reference point is a period generally used as a criterion for determining the prognosis of a cancer patient in the art, and refers to a period of time from onset until metastasis occurs. The reference time point is preferably 3-12 years after onset, more preferably 5-10 years. Also, the reference time point for classification into a group with a poor prognosis and a reference time point for classification into a group with a good prognosis may be the same period or different periods. Most preferably, within the patient group, patients with metastasis within 5 years after onset are classified as a poor prognosis group, and patients who do not develop metastasis for more than 10 years after onset are classified as a good prognosis group.

본 발명의 (c) 단계에 있어서, 용어“발현 프로파일(expression profile)”이란 생체 세포, 조직 또는 기관의 기능에 대한 전반적인 정보를 얻기 위하여 수많은 유전자의 활성을 동시에 측정하는 것을 말한다. 유전자의 활성이란 전사 활성, 번역 활성, 생성된 단백질의 발현량 및 이의 생체 내 활성을 모두 포함한다. In step (c) of the present invention, the term "expression profile" refers to simultaneously measuring the activity of a number of genes in order to obtain overall information on the function of a living cell, tissue or organ. The activity of a gene includes all of the transcriptional activity, translational activity, the expression level of the produced protein, and its in vivo activity.

유전자의 발현 프로파일을 수집하는 단계는 예를 들어 마이크로어레이 분석, 멀티플렉스 PCR(multiplex polymerase chain reaction), 정량 RT-PCR(quantitative reverse transcription polymerase chain reaction), 타일링 어레이(tiling array)를 이용한 전사체(transcriptome) 해석, 쇼트 리드 시퀀싱(short read sequencing)를 이용하여 이루어질 수 있으나, 이에 제한되지 않고 당업계에 알려진 다양한 방법으로 이루어질 수 있다. 바람직하게는 마이크로어레이 분석에 의하여 실시될 수 있다. 수집된 마이크로어레이 발현 프로파일을 통계적으로 분석하기 위해서, 당업계에서 통상적으로 사용하는 다양한 방법의 표준화방법을 이용할 수 있으나, 바람직하게는 RMA(Robust Multi-array Average) 표준화(normalization) 방법을 이용한다.
The steps of collecting the expression profile of the gene include, for example, microarray analysis, multiplex polymerase chain reaction (PCR), quantitative reverse transcription polymerase chain reaction (RT-PCR), and transcripts using a tiling array. transcriptome) analysis, short read sequencing, but is not limited thereto, and may be performed by various methods known in the art. Preferably, it can be carried out by microarray analysis. In order to statistically analyze the collected microarray expression profile, standardization methods of various methods commonly used in the art may be used, but a robust multi-array average (RMA) normalization method is preferably used.

본 발명의 (d) 단계에 있어서, 용어“발현량의 차이”란 상기 분석된 마이크로어레이 발현프로파일을 이용하여 비교한 결과 각 예후집단 간 특정 유전자의 발현정도가 통계적으로 유의하게 (FDR < 0.01) 차이가 나는 것을 말한다. In step (d) of the present invention, the term "difference in expression amount" is compared using the analyzed microarray expression profile. As a result, the expression level of a specific gene between each prognostic group is statistically significant (FDR <0.01). It says that it makes a difference.

발현량 차이의 분석은 당업계에서 통상적으로 사용하는 다양한 방법을 사용할 수 있으며, 바람직하게는 SAM(Significant Analysis of Microarray) 분석을 통해서 수행한다. The analysis of the difference in expression level may use various methods commonly used in the art, and is preferably performed through a Significant Analysis of Microarray (SAM) analysis.

SAM 분석은 마이크로어레이 분석 알고리듬인 SAM을 이용한 분석으로서, 집단간 발현량의 차이를 T-검정과 유사한 방법으로 계산하고, 발현량의 차이의 유의성을 FDR(false discovery rate, q-값)로 나타낸다. q-값이 작을수록 유전자 발현의 차이가 유의한 것을 뜻한다.
SAM analysis is an analysis using SAM, a microarray analysis algorithm, by calculating the difference in expression levels between groups in a similar method to the T-test, and expressing the significance of the difference in expression levels as FDR (false discovery rate, q-value). . The smaller the q-value, the more significant the difference in gene expression is.

본 발명의 바람직한 구현예에 따르면, 본 발명의 암은 유방암이며, 보다 바람직하게는 초기 유방암이다.
According to a preferred embodiment of the present invention, the cancer of the present invention is breast cancer, more preferably early breast cancer.

본 발명의 보다 바람직한 구현예에 따르면, 본 발명의 (b) 단계와 (c) 단계의 사이에 상기 환자군을 에스트로겐 수용체(estrogen receptor, ER)의 기준 발현량 미만의 환자군 및 기준 발현량 이상의 환자군으로 분류하는 단계를 추가적으로 포함한다.According to a more preferred embodiment of the present invention, between steps (b) and (c) of the present invention, the patient group is divided into a group of patients less than the reference expression level of estrogen receptor (ER) and a group of patients above the reference expression level. It further includes the step of classifying.

에스트로겐 수용체의 발현 여부는 유방암 환자를 서브타입으로 분류할 때 가장 보편적으로 사용하는 기준이며, 에스트로겐 수용체의 발현수준이 낮을수록 유방암의 전이 위험도가 높아지는 것으로 알려져 있다. 보통 임상에서는 병리학자에 의한 ER IHC(immuno-histochemistry)의 판독결과에 의해 에스트로겐 수용체 양성(ER+) 혹은 음성(ER-)으로 나눈다. 본 발명에 따르면, 대상 환자군을 에스트로겐 수용체의 발현량에 따라 분류하되, 예후가 나쁜 집단과 예후가 좋은 집단에 대해 각각 에스트로겐 수용체 양성군 및 에스토겐 수용체 음성군으로 분류하여 분석을 수행함으로써 각 예후 집단 간 유의한 차이를 보이는 유전자를 보다 신뢰도 있게 선별할 수 있다.
Whether or not the estrogen receptor is expressed is the most commonly used criterion when classifying breast cancer patients into subtypes, and it is known that the lower the expression level of the estrogen receptor, the higher the risk of breast cancer metastasis. Usually in clinical practice, it is divided into estrogen receptor positive (ER+) or negative (ER-) according to the reading of ER immuno-histochemistry (ER IHC) by a pathologist. According to the present invention, the target patient group is classified according to the expression level of the estrogen receptor, but the group with a poor prognosis and a group with a good prognosis are classified into an estrogen receptor positive group and an estogen receptor negative group, respectively, and analysis is performed between each prognostic group. Genes showing significant differences can be selected more reliably.

가장 바람직하게는 본 발명의 에스트로겐 수용체에 대한 타입(ER+ 또는 ER-)을 분류하기 위한 기준 발현량은 수집된 ER IHC(estrogen receptor immuno-histochemistry) 판독결과를 기준으로 ESR 1(estrogen receptor 1) mRNA의 발현량에 대한 ROC(receiver-operating characteristics) 분석을 이용하여 결정한다.
Most preferably, the reference expression level for classifying the type (ER+ or ER-) of the estrogen receptor of the present invention is based on the collected ER IHC (estrogen receptor immuno-histochemistry) reading result, ESR 1 ( estrogen receptor 1 ) mRNA It is determined using ROC (receiver-operating characteristics) analysis for the expression level of.

본 명세서에서 용어“군집분석(clustering analysis)”은 분석대상들 간의 구조적인 관계를 확인할 목적으로 이들을 집단(cluster)으로 분류하는 다변량 분석방법을 말한다. In this specification, the term "clustering analysis" refers to a multivariate analysis method for classifying them into clusters for the purpose of confirming the structural relationship between the analysis targets.

본 발명의 (e) 단계에 있어서, 군집분석은 당업계에서 통상적으로 사용하는 다양한 방법을 사용할 수 있으며, 바람직하게는 주성분 분석(Principal Component Analysis, PCA)를 통해서 수행된다. PCA 분석은 여러 유전자 변수들의 정보를 선형결합하여 소수의 재조합된 새로운 유전자 변수(super-gene)들을 생성한다. 즉 원자료의 정보의 손실을 적게 하면서 변수의 수를 줄여서 차원을 축소하는 방법이다.In step (e) of the present invention, cluster analysis may be performed using various methods commonly used in the art, and is preferably performed through Principal Component Analysis (PCA). PCA analysis creates a small number of recombined new super-genes by linearly combining information from several genetic variables. That is, it is a method of reducing the dimension by reducing the number of variables while reducing the loss of information in the original data.

본 명세서에서 용어“기능분석(function analysis)”은 상기 (e) 단계에서 선정된 주성분과 관련이 높은 유전자들에 대한 생물학적 기능을 알아보는 것을 의미한다. In the present specification, the term “function analysis” means finding out the biological functions of genes that are highly related to the main component selected in step (e).

본 발명의 (f) 단계에 있어서, 기능분석은 당업계에서 통상적으로 사용하는 다양한 방법을 사용할 수 있으며, 바람직하게는 GO(Gene Ontology) 분석을 통해서 수행된다. In step (f) of the present invention, the functional analysis can be performed using various methods commonly used in the art, and is preferably performed through a GO (Gene Ontology) analysis.

본 발명의 (g) 단계에 있어서, 예후예측 유전자 선택은 통계적 유의성에 따라 선택할 수 있으며, 바람직하게는 예후 집단간 평균 발현량의 차이 이외에도 선택된 주성분과의 상관성, 평균 발현량, 사분위수 범위를 추가적으로 고려하여 선택한다. 본 발명에서 용어“발현량이 많다”는 상기 선정된 발현패턴에 속하는 유전자 중에서 평균 발현량이 통계적 분석이 용이할 만큼 높은 경우를 가리키며, 바람직하게는 선정된 유전자군 중 발현량이 최상위에 랭크된 유전자 순으로 선정한다. 발명에서 용어“발현량의 차이가 크다”는 상기 선정된 발현패턴에 속하는 유전자 중에서 예후 집단간 평균 발현량의 차이가 실험적 분석이 용이할 만큼 뚜렷한 경우를 가리키며, 바람직하게는 선정된 유전자군 중 예후 집단간 평균 발현량의 차이가 최상위에 랭크된 유전자 순으로 선정한다. 가장 바람직하게는 선정된 유전자군 중 발현량이 최상위에 랭크된 유전자 및 예후 집단간 평균 발현량의 차이가 최상위에 랭크된 유전자 순으로 선정한다.
In step (g) of the present invention, the selection of the prognostic gene can be selected according to statistical significance, and preferably, in addition to the difference in the average expression amount between prognostic groups, the correlation with the selected main component, the average expression amount, and the quartile range are additionally added. Choose in consideration. In the present invention, the term “high expression amount” refers to a case where the average expression level is high enough for easy statistical analysis among genes belonging to the selected expression pattern, and preferably, the gene expression level in the selected gene group is ranked in the highest order. Select. In the present invention, the term “large difference in expression level” refers to a case in which the difference in the average expression level between prognostic groups among genes belonging to the selected expression pattern is clear enough for easy experimental analysis, and preferably, the prognosis of the selected gene group The difference in the average expression level between groups is selected in the order of the highest ranked genes. Most preferably, the genes with the highest expression level among the selected gene groups and the difference in the average expression level between the prognostic groups are selected in the order of the highest ranking genes.

바람직하게는, 본 발명의 상기 (g)단계 이후에 상기 선정된 예후 예측용 유전자를 이용하여 생존확률에 대한 수학적 모델을 개발하는 단계를 추가적으로 포함시킬 수 있다. 이러한 모델 개발은 선정된 예후예측 유전자들을 변수(covariate)로 하는 생존 회귀분석을 통해 전이가 일어나는데 걸리는 시간과 예후예측 유전자들간의 관계를 수식화함으로써 수행할 수 있다. 다양한 생존모델을 이용하여 환자의 전이 시간과 예후예측 유전자의 관계를 밝힐 수 있으며, 바람직하게는 모수적 생존분석인 가속화 시간고장 모델(AFT)를 이용하여 예후예측모델링을 수행한다. 바람직하게는 선정된 예후예측 유전자를 이용하여 개발한 생존모델을 독립적인 데이터세트에서 검증할 수 있다. 검증 방법은 생존확률과 실제 관찰된 생존확률을 비교할 수 있으며, 또는 생존모델을 이용하여 예후집단(예후가 좋은 집단 또는 예후가 나쁜 집단)을 분류하였을 때 실제로 관찰된 예후집단과 비교함으로써 생존모델의 정확성을 평가할 수 있다.
Preferably, after the step (g) of the present invention, it may additionally include the step of developing a mathematical model for the survival probability using the selected gene for predicting prognosis. Such model development can be performed by formulating the relationship between the time it takes for metastasis to occur and the relationship between the genes for predicting prognosis through survival regression analysis using selected prognostic genes as a covariate. The relationship between the patient's transition time and the prognostic gene can be revealed using various survival models, and prognostic modeling is preferably performed using the accelerated time failure model (AFT), which is a parametric survival analysis. Preferably, the survival model developed using the selected prognostic gene can be verified in an independent dataset. The verification method can compare the survival probability with the actual observed survival probability, or by comparing the prognostic group actually observed when the prognostic group (a good prognosis group or a poor prognosis group) is classified using a survival model. You can evaluate the accuracy.

본 발명의 다른 양태에 따르면, 본 발명은 서열목록 제 1 서열 내지 제 9 서열로 구성된 군으로부터 선택되는 뉴클레오타이드 서열에 특이적으로 결합하는 프라이머 또는 프로브를 포함하는 유방암 환자의 전이 위험도 예측용 키트를 제공한다.According to another aspect of the present invention, the present invention provides a kit for predicting the risk of metastasis in breast cancer patients comprising a primer or probe that specifically binds to a nucleotide sequence selected from the group consisting of the first to ninth sequence of the sequence list. do.

본 발명에 따르면, 본 발명의 유방암 환자의 전이 위험도 예측용 키트는 본 발명의 뉴클레오타이드에 특이적으로 결합하는 프로브를 이용한 마이크로어레이 또는 본 발명의 뉴클레오타이드에 특이적으로 결합하는 프라이머를 이용한 유전자 증폭 키트일 수 있다. According to the present invention, the kit for predicting the risk of metastasis in breast cancer patients of the present invention is a microarray using a probe that specifically binds to the nucleotide of the present invention or a gene amplification kit using a primer that specifically binds to the nucleotide of the present invention. I can.

본 명세서에서, 용어“뉴클레오타이드”는 단일가닥 또는 이중가닥 형태로 존재하는 디옥시리보뉴클레오타이드 또는 리보뉴클레오타이드이며, 다르게 특별하게 언급되어 있지 않은 한 자연의 뉴클레오타이드의 유사체를 포함한다(Scheit, Nucleotide Analogs, John Wiley, New York(1980); Uhlman 및 Peyman, Chemical Reviews, 90:543-584(1990)).In the present specification, the term “nucleotide” refers to a deoxyribonucleotide or ribonucleotide present in a single-stranded or double-stranded form, and includes analogs of natural nucleotides unless otherwise specifically stated (Scheit, Nucleotide Analogs, John Wiley, New York (1980); Uhlman and Peyman, Chemical Reviews , 90:543-584 (1990)).

본 명세서에서 사용되는 용어 “프라이머”는 올리고뉴클레오타이드를 의미하는 것으로, 핵산쇄(주형)에 상보적인 프라이머 연장 산물의 합성이 유도되는 조건, 즉, 뉴클레오타이드와 DNA 중합효소와 같은 중합제의 존재, 그리고 적합한 온도와 pH의 조건에서 합성의 개시점으로 작용할 수 있다. 바람직하게는, 프라이머는 디옥시리보뉴클레오타이드이며 단일쇄이다. 본 발명에서 이용되는 프라이머는 자연(naturally occurring) dNMP(즉, dAMP, dGMP, dCMP 및 dTMP), 변형 뉴클레오타이드 또는 비-자연 뉴클레오타이드를 포함할 수 있다. 또한, 프라이머는 리보뉴클레오타이드도 포함할 수 있다.The term “primer” as used herein refers to an oligonucleotide, under conditions under which the synthesis of a primer extension product complementary to a nucleic acid chain (template) is induced, that is, the presence of a polymerization agent such as nucleotide and DNA polymerase, and It can act as a starting point for synthesis under conditions of suitable temperature and pH. Preferably, the primer is deoxyribonucleotide and is single chain. Primers used in the present invention may include naturally occurring dNMP (ie, dAMP, dGMP, dCMP and dTMP), modified nucleotides or non-natural nucleotides. In addition, the primer may also include a ribonucleotide.

본 발명의 프라이머는 타겟 핵산에 어닐링 되어 주형-의존성 핵산 중합효소에 의해 타겟 핵산에 상보적인 서열을 형성하는 연장 프라이머(extension primer)일 수 있으며, 이는 고정화 프로브가 어닐링 되어 있는 위치까지 연장되어 프로브가 어닐링 되어 있는 부위를 차지한다.The primer of the present invention may be an extension primer that is annealed to the target nucleic acid to form a sequence complementary to the target nucleic acid by a template-dependent nucleic acid polymerase, which extends to the position where the immobilized probe is annealed, so that the probe is It occupies the annealed area.

본 발명에서 이용되는 연장 프라이머는 타겟 핵산의 제1위치에 상보적인 혼성화 뉴클레오타이드 서열을 포함한다. 용어 “상보적”은 소정의 어닐링 또는 혼성화 조건하에서 프라이머 또는 프로브가 타겟 핵산 서열에 선택적으로 혼성화할 정도로 충분히 상보적인 것을 의미하며, 실질적으로 상보적(substantially complementary) 및 완전히 상보적(perfectly complementary)인 것을 모두 포괄하는 의미를 가지며, 바람직하게는 완전히 상보적인 것을 의미한다. 본 명세서에서, 프라이머 서열과 관련하여 사용되는 용어, “실질적으로 상보적인 서열”은 완전히 일치되는 서열뿐만 아니라, 특정 서열에 어닐링하여 프라이머 역할을 할 수 있는 범위 내에서, 비교 대상의 서열과 부분적으로 불일치되는 서열도 포함되는 의미이다.The extension primer used in the present invention contains a hybridization nucleotide sequence that is complementary to the first position of the target nucleic acid. The term “complementary” means that a primer or probe is sufficiently complementary to selectively hybridize to a target nucleic acid sequence under predetermined annealing or hybridization conditions, and is substantially complementary and perfectly complementary. It has the meaning of all inclusive, and preferably means completely complementary. In the present specification, the term "substantially complementary sequence" used in relation to a primer sequence is not only a sequence that is completely matched, but also a sequence of a target to be compared within a range that can serve as a primer by annealing to a specific sequence. It means that mismatched sequences are also included.

프라이머는, 중합제의 존재 하에서 연장 산물의 합성을 프라이밍시킬 수 있을 정도로 충분히 길어야 한다. 프라이머의 적합한 길이는 다수의 요소, 예컨대, 온도, 응용분야 및 프라이머의 소스(source)에 따라 결정되지만 전형적으로 15-30 뉴클레오타이드이다. 짧은 프라이머 분자는 주형과 충분히 안정된 혼성 복합체를 형성하기 위하여 일반적으로 보다 낮은 온도를 요구한다. 용어 “어닐링” 또는 “프라이밍”은 주형 핵산에 올리고디옥시뉴클레오타이드 또는 핵산이 병치(apposition)되는 것을 의미하며, 상기 병치는 중합효소가 뉴클레오타이드를 중합시켜 주형 핵산 또는 그의 일부분에 상보적인 핵산 분자를 형성하게 한다.The primer should be long enough to prime the synthesis of the extension product in the presence of a polymerizing agent. The suitable length of a primer depends on a number of factors, such as temperature, application, and source of the primer, but is typically 15-30 nucleotides. Short primer molecules generally require lower temperatures to form sufficiently stable hybrid complexes with the template. The term “annealing” or “priming” refers to the apposition of oligodioxynucleotides or nucleic acids to a template nucleic acid, and the juxtaposition means that a polymerase polymerizes the nucleotides to form a nucleic acid molecule complementary to the template nucleic acid or a portion thereof. Let's do it.

프라이머의 서열은 주형의 일부 서열과 완전하게 상보적인 서열을 가질 필요는 없으며, 주형과 혼성화 되어 프라이머 고유의 작용을 할 수 있는 범위 내에서의 충분한 상보성을 가지면 충분하다. 따라서 본 발명에서의 프라이머는 주형인 상술한 뉴클레오티드 서열에 완벽하게 상보적인 서열을 가질 필요는 없으며, 이 유전자 서열에 혼성화되어 프라이머 작용을 할 수 있는 범위 내에서 충분한 상보성을 가지면 충분하다. 이러한 프라이머의 디자인은 상술한 뉴클레오티드 서열을 참조하여 당업자에 의해 용이하게 실시할 수 있으며, 예컨대, 프라이머 디자인용 프로그램(예: PRIMER 3 프로그램)을 이용하여 할 수 있다.The sequence of the primer does not need to have a sequence that is completely complementary to some of the sequences of the template, and it is sufficient to have sufficient complementarity within the range capable of hybridizing with the template to perform a unique function of the primer. Therefore, the primer in the present invention does not need to have a sequence that is completely complementary to the above-described nucleotide sequence as a template, and it is sufficient if it has sufficient complementarity within a range capable of hybridizing to this gene sequence to function as a primer. The design of such a primer can be easily carried out by a person skilled in the art by referring to the nucleotide sequence described above, for example, it can be done using a primer design program (eg, PRIMER 3 program).

본 명세서에서, 용어 “핵산 분자”는 DNA(gDNA 및 cDNA) 그리고 RNA 분자를 포괄적으로 포함하는 의미를 갖으며, 핵산 분자에서 기본 구성 단위인 뉴클레오타이드는 자연의 뉴클레오타이드뿐만 아니라, 당 또는 염기 부위가 변형된 유사체 (analogue)도 포함한다(Scheit, Nucleotide Analogs, John Wiley, New York(1980); Uhlman 및 Peyman, Chemical Reviews, 90:543-584(1990)).In the present specification, the term "nucleic acid molecule" has a meaning that comprehensively includes DNA (gDNA and cDNA) and RNA molecules, and nucleotides, which are basic structural units in nucleic acid molecules, are not only natural nucleotides, but also sugar or base sites are modified. (Scheit, Nucleotide Analogs, John Wiley, New York (1980); Uhlman and Peyman, Chemical Reviews , 90:543-584 (1990)).

본 발명의 키트에서 출발물질이 gDNA인 경우, gDNA의 분리는 당업계에 공지된 통상의 방법에 따라 실시될 수 있다(참조: Rogers & Bendich (1994)). When the starting material in the kit of the present invention is gDNA, the separation of gDNA can be carried out according to a conventional method known in the art (see Rogers & Bendich (1994)).

출발물질이 mRNA인 경우에는, 당업계에 공지된 통상의 방법에 총 RNA를 분리하여 실시된다(참조: Sambrook, J. et al., Molecular Cloning. A Laboratory Manual, 3rd ed. Cold Spring Harbor Press(2001); Tesniere, C. et al., Plant Mol. Biol. Rep., 9:242(1991); Ausubel, F.M. et al., Current Protocols in Molecular Biology, John Willey & Sons(1987); 및 Chomczynski, P. et al., Anal. Biochem. 162:156(1987)). 분리된 총 RNA는 역전사효소를 이용하여 cDNA로 합성된다. 상기 총 RNA는 인간(예컨대, 비만 또는 당뇨 환자)으로부터 분리된 것이기 때문에, mRNA의 말단에는 폴리-A 테일을 갖고 있으며, 이러한 서열 특성을 이용한 올리고 dT 프라이머 및 역전사 효소를 이용하여 cDNA을 용이하게 합성할 수 있다(참조: PNAS USA, 85:8998(1988); Libert F, et al., Science, 244:569(1989); 및 Sambrook, J. et al., Molecular Cloning. A Laboratory Manual, 3rd ed. Cold Spring Harbor Press(2001)).When the starting material is mRNA, it is carried out by separating total RNA according to a conventional method known in the art (see: Sambrook, J. et al., Molecular Cloning. A Laboratory Manual , 3rd ed. Cold Spring Harbor Press ( 2001); Tesniere, C. et al., Plant Mol. Biol. Rep. , 9:242 (1991); Ausubel, FM et al., Current Protocols in Molecular Biology , John Willey & Sons (1987); And Chomczynski, P. et al., Anal. Biochem. 162:156 (1987)). The isolated total RNA is synthesized into cDNA using reverse transcriptase. Since the total RNA is isolated from humans (e.g., obese or diabetic patients), it has a poly-A tail at the end of the mRNA, and cDNA is easily synthesized using oligo dT primers and reverse transcriptases using these sequence characteristics. ( PNAS USA , 85:8998 (1988); Libert F, et al., Science , 244:569 (1989); and Sambrook, J. et al., Molecular Cloning. A Laboratory Manual , 3rd ed ed. Cold Spring Harbor Press (2001)).

본 발명의 키트에 있어서, 상기 특정 서열을 규명하는 것은 당업계에 공지된 다양한 방법을 응용하여 실시될 수 있다. 예를 들어, 본 발명에 응용될 수 있는 기술은, 형광 인 시투 혼성화 (FISH), 직접적 DNA 서열결정, PFGE 분석, 서던 블롯 분석, 단일-가닥 컨퍼메이션 분석(SSCA, Orita et al., PNAS, USA 86:2776(1989)), RNase 보호 분석(Finkelstein et al., Genomics, 7:167(1990)), 닷트 블롯 분석, 변성 구배 젤 전기영동(DGGE, Wartell et al., Nucl.Acids Res., 18:2699(1990)), 뉴클레오타이드 미스매치를 인식하는 단백질(예: E. coli의 mutS 단백질)을 이용하는 방법(Modrich, Ann. Rev. Genet., 25:229-253(1991)), 및 대립형-특이 PCR을 포함하나, 이에 한정되는 것은 아니다.In the kit of the present invention, the identification of the specific sequence may be carried out by applying various methods known in the art. For example, techniques that can be applied to the present invention include fluorescence in situ hybridization (FISH), direct DNA sequencing, PFGE analysis, Southern blot analysis, single-stranded conformation analysis (SSCA, Orita et al., PNAS, USA 86:2776 (1989)), RNase protection assay (Finkelstein et al., Genomics , 7:167 (1990)), dot blot analysis, denaturation gradient gel electrophoresis (DGGE, Wartell et al., Nucl. Acids Res. , 18:2699 (1990)), a method using a protein that recognizes nucleotide mismatch (eg, mutS protein of E. coli ) (Modrich, Ann. Rev. Genet. , 25:229-253 (1991)), and Allelic-specific PCR includes, but is not limited to.

서열변화가 단일-가닥 분자내 염기 결합의 차이를 초래하여, 이동성이 다른 밴드를 출현하게 하는 데, SSCA는 이 밴드를 검출한다. DGGE 분석은 변성 구배 젤을 이용하여, 야생형 서열과 다른 이동성을 나타내는 서열을 검출한다. The sequence change results in differences in base binding within the single-stranded molecule, resulting in the appearance of bands with different mobility, which SSCA detects. DGGE analysis uses a denaturing gradient gel to detect sequences exhibiting different mobility from wild-type sequences.

다른 기술들은 일반적으로 본 발명의 뉴클레오타이드들을 포함하는 서열에 상보적인 프로브 또는 프라이머를 이용한다.Other techniques generally use a probe or primer that is complementary to a sequence comprising the nucleotides of the present invention.

예를 들어, RNase 보호 분석에서, 본 발명의 뉴클레오타이드들을 포함하는 서열에 상보적인 리보프로브가 이용된다. 상기 리보프로브와 인간으로부터 분리한 DNA 또는 mRNA를 혼성화시키고, 이어 미스매치를 검출할 수 있는 RNase A 효소로 절단한다. 만일, 미스매치가 있어 RNase A가 인식을 한 경우에는, 보다 작은 밴드가 관찰된다. For example, in an RNase protection assay, a riboprobe that is complementary to a sequence comprising the nucleotides of the present invention is used. The riboprobe and DNA or mRNA isolated from humans are hybridized, followed by digestion with RNase A enzyme capable of detecting a mismatch. If there is a mismatch and RNase A recognizes, a smaller band is observed.

혼성화 시그널을 이용하는 분석에서, 본 발명의 뉴클레오타이드 서열에 상보적인 프로브가 이용된다. 이러한 기술에서, 프로브와 타깃 서열의 혼성화 시그널을 검출하여 직접적으로 DM 또는 MS 여부를 결정한다.In assays using hybridization signals, probes that are complementary to the nucleotide sequence of the present invention are used. In this technique, DM or MS is directly determined by detecting the hybridization signal of the probe and the target sequence.

본 명세서에서, 용어 “프로브”는 특정 뉴클레오타이드 서열에 혼성화될 수 있는 디옥시리보뉴클레오타이드 및 리보뉴클레오타이드를 포함하는 자연 또는 변형되는 모노머 또는 결합을 갖는 선형의 올리고머를 의미한다. 바람직하게는, 프로브는 혼성화에서의 최대 효율을 위하여 단일가닥이다. 프로브는 바람직하게는 디옥시리보뉴클레오타이드이다.In the present specification, the term “probe” refers to a linear oligomer having a natural or modified monomer or bond including a deoxyribonucleotide and a ribonucleotide capable of hybridizing to a specific nucleotide sequence. Preferably, the probe is single stranded for maximum efficiency in hybridization. The probe is preferably a deoxyribonucleotide.

본 발명에 이용되는 프로브로서, 상기 뉴클레오타이드 서열에 완전하게(perfectly) 상보적인 서열이 이용될 수 있으나, 특이적 혼성화를 방해하지 않는 범위 내에서 실질적으로(substantially) 상보적인 서열이 이용될 수도 있다. 일반적으로, 혼성화에 의해 형성되는 듀플렉스(duplex)의 안정성은 말단의 서열의 일치에 의해 결정되는 경향이 있기 때문에, 3’-말단 또는 5’-말단에 본 발명의 뉴클레오타이드 서열에 상보적인 염기를 갖는 프로브에서 말단 부분이 혼성화되지 않으면, 이러한 듀플렉스는 엄격한 조건에서 해체될 수 있다. As the probe used in the present invention, a sequence that is perfectly complementary to the nucleotide sequence may be used, but a sequence that is substantially (substantially) complementary within a range that does not interfere with specific hybridization may be used. In general, since the stability of the duplex formed by hybridization tends to be determined by the matching of the sequence at the end, it has a base complementary to the nucleotide sequence of the present invention at the 3'-end or 5'-end. If the end portions of the probe are not hybridized, these duplexes can be disassembled under stringent conditions.

혼성화에 적합한 조건은 Joseph Sambrook, et al., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.(2001) 및 Haymes, B. D., et al., Nucleic Acid Hybridization, A Practical Approach, IRL Press, Washington, D.C. (1985)에 개시된 사항을 참조하여 결정할 수 있다. 혼성화에 이용되는 엄격한 조건(stringent condition)은 온도, 이온세기(완충액 농도) 및 유기 용매와 같은 화합물의 존재 등을 조절하여 결정될 수 있다. 이러한 엄격한 조건은 혼성화되는 서열에 의존하여 다르게 결정될 수 있다.
Conditions suitable for hybridization include Joseph Sambrook, et al., Molecular Cloning, A Laboratory Manual , Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (2001) and Haymes, BD, et al., Nucleic Acid Hybridization, A Practical Approach , IRL Press, Washington, DC (1985). Stringent conditions used for hybridization can be determined by controlling temperature, ionic strength (buffer concentration), and the presence of compounds such as organic solvents. These stringent conditions can be determined differently depending on the sequence being hybridized.

본 발명의 바람직한 구현예에 따르면, 본 발명의 서열목록 제 1 서열 내지 제 4 서열로 구성된 군으로부터 선택되는 뉴클레오타이드 서열은 유방암의 전이 위험성이 높은 환자로부터 고발현되고, 본 발명의 서열목록 제 5 서열 내지 제 9 서열로 구성된 군으로부터 선택되는 뉴클레오타이드 서열은 상기 서열목록 제 1 서열 내지 제 5 서열로 구성된 군으로부터 선택되는 뉴클레오타이드 서열의 발현량에 유의적인 차이가 없는 유방암의 전이 위험성이 높은 환자로부터 저발현된다.According to a preferred embodiment of the present invention, the nucleotide sequence selected from the group consisting of the first to fourth sequences of the sequence list of the present invention is highly expressed in patients with a high risk of metastasis of breast cancer, and the fifth sequence of the sequence list of the present invention The nucleotide sequence selected from the group consisting of the sequence listing from the first to the ninth sequence is low-expression in patients with high risk of metastasis of breast cancer without a significant difference in the expression level of the nucleotide sequence selected from the group consisting of the sequence listing first to the fifth sequence. do.

본 발명에 따르면, 예후 집단별로 발현량의 차이를 보이는 각 유전자에 대해 기능 분석을 수행한 결과 서열목록 제 1 서열 내지 제 4 서열의 뉴클레오타이드는 암세포의 증식에 관여하는 유전자이고, 서열목록 제 5 서열 내지 제 9 서열의 뉴클레오타이드는 면역반응에 관여하는 유전자이다.
According to the present invention, as a result of performing a functional analysis on each gene showing a difference in expression level for each prognostic group, the nucleotides of the first to fourth sequences of the sequence list are genes involved in the proliferation of cancer cells, and the fifth sequence of the sequence list The nucleotides of the ninth sequence are genes involved in the immune response.

본 발명의 특징 및 이점을 요약하면 다음과 같다:The features and advantages of the present invention are summarized as follows:

(a) 본 발명은 암의 예후(prognosis) 예측을 위한 유전자의 선정방법, 선정된 암의 예후예측용 유전자 및 이를 이용한 유방암 환자의 전이 예측용 키트를 제공한다.(a) The present invention provides a method of selecting a gene for predicting cancer prognosis, a gene for predicting the prognosis of a selected cancer, and a kit for predicting metastasis of a breast cancer patient using the same.

(b) 본 발명은 초기 유방암의 유전적 특성을 분석함으로써 환자의 예후를 예측하여 불필요한 항암치료를 줄일 수 있는 예후 진단에 유용하게 이용될 수 있다.
(b) The present invention can be usefully used for prognostic diagnosis that can reduce unnecessary chemotherapy by predicting the prognosis of a patient by analyzing the genetic characteristics of early breast cancer.

도 1는 예후예측유전자 발굴, 모델 개발 및 검증을 위해 수집된 마이크로어레이 데이터세트에 대한 설명을 나타낸 그림이다.
도 2a는 유방암조직의 마이크로어레이데이터의 큐레이션(curation) 및 전처리(pre-processing)에 의한 표준화 과정을 나타낸 모식도이다. 도 2b는 디스커버리 데이터 세트로부터 예후 예측유전자를 발굴하는 과정을 나타낸 그림이다.
도 3은 디스커버리 데이터세트에서 전이가 일어난 환자들의 타장기 전이 시간의 분포를 나타낸 그림이다.
도 4a는 예후집단간 발현량의 차이가 유의한 302개의 유전자에 대한 주성분분석 결과이다. 도 4b는 주성분1과 주성분2와 상관관계가 높은 상위 유전자 70개씩에 대한 발현량패턴이다.
도 5a는 주성분1과 상관관계가 높은 상위 유전자 70개에 대한 GO기능분석결과이다. 도 5b는 주성분2와 상관관계가 높은 상위 유전자 70개에 대한 GO기능분석결과이다.
도 6a는 선택된 예후 예측유전자를 이용하여 유방암을 ER+ 와 ER- 로 분류하여 증식과 면역반응의 정도를 비교한 모식도이다. 도 6b는 증식과 면역반응의 정도를 3구간으로 각각 나누었을 때, 증식이 증가할수록, 면역반응이 증가하는 것을 보여주고 있다.
도 7a는 디스커버리 데이터 세트의 생명표를 이용하여 계산한 해저드함수의 모양을 대략적으로 알아본 그림이다. 도 7b는 대수정규분포를 가정하였을 때, 생존확률의 직선성과 평행성을 본 그림이다.
도 8은 예후예측 모델을 3가지 분포에 대해 적합시킨 결과를 나타낸 그림이다.
도 9는 개발된 예후예측모델을 디스커버리데이터 세트에서 검증한 결과이다. 9a는 예후예측모델을 이용한 전체환자의 예후예측지수를 4등분하여 4개의 예후집단으로 분류한 뒤, 각 예후집단의 관찰된 생존확률이 잘 분리되었는지 보는 것이다. 관찰된 생존확률과 예측된 생존확률도 비교하였다. 9b는 전체환자의 관찰된 생존확률과 예후예측모델을 이용하여 예측된 생존확률을 비교한 것이다. 9c는 가장 영향력이 높은 p.mean에 대해 전체 환자를 4개의 집단으로 나눈 뒤 각 집단의 관찰된 생존확률이 예후예측모델을 이용하여 예측된 생존확률과 잘 일치하는 지 알아본 그림이다. 9d는 5년생존률에 대해 관찰된 생존확률과 예측된 생존확률이 얼마나 잘 일치하는지 알아본 그림이다.
도 10은 개발된 예후예측모델을 검증세트 1에서 검증한 결과이다. 디스커버리데이터 세트에서 검증한 방법과 동일하다. 10a는 판별에 대한 검증결과이고, 10b는 전체 관찰된 시간에 대한 교정에 대한 검증결과이다. 10c는 5년 생존률에 대한 교정에 대한 검증결과이다.
도 11은 개발된 예후예측모델을 검증세트 2에서 검증한 결과이다. 디스커버리데이터 세트에서 검증한 방법과 동일하다. 11a는 판별에 대한 검증결과이고, 11b는 전체 관찰된 시간에 대한 교정에 대한 검증결과이다. 11c는 5년 생존률에 대한 교정에 대한 검증결과이다.
도 12는 개발된 예후예측모델을 검증세트 3에서 검증한 결과이다. 디스커버리데이터 세트에서 검증한 방법과 동일하다.
1 is a diagram showing a description of a microarray data set collected for prognostic gene discovery, model development, and verification.
2A is a schematic diagram showing a standardization process by curation and pre-processing of microarray data of breast cancer tissue. 2B is a diagram showing a process of discovering a prognostic gene from a discovery data set.
FIG. 3 is a diagram showing the distribution of metastasis time of patients with metastasis in the discovery dataset.
4A is a result of principal component analysis of 302 genes with significant differences in expression levels between prognostic groups. 4B is an expression level pattern for each of 70 high-level genes having a high correlation with principal component 1 and principal component 2. FIG.
5A is a result of GO function analysis of 70 high-level genes having a high correlation with principal component 1. FIG. 5B is a result of GO function analysis of 70 high-level genes having a high correlation with principal component 2.
Figure 6a is a schematic diagram comparing the degree of proliferation and immune response by classifying breast cancer into ER+ and ER- using selected prognostic genes. Figure 6b shows that when the degree of proliferation and immune response is divided into three sections, as the proliferation increases, the immune response increases.
7A is a schematic diagram of a shape of a hazard function calculated using a life table of a discovery data set. 7B is a diagram showing the linearity and parallelism of the survival probability assuming a lognormal distribution.
8 is a diagram showing the results of fitting a prognosis prediction model for three distributions.
9 is a result of verifying the developed prognostic prediction model in a discovery data set. 9a is to divide the prognostic index of all patients into four prognostic groups using the prognostic model, and then to see if the observed survival probability of each prognostic group is well separated. The observed survival probability and the predicted survival probability were also compared. 9b compares the observed survival probability of all patients with the predicted survival probability using the prognostic model. 9c is a diagram to determine whether the observed survival probability of each group agrees well with the predicted survival probability using the prognostic model after dividing the total patients into 4 groups for the most influential p.mean. 9d shows how well the observed survival probability and the predicted survival probability match for the 5-year survival rate.
10 is a result of verifying the developed prognosis prediction model in verification set 1. It is the same as the method verified in the discovery data set. 10a is the verification result for the discrimination, and 10b is the verification result for the calibration for the entire observed time. 10c is the verification result of the correction for the 5-year survival rate.
11 is a result of verifying the developed prognosis prediction model in verification set 2. It is the same as the method verified in the discovery data set. 11a is the verification result for the discrimination, and 11b is the verification result for the calibration for the entire observed time. 11c is the verification result of the correction for the 5-year survival rate.
12 is a result of verifying the developed prognosis prediction model in verification set 3. It is the same as the method verified in the discovery data set.

이하, 실시예를 통하여 본 발명을 더욱 상세히 설명하고자 한다. 이들 실시예는 오로지 본 발명을 보다 구체적으로 설명하기 위한 것으로, 본 발명의 요지에 따라 본 발명의 범위가 이들 실시예에 의해 제한되지 않는다는 것은 당업계에서 통상의 지식을 가진 자에 있어서 자명할 것이다.
Hereinafter, the present invention will be described in more detail through examples. These examples are only for describing the present invention in more detail, and it will be apparent to those of ordinary skill in the art that the scope of the present invention is not limited by these examples according to the gist of the present invention. .

실시예Example

실험방법Experiment method

초기유방암조직의 발현 프로파일의 수집Collection of early breast cancer tissue expression profiles

초기 유방암 환자의 냉동 암 조직을 이용하여 얻은 발현 프로파일과 임상정보를 공개 데이터베이스인 GEO(http://www.ncbi.nlm.nih.gov/geo)에서 수집하였다. 총 9개의 독립된 발현 프로파일 세트들은 각각 100개 이상의 샘플로 구성된 비교적 큰 데이터 세트이며 모두 초기유방암환자의 예후와 관련된 연구를 수행하기 위해서 만들어졌다(2, 4, 9, 10, 13, 25, 32, 33). 이중 8개의 데이터 세트는 Affymetrix U133A라는 마이크로어레이 플랫폼으로 만들었고, 나머지 하나만 Agilent Hu25K로 제작하였다. 대부분의 경우 환자의 중요 임상정보(나이, 성별, 암의 크기, 전이상태 및 암의 분화정도)와 생존정보가 함께 수집되어있다. 8개의 Affymetrix U133A로 제작된 데이터 세트들 중에서 6개의 데이터 세트는 생존정보가 외부조직으로의 전이(distant-metastasis free survival)에 대한 것이며, 나머지 2개는 생존기간(overall survival)이였다. Agilent 데이터는 외부조직 전이에 대한 생존정보를 가지고 있었다. 외부조직의 전이가 예후 결정에 있어 가장 결정적인 사건인 점, 외부조직 전이는 암의 고유의 특성에 의해 결정된다는 점, 수집된 데이터에 가장 많은 환자가 외부조직전이에 대한 정보를 가지고 있다는 점을 기반으로 외부조직 전이여부를 기초로 하여 생존분석을 수행하기로 하였다. 수집된 모든 환자의 정보를 비교하여 중복된 186명의 환자의 발현 프로파일을 제거하였고, 총 1,861명의 유일한(unique) 환자들에 대해서 연구를 수행하였다. 동일한 플랫폼(Affymetrix U133A)으로 제작된 7개의 데이터 세트에 대해, 해당하는 모든 환자의 발현 프로파일의 원본파일(.CEL)을 모아서 한꺼번에 표준화를 시켰다. 표준화 방법은 rma(background correction : rma, normalization : quantile, summarization : medianpolish) 방법으로 수행하였다. 표준화 수행시 Manhong Dai 등이 개발한 custom CDF(http://brainarray.mbni.med.umich.edu/Brainarray/) ENTREZG version 13를 이용하였다(34). 표준화를 수행한 후, 각 프로브의 발현량은 디스커버리 데이터 세트 내의 프로브 별 평균값을 뺌으로써 1-색(color) 발현량을 2-색 발현량과 같은 형태로 변환시켰다. 총 8개의 표준화된 데이터세트에서 5개의 데이터세트는 하나로 묶어서 디스커버리 데이터 세트로 사용하였고, 2개는 따로 묶어서 검증(validation) 데이터 세트 1로, 나머지 1개는 검증 데이터 세트 2로 이용하였다. Agilent 데이터 세트도 검증 데이터 세트 3으로 사용하였다.
Expression profiles and clinical information obtained using frozen cancer tissues of early breast cancer patients were collected from GEO (http://www.ncbi.nlm.nih.gov/geo), an open database. A total of 9 independent expression profile sets are relatively large data sets consisting of more than 100 samples each, and all were created to conduct studies related to the prognosis of early breast cancer patients (2, 4, 9, 10, 13, 25, 32, 33). Of these, eight data sets were created on a microarray platform called Affymetrix U133A, and only the other was built on an Agilent Hu25K. In most cases, important clinical information (age, sex, size of cancer, metastasis and degree of differentiation of cancer) and survival information of patients are collected together. Of the eight Affymetrix U133A data sets, six data sets were for distant-metastasis free survival, and the other two were overall survival. Agilent data contained survival information for external tissue metastasis. Based on the fact that metastasis of external tissues is the most decisive event in determining the prognosis, that the metastasis of external tissues is determined by the inherent characteristics of the cancer, and the fact that the most patients in the collected data have information on external tissue metastasis As a result, it was decided to perform a survival analysis based on the presence of external tissue metastasis. By comparing the information of all patients collected, the expression profiles of 186 patients were duplicated, and a total of 1,861 unique patients were studied. For 7 data sets made with the same platform (Affymetrix U133A), original files (.CEL) of the expression profiles of all corresponding patients were collected and standardized at once. The standardization method was performed by the rma (background correction: rma, normalization: quantile, summarization: medianpolish) method. For standardization, a custom CDF (http://brainarray.mbni.med.umich.edu/Brainarray/) ENTREZG version 13 developed by Manhong Dai et al. was used (34). After normalization was performed, the expression level of each probe was converted into the same form as the expression level of 1-color by subtracting the average value of each probe in the discovery data set. In a total of eight standardized data sets, five data sets were grouped together and used as a discovery data set, two were separately grouped together as a validation data set 1, and the remaining one was used as a validation data set 2. The Agilent data set was also used as the validation data set 3.

환자의 예후 및 ER 상태에 대한 정의 설정Establish definition of patient prognosis and ER status

환자의 예후와 관련된 유전자를 발굴하기 위해서 수집된 환자를 예후가 좋은 집단과 예후가 나쁜 집단으로 분류하였다. 일반적으로 임상에서는 5년 생존 혹은 전이 정보를 이용하여 분류한다. 즉, 5년 내에 전이가 발생하거나 사망을 할 경우 예후가 나쁘다고 말하고, 5년 이상 전이가 없거나 생존하였을 경우 예후가 좋다고 말한다. 디스커버리 데이터세트의 환자정보를 이용하여 전이가 일어난 환자들의 생존시간의 분포를 알아보았다. 전이가 발생한 환자의 73% 이상이 5년 이내에 전이가 발생하였으며, 10년 이후에 전이가 관찰된 경우는 7% 미만이었다. 이를 바탕으로 디스커버리 데이터세트의 환자 중에서 5년 이내에 전이가 발생한 217명의 환자를‘예후가 나쁜 집단’으로 10년 이상 전이가 발생하지 않은 281명의 환자를‘예후가 좋은 집단’으로 분류하였다. 분류 결과, 예후가 좋은 집단의 생존 시간 중앙값은 2.4년이었고, 예후가 나쁜 집단의 생존 시간 중앙값은 12.9년 이었다. 예후가 나쁜 집단과 좋은 집단을 명확하게 구분함으로써 불확실한 생존정보에 의한 오류를 최소한으로 줄일 수 있었다. 에스트로겐 수용체(Estrogen receptor, ER)의 발현 여부는 유방암 환자를 서브타입으로 분류할 때 가장 보편적으로 사용하는 기준이다. 보통 임상에서는 병리학자에 의한 ER IHC(immuno-histochemistry)의 판독결과에 의해 ER+ 혹은 ER-로 나눈다. 수집된 디스커버리 데이터 세트에서 200여명의 환자가 ER IHC 정보가 없었고, 디스커버리 데이터 세트를 구성하는 5개의 데이터 세트마다 독립적으로 ER IHC의 결정이 이루어진 점을 고려하여, 환자별 발현 프로파일 내의 ESR1 유전자의 mRNA 발현량을 이용하여 ER 상태를 결정하였다. ER IHC 정보가 있는 환자에 대해, ER IHC 정보와 ESR1 mRNA 발현량을 이용하여 ROC(region of convergence) 분석을 수행하였다. ER IHC 결과와 ESR1 mRNA 발현량을 비교하여 가장 정확도(0.88)가 높은 발현량 지점을 컷 오프로 잡았고, 컷 오프 이상의 발현량을 보이는 경우는 ER+로, 컷 오프 이하의 발현량을 보이는 경우는 ER-로 분류하였다. 디스커버리 데이터 세트에서 864명을 ER+로, 240명을 ER-로 배치하였다.
In order to discover the genes related to the patient's prognosis, the collected patients were classified into a group with a good prognosis and a group with a poor prognosis. In general, clinical classification is based on 5-year survival or metastasis information. In other words, if metastasis occurs or dies within 5 years, the prognosis is said to be poor, and if there is no metastasis or survives for more than 5 years, the prognosis is said to be good. The distribution of survival time of patients with metastasis was investigated using patient information from the discovery dataset. More than 73% of patients with metastasis developed metastasis within 5 years, and less than 7% metastasis was observed after 10 years. Based on this, among the patients in the discovery dataset, 217 patients with metastasis within 5 years were classified as'bad prognosis group' and 281 patients who did not develop metastasis for more than 10 years were classified as'good prognosis group'. As a result of classification, the median survival time of the group with good prognosis was 2.4 years, and the median survival time of the group with poor prognosis was 12.9 years. By clearly distinguishing the group with poor prognosis and the group with good prognosis, errors due to uncertain survival information could be reduced to a minimum. Estrogen receptor (ER) expression is the most commonly used criterion when subtypes of breast cancer patients are classified. Usually in clinical practice, it is divided into ER+ or ER- according to the reading of ER IHC (immuno-histochemistry) by a pathologist. In the collected discovery data set, 200 patients did not have ER IHC information, and taking into account the fact that ER IHC was independently determined for each of the 5 data sets constituting the discovery data set, the mRNA of the ESR1 gene in the patient-specific expression profile ER status was determined using the expression level. For patients with ER IHC information, using ER IHC information and ESR1 mRNA expression level A region of convergence (ROC) analysis was performed. By comparing the ER IHC result and ESR1 mRNA expression level, the expression level point with the highest accuracy (0.88) was set as a cut-off. If the expression level exceeds the cut-off, it is ER+, and if the expression level is less than the cut-off, ER Classified as -. In the discovery data set, 864 people were placed as ER+ and 240 were placed as ER-.

예후 예측 유전자의 선택Selection of genes for predicting prognosis

디스커버리 데이터 세트에서 예후가 좋은 집단과 예후가 나쁜 집단을 ER+, ER-의 경우로 나누었다. 예후가 좋은 환자는 총 275명 이였으며, 예후가 나쁜 환자는 218명 이였다. SAM(Significant Analysis of Microarray) 분석을 통해 예후집단 간 발현량이 차이가 나는 유전자를 알아보았다. SAM 분석결과의 q-값을 이용하여 예후가 좋은 집단에서 과발현된 유전자 182개, 예후가 나쁜 집단에서 과발현된 유전자 120개를 선택하였다. 선택된 유전자를 하나로 합친 결과 총 302개의 중복되지 않는 유전자세트가 만들어졌고, 이 유전자들의 발현패턴을 알아보기 위한 군집분석을 주성분 분석(Principal Component Analysis, PCA) 방법을 이용하여 수행하였다. 2개의 주성분을 선택하여 각 주성분에 대해, 관련된 생물학적 기능을 알아보기위하여, 군집별로 GO 기능분석을 수행하였다(표 1 내지 3).
In the discovery data set, a group with a good prognosis and a group with a poor prognosis were divided into cases of ER+ and ER-. A total of 275 patients had a good prognosis, and 218 patients had a poor prognosis. Genes with differences in expression levels between prognostic groups were identified through SAM (Significant Analysis of Microarray) analysis. Using the q-value of the SAM analysis result, 182 genes overexpressed in the group with good prognosis and 120 genes overexpressed in the group with poor prognosis were selected. As a result of combining the selected genes into one, a total of 302 non-overlapping gene sets were created, and cluster analysis to determine the expression patterns of these genes was performed using the Principal Component Analysis (PCA) method. Two principal components were selected and, for each principal component, GO function analysis was performed for each group in order to find out related biological functions (Tables 1 to 3).

예후가 나쁜집단에서 과발현한 유전자Genes overexpressed in groups with poor prognosis 유전자심벌Genetic symbol 유전자 명칭Gene name PRC1PRC1 protein regulator of cytokinesis 1protein regulator of cytokinesis 1 CCNB2CCNB2 cyclin B2cyclin B2 UBE2CUBE2C ubiquitin-conjugating enzyme E2Cubiquitin-conjugating enzyme E2C CDC20CDC20 cell division cycle 20 homolog (S. cerevisiae)cell division cycle 20 homolog (S. cerevisiae) KIF4AKIF4A kinesin family member 4Akinesin family member 4A TOP2ATOP2A topoisomerase (DNA) II alpha 170kDatopoisomerase (DNA) II alpha 170kDa RACGAP1RACGAP1 Rac GTPase activating protein 1Rac GTPase activating protein 1 ASPMASPM asp (abnormal spindle) homolog, microcephaly associated (Drosophila)asp (abnormal spindle) homolog, microcephaly associated (Drosophila) BUB1BBUB1B budding uninhibited by benzimidazoles 1 homolog beta (yeast)budding uninhibited by benzimidazoles 1 homolog beta (yeast) CDC45CDC45 cell division cycle 45 homolog (S. cerevisiae)cell division cycle 45 homolog (S. cerevisiae) PTTG1PTTG1 pituitary tumor-transforming 1pituitary tumor-transforming 1 CENPFCENPF centromere protein F, 350/400kDa (mitosin)centromere protein F, 350/400kDa (mitosin) FOXM1FOXM1 forkhead box M1forkhead box M1 KIF11KIF11 kinesin family member 11kinesin family member 11 BLMBLM Bloom syndrome, RecQ helicase-likeBloom syndrome, RecQ helicase-like ZWINTZWINT ZW10 interactorZW10 interactor CDC7CDC7 cell division cycle 7 homolog (S. cerevisiae)cell division cycle 7 homolog (S. cerevisiae) KIF20AKIF20A kinesin family member 20Akinesin family member 20A TRIP13TRIP13 thyroid hormone receptor interactor 13thyroid hormone receptor interactor 13 FANCIFANCI Fanconi anemia, complementation group IFanconi anemia, complementation group I MAD2L1MAD2L1 MAD2 mitotic arrest deficient-like 1 (yeast)MAD2 mitotic arrest deficient-like 1 (yeast) MCM2MCM2 minichromosome maintenance complex component 2minichromosome maintenance complex component 2 RRM2RRM2 ribonucleotide reductase M2ribonucleotide reductase M2 NCAPGNCAPG non-SMC condensin I complex, subunit Gnon-SMC condensin I complex, subunit G KIF15KIF15 kinesin family member 15kinesin family member 15 MLF1IPMLF1IP MLF1 interacting proteinMLF1 interacting protein GINS1GINS1 GINS complex subunit 1 (Psf1 homolog)GINS complex subunit 1 (Psf1 homolog) OIP5OIP5 Opa interacting protein 5Opa interacting protein 5 NUSAP1NUSAP1 nucleolar and spindle associated protein 1nucleolar and spindle associated protein 1 ADMADM adrenomedullinadrenomedullin HMMRHMMR hyaluronan-mediated motility receptor (RHAMM)hyaluronan-mediated motility receptor (RHAMM) AURKAAURKA aurora kinase Aaurora kinase A CCNA2CCNA2 cyclin A2cyclin A2 NME1NME1 non-metastatic cells 1, protein (NM23A) expressed innon-metastatic cells 1, protein (NM23A) expressed in DLGAP5DLGAP5 discs, large (Drosophila) homolog-associated protein 5discs, large (Drosophila) homolog-associated protein 5 ZDHHC13ZDHHC13 zinc finger, DHHC-type containing 13zinc finger, DHHC-type containing 13 HMGB3HMGB3 high-mobility group box 3high-mobility group box 3 TMED9TMED9 transmembrane emp24 protein transport domain containing 9transmembrane emp24 protein transport domain containing 9 MT1HMT1H metallothionein 1Hmetallothionein 1H MMP11MMP11 matrix metallopeptidase 11 (stromelysin 3)matrix metallopeptidase 11 (stromelysin 3) TTKTTK TTK protein kinaseTTK protein kinase ENO2ENO2 enolase 2 (gamma, neuronal)enolase 2 (gamma, neuronal) GPR56GPR56 G protein-coupled receptor 56G protein-coupled receptor 56 SPAG5SPAG5 sperm associated antigen 5sperm associated antigen 5 PBKPBK PDZ binding kinasePDZ binding kinase MMP1MMP1 matrix metallopeptidase 1 (interstitial collagenase)matrix metallopeptidase 1 (interstitial collagenase) MST4MST4 serine/threonine protein kinase MST4serine/threonine protein kinase MST4 EZH2EZH2 enhancer of zeste homolog 2 (Drosophila)enhancer of zeste homolog 2 (Drosophila) CDC25BCDC25B cell division cycle 25 homolog B (S. pombe)cell division cycle 25 homolog B (S. pombe) DSCC1DSCC1 defective in sister chromatid cohesion 1 homolog (S. cerevisiae)defective in sister chromatid cohesion 1 homolog (S. cerevisiae) CDCA8CDCA8 cell division cycle associated 8cell division cycle associated 8 CEP55CEP55 centrosomal protein 55kDacentrosomal protein 55kDa HPSEHPSE heparanaseheparanase CENPMCENPM centromere protein Mcentromere protein M CDK1CDK1 cyclin-dependent kinase 1cyclin-dependent kinase 1 EYA2EYA2 eyes absent homolog 2 (Drosophila)eyes absent homolog 2 (Drosophila) TMSB15BTMSB15B thymosin beta 15Bthymosin beta 15B GGHGGH gamma-glutamyl hydrolase (conjugase, folylpolygammaglutamyl hydrolase)gamma-glutamyl hydrolase (conjugase, folylpolygammaglutamyl hydrolase) PSMD3PSMD3 proteasome (prosome, macropain) 26S subunit, non-ATPase, 3proteasome (prosome, macropain) 26S subunit, non-ATPase, 3 FGD1FGD1 FYVE, RhoGEF and PH domain containing 1FYVE, RhoGEF and PH domain containing 1 ASF1BASF1B ASF1 anti-silencing function 1 homolog B (S. cerevisiae)ASF1 anti-silencing function 1 homolog B (S. cerevisiae) SPAG16SPAG16 sperm associated antigen 16sperm associated antigen 16 SMC4SMC4 structural maintenance of chromosomes 4structural maintenance of chromosomes 4 C11orf80C11orf80 chromosome 11 open reading frame 80chromosome 11 open reading frame 80 LSM1LSM1 LSM1 homolog, U6 small nuclear RNA associated (S. cerevisiae)LSM1 homolog, U6 small nuclear RNA associated (S. cerevisiae) PMEPA1PMEPA1 prostate transmembrane protein, androgen induced 1prostate transmembrane protein, androgen induced 1 CDKN3CDKN3 cyclin-dependent kinase inhibitor 3cyclin-dependent kinase inhibitor 3 TOPBP1TOPBP1 topoisomerase (DNA) II binding protein 1topoisomerase (DNA) II binding protein 1 CCT5CCT5 chaperonin containing TCP1, subunit 5 (epsilon)chaperonin containing TCP1, subunit 5 (epsilon) RAD51AP1RAD51AP1 RAD51 associated protein 1RAD51 associated protein 1 GPSM2GPSM2 G-protein signaling modulator 2G-protein signaling modulator 2 LIG1LIG1 ligase I, DNA, ATP-dependentligase I, DNA, ATP-dependent NMUNMU neuromedin Uneuromedin U KIAA1199KIAA1199 KIAA1199KIAA1199 DTLDTL denticleless homolog (Drosophila)denticleless homolog (Drosophila) KIF2CKIF2C kinesin family member 2Ckinesin family member 2C WDR45LWDR45L WDR45-likeWDR45-like SLC16A3SLC16A3 solute carrier family 16, member 3 (monocarboxylic acid transporter 4)solute carrier family 16, member 3 (monocarboxylic acid transporter 4) MT1FMT1F metallothionein 1Fmetallothionein 1F C18orf8C18orf8 chromosome 18 open reading frame 8chromosome 18 open reading frame 8 STMN1STMN1 stathmin 1stathmin 1 HSPA1AHSPA1A heat shock 70kDa protein 1Aheat shock 70kDa protein 1A PUS7PUS7 pseudouridylate synthase 7 homolog (S. cerevisiae)pseudouridylate synthase 7 homolog (S. cerevisiae) GPR172AGPR172A G protein-coupled receptor 172AG protein-coupled receptor 172A SCRN1SCRN1 secernin 1secernin 1 AURKBAURKB aurora kinase Baurora kinase B GALNT14GALNT14 UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase 14 (GalNAc-T14)UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase 14 (GalNAc-T14) SPP1SPP1 secreted phosphoprotein 1secreted phosphoprotein 1 NUP107NUP107 nucleoporin 107kDanucleoporin 107kDa C21orf45C21orf45 chromosome 21 open reading frame 45chromosome 21 open reading frame 45 CTPSCTPS CTP synthaseCTP synthase GINS2GINS2 GINS complex subunit 2 (Psf2 homolog)GINS complex subunit 2 (Psf2 homolog) CCNE2CCNE2 cyclin E2cyclin E2 GSDMBGSDMB gasdermin Bgasdermin B RIPK4RIPK4 receptor-interacting serine-threonine kinase 4receptor-interacting serine-threonine kinase 4 TMSB15ATMSB15A thymosin beta 15athymosin beta 15a MYBL1MYBL1 v-myb myeloblastosis viral oncogene homolog (avian)-like 1v-myb myeloblastosis viral oncogene homolog (avian)-like 1 KIF14KIF14 kinesin family member 14kinesin family member 14 TK1TK1 thymidine kinase 1, solublethymidine kinase 1, soluble ABCC10ABCC10 ATP-binding cassette, sub-family C (CFTR/MRP), member 10ATP-binding cassette, sub-family C (CFTR/MRP), member 10 CIAPIN1CIAPIN1 cytokine induced apoptosis inhibitor 1cytokine induced apoptosis inhibitor 1 TXNRD1TXNRD1 thioredoxin reductase 1thioredoxin reductase 1 GLDCGLDC glycine dehydrogenase (decarboxylating)glycine dehydrogenase (decarboxylating) SAP30SAP30 Sin3A-associated protein, 30kDaSin3A-associated protein, 30kDa TYMSTYMS thymidylate synthetasethymidylate synthetase LLGL2LLGL2 lethal giant larvae homolog 2 (Drosophila)lethal giant larvae homolog 2 (Drosophila) EPN3EPN3 epsin 3epsin 3 DONSONDONSON downstream neighbor of SONdownstream neighbor of SON NCAPG2NCAPG2 non-SMC condensin II complex, subunit G2non-SMC condensin II complex, subunit G2 C1orf135C1orf135 chromosome 1 open reading frame 135chromosome 1 open reading frame 135 CDCA3CDCA3 cell division cycle associated 3cell division cycle associated 3 MKI67MKI67 antigen identified by monoclonal antibody Ki-67antigen identified by monoclonal antibody Ki-67 F12F12 coagulation factor XII (Hageman factor)coagulation factor XII (Hageman factor) ELMO3ELMO3 engulfment and cell motility 3engulfment and cell motility 3 TMEM132ATMEM132A transmembrane protein 132Atransmembrane protein 132A SCRIBSCRIB scribbled homolog (Drosophila)scribbled homolog (Drosophila) EXO1EXO1 exonuclease 1exonuclease 1 AP3M2AP3M2 adaptor-related protein complex 3, mu 2 subunitadaptor-related protein complex 3, mu 2 subunit CYCSCYCS cytochrome c, somaticcytochrome c, somatic NPM3NPM3 nucleophosmin/nucleoplasmin 3nucleophosmin/nucleoplasmin 3

예후가 좋은 집단에서 과발현한 유전자Gene overexpressed in a population with a good prognosis 유전자심벌Genetic symbol 유전자 명칭Gene name TRBV20-1TRBV20-1 T cell receptor beta variable 20-1T cell receptor beta variable 20-1 CCL19CCL19 chemokine (C-C motif) ligand 19chemokine (C-C motif) ligand 19 CD52CD52 CD52 moleculeCD52 molecule SRGNSRGN serglycinserglycin CD3DCD3D CD3d molecule, delta (CD3-TCR complex)CD3d molecule, delta (CD3-TCR complex) IGJIGJ immunoglobulin J polypeptide, linker protein for immunoglobulin alpha and mu polypeptidesimmunoglobulin J polypeptide, linker protein for immunoglobulin alpha and mu polypeptides HLA-DRAHLA-DRA major histocompatibility complex, class II, DR alphamajor histocompatibility complex, class II, DR alpha LOC91316LOC91316 glucuronidase, beta/immunoglobulin lambda-like polypeptide 1 pseudogeneglucuronidase, beta/immunoglobulin lambda-like polypeptide 1 pseudogene IGF1IGF1 insulin-like growth factor 1 (somatomedin C)insulin-like growth factor 1 (somatomedin C) CYBRD1CYBRD1 cytochrome b reductase 1cytochrome b reductase 1 TMC5TMC5 transmembrane channel-like 5transmembrane channel-like 5 ALDH1A1ALDH1A1 aldehyde dehydrogenase 1 family, member A1aldehyde dehydrogenase 1 family, member A1 OGNOGN osteoglycinosteoglycin PDCD4PDCD4 programmed cell death 4 (neoplastic transformation inhibitor)programmed cell death 4 (neoplastic transformation inhibitor) FRZBFRZB frizzled-related proteinfrizzled-related protein CX3CR1CX3CR1 chemokine (C-X3-C motif) receptor 1chemokine (C-X3-C motif) receptor 1 IGFBP6IGFBP6 insulin-like growth factor binding protein 6insulin-like growth factor binding protein 6 GLAGLA galactosidase, alphagalactosidase, alpha LOC96610LOC96610 BMS1 homolog, ribosome assembly protein (yeast) pseudogeneBMS1 homolog, ribosome assembly protein (yeast) pseudogene IGLL3IGLL3 immunoglobulin lambda-like polypeptide 3immunoglobulin lambda-like polypeptide 3 ITPR1ITPR1 inositol 1,4,5-triphosphate receptor, type 1inositol 1,4,5-triphosphate receptor, type 1 SERPINA1SERPINA1 serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 1serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 1 EPHX2EPHX2 epoxide hydrolase 2, cytoplasmicepoxide hydrolase 2, cytoplasmic MFAP4MFAP4 microfibrillar-associated protein 4microfibrillar-associated protein 4 RNASET2RNASET2 ribonuclease T2ribonuclease T2 CCNG1CCNG1 cyclin G1cyclin G1 FBLN5FBLN5 fibulin 5fibulin 5 SORBS2SORBS2 sorbin and SH3 domain containing 2sorbin and SH3 domain containing 2 CCBL2CCBL2 cysteine conjugate-beta lyase 2cysteine conjugate-beta lyase 2 BTN3A2BTN3A2 butyrophilin, subfamily 3, member A2butyrophilin, subfamily 3, member A2 TFAP2BTFAP2B transcription factor AP-2 beta (activating enhancer binding protein 2 beta)transcription factor AP-2 beta (activating enhancer binding protein 2 beta) LTFLTF lactotransferrinlactotransferrin ITM2AITM2A integral membrane protein 2Aintegral membrane protein 2A HLA-DPB1HLA-DPB1 major histocompatibility complex, class II, DP beta 1major histocompatibility complex, class II, DP beta 1 HLA-DMAHLA-DMA major histocompatibility complex, class II, DM alphamajor histocompatibility complex, class II, DM alpha RPL3RPL3 ribosomal protein L3ribosomal protein L3 LOC100130100LOC100130100 similar to hCG26659similar to hCG26659 FAM129AFAM129A family with sequence similarity 129, member Afamily with sequence similarity 129, member A ELOVL5ELOVL5 ELOVL family member 5, elongation of long chain fatty acids (FEN1/Elo2, SUR4/Elo3-like, yeast)ELOVL family member 5, elongation of long chain fatty acids (FEN1/Elo2, SUR4/Elo3-like, yeast) GBP2GBP2 guanylate binding protein 2, interferon-inducibleguanylate binding protein 2, interferon-inducible RARRES3RARRES3 retinoic acid receptor responder (tazarotene induced) 3retinoic acid receptor responder (tazarotene induced) 3 GOLM1GOLM1 golgi membrane protein 1golgi membrane protein 1 RTN1RTN1 reticulon 1reticulon 1 ICAM3ICAM3 intercellular adhesion molecule 3intercellular adhesion molecule 3 LAMA2LAMA2 laminin, alpha 2laminin, alpha 2 CXCL13CXCL13 chemokine (C-X-C motif) ligand 13chemokine (C-X-C motif) ligand 13 ZCCHC24ZCCHC24 zinc finger, CCHC domain containing 24zinc finger, CCHC domain containing 24 CD37CD37 CD37 moleculeCD37 molecule VTCN1VTCN1 V-set domain containing T cell activation inhibitor 1V-set domain containing T cell activation inhibitor 1 PYCARDPYCARD PYD and CARD domain containingPYD and CARD domain containing CORO1ACORO1A coronin, actin binding protein, 1Acoronin, actin binding protein, 1A SH3BGRLSH3BGRL SH3 domain binding glutamic acid-rich protein likeSH3 domain binding glutamic acid-rich protein like TPSAB1TPSAB1 tryptase alpha/beta 1tryptase alpha/beta 1 TNFSF10TNFSF10 tumor necrosis factor (ligand) superfamily, member 10tumor necrosis factor (ligand) superfamily, member 10 ACSF2ACSF2 acyl-CoA synthetase family member 2acyl-CoA synthetase family member 2 TGFBR2TGFBR2 transforming growth factor, beta receptor II (70/80kDa)transforming growth factor, beta receptor II (70/80kDa) DUSP4DUSP4 dual specificity phosphatase 4dual specificity phosphatase 4 ARHGDIBARHGDIB Rho GDP dissociation inhibitor (GDI) betaRho GDP dissociation inhibitor (GDI) beta TMPRSS3TMPRSS3 transmembrane protease, serine 3transmembrane protease, serine 3 DCNDCN decorindecorin LRIG1LRIG1 leucine-rich repeats and immunoglobulin-like domains 1leucine-rich repeats and immunoglobulin-like domains 1 FMODFMOD fibromodulinfibromodulin ZNF423ZNF423 zinc finger protein 423zinc finger protein 423 SQRDLSQRDL sulfide quinone reductase-like (yeast)sulfide quinone reductase-like (yeast) TPST2TPST2 tyrosylprotein sulfotransferase 2tyrosylprotein sulfotransferase 2 CD44CD44 CD44 molecule (Indian blood group)CD44 molecule (Indian blood group) MREGMREG melanoregulinmelanoregulin GIMAP6GIMAP6 GTPase, IMAP family member 6GTPase, IMAP family member 6 GJA1GJA1 gap junction protein, alpha 1, 43kDagap junction protein, alpha 1, 43kDa IFITM3IFITM3 interferon induced transmembrane protein 3 (1-8U)interferon induced transmembrane protein 3 (1-8U) BTG2BTG2 BTG family, member 2BTG family, member 2 PIPPIP prolactin-induced proteinprolactin-induced protein RPS9RPS9 ribosomal protein S9ribosomal protein S9 HLA-DPA1HLA-DPA1 major histocompatibility complex, class II, DP alpha 1major histocompatibility complex, class II, DP alpha 1 IMPDH2IMPDH2 IMP (inosine 5'-monophosphate) dehydrogenase 2IMP (inosine 5'-monophosphate) dehydrogenase 2 TNFRSF17TNFRSF17 tumor necrosis factor receptor superfamily, member 17tumor necrosis factor receptor superfamily, member 17 C14orf139C14orf139 chromosome 14 open reading frame 139chromosome 14 open reading frame 139 SPRY2SPRY2 sprouty homolog 2 (Drosophila)sprouty homolog 2 (Drosophila) XBP1XBP1 X-box binding protein 1X-box binding protein 1 THYN1THYN1 thymocyte nuclear protein 1thymocyte nuclear protein 1 APODAPOD apolipoprotein Dapolipoprotein D C10orf116C10orf116 chromosome 10 open reading frame 116chromosome 10 open reading frame 116 VAV3VAV3 vav 3 guanine nucleotide exchange factorvav 3 guanine nucleotide exchange factor FASFAS Fas (TNF receptor superfamily, member 6)Fas (TNF receptor superfamily, member 6) MYBPC1MYBPC1 myosin binding protein C, slow typemyosin binding protein C, slow type CFBCFB complement factor Bcomplement factor B TRIM22TRIM22 tripartite motif-containing 22tripartite motif-containing 22 ARID5BARID5B AT rich interactive domain 5B (MRF1-like)AT rich interactive domain 5B (MRF1-like) PTGDSPTGDS prostaglandin D2 synthase 21kDa (brain)prostaglandin D2 synthase 21kDa (brain) TGFBR3TGFBR3 transforming growth factor, beta receptor IIItransforming growth factor, beta receptor III TNFAIP8TNFAIP8 tumor necrosis factor, alpha-induced protein 8tumor necrosis factor, alpha-induced protein 8 SEMA3CSEMA3C sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3Csema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C TMEM135TMEM135 transmembrane protein 135transmembrane protein 135 ARHGEF3ARHGEF3 Rho guanine nucleotide exchange factor (GEF) 3Rho guanine nucleotide exchange factor (GEF) 3 PTGER4PTGER4 prostaglandin E receptor 4 (subtype EP4)prostaglandin E receptor 4 (subtype EP4) ABCA8ABCA8 ATP-binding cassette, sub-family A (ABC1), member 8ATP-binding cassette, sub-family A (ABC1), member 8 ICAM2ICAM2 intercellular adhesion molecule 2intercellular adhesion molecule 2 HLA-DQB1HLA-DQB1 major histocompatibility complex, class II, DQ beta 1major histocompatibility complex, class II, DQ beta 1 HSPA2HSPA2 heat shock 70kDa protein 2heat shock 70kDa protein 2 CD27CD27 CD27 moleculeCD27 molecule ARMCX1ARMCX1 armadillo repeat containing, X-linked 1armadillo repeat containing, X-linked 1 POU2AF1POU2AF1 POU class 2 associating factor 1POU class 2 associating factor 1 IGBP1IGBP1 immunoglobulin (CD79A) binding protein 1immunoglobulin (CD79A) binding protein 1 PDE4BPDE4B phosphodiesterase 4B, cAMP-specificphosphodiesterase 4B, cAMP-specific ADH1BADH1B alcohol dehydrogenase 1B (class I), beta polypeptidealcohol dehydrogenase 1B (class I), beta polypeptide WLSWLS wntless homolog (Drosophila)wntless homolog (Drosophila) SUCLG2SUCLG2 succinate-CoA ligase, GDP-forming, beta subunitsuccinate-CoA ligase, GDP-forming, beta subunit PGRPGR progesterone receptorprogesterone receptor STARD13STARD13 StAR-related lipid transfer (START) domain containing 13StAR-related lipid transfer (START) domain containing 13 SORL1SORL1 sortilin-related receptor, L(DLR class) A repeats-containingsortilin-related receptor, L (DLR class) A repeats-containing ATP1B1ATP1B1 ATPase, Na+/K+ transporting, beta 1 polypeptideATPase, Na+/K+ transporting, beta 1 polypeptide IFT46IFT46 intraflagellar transport 46 homolog (Chlamydomonas)intraflagellar transport 46 homolog (Chlamydomonas) SIK3SIK3 SIK family kinase 3SIK family kinase 3 LIPT1LIPT1 lipoyltransferase 1lipoyltransferase 1 OMDOMD osteomodulinosteomodulin HBBHBB hemoglobin, betahemoglobin, beta C3C3 complement component 3complement component 3 FGL2FGL2 fibrinogen-like 2fibrinogen-like 2 PECIPECI peroxisomal D3,D2-enoyl-CoA isomeraseperoxisomal D3,D2-enoyl-CoA isomerase RAC2RAC2 ras-related C3 botulinum toxin substrate 2 (rho family, small GTP binding protein Rac2)ras-related C3 botulinum toxin substrate 2 (rho family, small GTP binding protein Rac2) PDZRN3PDZRN3 PDZ domain containing ring finger 3PDZ domain containing ring finger 3 CXCL12CXCL12 chemokine (C-X-C motif) ligand 12chemokine (C-X-C motif) ligand 12 DPYDDPYD dihydropyrimidine dehydrogenasedihydropyrimidine dehydrogenase TXNDC15TXNDC15 thioredoxin domain containing 15thioredoxin domain containing 15 STOMSTOM stomatinstomatin EMCNEMCN endomucinendomucin SCGB2A2SCGB2A2 secretoglobin, family 2A, member 2secretoglobin, family 2A, member 2 FAM176BFAM176B family with sequence similarity 176, member Bfamily with sequence similarity 176, member B HIGD1AHIGD1A HIG1 hypoxia inducible domain family, member 1AHIG1 hypoxia inducible domain family, member 1A ACSL5ACSL5 acyl-CoA synthetase long-chain family member 5acyl-CoA synthetase long-chain family member 5 RPS24RPS24 ribosomal protein S24ribosomal protein S24 RGS10RGS10 regulator of G-protein signaling 10regulator of G-protein signaling 10 RAI2RAI2 retinoic acid induced 2retinoic acid induced 2 CNN3CNN3 calponin 3, acidiccalponin 3, acidic FBXW4FBXW4 F-box and WD repeat domain containing 4F-box and WD repeat domain containing 4 SEPP1SEPP1 selenoprotein P, plasma, 1selenoprotein P, plasma, 1 SLC44A4SLC44A4 solute carrier family 44, member 4solute carrier family 44, member 4 MGPMGP matrix Gla proteinmatrix Gla protein ABCD3ABCD3 ATP-binding cassette, sub-family D (ALD), member 3ATP-binding cassette, sub-family D (ALD), member 3 SETBP1SETBP1 SET binding protein 1SET binding protein 1 APOBEC3GAPOBEC3G apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3Gapolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3G LCP2LCP2 lymphocyte cytosolic protein 2 (SH2 domain containing leukocyte protein of 76kDa)lymphocyte cytosolic protein 2 (SH2 domain containing leukocyte protein of 76kDa) HLA-DRB1HLA-DRB1 major histocompatibility complex, class II, DR beta 1major histocompatibility complex, class II, DR beta 1 SCUBE2SCUBE2 signal peptide, CUB domain, EGF-like 2signal peptide, CUB domain, EGF-like 2 DEPDC6DEPDC6 DEP domain containing 6DEP domain containing 6 RPL15RPL15 ribosomal protein L15ribosomal protein L15 SH3BP4SH3BP4 SH3-domain binding protein 4SH3-domain binding protein 4 MSX2MSX2 msh homeobox 2msh homeobox 2 CLUCLU clusterinclusterin DPTDPT dermatopontindermatopontin ZNF238ZNF238 zinc finger protein 238zinc finger protein 238 HBP1HBP1 HMG-box transcription factor 1HMG-box transcription factor 1 GSTK1GSTK1 glutathione S-transferase kappa 1glutathione S-transferase kappa 1 ZBTB16ZBTB16 zinc finger and BTB domain containing 16zinc finger and BTB domain containing 16 CCDC69CCDC69 coiled-coil domain containing 69coiled-coil domain containing 69 ALDH2ALDH2 aldehyde dehydrogenase 2 family (mitochondrial)aldehyde dehydrogenase 2 family (mitochondrial) SLC1A1SLC1A1 solute carrier family 1 (neuronal/epithelial high affinity glutamate transporter, system Xag), member 1solute carrier family 1 (neuronal/epithelial high affinity glutamate transporter, system Xag), member 1 ARMCX2ARMCX2 armadillo repeat containing, X-linked 2armadillo repeat containing, X-linked 2 HMGCS2HMGCS2 3-hydroxy-3-methylglutaryl-CoA synthase 2 (mitochondrial)3-hydroxy-3-methylglutaryl-CoA synthase 2 (mitochondrial) TSPAN3TSPAN3 tetraspanin 3tetraspanin 3 FTOFTO fat mass and obesity associatedfat mass and obesity associated PON2PON2 paraoxonase 2paraoxonase 2 C16orf62C16orf62 chromosome 16 open reading frame 62chromosome 16 open reading frame 62 QDPRQDPR quinoid dihydropteridine reductasequinoid dihydropteridine reductase LRP2LRP2 low density lipoprotein receptor-related protein 2low density lipoprotein receptor-related protein 2 PSMB8PSMB8 proteasome (prosome, macropain) subunit, beta type, 8 (large multifunctional peptidase 7)proteasome (prosome, macropain) subunit, beta type, 8 (large multifunctional peptidase 7) HCLS1HCLS1 hematopoietic cell-specific Lyn substrate 1hematopoietic cell-specific Lyn substrate 1 FXYD1FXYD1 FXYD domain containing ion transport regulator 1FXYD domain containing ion transport regulator 1 OATOAT ornithine aminotransferaseornithine aminotransferase SLC38A1SLC38A1 solute carrier family 38, member 1solute carrier family 38, member 1 MAOAMAOA monoamine oxidase Amonoamine oxidase A LPLLPL lipoprotein lipaselipoprotein lipase C10orf57C10orf57 chromosome 10 open reading frame 57chromosome 10 open reading frame 57 SPARCL1SPARCL1 SPARC-like 1 (hevin)SPARC-like 1 (hevin) ERAP2ERAP2 endoplasmic reticulum aminopeptidase 2endoplasmic reticulum aminopeptidase 2 PDGFRLPDGFRL platelet-derived growth factor receptor-likeplatelet-derived growth factor receptor-like RBP4RBP4 retinol binding protein 4, plasmaretinol binding protein 4, plasma LRRC17LRRC17 leucine rich repeat containing 17leucine rich repeat containing 17 LHFPLHFP lipoma HMGIC fusion partnerlipoma HMGIC fusion partner BLNKBLNK B-cell linkerB-cell linker HBA2HBA2 hemoglobin, alpha 2hemoglobin, alpha 2 CST7CST7 cystatin F (leukocystatin)cystatin F (leukocystatin)

GO 분석결과 주성분 1은 증식에 집중되어 있고 주성분 2는 면역반응에 집중되어 있는 것으로 나타났다. 증식과 면역반응에 관여하는 2개의 주성분에 속하는 유전자를 대상으로 예후 집단간 발현량이 가장 큰 유전자를 각각 4개와 5개를 선택하였다. 각 유전자세트는 유전자는 증식의 발현패턴을 대표하는 의미에서 p-gene, 면역반응의 발현패턴을 나타내는 i-gene으로 명명하였다.
As a result of GO analysis, it was found that principal component 1 is concentrated in proliferation and principal component 2 is concentrated in immune response. For genes belonging to the two main components involved in proliferation and immune response, 4 and 5 genes with the highest expression levels between prognostic groups were selected, respectively. For each gene set, the gene was named p-gene, which represents the expression pattern of proliferation, and i-gene, which represents the expression pattern of the immune response.

모수적 생존분석을 이용한 예후예측모델 구성Constructing a prognostic model using parametric survival analysis

모수형 생존모델 중 가속화 고장시간모델(accelerated failure time model, AFT)을 이용하여 p-gene과 i-gene의 발현량을 공변수로 하는 회귀분석을 수행하였다. 4개의 p-gene은 환자별로 평균값을 구하여 p.mean으로 변환하였고, 5개의 i-gene도 역시 환자별로 평균값을 구하여 i.mean으로 변환하여 적용하였다. 가속과 고장시간모델은 Among the parametric survival models, an accelerated failure time model (AFT) was used to perform regression analysis using the expression levels of p-gene and i-gene as covariates. The average value of 4 p-genes was calculated for each patient and converted into p.mean, and the average value of 5 i-genes was also calculated for each patient and converted into i.mean. The acceleration and downtime models

T i = T 0 exp(β 1 χ 1 +β 2 χ 2+ … +β q χ q )·εi (1) T i = T 0 exp( β 1 χ 1 + β 2 χ 2 +… + β q χ q )·ε i (One)

로서 여기서 T i는 i번째 개체의 생존시간, T 0는 기저선 생존시간, χ i 는 공변수의 벡터 (j=1,2, ...,q), β는 대응하는 공변수의 계수이고 ε는 오차이다. 이 모델에서는 공변수가 기저선 생존시간에 상승적인 영향을 미치기 때문에 이것을 자주 이용하는 산업계에서 가속화 고장시간 모델이라고 부른다. 생존시간에 상승적으로 작용하는 효과 Ф=β 1 χ 1 +β 2 χ 2 + … + β q χ q 를 가속요인이라고 칭한다. 식(1)의 자연대수를 얻으면Where T i is the survival time of the i-th individual, T 0 is the baseline survival time, χ i is the vector of covariates (j=1,2, ...,q), β is the coefficient of the corresponding covariate and ε Is the error. In this model, the covariate has a synergistic effect on the baseline survival time, so it is often called an accelerated failure time model in industries that use it. Synergistic effect on survival time Ф = β 1 χ 1 + β 2 χ 2 +… + β q χ q is called an acceleration factor. If we get the natural logarithm of equation (1)

logT i = logT 0 +β 1 χ 1 +β 2 χ 2 + … + β q χ q * (2)log T i = log T 0 + β 1 χ 1 + β 2 χ 2 +… + β q χ q * (2)

이 되어 AFT모델은 일반 선형회귀 모델과 동일한 형태를 갖는다. 그러나 종속변수 logT는 정규분포를 하지 않을 뿐더러 생존분석 자료에는 선형회귀 모델에서 용납되지 않는 중도절단예가 존재하기 마련이이서 식(2)를 선형회귀모델과 같이 처리할 수 없다. 식(2)의 ε*는 일반 선형회귀모델에서 정규분포를 가정하는 것과 달리 데이터세트에 따라 경우마다 분포가 다를 수 있기 때문에 실제적인 통계처리가 번거롭다. 이를 극복하기 위하여 logT 0와 ε*를 변형하여 다음과 같이 표현한다.Thus, the AFT model has the same form as the general linear regression model. However, since the dependent variable logT is not normally distributed and there are examples of censoring that are not acceptable in the linear regression model in the survival analysis data, Equation (2) cannot be treated like a linear regression model. Unlike assuming a normal distribution in the general linear regression model, ε * in Equation (2) is cumbersome in actual statistical processing because the distribution may differ from case to case depending on the data set. To overcome this, log T 0 and ε * are transformed and expressed as follows.

logT = β 0 +β 1 χ 1 +β 2 χ 2 + … + β q χ q +σW (3)log T = β 0 + β 1 χ 1 + β 2 χ 2 +… + β q χ q +σW (3)

여기서 W는 logT의 분포를 따르며 그 분산은 표준화 분포의 값으로 고정되어 있다. σ는 척도모수로서 상수인데 그 값은 다루는 데이터 세트에 따라 결정된다.Here, W follows the distribution of logT, and the variance is fixed at the value of the standardized distribution. σ is a scale parameter, which is a constant, whose value is determined by the data set covered.

AFT모델을 이용하여 다양한 후보 예후예측모델에 대해, 와이블(weibull) 분포, 대수로지스틱(loglogistic) 분포, 대수정규(lognormal) 분포에 맞추어 보고, 가장 적합한 모델을 선택하였다. AFT모델에 맞출 위험도 분포는 디스커버리 데이터 세트의 생존정보의 세대생명표를 작성하여 얻을 수 있는 해저드함수를 이용하였다. 세대생명표로 얻은 해저드함수는 단봉(unimodal)형태를 보이므로, 와이블, 대수로지스틱, 대수정규 분포가 잘 적합할 것으로 예측되었다. 최종 모델의 선택은 Akaike's information criterion(AIC)과 R square(R2)를 고려하여 선택되었다.
For various candidate prognostic models using the AFT model, the most suitable model was selected by looking at the Weibull distribution, the loglogistic distribution, and the lognormal distribution. The risk distribution to fit the AFT model used the hazard function that can be obtained by creating a generational life table of the survival information of the discovery data set. Since the hazard function obtained from the generational life table is unimodal, it is predicted that Weibull, logarithmic logistic, and lognormal distribution will fit well. The final model was selected in consideration of Akaike's information criterion (AIC) and R square (R 2 ).

예후예측모델의 검증Verification of prognostic model

선택된 모델에 대한 검증은‘교정(calibration)’과‘판별(discrimination)’에 대해 수행하였다. ‘교정’은 만들어진 예후예측모델을 이용하여 예측된 생존확률과 실제 관찰된 생존확률이 얼마만큼 일치하는지를 알아보는 것이고,‘판별’은 예후 예측모델에 의해 주어진 환자집단을 예후집단으로 분류하였을 때의 분리성을 알아보는 것이다. 여기서 말하는 실제 관찰되는 생존확률은 Kaplan-Meier 법에 의해 구해진 값을 뜻한다. AFT 기반의 예후예측 모델은 환자별 생존확률을 모든 시간대에 대해 구할 수 있다. 모델에 의해 예측된 생존확률과 Kaplan-Meier법에 의한 생존확률을 비교하였다. Kaplan-Meier와 같이 전체 시간에 따른 예측 생존확률을 얻기 위해, 전체 환자들의 생존확률곡선은 0yr-25yrs까지 0.1 단위로 구하여 각 시간별 평균생존확률을 계산하여 구하였다. 전체 생존시간에 대한 생존확률비교와 함께, 5년 생존확률도 비교하였다. 주어진 데이터세트에서 환자들의 5년 생존확률을 예후 예측모델을 이용하여 예측한 생존확률을, 해저드회귀분석인 Hare을 이용하여 계산되는 5년 생존확률을 관측값으로 하여 비교하였다. Verification of the selected model was performed for'calibration' and'discrimination'. 'Calibration' is to find out how much the predicted survival probability matches the actual observed survival probability using the created prognostic prediction model, and'discrimination' is when the patient group given by the prognosis prediction model is classified as a prognostic group. To find out the separability. The actual observed survival probability here refers to the value obtained by the Kaplan-Meier method. In the AFT-based prognostic model, the survival probability for each patient can be obtained for all time periods. The survival probability predicted by the model was compared with the survival probability by the Kaplan-Meier method. Like Kaplan-Meier, in order to obtain the predicted survival probability according to the total time, the survival probability curve of all patients was calculated by calculating the average survival probability for each time from 0 yrs to 25 yrs in 0.1 units. In addition to comparing the survival probability to the total survival time, the 5-year survival probability was also compared. In a given dataset, the survival probability predicted using a prognostic model for the 5-year survival probability of patients was compared with the 5-year survival probability calculated using Hare, a hazard regression analysis, as an observation value.

‘판별’은 주어진 데이터세트의 모든 환자의 예후예측지수(prognostic index)를 4구간으로 나눈 뒤, 각 구간에 속하는 환자들의 생존확률을 KM 그래프로 비교하였다. 예후예측지수는 생존모델의 종속변수이다. 4개의 예측된 예후집단에 대한 KM 그래프가 분명하게 나뉠수록 판별의 기능이 좋은 모델이다.In'Discrimination', the prognostic index of all patients in a given dataset was divided into 4 sections, and the survival probability of patients in each section was compared with a KM graph. The prognostic index is a dependent variable of the survival model. The more clearly divided the KM graphs for the four predicted prognostic groups, the better the discrimination function is.

디스커버리 데이터세트와 3개의 독립적인 검증 데이터 세트들에 대해 모두‘교정’과‘판별’을 알아보았다.Discovery dataset and 3 independent All of the validation data sets were examined for'calibration'and'discrimination'.

통계분석에 사용된 중요 R 패키지들은 다음과 같다:The important R packages used for statistical analysis are:

affy : .CEL 파일에 대해 rma 알고리듬을 이용한 전 처리(pre-processing). affy: Pre-processing of .CEL files using rma algorithm.

samr : 예후집단간 발현량에 차이가 있는 유전자 발굴.samr: Identification of genes with differences in expression levels between prognostic groups.

GOstats : 선택된 유전자세트와 관련된 기능을 알아봄.GOstats: Find out the functions related to the selected gene set.

KMsurv : 디스커버리 데이터 세트의 생존자료를 이용하여 생명표를 작성함.KMsurv: A life table was created using the survival data of the discovery data set.

rma : AFT 모델을 이용하여 예후예측모델의 계수를 추정함. 모델에 대한‘교정’수행.
rma: Estimates the coefficients of the prognostic model using the AFT model. Perform a'correction' on the model.

실험 결과Experiment result

예후 예측모델을 위한 예후유전자의 선택Selection of prognostic genes for predictive prognosis model

초기 유방암 조직의 발현 프로파일로 이루어진 5개의 데이터 세트를 모두 합쳐서 1,104개 샘플의 디스커버리 데이터 세트를 구성하였다. 모든 환자들은 화학치료를 받지 않았고, 거의 대부분 액와절 전이가 전혀 없거나(N0 or N-) 유방암초기 (1기 또는 2기)이다. 이 중, 외부조직 전이에 대한 생존정보를 가지는 1,072명을 대상으로 통계적 분석을 수행하였다. 예후와 관련된 유전자를 찾기 위하여 예후가 좋은 집단(10년이상 전이가 없는 경우)과 예후가 나쁜 집단(5년 이내에 전이가 있는 경우)의 발현 프로파일로 나누어 비교하였다. 예후가 좋은 집단에서 높은 발현량을 보인 182개의 유전자와 예후가 나쁜 집단에서 높은 발현량을 보인 120개의 유전자를 선택하였다(FDR < 0.001). All five data sets consisting of the expression profiles of early breast cancer tissues were combined to form a discovery data set of 1,104 samples. All patients did not receive chemotherapy and almost all had no axillary metastasis at all (N0 or N-) or early breast cancer (stage 1 or 2). Among them, a statistical analysis was performed on 1,072 people who had survival information for external tissue metastasis. To find genes related to prognosis, the expression profiles of the group with good prognosis (if there is no metastasis for more than 10 years) and the group with poor prognosis (if there is metastasis within 5 years) were compared. 182 genes with high expression levels in the group with good prognosis and 120 genes with high expression levels in the group with poor prognosis were selected (FDR <0.001).

선택된 302개의 유전자의 발현량에 대해 주성분 분석을 수행하였다. 주성분 1과 주성분 2에 대해 GO 기능분석을 수행하였다. 주성분 1은 매우 뚜렷하게 증식에 관련되어 있었고, 주성분 2는 면역반응과 관련이 강하게 나타났다. 이를 기반으로 주성분 1에 속하는 4개의 유전자를 선택하였고, 주성분 2에 속하는 5개의 유전자를 선택함으로써 2개의 발현패턴을 예후 예측모델에 반영하도록 하였다.
Principal component analysis was performed on the expression levels of 302 selected genes. Principal component 1 and principal component 2 were analyzed for GO function. Principal component 1 was very clearly related to proliferation, and principal component 2 was strongly related to immune response. Based on this, four genes belonging to principal component 1 were selected, and five genes belonging to principal component 2 were selected to reflect two expression patterns in the prognosis prediction model.

선택된 9개의 유전자들은 예후와 관련이 있을 뿐만 아니라, 예후집단간 발현차이가 가장 큰 유전자들로 선택하였다. 증식을 나타내는 주성분 1에서 선택된 유전자 4개는 p-gene로 면역반응을 나타내는 주성분 2에서 선택된 유전자 5개는 i-genes로 명하였다.
The selected 9 genes were selected as genes with the greatest difference in expression between prognostic groups as well as having a prognosis related. Four genes selected from principal component 1 representing proliferation were designated as p-genes, and five genes selected from principal component 2 representing an immune response were designated as i-genes.

ER+ 유방암과 ER- 유방암의 비교Comparison of ER+ breast cancer and ER- breast cancer

에스트로겐 수용체(estrogen receptor, ER)의 발현유무는 유방암의 발생 및 발달과 밀접한 관련이 있는 것으로 알려져 있다. 예후와 관련하여 선택된 유전자들이 나타내는 2가지 기능, 즉 증식과 면역반응은 암의 메카니즘에 있어 흥미로운 기능이다. 선택된 16개의 유전자들(p-genes 및 i-genes)을 이용하여 ER- 유방암과 ER+ 유방암을 비교하여 보았다. 각 기능의 강도를 나타내기 위하여 p-genes 과 i-genes는 평균 발현량에 따라 3단계(p1, p2, p3 또는 i1, i2, i3)로 층화하였다. p1은 p-gene의 발현량이 가장 낮은 집단이고 증식이 가장 느릴 것으로 가정하였다. p3는 p-genes의 발현량이 가장 높은 집단이고 증식이 가장 활발하게 일어날 것으로 가정하였다. p2는 중간 발현량을 보이고 중간수준의 증식을 가정하였다. i1은 i-genes가 가장 적게 발현하는 집단이고 약한 면역반응이 있다고 가정하였다. i3는 i-genes가 가장 많이 발현하는 집단이고 매우 강한 면역반응이 있다고 보았다. i2는 중간수준의 발현량과 활동을 보일 것으로 간주하였다. It is known that the expression of estrogen receptor (ER) is closely related to the incidence and development of breast cancer. Two functions of the genes selected for prognosis, proliferation and immune response, are interesting functions in the mechanism of cancer. We compared ER- breast cancer and ER+ breast cancer using 16 selected genes (p-genes and i-genes). To indicate the strength of each function, p-genes and i-genes were stratified in three stages (p1, p2, p3 or i1, i2, i3) according to the average expression level. It was assumed that p1 was the group with the lowest expression level of p-gene and the slowest proliferation. p3 was the group with the highest expression level of p-genes, and it was assumed that proliferation would occur most actively. p2 showed an intermediate level of expression and an intermediate level of proliferation was assumed. i1 was the population that expressed the least i-genes and was assumed to have a weak immune response. i3 was considered to be the group in which i-genes was most expressed and had a very strong immune response. i2 was considered to have a moderate level of expression and activity.

디스커버리 데이터 세트 내의 1,072 명에 대해 p-gene과 i-gene의 발현량에 따라 분류를 하고 ER 상태별로 각 기능의 강도에 대한 구성을 살펴보았다. ER- 유방암은 ER+ 유방암에 비해 매우 활발히 증식하는 p3 타입의 비율이 매우 높았다. 약 62%의 ER- 유방암이 매우 높은 p-genes 발현량 (p3)을 보인 반면, 18%의 ER+ 유방암만이 높은 p-genes 발현량을 보임으로써, ER- 유방암이 ER+ 유방암보다 훨씬 공격적인 성향을 보인다고 알려진 바와 같았다. 약 35%의 ER+ 유방암이 약한 p-genes (p1)을 보였고, ER-의 경우는 p1의 비율이 9%밖에 되지 않았다. 활발한 면역반응 기능은 ER- 유방암의 또 다른 특징으로서 38% 이상의 ER- 유방암은 i-genes (i3)의 발현량이 매우 높았다. 반면 ER+ 유방암은 21% 정도가 높은 i-genes 발현량을 보였다. ER+와 ER- 모두 증식이 활발해질수록 면역반응 역시 활발해지는 것이 관찰되었지만, ER- 유방암이 면역반응을 더욱 적극적으로 보이는 것으로 나타났다.1,072 people in the discovery data set were classified according to the expression levels of p-gene and i-gene, and the composition of the strength of each function by ER status was examined. ER- breast cancer had a very high proportion of p3 type, which proliferates very actively compared to ER+ breast cancer. About 62% of ER- breast cancer showed very high p-genes expression (p3), whereas only 18% of ER+ breast cancer showed high p-genes expression, so ER- breast cancer had a much more aggressive tendency than ER+ breast cancer. It was as it was known to be seen. About 35% of ER+ breast cancer showed weak p-genes (p1), and in the case of ER-, the ratio of p1 was only 9%. An active immune response function is another characteristic of ER- breast cancer, and the expression level of i-genes (i3) was very high in 38% or more of ER- breast cancer. On the other hand, ER+ breast cancer showed a high i-genes expression level of 21%. It was observed that the more active the proliferation of both ER+ and ER- became, the more active the immune response was, but the ER- breast cancer showed more active immune response.

이 외에, 유방암의 분화 (grade)정도도 증식과 밀접한 관계가 있는 것으로 나타났다. 분화가 잘 안되어 있는 유방암 (G3)일수록 빠른 증식을 보였고, 분화가 잘된 유방암 (G1)은 대부분 약한 증식을 보였다. 환자의 예후도 증식과 상관관계가 있는 것으로 나타났다. 5년 내에 전이가 일어난 예후가 나쁜 환자의 많은 수가 증식이 빠른 집단에 더 많이 몰려있는 것이 관찰되었다.In addition, the degree of breast cancer grade was also found to be closely related to proliferation. The less differentiated breast cancer (G3) showed faster proliferation, and the well-differentiated breast cancer (G1) showed weaker proliferation. The prognosis of patients was also found to be correlated with proliferation. It was observed that a large number of patients with poor prognosis with metastasis within 5 years were more concentrated in the rapidly proliferating group.

종합적으로 ER- 유방암은 증식과 면역반응 모두 ER+ 유방암에 비해 매우 활발하였고, ER의 발현량이 유방암의 발생 및 발달의 메카니즘에 영향을 주는 것으로 추측된다.
Overall, ER- breast cancer was more active than ER+ breast cancer in both proliferation and immune response, and it is estimated that the amount of ER expression affects the mechanism of incidence and development of breast cancer.

예후 예측모델의 확립Establishment of prognosis prediction model

디스커버리 데이터 세트의 생존정보와 선택된 p-gene과 i-gene을 이용하여 초기 유방암 환자의 전이에 대한 AFT 예후예측모델을 만들었다. 디스커버리 데이터 세트의 생존정보를 이용하여 1년 단위의 세대생명표를 작성하여 대략적인 위험도를 계산하였다Using the survival information of the discovery data set and the selected p-gene and i-gene, an AFT prognostic model for metastasis in early breast cancer patients was created. Using the survival information of the discovery data set, a 1-year household life table was created and the approximate risk was calculated.

세대생명표로 얻은 사망확률은 단봉 (unimodal)형태를 보이므로, 와이블, 대수로지스틱, 대수정규 분포가 잘 적합할 것으로 예측되었다. 예후예측모델에 포함될 공변수는 p.mean와 i.mean이다. p.mean 은 4개의 p-genes의 평균값이면, i.mean은 i-genes의 평균값이다. Since the mortality probability obtained from the generational life table is unimodal, it was predicted that Weibull, logarithmic logistic, and lognormal distribution would fit well. The covariates to be included in the prognostic model are p.mean and i.mean. p.mean is the average value of 4 p-genes, and i.mean is the average value of i-genes.

3개의 모델에 대해 와이블, 대수로지스틱, 대수정규 분포에 대해 적용을 시킨 결과, 대수정규분포와 가장 잘 적합하였다. AIC(Akaike’s information criterion)을 이용하여 최종모델 ③을 선택하였다.
As a result of applying Weibull, logarithmic, and lognormal distribution to the three models, it fits best with the lognormal distribution. The final model ③ was selected using AIC (Akaike's information criterion).

log(T)= -0.689 x p.mean + 0.274 x i.mean + 3.219
log(T)= -0.689 x p.mean + 0.274 x i.mean + 3.219

위의 추정된 모델에 의하면, p.mean, 즉 증식은 생존시간 (T)과 음의 상관관계(-0.689, p값 = 2.47 x e-17)를 가지므로 증식이 활발할수록 생존시간은 짧아지게 된다. 반대로, i.mean은 생존시간과 양의 상관관계 (0.274, p값 = 3.69 x e-11)를 가지는데, 면역반응이 활발할수록 생존시간이 길어지는 것을 뜻한다. 위의 추정된 변수들을 해석하면, 증식이 유방암의 예후에 결정적인 역할을 하며 활발할수록 예후가 나쁜 반면, 면역반응이 빠른 증식에 대한 방어 메커니즘으로 활동하는 것으로 결론지을 수 있다.
According to the estimated model above, p.mean, that is, proliferation has a negative correlation with survival time (T) (-0.689, p value = 2.47 xe -17 ), so the more active the proliferation, the shorter the survival time. . In contrast, i.mean has a positive correlation with survival time (0.274, p value = 3.69 xe -11 ), which means that the more active the immune response, the longer the survival time. Interpreting the above estimated variables, it can be concluded that proliferation plays a decisive role in the prognosis of breast cancer, and the more active the prognosis is poor, while the immune response acts as a defense mechanism against rapid proliferation.

예후예측모델의 검증Verification of prognostic model

디스커버리 데이터세트의 1,072명의 초기유방암 환자의 발현 프로파일을 이용하여 만든 예후 예측모델에 대한 검증은‘교정’과‘판별’에 대해 수행되었다. ‘교정’은 모델을 통해 예측된 생존확률이 실제 관찰된 생존확률과 얼마나 비슷한지를 알아보는 것인데, 이때 실제 관찰된 생존확률은 Kaplan-Meier 방법을 이용하여 얻은 생존확률을 말한다. ‘판별’은 모델을 이용하여 환자를 예후집단으로 잘 분류하는가 이다. 두 가지 성능에 대한 검증은 모델을 개발한 디스커버리 데이터세트와 3개의 독립된 검증 데이터세트에 대해 수행하였다.The verification of the prognosis prediction model created using the expression profiles of 1,072 early breast cancer patients in the discovery dataset was performed for'correction' and'discrimination'. 'Calibration' is to find out how similar the survival probability predicted through the model is to the actual observed survival probability, and the actual observed survival probability refers to the survival probability obtained using the Kaplan-Meier method. 'Discrimination' is whether patients are well classified into prognostic groups using a model. Two performance validations were performed on the discovery dataset for which the model was developed and on 3 independent validation datasets.

예후 예측모델을 개발한 디스커버리 데이터 세트에 대해 예후예측지수 (prognostic index, PI)를 4등분하여 4개의 예후집단으로 분류하였다. 예후예측지수에 의해 분류된 4개의 예후그룹에 대해 관찰된 생존확률인 KM 그래프를 이용하여 비교하였다. 그 결과, 4개의 예후그룹이 매우 잘 분류된 것을 볼 수 있었으며, 각 예후집단의 예측된 생존확률과 관찰된 생존확률이 잘 일치하는 것을 볼 수 있다.The prognostic index (PI) was divided into four prognostic groups for the discovery data set for which the prognostic model was developed. The observed survival probability for the four prognostic groups classified by the prognostic index was compared using the KM graph. As a result, it was found that the four prognostic groups were classified very well, and the predicted survival probability of each prognostic group and the observed survival probability were well matched.

KM 생존확률과 예후예측모형에 의해 예측된 생존확률을 그래프를 이용하여 비교하였다. 예후 예측모델은 모든 환자에 대해 모든 시간별 생존확률을 구하기 때문에, KM 생존곡선과 같이 전체 생존시간에 대한 확률곡선을 얻기 위해, 각 환자들의 시간별(0년-25년, 0.1 간격) 평균 생존확률을 이용하여 생존확률그래프를 그렸다. 예측된 생존확률이 KM에 의한 생존확률보다 약간 높게 나오기는 했지만 전체적으로 비슷하였다. 전체 생존시간에 대한 생존확률 비교 외에 5년차 생존확률에 대해서도 비교하였다. 모델에 의한 5년 생존확률도 실제 관찰된 5년 생존확률과 유사하였고 특히 예측된 5년 생존확률이 높을수록 예측확률과 관찰확률이 잘 일치하였다.The KM survival probability and the survival probability predicted by the prognostic model were compared using a graph. The prognosis prediction model calculates the survival probability at all times for all patients, so in order to obtain a probability curve for the total survival time like the KM survival curve, the average survival probability for each patient by time (0 years-25 years, 0.1 intervals) is calculated. Was used to draw a survival probability graph. Although the predicted survival probability was slightly higher than that by KM, it was generally similar. In addition to comparing the survival probability for the overall survival time, the survival probability at 5 years was also compared. The 5-year survival probability by the model was also similar to the actual observed 5-year survival probability. In particular, the higher the predicted 5-year survival probability, the better the predicted probability and the observation probability matched.

보다 객관적인 검증을 위하여 3개의 독립적인 검증 데이터 세트들을 이용하여 예후예측모델을 검증하였다. 첫 번째 검증 데이터 세트는 Affymetrix U133A 플랫폼으로 생성된 2개의 데이터 세트를 합친 것이다. 두 번째 검증 데이터 세트는 Affymetric U133A 플랫폼으로 생성된 데이터로서 모두 tamoxifen을 5년간 복용한 ER+ 환자이다. 세 번째 검증 데이터 세트는 70개의 예후 예측유전자(현재 mammaprint로 상용화)의 발굴 및 검증을 위하여 사용된 데이터세트로 Agilent Hu25K 플랫폼으로 생성되었다. 검증 데이터세트 1과 2의 경우 디스커버리 데이터 세트와 같은 Affymetrix U133A 플랫폼으로 제작된 것으로 디스커버리 데이터 세트와 함께 발현량을 표준화하였다. 검증 데이터 세트 1과 2는 교정과 판별의 성능을 평가하였고, 검증 데이터 세트 3은 발현량 표준화 문제로 판별의 성능만 평가하였다. For more objective verification, the prognostic prediction model was verified using three independent verification data sets. The first validation data set is a combination of two data sets created with the Affymetrix U133A platform. The second validation data set is data generated by the Affymetric U133A platform, all of ER+ patients taking tamoxifen for 5 years. The third validation data set was created with the Agilent Hu25K platform as a data set used for discovery and validation of 70 prognostic predictive genes (currently commercially available as mammaprint). In the case of validation datasets 1 and 2, the expression levels were standardized together with the discovery dataset, as they were made with the same Affymetrix U133A platform as the discovery dataset. Validation data sets 1 and 2 evaluated the performance of calibration and discrimination, and verification data set 3 evaluated only the performance of discrimination due to the expression level standardization problem.

검증 데이터세트 1의 경우, 4개의 예후집단이 분명하게 분류되었고, 각 예후집단의 관측된 생존확률과 모델에 의해 예측된 생존확률이 잘 일치하는 편이였다. 전체시간에 대한 예측된 생존확률은 관찰된 KM 그래프와 잘 일치하였고, 5년 생존률의 경우 예측된 생존확률이 약 2% 정도 관찰확률보다 높게 나왔다.In the case of the validation dataset 1, 4 prognostic groups were clearly classified, and the observed survival probability of each prognostic group and the survival probability predicted by the model were in good agreement. The predicted survival probability for the total time was in good agreement with the observed KM graph, and in the case of the 5-year survival rate, the predicted survival probability was about 2% higher than the observed probability.

검증 데이터세트 2의 경우, 4개의 예후집단이 분명하게 분류되지는 않았지만, 전체적으로 예측된 생존확률이 높을수록 관찰된 생존확률도 높았다. 전체 생존확률에 대해서도, 예측된 생존확률그래프는 관찰된 KM 그래프와 잘 일치하였다. 5년 생존률의 경우, 예측된 생존확률이 약 2%정도 관찰된 생존확률에 비해 높게 나왔다.
In the case of validation dataset 2, the four prognostic groups were not clearly classified, but the higher the overall predicted survival probability, the higher the observed survival probability. For the overall survival probability, the predicted survival probability graph was in good agreement with the observed KM graph. In the case of the 5-year survival rate, the predicted survival probability was higher than the observed survival probability by about 2%.

이상으로 본 발명의 특정한 부분을 상세히 기술하였는 바, 당업계의 통상의 지식을 가진 자에게 있어서 이러한 구체적인 기술은 단지 바람직한 구현예일 뿐이며, 이에 본 발명의 범위가 제한되는 것이 아닌 점은 명백하다. 따라서, 본 발명의 실질적인 범위는 첨부된 청구항과 그의 등가물에 의하여 정의된다고 할 것이다.
As described above, a specific part of the present invention has been described in detail, and it is obvious that this specific technology is only a preferred embodiment for those of ordinary skill in the art, and the scope of the present invention is not limited thereto. Accordingly, it will be said that the substantial scope of the present invention is defined by the appended claims and their equivalents.

참고문헌references

1. Chang, H.Y., et al., Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol 2(2): p. E7(2004).1.Chang, HY, et al., Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol 2 (2): p. E7 (2004).

2. van de Vijver, M.J., et al., A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347(25):1999-2009(2002).2. van de Vijver, MJ, et al., A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347(25):1999-2009(2002).

3. van 't Veer, L.J., et al., Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871): 530-536(2002).3. van't Veer, LJ, et al., Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871): 530-536(2002).

4. Wang, Y., et al., Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365(9460): 671-679(2005).4. Wang, Y., et al., Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365 (9460): 671-679 (2005).

5. Buyse, M., et al., Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J Natl Cancer Inst, 98(17):1183-92(2006).5. Buyse, M., et al., Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J Natl Cancer Inst , 98(17):1183-92(2006).

6. Paik, S., Development and clinical utility of a 21-gene recurrence score prognostic assay in patients with early breast cancer treated with tamoxifen. Oncologist 12(6):631-635(2007).6. Paik, S., Development and clinical utility of a 21-gene recurrence score prognostic assay in patients with early breast cancer treated with tamoxifen. Oncologist 12(6):631-635(2007).

7. Paik, S., et al., A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med 351(27) :2817-2826(2004).7. Paik, S., et al., A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med 351(27):2817-2826(2004).

8. Sotiriou, C., et al., Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98(4):262-72(2006)8. Sotiriou, C., et al., Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98(4):262-72(2006)

9. Pawitan, Y., et al., Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res 7(6):R953-964(2005).9. Pawitan, Y., et al., Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res 7 (6): R953-964 (2005).

10. Miller, L.D., et al., An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci USA, 102(38):13550-13555(2005).10. Miller, LD, et al., An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci USA , 102(38):13550-13555(2005).

11. Bild, A.H., et al., Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439(7074):353-357(2006).11.Bild, AH, et al., Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439(7074):353-357(2006).

12. Teschendorff, A.E., et al., A consensus prognostic gene expression classifier for ER positive breast cancer. Genome Biol 7(10):R101(2006).12. Teschendorff, AE, et al., A consensus prognostic gene expression classifier for ER positive breast cancer. Genome Biol 7(10):R101(2006).

13. Desmedt, C., et al., Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res 13(11): 3207-3214(2007).13. Desmedt, C., et al., Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res 13(11): 3207-3214 (2007).

14. Kim, S.Y., Effects of sample size on robustness and prediction accuracy of a prognostic gene signature. BMC Bioinformatics 10:147(2009).14. Kim, SY, Effects of sample size on robustness and prediction accuracy of a prognostic gene signature. BMC Bioinformatics 10:147 (2009).

15. Hummel, M., et al., Association between a prognostic gene signature and functional gene sets. Bioinform Biol Insights 2:329-341(2008).15. Hummel, M., et al., Association between a prognostic gene signature and functional gene sets. Bioinform Biol Insights 2:329-341 (2008).

16. Pfeffer, U., et al., Prediction of breast cancer metastasis by genomic profiling: where do we stand? Clin Exp Metastasis 26(6): 547-558(2009).16. Pfeffer, U., et al., Prediction of breast cancer metastasis by genomic profiling: where do we stand? Clin Exp Metastasis 26(6): 547-558 (2009).

17. Ein-Dor, L., O. Zuk, and E. Domany, Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA, 103(15):5923-5928(2006).17. Ein-Dor, L., O. Zuk, and E. Domany, Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA , 103(15):5923-5928(2006).

18. van Vliet, M.H., et al., Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability. BMC Genomics, 9:375(2008).18. van Vliet, MH, et al., Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability. BMC Genomics , 9:375 (2008).

19. Yasrebi, H., et al., Can survival prediction be improved by merging gene expression data sets? PLoS One 4(10):e7431(2009).19. Yasrebi, H., et al., Can survival prediction be improved by merging gene expression data sets? PLoS One 4(10):e7431(2009).

20. Fan, C., et al., Concordance among gene-expression-based predictors for breast cancer. N Engl J Med 355(6):560-569(2006).20. Fan, C., et al., Concordance among gene-expression-based predictors for breast cancer. N Engl J Med 355(6):560-569(2006).

21. Reyal, F., et al., A comprehensive analysis of prognostic signatures reveals the high predictive capacity of the proliferation, immune response and RNA splicing modules in breast cancer. Breast Cancer Res 10(6):R93(2008).21.Reyal, F., et al., A comprehensive analysis of prognostic signatures reveals the high predictive capacity of the proliferation, immune response and RNA splicing modules in breast cancer. Breast Cancer Res 10(6):R93(2008).

22. Yu, J.X., et al., Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer. BMC Cancer 7:182(2007).22. Yu, JX, et al., Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer. BMC Cancer 7:182 (2007).

23. Kim, S.Y. and Y.S. Kim, A gene sets approach for identifying prognostic gene signatures for outcome prediction. BMC Genomics 9:177(2008).23. Kim, SY and YS Kim, A gene sets approach for identifying prognostic gene signatures for outcome prediction. BMC Genomics 9:177 (2008).

24. Thomassen, M., Q. Tan, and T.A. Kruse, Gene expression meta-analysis identifies metastatic pathways and transcription factors in breast cancer. BMC Cancer 8:394(2008).24. Thomassen, M., Q. Tan, and TA Kruse, Gene expression meta-analysis identifies metastatic pathways and transcription factors in breast cancer. BMC Cancer 8:394 (2008).

25. Schmidt, M., et al., The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res 68(13):5405-13(2008).25. Schmidt, M., et al., The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res 68(13):5405-13(2008).

26. Schmidt, M., et al., Coordinates in the universe of node-negative breast cancer revisited. Cancer Res 69(7):2695-2698(2009).26. Schmidt, M., et al., Coordinates in the universe of node-negative breast cancer revisited. Cancer Res 69(7):2695-2698(2009).

27. Calabro, A., et al., Effects of infiltrating lymphocytes and estrogen receptor on gene expression and prognosis in breast cancer. Breast Cancer Res Treat 116(1):69-77(2009).27. Calabro, A., et al., Effects of infiltrating lymphocytes and estrogen receptor on gene expression and prognosis in breast cancer. Breast Cancer Res Treat 116(1):69-77(2009).

28. Finak, G., et al., Stromal gene expression predicts clinical outcome in breast cancer. Nat Med 14(5):518-27(2008).28. Finak, G., et al., Stromal gene expression predicts clinical outcome in breast cancer. Nat Med 14(5):518-27(2008).

29. Ma, X.J., et al., Gene expression profiling of the tumor microenvironment during breast cancer progression. Breast Cancer Res 11(1):R7(2009).29. Ma, XJ, et al., Gene expression profiling of the tumor microenvironment during breast cancer progression. Breast Cancer Res 11(1):R7(2009).

30. Rutqvist, L.E., A. Wallgren, and B. Nilsson, Is breast cancer a curable disease? A study of 14,731 women with breast cancer from the Cancer Registry of Norway. Cancer 53(8):1793-1800(1984).30. Rutqvist, LE, A. Wallgren, and B. Nilsson, Is breast cancer a curable disease? A study of 14,731 women with breast cancer from the Cancer Registry of Norway. Cancer 53(8):1793-1800(1984).

31. Mould, R.F. and J.W. Boag, A test of several parametic statistical models for estimating success rate in the treatment of carcinoma cervix uteri. Br J Cancer 32(5):529-550(1975).31.Mould, RF and JW Boag, A test of several parametic statistical models for estimating success rate in the treatment of carcinoma cervix uteri. Br J Cancer 32(5):529-550 (1975).

32. Loi, S., et al., Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen. BMC Genomics, 9:239(2008).32. Loi, S., et al., Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen. BMC Genomics , 9:239 (2008).

33. Zhang, Y., et al., The 76-gene signature defines high-risk patients that benefit from adjuvant tamoxifen therapy. Breast Cancer Res Treat 116(2):303-309(2009).33. Zhang, Y., et al., The 76-gene signature defines high-risk patients that benefit from adjuvant tamoxifen therapy. Breast Cancer Res Treat 116(2):303-309(2009).

34. Dai, M., et al., Evolving gene/transcript definitions significantly alter the interpretation of Gene Chip data. Nucleic Acids Res 33(20):e175(2005).34. Dai, M., et al., Evolving gene/transcript definitions significantly alter the interpretation of Gene Chip data. Nucleic Acids Res 33(20):e175(2005).

35. Tusher, V.G., R. Tibshirani, and G. Chu, Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA, 98(9):5116-21(2001).35. Tusher, VG, R. Tibshirani, and G. Chu, Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA , 98(9):5116-21(2001).

<110> REFERENCE BIOLABS Inc. <120> Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients <160> 9 <170> KopatentIn 1.71 <210> 1 <211> 3128 <212> DNA <213> Homo sapiens <400> 1 gcttcgcccc gtggcgcggt ttgaaatttt gcggggctca acggctcgcg gagcggctac 60 gcggagtgac atcgccggtg tttgcgggtg gttgttgctc tcggggccgt gtggagtagg 120 tctggacctg gactcacggc tgcttggagc gtccgccatg aggagaagtg aggtgctggc 180 ggaggagtcc atagtatgtc tgcagaaagc cctaaatcac cttcgggaaa tatgggagct 240 aattgggatt ccagaggacc agcggttaca aagaactgag gtggtaaaga agcatatcaa 300 ggaactcctg gatatgatga ttgctgaaga ggaaagcctg aaggaaagac tcatcaaaag 360 catatccgtc tgtcagaaag agctgaacac tctgtgcagc gagttacatg ttgagccatt 420 tcaggaagaa ggagagacga ccatcttgca actagaaaaa gatttgcgca cccaagtgga 480 attgatgcga aaacagaaaa aggagagaaa acaggaactg aagctacttc aagagcaaga 540 tcaagaactg tgcgaaattc tttgtatgcc ccactatgat attgacagtg cctcagtgcc 600 cagcttagaa gagctgaacc agttcaggca acatgtgaca actttgaggg aaacaaaggc 660 ttctaggcgt gaggagtttg tcagtataaa gagacagatc atactgtgta tggaagaatt 720 agaccacacc ccagacacaa gctttgaaag agatgtggtg tgtgaagacg aagatgcctt 780 ttgtttgtct ttggagaata ttgcaacact acaaaagttg ctacggcagc tggaaatgca 840 gaaatcacaa aatgaagcag tgtgtgaggg gctgcgtact caaatccgag agctctggga 900 caggttgcaa atacctgaag aagaaagaga agctgtggcc accattatgt ctgggtcaaa 960 ggccaaggtc cggaaagcgc tgcaattaga agtggatcgg ttggaagaac tgaaaatgca 1020 aaacatgaag aaagtgattg aggcaattcg agtggagctg gttcagtact gggaccagtg 1080 cttttatagc caggagcaga gacaagcttt tgcccctttc tgtgctgagg actacacaga 1140 aagtctgctc cagctccacg atgctgagat tgtgcggtta aaaaactact atgaagttca 1200 caaggaactc tttgaaggtg tccagaagtg ggaagaaacc tggaggcttt tcttagagtt 1260 tgagagaaaa gcttcagatc caaatcgatt tacaaaccga ggaggaaatc ttctaaaaga 1320 agaaaaacaa cgagccaagc tccagaaaat gctgcccaag ctggaagaag agttgaaggc 1380 acgaattgaa ttgtgggaac aggaacattc aaaggcattt atggtgaatg ggcagaaatt 1440 catggagtat gtggcagaac aatgggagat gcatcgattg gagaaagaga gagccaagca 1500 ggaaagacaa ctgaagaaca aaaaacagac agagacagag atgctgtatg gcagcgctcc 1560 tcgaacacct agcaagcggc gaggactggc tcccaataca ccgggcaaag cacgtaagct 1620 gaacactacc accatgtcca atgctacggc caatagtagc attcggccta tctttggagg 1680 gacagtctac cactcccccg tgtctcgact tcctccttct ggcagcaagc cagtcgctgc 1740 ttccacctgt tcagggaaga aaacaccccg tactggcagg catggagcca acaaggagaa 1800 cctggagctc aacggcagca tcctgagtgg tgggtaccct ggctcggccc ccctccagcg 1860 caacttcagc attaattctg ttgccagcac ctattctgag tttgcgaagg atccgtccct 1920 ctctgacagt tccactgttg ggcttcagcg agaactttca aaggcttcca aatctgatgc 1980 tacttctgga atcctcaatt caaccaacat ccagtcctga gaagccctga tcagtcaacc 2040 agctgtggct tcctgtgcct agactggacc taattatatg ggggtgactt tagtttttct 2100 tcagcttagg cgtgcttgaa accttggcca ggttccatga ccatgggcct aacttaaaga 2160 tgtgaatgag tgttacagtt gaaagcccat cataggttta gtggtcctag gagacttggt 2220 tttgacttat atacatgaaa agtttatggc aagaagtgca aattttagca tatggggcct 2280 gacttctcta ccacataatt ctacttgctg aagcatgatc aaagcttgtt ttatttcacc 2340 actgtaggaa aatgattgac tatgcccatc cctgggggta attttggcat gtatacctgt 2400 aactagtaat taacatcttt tttgtttagg catgttcaat taatgctgta gctatcatag 2460 ctttgctctt acctgaagcc ttgtccccac cacacaggac agccttcctc ctgaagagaa 2520 tgtctttgtg tgtccgaagt tgagatggcc tgccctactg ccaaagaggt gacaggaagg 2580 ctgggagcag ctttgttaaa ttgtgttcag ttctgttaca cagtgcattg ccctttgttg 2640 ggggtatgca tgtatgaaca cacatgcttg tcggaacgct ttctcggcgt ttgtcccttg 2700 gctctcatct cccccattcc tgtgcctact ttgcctgagt tcttctaccc ccgcagttgc 2760 cagccacatt gggagtctgt ttgttccaat gggttgagct gtctttgtcg tggagatctg 2820 gaactttgca catgtcacta ctggggaggt gttcctgctc tagcttccac gatgaggcgc 2880 cctctttacc tatcctctca atcactactc ttcttgaagc actattattt attcttccgc 2940 tgtctgcctg cagcagtact actgtcaaca tagtgtaaat ggttctcaaa agcttaccag 3000 tgtggacttg gtgttagcca cgctgtttac tcatacagta cgtgtcctgt ttttaaaata 3060 tacaattatt cttaaaaata aattaaaatc tgtatactta catttcaaaa agaaaaaaaa 3120 aaaaaaaa 3128 <210> 2 <211> 823 <212> DNA <213> Homo sapiens <400> 2 aaacgcgggc gggcgggccc gcagtcctgc agttgcagtc gtgttctccg agttcctgtc 60 tctctgccaa cgccgcccgg atggcttccc aaaaccgcga cccagccgcc actagcgtcg 120 ccgccgcccg taaaggagct gagccgagcg ggggcgccgc ccggggtccg gtgggcaaaa 180 ggctacagca ggagctgatg accctcatga tgtctggcga taaagggatt tctgccttcc 240 ctgaatcaga caaccttttc aaatgggtag ggaccatcca tggagcagct ggaacagtat 300 atgaagacct gaggtataag ctctcgctag agttccccag tggctaccct tacaatgcgc 360 ccacagtgaa gttcctcacg ccctgctatc accccaacgt ggacacccag ggtaacatat 420 gcctggacat cctgaaggaa aagtggtctg ccctgtatga tgtcaggacc attctgctct 480 ccatccagag ccttctagga gaacccaaca ttgatagtcc cttgaacaca catgctgccg 540 agctctggaa aaaccccaca gcttttaaga agtacctgca agaaacctac tcaaagcagg 600 tcaccagcca ggagccctga cccaggctgc ccagcctgtc cttgtgtcgt ctttttaatt 660 tttccttaga tggtctgtcc tttttgtgat ttctgtatag gactctttat cttgagctgt 720 ggtatttttg ttttgttttt gtcttttaaa ttaagcctcg gttgagccct tgtatattaa 780 ataaatgcat ttttgtcctt ttttagacaa aaaaaaaaaa aaa 823 <210> 3 <211> 1530 <212> DNA <213> Homo sapiens <400> 3 aatcctggaa caaggctaca gcgtcgaaga tccccagcgc tgcgggctcg gagagcagtc 60 ctaacggcgc ctcgtacgct agtgtcctcc cttttcagtc cgcgtccctc cctgggccgg 120 gctggcactc ttgccttccc cgtccctcat ggcgctgctc cgacgcccga cggtgtccag 180 tgatttggag aatattgaca caggagttaa ttctaaagtt aagagtcatg tgactattag 240 gcgaactgtt ttagaagaaa ttggaaatag agttacaacc agagcagcac aagtagctaa 300 gaaagctcag aacaccaaag ttccagttca acccaccaaa acaacaaatg tcaacaaaca 360 actgaaacct actgcttctg tcaaaccagt acagatggaa aagttggctc caaagggtcc 420 ttctcccaca cctgaggatg tctccatgaa ggaagagaat ctctgccaag ctttttctga 480 tgccttgctc tgcaaaatcg aggacattga taacgaagat tgggagaacc ctcagctctg 540 cagtgactac gttaaggata tctatcagta tctcaggcag ctggaggttt tgcagtccat 600 aaacccacat ttcttagatg gaagagatat aaatggacgc atgcgtgcca tcctagtgga 660 ttggctggta caagtccact ccaagtttag gcttctgcag gagactctgt acatgtgcgt 720 tggcattatg gatcgatttt tacaggttca gccagtttcc cggaagaagc ttcaattagt 780 tgggattact gctctgctct tggcttccaa gtatgaggag atgttttctc caaatattga 840 agactttgtt tacatcacag acaatgctta taccagttcc caaatccgag aaatggaaac 900 tctaattttg aaagaattga aatttgagtt gggtcgaccc ttgccactac acttcttaag 960 gcgagcatca aaagccgggg aggttgatgt tgaacagcac actttagcca agtatttgat 1020 ggagctgact ctcatcgact atgatatggt gcattatcat ccttctaagg tagcagcagc 1080 tgcttcctgc ttgtctcaga aggttctagg acaaggaaaa tggaacttaa agcagcagta 1140 ttacacagga tacacagaga atgaagtatt ggaagtcatg cagcacatgg ccaagaatgt 1200 ggtgaaagta aatgaaaact taactaaatt catcgccatc aagaataagt atgcaagcag 1260 caaactcctg aagatcagca tgatccctca gctgaactca aaagccgtca aagaccttgc 1320 ctccccactg ataggaaggt cctaggctgc cgtgggccct ggggatgtgt gcttcattgt 1380 gccctttttc ttattggttt agaactcttg attttgtaca tagtcctctg gtctatctca 1440 tgaaacctct tctcagacca gttttctaaa catatattga ggaaaaataa agcgattggt 1500 ttttcttaag gtaaaaaaaa aaaaaaaaaa 1530 <210> 4 <211> 1697 <212> DNA <213> Homo sapiens <400> 4 gaggcgtaag ccaggcgtgt taaagccggt cggaactgct ccggagggca cgggctccgt 60 aggcaccaac tgcaaggacc cctccccctg cgggcgctcc catggcacag ttcgcgttcg 120 agagtgacct gcactcgctg cttcagctgg atgcacccat ccccaatgca ccccctgcgc 180 gctggcagcg caaagccaag gaagccgcag gcccggcccc ctcacccatg cgggccgcca 240 accgatccca cagcgccggc aggactccgg gccgaactcc tggcaaatcc agttccaagg 300 ttcagaccac tcctagcaaa cctggcggtg accgctatat cccccatcgc agtgctgccc 360 agatggaggt ggccagcttc ctcctgagca aggagaacca gcctgaaaac agccagacgc 420 ccaccaagaa ggaacatcag aaagcctggg ctttgaacct gaacggtttt gatgtagagg 480 aagccaagat ccttcggctc agtggaaaac cacaaaatgc gccagagggt tatcagaaca 540 gactgaaagt actctacagc caaaaggcca ctcctggctc cagccggaag acctgccgtt 600 acattccttc cctgccagac cgtatcctgg atgcgcctga aatccgaaat gactattacc 660 tgaaccttgt ggattggagt tctgggaatg tactggccgt ggcactggac aacagtgtgt 720 acctgtggag tgcaagctct ggtgacatcc tgcagctttt gcaaatggag cagcctgggg 780 aatatatatc ctctgtggcc tggatcaaag agggcaacta cttggctgtg ggcaccagca 840 gtgctgaggt gcagctatgg gatgtgcagc agcagaaacg gcttcgaaat atgaccagtc 900 actctgcccg agtgggctcc ctaagctgga acagctatat cctgtccagt ggttcacgtt 960 ctggccacat ccaccaccat gatgttcggg tagcagaaca ccatgtggcc acactgagtg 1020 gccacagcca ggaagtgtgt gggctgcgct gggccccaga tggacgacat ttggccagtg 1080 gtggtaatga taacttggtc aatgtgtggc ctagtgctcc tggagagggt ggctgggttc 1140 ctctgcagac attcacccag catcaagggg ctgtcaaggc cgtagcatgg tgtccctggc 1200 agtccaatgt cctggcaaca ggagggggca ccagtgatcg acacattcgc atctggaatg 1260 tgtgctctgg ggcctgtctg agtgccgtgg atgcccattc ccaggtgtgc tccatcctct 1320 ggtctcccca ttacaaggag ctcatctcag gccatggctt tgcacagaac cagctagtta 1380 tttggaagta cccaaccatg gccaaggtgg ctgaactcaa aggtcacaca tcccgggtcc 1440 tgagtctgac catgagccca gatggggcca cagtggcatc cgcagcagca gatgagaccc 1500 tgaggctatg gcgctgtttt gagttggacc ctgcgcggcg gcgggagcgg gagaaggcca 1560 gtgcagccaa aagcagcctc atccaccaag gcatccgctg aagaccaacc catcacctca 1620 gttgtttttt atttttctaa taaagtcatg tctcccttca tgtttttttt ttaaaaaaaa 1680 aaaaaaaaaa aaaaaaa 1697 <210> 5 <211> 771 <212> DNA <213> Homo sapiens <400> 5 agagaagcag acatcttcta gttcctcccc cactctcctc tttccggtac ctgtgagtca 60 gctaggggag ggcagctctc acccaggctg atagttcggt gacctggctt tatctactgg 120 atgagttccg ctgggagatg gaacatagca cgtttctctc tggcctggta ctggctaccc 180 ttctctcgca agtgagcccc ttcaagatac ctatagagga acttgaggac agagtgtttg 240 tgaattgcaa taccagcatc acatgggtag agggaacggt gggaacactg ctctcagaca 300 ttacaagact ggacctggga aaacgcatcc tggacccacg aggaatatat aggtgtaatg 360 ggacagatat atacaaggac aaagaatcta ccgtgcaagt tcattatcga atgtgccaga 420 gctgtgtgga gctggatcca gccaccgtgg ctggcatcat tgtcactgat gtcattgcca 480 ctctgctcct tgctttggga gtcttctgct ttgctggaca tgagactgga aggctgtctg 540 gggctgccga cacacaagct ctgttgagga atgaccaggt ctatcagccc ctccgagatc 600 gagatgatgc tcagtacagc caccttggag gaaactgggc tcggaacaag tgaacctgag 660 actggtggct tctagaagca gccattacca actgtacctt cccttcttgc tcagccaata 720 aatatatcct ctttcactca gaaaaaaaaa aaaaaaaaaa aaaaaaaaaa a 771 <210> 6 <211> 1270 <212> DNA <213> Homo sapiens <400> 6 attttctaaa agggacagag agcaccctgc tacatttcct aatcaagaag ttggcgtgca 60 gctgggagag ctagactaag ttggtcatga tgcagaagct actcaaatgc agtcggcttg 120 tcctggctct tgccctcatc ctggttctgg aatcctcagt tcaaggttat cctacgcgga 180 gagccaggta ccaatgggtg cgctgcaatc cagacagtaa ttctgcaaac tgccttgaag 240 aaaaaggacc aatgttcgaa ctacttccag gtgaatccaa caagatcccc cgtctgagga 300 ctgacctttt tccaaagacg agaatccagg acttgaatcg tatcttccca ctttctgagg 360 actactctgg atcaggcttc ggctccggct ccggctctgg atcaggatct gggagtggct 420 tcctaacgga aatggaacag gattaccaac tagtagacga aagtgatgct ttccatgaca 480 accttaggtc tcttgacagg aatctgccct cagacagcca ggacttgggt caacatggat 540 tagaagagga ttttatgtta taaaagagga ttttcccacc ttgacaccag gcaatgtagt 600 tagcatattt tatgtaccat ggttatatga ttaatcttgg gacaaagaat tttatagaaa 660 tttttaaaca tctgaaaaag aagcttaagt tttatcatcc ttttttttct catgaattct 720 taaaggatta tgctttaatg ctgttatcta ttttattgtt cttgaaaata cctgcatttt 780 ttggtatcat gttcaaccaa catcattatg aaattaatta gattcccatg gccataaaat 840 ggctttaaag aatatatata tatttttaaa gtagcttgag aagcaaattg gcaggtaata 900 tttcatacct aaattaagac tctgacttgg attgtgaatt ataatgatat gccccttttc 960 ttataaaaac aaaaaaaaaa ataatgaaac acagtgaatt tgtagagtgg gggtatttga 1020 catattttac agggtggagt gtactatata ctattacctt tgaatgtgtt tgcagagcta 1080 gtggatgtgt ttgtctacaa gtatgattgc tgttacataa caccccaaat taactcccaa 1140 attaaaacac agttgtgctg tcaatacctc atactgcttt accttttttt cctggatatc 1200 tgtgtatttt caaatgttac tatatattaa agcagaaata taaccaaagg ttaaaaaaaa 1260 aaaaaaaaaa 1270 <210> 7 <211> 523 <212> DNA <213> Homo sapiens <400> 7 ctcctggttc aaaagcagct aaaccaaaag aagcctccag acagccctga gatcacctaa 60 aaagctgcta ccaagacagc cacgaagatc ctaccaaaat gaagcgcttc ctcttcctcc 120 tactcaccat cagcctcctg gttatggtac agatacaaac tggactctca ggacaaaacg 180 acaccagcca aaccagcagc ccctcagcat ccagcaacat aagcggaggc attttccttt 240 tcttcgtggc caatgccata atccacctct tctgcttcag ttgaggtgac acgtctcagc 300 cttagccctg tgccccctga aacagctgcc accatcactc gcaagagaat cccctccatc 360 tttgggaggg gttgatgcca gacatcacca ggttgtagaa gttgacaggc agtgccatgg 420 gggcaacagc caaaataggg gggtaatgat gtaggggcca agcagtgccc agctgggggt 480 caataaagtt acccttgtac ttgcaaaaaa aaaaaaaaaa aaa 523 <210> 8 <211> 684 <212> DNA <213> Homo sapiens <400> 8 cattcccagc ctcacatcac tcacaccttg catttcaccc ctgcatccca gtcgccctgc 60 agcctcacac agatcctgca cacacccaga cagctggcgc tcacacattc accgttggcc 120 tgcctctgtt caccctccat ggccctgcta ctggccctca gcctgctggt tctctggact 180 tccccagccc caactctgag tggcaccaat gatgctgaag actgctgcct gtctgtgacc 240 cagaaaccca tccctgggta catcgtgagg aacttccact accttctcat caaggatggc 300 tgcagggtgc ctgctgtagt gttcaccaca ctgaggggcc gccagctctg tgcaccccca 360 gaccagccct gggtagaacg catcatccag agactgcaga ggacctcagc caagatgaag 420 cgccgcagca gttaacctat gaccgtgcag agggagcccg gagtccgagt caagcattgt 480 gaattattac ctaacctggg gaaccgagga ccagaaggaa ggaccaggct tccagctcct 540 ctgcaccaga cctgaccagc caggacaggg cctggggtgt gtgtgagtgt gagtgtgagc 600 gagagggtga gtgtggtcag agtaaagctg ctccaccccc agattgcaat gctaccaata 660 aagccgcctg gtgtttacaa ctaa 684 <210> 9 <211> 293 <212> DNA <213> Homo sapiens <400> 9 ggtgctgtcg tctctcaaca tccgagctgg gttatctgta agagtggaac ctctgtgaag 60 atcgagtgcc gttccctgga ctttcaggcc acaactatgt tttggtatcg tcagttcccg 120 aaacagagtc tcatgctgat ggcaacttcc aatgagggct ccaaggccac atacgagcaa 180 ggcgtcgaga aggacaagtt tctcatcaac catgcaagcc tgaccttgtc cactctgaca 240 gtgaccagtg cccatcctga agacagcagc ttctacatct gcagtgctag aga 293 <110> REFERENCE BIOLABS Inc. <120> Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients <160> 9 <170> KopatentIn 1.71 <210> 1 <211> 3128 <212> DNA <213> Homo sapiens <400> 1 gcttcgcccc gtggcgcggt ttgaaatttt gcggggctca acggctcgcg gagcggctac 60 gcggagtgac atcgccggtg tttgcgggtg gttgttgctc tcggggccgt gtggagtagg 120 tctggacctg gactcacggc tgcttggagc gtccgccatg aggagaagtg aggtgctggc 180 ggaggagtcc atagtatgtc tgcagaaagc cctaaatcac cttcgggaaa tatgggagct 240 aattgggatt ccagaggacc agcggttaca aagaactgag gtggtaaaga agcatatcaa 300 ggaactcctg gatatgatga ttgctgaaga ggaaagcctg aaggaaagac tcatcaaaag 360 catatccgtc tgtcagaaag agctgaacac tctgtgcagc gagttacatg ttgagccatt 420 tcaggaagaa ggagagacga ccatcttgca actagaaaaa gatttgcgca cccaagtgga 480 attgatgcga aaacagaaaa aggagagaaa acaggaactg aagctacttc aagagcaaga 540 tcaagaactg tgcgaaattc tttgtatgcc ccactatgat attgacagtg cctcagtgcc 600 cagcttagaa gagctgaacc agttcaggca acatgtgaca actttgaggg aaacaaaggc 660 ttctaggcgt gaggagtttg tcagtataaa gagacagatc atactgtgta tggaagaatt 720 agaccacacc ccagacacaa gctttgaaag agatgtggtg tgtgaagacg aagatgcctt 780 ttgtttgtct ttggagaata ttgcaacact acaaaagttg ctacggcagc tggaaatgca 840 gaaatcacaa aatgaagcag tgtgtgaggg gctgcgtact caaatccgag agctctggga 900 caggttgcaa atacctgaag aagaaagaga agctgtggcc accattatgt ctgggtcaaa 960 ggccaaggtc cggaaagcgc tgcaattaga agtggatcgg ttggaagaac tgaaaatgca 1020 aaacatgaag aaagtgattg aggcaattcg agtggagctg gttcagtact gggaccagtg 1080 cttttatagc caggagcaga gacaagcttt tgcccctttc tgtgctgagg actacacaga 1140 aagtctgctc cagctccacg atgctgagat tgtgcggtta aaaaactact atgaagttca 1200 caaggaactc tttgaaggtg tccagaagtg ggaagaaacc tggaggcttt tcttagagtt 1260 tgagagaaaa gcttcagatc caaatcgatt tacaaaccga ggaggaaatc ttctaaaaga 1320 agaaaaacaa cgagccaagc tccagaaaat gctgcccaag ctggaagaag agttgaaggc 1380 acgaattgaa ttgtgggaac aggaacattc aaaggcattt atggtgaatg ggcagaaatt 1440 catggagtat gtggcagaac aatgggagat gcatcgattg gagaaagaga gagccaagca 1500 ggaaagacaa ctgaagaaca aaaaacagac agagacagag atgctgtatg gcagcgctcc 1560 tcgaacacct agcaagcggc gaggactggc tcccaataca ccgggcaaag cacgtaagct 1620 gaacactacc accatgtcca atgctacggc caatagtagc attcggccta tctttggagg 1680 gacagtctac cactcccccg tgtctcgact tcctccttct ggcagcaagc cagtcgctgc 1740 ttccacctgt tcagggaaga aaacaccccg tactggcagg catggagcca acaaggagaa 1800 cctggagctc aacggcagca tcctgagtgg tgggtaccct ggctcggccc ccctccagcg 1860 caacttcagc attaattctg ttgccagcac ctattctgag tttgcgaagg atccgtccct 1920 ctctgacagt tccactgttg ggcttcagcg agaactttca aaggcttcca aatctgatgc 1980 tacttctgga atcctcaatt caaccaacat ccagtcctga gaagccctga tcagtcaacc 2040 agctgtggct tcctgtgcct agactggacc taattatatg ggggtgactt tagtttttct 2100 tcagcttagg cgtgcttgaa accttggcca ggttccatga ccatgggcct aacttaaaga 2160 tgtgaatgag tgttacagtt gaaagcccat cataggttta gtggtcctag gagacttggt 2220 tttgacttat atacatgaaa agtttatggc aagaagtgca aattttagca tatggggcct 2280 gacttctcta ccacataatt ctacttgctg aagcatgatc aaagcttgtt ttatttcacc 2340 actgtaggaa aatgattgac tatgcccatc cctgggggta attttggcat gtatacctgt 2400 aactagtaat taacatcttt tttgtttagg catgttcaat taatgctgta gctatcatag 2460 ctttgctctt acctgaagcc ttgtccccac cacacaggac agccttcctc ctgaagagaa 2520 tgtctttgtg tgtccgaagt tgagatggcc tgccctactg ccaaagaggt gacaggaagg 2580 ctgggagcag ctttgttaaa ttgtgttcag ttctgttaca cagtgcattg ccctttgttg 2640 ggggtatgca tgtatgaaca cacatgcttg tcggaacgct ttctcggcgt ttgtcccttg 2700 gctctcatct cccccattcc tgtgcctact ttgcctgagt tcttctaccc ccgcagttgc 2760 cagccacatt gggagtctgt ttgttccaat gggttgagct gtctttgtcg tggagatctg 2820 gaactttgca catgtcacta ctggggaggt gttcctgctc tagcttccac gatgaggcgc 2880 cctctttacc tatcctctca atcactactc ttcttgaagc actattattt attcttccgc 2940 tgtctgcctg cagcagtact actgtcaaca tagtgtaaat ggttctcaaa agcttaccag 3000 tgtggacttg gtgttagcca cgctgtttac tcatacagta cgtgtcctgt ttttaaaata 3060 tacaattatt cttaaaaata aattaaaatc tgtatactta catttcaaaa agaaaaaaaa 3120 aaaaaaaa 3128 <210> 2 <211> 823 <212> DNA <213> Homo sapiens <400> 2 aaacgcgggc gggcgggccc gcagtcctgc agttgcagtc gtgttctccg agttcctgtc 60 tctctgccaa cgccgcccgg atggcttccc aaaaccgcga cccagccgcc actagcgtcg 120 ccgccgcccg taaaggagct gagccgagcg ggggcgccgc ccggggtccg gtgggcaaaa 180 ggctacagca ggagctgatg accctcatga tgtctggcga taaagggatt tctgccttcc 240 ctgaatcaga caaccttttc aaatgggtag ggaccatcca tggagcagct ggaacagtat 300 atgaagacct gaggtataag ctctcgctag agttccccag tggctaccct tacaatgcgc 360 ccacagtgaa gttcctcacg ccctgctatc accccaacgt ggacacccag ggtaacatat 420 gcctggacat cctgaaggaa aagtggtctg ccctgtatga tgtcaggacc attctgctct 480 ccatccagag ccttctagga gaacccaaca ttgatagtcc cttgaacaca catgctgccg 540 agctctggaa aaaccccaca gcttttaaga agtacctgca agaaacctac tcaaagcagg 600 tcaccagcca ggagccctga cccaggctgc ccagcctgtc cttgtgtcgt ctttttaatt 660 tttccttaga tggtctgtcc tttttgtgat ttctgtatag gactctttat cttgagctgt 720 ggtatttttg ttttgttttt gtcttttaaa ttaagcctcg gttgagccct tgtatattaa 780 ataaatgcat ttttgtcctt ttttagacaa aaaaaaaaaa aaa 823 <210> 3 <211> 1530 <212> DNA <213> Homo sapiens <400> 3 aatcctggaa caaggctaca gcgtcgaaga tccccagcgc tgcgggctcg gagagcagtc 60 ctaacggcgc ctcgtacgct agtgtcctcc cttttcagtc cgcgtccctc cctgggccgg 120 gctggcactc ttgccttccc cgtccctcat ggcgctgctc cgacgcccga cggtgtccag 180 tgatttggag aatattgaca caggagttaa ttctaaagtt aagagtcatg tgactattag 240 gcgaactgtt ttagaagaaa ttggaaatag agttacaacc agagcagcac aagtagctaa 300 gaaagctcag aacaccaaag ttccagttca acccaccaaa acaacaaatg tcaacaaaca 360 actgaaacct actgcttctg tcaaaccagt acagatggaa aagttggctc caaagggtcc 420 ttctcccaca cctgaggatg tctccatgaa ggaagagaat ctctgccaag ctttttctga 480 tgccttgctc tgcaaaatcg aggacattga taacgaagat tgggagaacc ctcagctctg 540 cagtgactac gttaaggata tctatcagta tctcaggcag ctggaggttt tgcagtccat 600 aaacccacat ttcttagatg gaagagatat aaatggacgc atgcgtgcca tcctagtgga 660 ttggctggta caagtccact ccaagtttag gcttctgcag gagactctgt acatgtgcgt 720 tggcattatg gatcgatttt tacaggttca gccagtttcc cggaagaagc ttcaattagt 780 tgggattact gctctgctct tggcttccaa gtatgaggag atgttttctc caaatattga 840 agactttgtt tacatcacag acaatgctta taccagttcc caaatccgag aaatggaaac 900 tctaattttg aaagaattga aatttgagtt gggtcgaccc ttgccactac acttcttaag 960 gcgagcatca aaagccgggg aggttgatgt tgaacagcac actttagcca agtatttgat 1020 ggagctgact ctcatcgact atgatatggt gcattatcat ccttctaagg tagcagcagc 1080 tgcttcctgc ttgtctcaga aggttctagg acaaggaaaa tggaacttaa agcagcagta 1140 ttacacagga tacacagaga atgaagtatt ggaagtcatg cagcacatgg ccaagaatgt 1200 ggtgaaagta aatgaaaact taactaaatt catcgccatc aagaataagt atgcaagcag 1260 caaactcctg aagatcagca tgatccctca gctgaactca aaagccgtca aagaccttgc 1320 ctccccactg ataggaaggt cctaggctgc cgtgggccct ggggatgtgt gcttcattgt 1380 gccctttttc ttattggttt agaactcttg attttgtaca tagtcctctg gtctatctca 1440 tgaaacctct tctcagacca gttttctaaa catatattga ggaaaaataa agcgattggt 1500 ttttcttaag gtaaaaaaaa aaaaaaaaaa 1530 <210> 4 <211> 1697 <212> DNA <213> Homo sapiens <400> 4 gaggcgtaag ccaggcgtgt taaagccggt cggaactgct ccggagggca cgggctccgt 60 aggcaccaac tgcaaggacc cctccccctg cgggcgctcc catggcacag ttcgcgttcg 120 agagtgacct gcactcgctg cttcagctgg atgcacccat ccccaatgca ccccctgcgc 180 gctggcagcg caaagccaag gaagccgcag gcccggcccc ctcacccatg cgggccgcca 240 accgatccca cagcgccggc aggactccgg gccgaactcc tggcaaatcc agttccaagg 300 ttcagaccac tcctagcaaa cctggcggtg accgctatat cccccatcgc agtgctgccc 360 agatggaggt ggccagcttc ctcctgagca aggagaacca gcctgaaaac agccagacgc 420 ccaccaagaa ggaacatcag aaagcctggg ctttgaacct gaacggtttt gatgtagagg 480 aagccaagat ccttcggctc agtggaaaac cacaaaatgc gccagagggt tatcagaaca 540 gactgaaagt actctacagc caaaaggcca ctcctggctc cagccggaag acctgccgtt 600 acattccttc cctgccagac cgtatcctgg atgcgcctga aatccgaaat gactattacc 660 tgaaccttgt ggattggagt tctgggaatg tactggccgt ggcactggac aacagtgtgt 720 acctgtggag tgcaagctct ggtgacatcc tgcagctttt gcaaatggag cagcctgggg 780 aatatatatc ctctgtggcc tggatcaaag agggcaacta cttggctgtg ggcaccagca 840 gtgctgaggt gcagctatgg gatgtgcagc agcagaaacg gcttcgaaat atgaccagtc 900 actctgcccg agtgggctcc ctaagctgga acagctatat cctgtccagt ggttcacgtt 960 ctggccacat ccaccaccat gatgttcggg tagcagaaca ccatgtggcc acactgagtg 1020 gccacagcca ggaagtgtgt gggctgcgct gggccccaga tggacgacat ttggccagtg 1080 gtggtaatga taacttggtc aatgtgtggc ctagtgctcc tggagagggt ggctgggttc 1140 ctctgcagac attcacccag catcaagggg ctgtcaaggc cgtagcatgg tgtccctggc 1200 agtccaatgt cctggcaaca ggagggggca ccagtgatcg acacattcgc atctggaatg 1260 tgtgctctgg ggcctgtctg agtgccgtgg atgcccattc ccaggtgtgc tccatcctct 1320 ggtctcccca ttacaaggag ctcatctcag gccatggctt tgcacagaac cagctagtta 1380 tttggaagta cccaaccatg gccaaggtgg ctgaactcaa aggtcacaca tcccgggtcc 1440 tgagtctgac catgagccca gatggggcca cagtggcatc cgcagcagca gatgagaccc 1500 tgaggctatg gcgctgtttt gagttggacc ctgcgcggcg gcgggagcgg gagaaggcca 1560 gtgcagccaa aagcagcctc atccaccaag gcatccgctg aagaccaacc catcacctca 1620 gttgtttttt atttttctaa taaagtcatg tctcccttca tgtttttttt ttaaaaaaaa 1680 aaaaaaaaaa aaaaaaa 1697 <210> 5 <211> 771 <212> DNA <213> Homo sapiens <400> 5 agagaagcag acatcttcta gttcctcccc cactctcctc tttccggtac ctgtgagtca 60 gctaggggag ggcagctctc acccaggctg atagttcggt gacctggctt tatctactgg 120 atgagttccg ctgggagatg gaacatagca cgtttctctc tggcctggta ctggctaccc 180 ttctctcgca agtgagcccc ttcaagatac ctatagagga acttgaggac agagtgtttg 240 tgaattgcaa taccagcatc acatgggtag agggaacggt gggaacactg ctctcagaca 300 ttacaagact ggacctggga aaacgcatcc tggacccacg aggaatatat aggtgtaatg 360 ggacagatat atacaaggac aaagaatcta ccgtgcaagt tcattatcga atgtgccaga 420 gctgtgtgga gctggatcca gccaccgtgg ctggcatcat tgtcactgat gtcattgcca 480 ctctgctcct tgctttggga gtcttctgct ttgctggaca tgagactgga aggctgtctg 540 gggctgccga cacacaagct ctgttgagga atgaccaggt ctatcagccc ctccgagatc 600 gagatgatgc tcagtacagc caccttggag gaaactgggc tcggaacaag tgaacctgag 660 actggtggct tctagaagca gccattacca actgtacctt cccttcttgc tcagccaata 720 aatatatcct ctttcactca gaaaaaaaaa aaaaaaaaaa aaaaaaaaaa a 771 <210> 6 <211> 1270 <212> DNA <213> Homo sapiens <400> 6 attttctaaa agggacagag agcaccctgc tacatttcct aatcaagaag ttggcgtgca 60 gctgggagag ctagactaag ttggtcatga tgcagaagct actcaaatgc agtcggcttg 120 tcctggctct tgccctcatc ctggttctgg aatcctcagt tcaaggttat cctacgcgga 180 gagccaggta ccaatgggtg cgctgcaatc cagacagtaa ttctgcaaac tgccttgaag 240 aaaaaggacc aatgttcgaa ctacttccag gtgaatccaa caagatcccc cgtctgagga 300 ctgacctttt tccaaagacg agaatccagg acttgaatcg tatcttccca ctttctgagg 360 actactctgg atcaggcttc ggctccggct ccggctctgg atcaggatct gggagtggct 420 tcctaacgga aatggaacag gattaccaac tagtagacga aagtgatgct ttccatgaca 480 accttaggtc tcttgacagg aatctgccct cagacagcca ggacttgggt caacatggat 540 tagaagagga ttttatgtta taaaagagga ttttcccacc ttgacaccag gcaatgtagt 600 tagcatattt tatgtaccat ggttatatga ttaatcttgg gacaaagaat tttatagaaa 660 tttttaaaca tctgaaaaag aagcttaagt tttatcatcc ttttttttct catgaattct 720 taaaggatta tgctttaatg ctgttatcta ttttattgtt cttgaaaata cctgcatttt 780 ttggtatcat gttcaaccaa catcattatg aaattaatta gattcccatg gccataaaat 840 ggctttaaag aatatatata tatttttaaa gtagcttgag aagcaaattg gcaggtaata 900 tttcatacct aaattaagac tctgacttgg attgtgaatt ataatgatat gccccttttc 960 ttataaaaac aaaaaaaaaa ataatgaaac acagtgaatt tgtagagtgg gggtatttga 1020 catattttac agggtggagt gtactatata ctattacctt tgaatgtgtt tgcagagcta 1080 gtggatgtgt ttgtctacaa gtatgattgc tgttacataa caccccaaat taactcccaa 1140 attaaaacac agttgtgctg tcaatacctc atactgcttt accttttttt cctggatatc 1200 tgtgtatttt caaatgttac tatatattaa agcagaaata taaccaaagg ttaaaaaaaa 1260 aaaaaaaaaa 1270 <210> 7 <211> 523 <212> DNA <213> Homo sapiens <400> 7 ctcctggttc aaaagcagct aaaccaaaag aagcctccag acagccctga gatcacctaa 60 aaagctgcta ccaagacagc cacgaagatc ctaccaaaat gaagcgcttc ctcttcctcc 120 tactcaccat cagcctcctg gttatggtac agatacaaac tggactctca ggacaaaacg 180 acaccagcca aaccagcagc ccctcagcat ccagcaacat aagcggaggc attttccttt 240 tcttcgtggc caatgccata atccacctct tctgcttcag ttgaggtgac acgtctcagc 300 cttagccctg tgccccctga aacagctgcc accatcactc gcaagagaat cccctccatc 360 tttgggaggg gttgatgcca gacatcacca ggttgtagaa gttgacaggc agtgccatgg 420 gggcaacagc caaaataggg gggtaatgat gtaggggcca agcagtgccc agctgggggt 480 caataaagtt acccttgtac ttgcaaaaaa aaaaaaaaaa aaa 523 <210> 8 <211> 684 <212> DNA <213> Homo sapiens <400> 8 cattcccagc ctcacatcac tcacaccttg catttcaccc ctgcatccca gtcgccctgc 60 agcctcacac agatcctgca cacacccaga cagctggcgc tcacacattc accgttggcc 120 tgcctctgtt caccctccat ggccctgcta ctggccctca gcctgctggt tctctggact 180 tccccagccc caactctgag tggcaccaat gatgctgaag actgctgcct gtctgtgacc 240 cagaaaccca tccctgggta catcgtgagg aacttccact accttctcat caaggatggc 300 tgcagggtgc ctgctgtagt gttcaccaca ctgaggggcc gccagctctg tgcaccccca 360 gaccagccct gggtagaacg catcatccag agactgcaga ggacctcagc caagatgaag 420 cgccgcagca gttaacctat gaccgtgcag agggagcccg gagtccgagt caagcattgt 480 gaattattac ctaacctggg gaaccgagga ccagaaggaa ggaccaggct tccagctcct 540 ctgcaccaga cctgaccagc caggacaggg cctggggtgt gtgtgagtgt gagtgtgagc 600 gagagggtga gtgtggtcag agtaaagctg ctccaccccc agattgcaat gctaccaata 660 aagccgcctg gtgtttacaa ctaa 684 <210> 9 <211> 293 <212> DNA <213> Homo sapiens <400> 9 ggtgctgtcg tctctcaaca tccgagctgg gttatctgta agagtggaac ctctgtgaag 60 atcgagtgcc gttccctgga ctttcaggcc acaactatgt tttggtatcg tcagttcccg 120 aaacagagtc tcatgctgat ggcaacttcc aatgagggct ccaaggccac atacgagcaa 180 ggcgtcgaga aggacaagtt tctcatcaac catgcaagcc tgaccttgtc cactctgaca 240 gtgaccagtg cccatcctga agacagcagc ttctacatct gcagtgctag aga 293

Claims (10)

삭제delete 삭제delete 삭제delete 삭제delete 삭제delete 삭제delete 서열목록 제 9 서열의 뉴클레오타이드 서열에 특이적으로 결합하는 프라이머 또는 프로브를 포함하는 유방암 환자의 전이 위험도 예측용 키트.
A kit for predicting the risk of metastasis in a breast cancer patient, comprising a primer or a probe that specifically binds to a nucleotide sequence of SEQ ID NO: 9.
삭제delete 서열목록 제 9 서열의 뉴클레오타이드 서열 또는 상기 뉴클레오타이드 서열에 상보적인 서열 또는 상기 뉴클레오타이드 서열의 단편을 포함하는 유방암 환자의 전이 위험도 예측용 조성물.A composition for predicting the risk of metastasis in a breast cancer patient comprising a nucleotide sequence of SEQ ID NO: 9 or a sequence complementary to said nucleotide sequence or a fragment of said nucleotide sequence. 삭제delete
KR1020130009394A 2013-01-28 2013-01-28 Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients KR101725985B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020130009394A KR101725985B1 (en) 2013-01-28 2013-01-28 Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020130009394A KR101725985B1 (en) 2013-01-28 2013-01-28 Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
KR1020110000521A Division KR101287600B1 (en) 2011-01-04 2011-01-04 Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients

Publications (2)

Publication Number Publication Date
KR20130023312A KR20130023312A (en) 2013-03-07
KR101725985B1 true KR101725985B1 (en) 2017-04-13

Family

ID=48175624

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020130009394A KR101725985B1 (en) 2013-01-28 2013-01-28 Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients

Country Status (1)

Country Link
KR (1) KR101725985B1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101950717B1 (en) * 2016-11-23 2019-02-21 주식회사 젠큐릭스 Methods for predicting effectiveness of chemotherapy for breast cancer patients
SG11202111940XA (en) * 2019-05-03 2021-12-30 Dcgen Co Ltd Method of predicting cancer prognosis and composition for same
CN112735529A (en) * 2021-01-18 2021-04-30 中国医学科学院肿瘤医院 Breast cancer prognosis model construction method, application method and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Gastroenterology, Vol. 137, No. 1, pp. 165-175 (2009.03.26.)*

Also Published As

Publication number Publication date
KR20130023312A (en) 2013-03-07

Similar Documents

Publication Publication Date Title
US10428386B2 (en) Gene for predicting the prognosis for early-stage breast cancer, and a method for predicting the prognosis for early-stage breast cancer by using the same
US11913078B2 (en) Method for breast cancer recurrence prediction under endocrine treatment
US20210062275A1 (en) Methods to predict clinical outcome of cancer
US20190249260A1 (en) Method for Using Gene Expression to Determine Prognosis of Prostate Cancer
EP2668296B1 (en) Colon cancer gene expression signatures and methods of use
EP2591126B1 (en) Gene signatures for cancer prognosis
US20120245235A1 (en) Classification of cancers
AU2008294687A1 (en) Methods and tools for prognosis of cancer in ER- patients
US20230323464A1 (en) Method for predicting prognosis of patients having early breast cancer
CA2695814A1 (en) Methods and tools for prognosis of cancer in her2+ patients
KR101725985B1 (en) Prognostic Genes for Early Breast Cancer and Prognostic Model for Early Breast Cancer Patients
KR101748867B1 (en) Automated system for prognosing or predicting early stage breast cancer

Legal Events

Date Code Title Description
A107 Divisional application of patent
A201 Request for examination
AMND Amendment
E902 Notification of reason for refusal
AMND Amendment
E601 Decision to refuse application
AMND Amendment
X701 Decision to grant (after re-examination)
GRNT Written decision to grant