KR102657185B1

KR102657185B1 - Method for detecting cellular senescence using cellular senescence biomarkers GAS2L3 and WDR76 from isolated cells

Info

Publication number: KR102657185B1
Application number: KR1020220155842A
Authority: KR
Inventors: 황소현; 신동준; 민요석; 백광현
Original assignee: 의료법인 성광의료재단; 차의과학대학교 산학협력단
Priority date: 2021-11-18
Filing date: 2022-11-18
Publication date: 2024-04-16
Also published as: KR20230074040A; KR102657190B1; KR102614665B1; KR20230074022A; KR102613068B1; KR102613065B1; KR102657184B1; KR20230074041A; KR20230074024A; KR20230074043A; KR20230074023A; KR20230074026A; KR20230074028A; KR102639818B1; KR20230074046A; KR20230074020A; KR102639826B1; KR102639819B1; KR20230074048A; KR102639822B1

Abstract

기계 학습을 이용한 세포 노화의 마커를 선별하는 방법, 세포 노화 바이오 마커, 및 이를 이용한 세놀리틱 제제를 스크리닝 하는 방법에 관한 것으로, 일 양상에 따른 기계 학습을 이용한 세포 노화의 마커를 선별하는 방법에 의하면, 노화 세포 RNA-seq 데이터에 대한 메타분석을 진행하는 분석 파이프라인을 구축하여 충분한 샘플 수를 확보함으로써 통계적으로 보다 유의한 유전자들을 선별할 수 있고, 기계학습적 방법론을 통해 다양한 변수들 중에서 노화 세포 특징적 마커의 후보군을 찾아낼 수 있는 효과가 있다. 또한, 상기 방법에 의해 도출된 세포 노화의 마커는, 기존의 연구보다 다양한 종류의 세포들에도 세포 노화를 특정할 수 있는 더 유의한 유전자 시그니처로서, 노화 세포를 특정 또는 검출하는데 유용하게 사용될 수 있을 뿐만 아니라, 세놀리틱 제제를 선별하는데 유용하게 사용될 수 있는 효과가 있다. It relates to a method for screening markers of cellular aging using machine learning, cellular aging biomarkers, and a method for screening senolytic agents using the same. A method for screening cellular aging markers using machine learning according to one aspect According to this, by establishing an analysis pipeline that performs meta-analysis on aging cell RNA-seq data and securing a sufficient number of samples, statistically more significant genes can be selected, and through machine learning methodology, aging can be identified among various variables. It has the effect of identifying candidates for cell-specific markers. In addition, the marker of cellular senescence derived by the above method is a more meaningful genetic signature that can specify cellular senescence in various types of cells than existing studies, and can be usefully used to specify or detect senescent cells. In addition, it has an effect that can be useful in selecting senolytic agents.

Description

분리된 세포로부터 세포노화 바이오 마커 GAS2L3 및 WDR76를 이용한 세포 노화를 검출하는 방법{Method for detecting cellular senescence using cellular senescence biomarkers GAS2L3 and WDR76 from isolated cells}Method for detecting cellular senescence using cellular senescence biomarkers GAS2L3 and WDR76 from isolated cells {Method for detecting cellular senescence using cellular senescence biomarkers GAS2L3 and WDR76 from isolated cells}

기계 학습을 이용한 세포 노화의 마커를 선별하는 방법, 세포 노화 바이오 마커, 및 이를 이용한 세놀리틱 제제를 스크리닝 하는 방법에 관한 것이다.It relates to a method of screening markers of cellular aging using machine learning, cellular aging biomarkers, and a method of screening senolytic agents using the same.

세포 노화(Cellular senescence)는 내/외부 자극에 의해 일어나는 세포의 영구적인 세포주기 정지를 의미한다. 세포 노화가 일어난 세포들은 노화 관련 분비표현형(SASP: Senescence Associated Secretory phenotype)을 분비하여 다양한 염증 반응을 일으키고, 그 결과 암을 포함한 나이와 연관된 질병들을 조절하는 것으로 알려져 있다. 이러한 노화 세포의 선택적 제거를 통해 다양한 질병들을 치료하려는 senolysis 기법이 최근 각광을 받고 있다. 실예로, 2020년 MIT가 선정한 10대 기술에 노화 세포 타겟 연구가 지정 되었다. Cellular senescence refers to permanent cell cycle arrest of cells caused by internal/external stimuli. Cells that have undergone cellular senescence are known to secrete a senescence associated secretory phenotype (SASP), causing various inflammatory responses, and as a result, controlling age-related diseases, including cancer. The senolysis technique, which aims to treat various diseases through selective removal of senescent cells, has recently been in the spotlight. For example, research targeting aging cells was designated as one of the top 10 technologies selected by MIT in 2020.

기존의 다양한 연구에서, 세놀리틱 약물을 처리한 마우스의 수명이 증가하고 나이와 연관된 질병들이 감소함이 확인되었다. 그러나 노화 세포의 이질적 특성으로 인해 노화 세포를 특이적으로 선별할 수 있는 마커가 부족한 상황이며 현재까지의 노화 세포 마커들은 노화 세포가 아닌 세포에서도 발현된다. 최근 미국 Mayo clinic에서 개발된 세놀리틱 약물을 포함한 소수의 약물들이 임상을 진행하였으나 대부분 또렷한 효과를 보이지 않거나 부작용이 나타나 진척이 없는 상황이다. In various existing studies, it was confirmed that the lifespan of mice treated with senolytic drugs increased and age-related diseases decreased. However, due to the heterogeneous nature of senescent cells, there is a lack of markers that can specifically select senescent cells, and senescent cell markers to date are also expressed in cells other than senescent cells. Recently, a small number of drugs, including senolytic drugs developed at the Mayo Clinic in the United States, have been clinically tested, but most of them do not show clear effects or cause side effects, making no progress.

한편, 세포 노화는 노화가 일어나는 세포의 종류 및 노화를 유도하는 유도체의 종류에 따라 그 특성이 매우 다양하다고 알려져 있다. 그러나 이에 대한 개별적인 연구는 많이 진행되지 않은 상황이다. 따라서 노화 세포의 종류에 따른 분자적 특성을 파악하여 마커를 발굴하는 접근이 필요한 실정이다. Meanwhile, cellular aging is known to have very diverse characteristics depending on the type of cell in which aging occurs and the type of derivative that induces aging. However, not much individual research has been conducted on this. Therefore, there is a need for an approach to discover markers by identifying the molecular characteristics of each type of senescent cell.

세포 노화는 텔로미어 감소에 의해 일어나는 replicative senescence, 종양유전자의 발현에 의해 유도되는 oncogene induced senescence, 질병의 치료 과정에서 drug 혹은 irradiation에 의해 일어나는 Therapy induced senescence 등 다양한 유도체에 의해 일어난다. 기존 연구들에 따르면 세포 노화는 유도체(inducer type)의 종류보다 세포의 종류에 따라 구별되는 특성이 더 강한 것으로 알려져 있으나, 세포 노화가 유도되는 유도체의 종류에 따라 노화 세포가 분비하는 물질의 상위 경로가 다르다는 보고가 존재하고, 기존 세포 노화 발현 연관 연구들의 경우 실험 당 샘플 수의 제한으로 인해 상대적으로 유의미한 유전자를 선별하지 못하는 단점들을 안고 있기 때문에 이러한 문제들을 보완하여 분석하는 방법이 필요한 실정이다.Cellular aging is caused by various inducers, such as replicative senescence caused by telomere attrition, oncogene induced senescence induced by the expression of oncogenes, and therapy induced senescence caused by drugs or irradiation during the treatment of a disease. According to existing studies, cellular senescence is known to have stronger characteristics depending on the type of cell than the type of inducer type, but the upper path of substances secreted by senescent cells depends on the type of inducer by which cellular senescence is induced. There are reports that the expression of cell aging is different, and existing studies related to cellular aging have the disadvantage of not being able to select relatively significant genes due to limitations in the number of samples per experiment. Therefore, a method of analysis that complements these problems is needed.

일 양상은, 복수의 서로 다른 원인의 노화 세포군 및 대조군 세포의 RNA 시퀀싱 데이터를 포함하는 복수의 서로 다른 실험데이터를 획득하는 단계; 상기 획득된 복수의 서로 다른 실험데이터의 전사체(trasncriptome)를 분석 파이프라인을 통해 정량화하여 서로 다른 실험에서 파생되는 배치 효과(batch effect)를 감소시켜 배치-수정된(batch-corrected) 데이터를 획득하는 단계; 상기 배치-수정된 데이터 중 훈련 셋(training set) 내의 복수의 서로 다른 원인의 노화 세포군 각각에 대해 대조군 세포 대비 차별적으로 발현하는 유전자(differentially expressed genes)를 선별하는 단계; 상기 선별된 유전자 중 복수의 서로 다른 원인의 노화 세포군에서 공통적으로 존재하는 유전자들 선별하는 단계; 및 상기 선별된 유전자의 발현 값을 독립변수로 하는 지도 학습(supervised learning)이 가능한 회귀 모델(regression model)을 이용하여 상기 선별된 유전자 중 세포 노화의 마커를 선별하는 단계를 포함하는 기계적 학습 방법을 이용한 세포 노화의 마커를 선별하는 방법을 제공한다. One aspect includes acquiring a plurality of different experimental data including RNA sequencing data of a plurality of different causes of senescent cell populations and control cells; The transcriptome of the obtained plurality of different experimental data is quantified through an analysis pipeline to reduce the batch effect derived from different experiments to obtain batch-corrected data. steps; Among the batch-corrected data, selecting differentially expressed genes compared to control cells for each of a plurality of senescent cell groups of different causes in a training set; Selecting genes that are commonly present in a plurality of senescent cell populations of different causes among the selected genes; And a mechanical learning method comprising selecting a marker of cellular aging among the selected genes using a regression model capable of supervised learning using the expression value of the selected gene as an independent variable. A method for selecting markers of cellular aging using the present invention is provided.

다른 양상은 상기 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체를 제공한다. Another aspect provides a computer-readable recording medium recording a program for executing the method on a computer.

또 다른 양상은 상기 방법을 수행하는 세포 노화의 마커를 검출하는 장치를 제공한다. Another aspect provides a device for detecting markers of cellular senescence performing the method.

또 다른 양상은 상기 방법에 의해 도출된 세포 노화의 마커를 제공한다. Another aspect provides markers of cellular senescence derived by the above method.

또 다른 양상은 상기 방법에 의해 도출된 세포 노화의 마커를 이용하여 생성된 세포 노화 예측 모델을 제공한다. Another aspect provides a cellular aging prediction model generated using markers of cellular aging derived by the above method.

또 다른 양상은 RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, 및 PXMP2로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정할 수 있는 제제를 포함하는 세포 노화를 검출하기 위한 조성물을 제공한다.Another aspect is the expression of any one or more genes or proteins selected from the group consisting of RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, and PXMP2, or A composition for detecting cellular senescence comprising an agent capable of measuring the level of activity is provided.

또 다른 양상은 EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, 및 H2AC6로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정할 수 있는 제제를 포함하는 세포 노화를 검출하기 위한 조성물을 제공한다. Another aspect is one or more genes or proteins selected from the group consisting of EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, and H2AC6. A composition for detecting cellular senescence comprising an agent capable of measuring expression or activity levels is provided.

또 다른 양상은 GAS2L3, AMOT 및 WDR76로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정할 수 있는 제제를 포함하는 세포 노화를 검출하기 위한 조성물을 제공한다. Another aspect provides a composition for detecting cellular senescence, comprising an agent capable of measuring the expression or activity level of any one or more genes or proteins selected from the group consisting of GAS2L3, AMOT, and WDR76.

또 다른 양상을 상기 조성물을 포함하는 세포 노화를 검출하기 위한 키트를 제공한다. Another aspect provides a kit for detecting cellular aging comprising the composition.

또 다른 양상은 분리된 생물학적 시료로부터 상기 유전자 또는 단백질의 발현 또는 활성 수준을 측정하는 단계를 포함하는 세포 노화를 검출하는 방법을 제공한다. Another aspect provides a method for detecting cellular senescence comprising measuring the expression or activity level of the gene or protein from an isolated biological sample.

또 다른 양상은 상기 유전자 또는 단백질과 피검 물질을 접촉시키는 단계; 및 무처리 대조군에 비하여 상기 유전자 또는 단백질의 발현 또는 활성 수준을 변화시킨 피검 물질을 세놀리틱(senolytic) 약물로 선별하는 단계를 포함하는 세놀리틱 약물을 스크리닝 하는 방법을 제공한다.Another aspect includes contacting the gene or protein with a test substance; and selecting a test substance that has changed the expression or activity level of the gene or protein as a senolytic drug compared to an untreated control group.

일 양상은 세포 노화의 마커를 선별하는 방법을 제공하는 것이다. One aspect is to provide a method for screening for markers of cellular senescence.

본원에서 사용된 용어 "세포"는 개별 세포를 포함함을 의미할 뿐만 아니라, 그가 기원하는 특정한 조직 또는 기관을 의미할 수 있다.As used herein, the term “cell” is meant to include individual cells, but can also mean the specific tissue or organ from which it originates.

상기 세포는 내피 세포, 평활근 세포, 대식 세포, 섬유아세포, 망막 색소 상피 세포, 다른 상피 세포(예를 들면, 폐 상피 세포 또는 신장 상피 세포), 면역 세포(예를 들면, 대식세포), 연골 세포, 또는 줄기세포(예를 들면, 중간엽줄기세포), 또는 신경 세포 (예를 들면, 뉴런)를 포함할 수 있다. The cells include endothelial cells, smooth muscle cells, macrophages, fibroblasts, retinal pigment epithelial cells, other epithelial cells (e.g., lung epithelial cells or renal epithelial cells), immune cells (e.g., macrophages), and chondrocytes. , or stem cells (eg, mesenchymal stem cells), or nerve cells (eg, neurons).

용어 "세포 노화(Cellular senescence)"는 세포 주기에서 나왔거나, 노화와 일치하는 후성유전적 마커를 나타내거나, 또는 노화 세포 마커 (예를 들어 노화-연관된 베타-갈락토시다제, 또는 염증성 시토카인)를 발현하는 세포를 의미할 수 있다. 세포 노화는 부분적인 또는 완전한 것일 수 있다.The term “cellular senescence” refers to epigenetic markers that emerge from the cell cycle, are consistent with aging, or are senescent cellular markers (e.g., senescence-associated beta-galactosidase, or inflammatory cytokines). It may refer to cells that express . Cellular senescence may be partial or complete.

용어 "후성유전체" 또는 "후성유전학"은 세포에서 핵산 (예를 들어, 조작된 핵산)의 발현 또는 게놈 정보를 제어하는 세포 내에서의 변형 및 구조적 변화를 지칭한다. 후성유전체에 대한 변화는 배아 발달, 질환 진행, 및 노화의 과정 동안에 발생하고 이를 유도한다.The term “epigenome” or “epigenetics” refers to modifications and structural changes within a cell that control the expression of nucleic acids (e.g., engineered nucleic acids) or genomic information in the cell. Changes to the epigenome occur and induce during the processes of embryonic development, disease progression, and aging.

용어 "후성유전적 시계"는 나이 추정기 또는 선천적인 생물학적 과정을 의미할 수 있다. 일부 실시양태에서, 나이 추정기는 후성유전적 나이 추정기이다. 예를 들어, 후성유전적 나이 추정기는 수학적 알고리즘과 조합하여 사용될 때 세포, 기관 또는 조직을 비롯한 DNA 공급원의 나이를 추정하기 위해 사용될 수 있는 CpG 디뉴클레오티드의 집합일 수 있다. 일부 실시양태에서, 나이 추정기는 DNA 메틸화-기반 (DNAm) 나이 추정기이다. 일부 실시양태에서, DNAm 나이 추정기는 DNA 메틸화-기반 (DNAm) 나이 (추정된 나이로도 공지됨)와 생활 나이 사이의 피어슨(Pearson) 상관 계수 r을 이용하여 나이 상관관계로서 계산될 수 있다. 일부 실시양태에서, DNA 메틸화-기반 (DNAm) 나이 추정기는 단일-조직 DNA 메틸화-기반 나이 추정기일 수 있다. 일부 실시양태에서, DNA 메틸화-기반 나이 추정기는 다중-조직 DNA 메틸화-기반 나이 추정기일 수 있다. The term “epigenetic clock” can refer to an age estimator or an innate biological process. In some embodiments, the age estimator is an epigenetic age estimator. For example, an epigenetic age estimator may be a set of CpG dinucleotides that, when used in combination with a mathematical algorithm, can be used to estimate the age of a DNA source, including a cell, organ, or tissue. In some embodiments, the age estimator is a DNA methylation-based (DNAm) age estimator. In some embodiments, the DNAm age estimator can be calculated as an age correlation using the Pearson correlation coefficient r between DNA methylation-based (DNAm) age (also known as estimated age) and chronological age. In some embodiments, the DNA methylation-based (DNAm) age estimator may be a single-tissue DNA methylation-based age estimator. In some embodiments, the DNA methylation-based age estimator may be a multi-tissue DNA methylation-based age estimator.

노화 세포는 하기의 7가지 특징들 중 임의의 하나 이상의 것을 나타낼 수 있다. (1) 노화 성장 정지는 본질적으로 영구적이고, 공지된 생리학적 자극에 의해 역전될 수 없다. (2) 노화 세포는 크기가 증가하고, 비노화 대응물의 크기와 비교하여 2배 초과로 확대되어 있다. (3) 노화 세포는 노화 관련 β갈락토시다제(SA-β를 발현하는데, 이는 부분적으로는 리소좀 질량의 증가를 반영한다. (4) 대부분의 노화 세포는 p16INK4a를 발현하는데, 이는 보편적으로는 휴지 또는 최종적으로 분화된 세포에 의해서는 발현되지 않는다. (5) 지속적인 DDR 신호전달에 따라 노화되는 세포는 노화를 강화시키는 크로마틴 변경이 있는 DNA 세그먼트(DNA-SCARS: DNA segments with chromatin alterations reinforcing senescence)로 명명되는, 지속적인 핵 포커스를 보유한다. 상기 포커스는 활성화된 DDR 단백질을 포함하고, 일시적인 손상 포커스와는 구별가능하다. DNA-SCARS는 기능장애 텔로미어 또는 텔로미어 기능장애 유도성 포커스(TIF: telomere dysfunction-induced foci)를 포함한다. (6) 노화 세포는 노화와 관련된 분자를 발현하고, 분비할 수 있으며, 이는 특정 경우에는 지속적인 DDR 신호전달의 존재하에서 관찰되고, 특정 경우에는 그의 발현을 위해 지속적인 DDR 신호전달에 의존할 수 있다. (7) 노화 세포의 핵은 구조 단백질, 예컨대, 라민(Lamin) B1 또는 크로마틴 결합 단백질, 예컨대, 히스톤 및 HMGB1이 없다. 예컨대, 문헌 [Freund et al., Mol. Biol. Cell 23:2066-75 (2012)]; [Davalos et al., J. Cell Biol. 201:613-29 (2013)]; [Ivanov et al., J. Cell Biol. DOI: 10.1083/jcb.201212110, page 1-15](2013년 7월 1일 온라인상에 공개); [Funayama et al., J. Cell Biol. 175:869-80 (2006)]을 참조할 수 있다.Senescent cells may exhibit any one or more of the seven characteristics below. (1) Aging growth arrest is essentially permanent and cannot be reversed by known physiological stimuli. (2) Senescent cells increase in size and are more than two-fold enlarged compared to the size of their non-senescent counterparts. (3) Senescent cells express senescence-associated β-galactosidase (SA-β), which partly reflects an increase in lysosomal mass. (4) Most senescent cells express p16INK4a, which does not universally It is not expressed by resting or terminally differentiated cells. (5) Cells that age due to continuous DDR signaling have DNA segments with chromatin alterations reinforcing senescence. ), these foci contain activated DDR proteins and are distinguishable from transient damage foci, which are called dysfunctional telomeres or telomere dysfunction-induced foci (TIFs). (6) Senescent cells can express and secrete molecules associated with aging, which in certain cases is observed in the presence of persistent DDR signaling and, in certain cases, is persistent for its expression. (7) The nuclei of senescent cells lack structural proteins such as Lamin B1 or chromatin binding proteins such as histones and HMGB1, as described in Freund et al. Mol Biol. Cell 23:2066-75 (2012); Ivanov et al., J. Cell Biol. /jcb.201212110, page 1-15] (published online July 1, 2013); Funayama et al., J. Cell Biol 175:869-80 (2006).

노화 세포 및 노화 세포 관련 분자는 당업계에 기술된 기법 및 방법에 의해 검출될 수 있다. 예를 들어, 조직 중 노화 세포의 존재는 노화 마커, SA-베타 갈락토시다제(SA-β를 검출하는 조직화학법 또는 면역조직화학법에 의해 분석될 수 있다(예컨대, 문헌 [Dimri et al., Proc. Natl. Acad. Sci. USA 92: 9363-9367 (1995)] 참조). 노화 세포 관련 폴리펩티드 p16의 존재는 당업계에서 실시되는 다수의 면역화학 방법들 중 어느 하나, 예컨대, 면역블롯팅 분석에 의해 측정될 수 있다. 세포에서 p16 mRNA의 발현은 정량적 PCR을 비롯한, 당업계에서 실시되는 다양한 기법에 의해 측정될 수 있다. 노화 세포 관련 폴리펩티드(예컨대, SASP의 폴리펩티드)의 존재 및 수준은 자동 및 고처리량 검정법, 예컨대, 당업계에 기술되어 있는 자동 루미넥스(Luminex) 어레이 검정법을 사용하여 측정될 수 있다(예컨대, 문헌 [Coppe et al., PLoS Biol 6: 2853-68 (2008)] 참조).Senescent cells and senescent cell-related molecules can be detected by techniques and methods described in the art. For example, the presence of senescent cells in tissue can be analyzed by histochemistry or immunohistochemistry to detect the senescence marker, SA-beta galactosidase (SA-β) (see, e.g., Dimri et al. ., Proc. Acad. Sci. USA 92: 9363-9367 (1995)), using any of a number of immunochemical methods practiced in the art. The expression of p16 mRNA in a cell can be measured by a variety of techniques practiced in the art, including quantitative PCR, and the presence and level of senescent cell-related polypeptides (e.g., polypeptides of SASP). Can be measured using automated and high-throughput assays, such as the automated Luminex array assay described in the art (e.g., Coppe et al., PLoS Biol 6: 2853-68 (2008) ] reference).

본 명세서에서 용어 "마커(marker)"란 정상 세포군과 노화 세포군을 구분하여 특정할 수 있는 물질로, 본 발명의 노화 세포군에서 증가 또는 감소를 보이는 폴리펩티드, 단백질 또는 핵산, 유전자, 지질, 당지질, 당단백질 또는 당 등과 같은 유기 생체 분자들을 모두 포함한다. As used herein, the term "marker" refers to a substance that can be identified by distinguishing normal cell groups from senescent cell groups, and is a polypeptide, protein or nucleic acid, gene, lipid, glycolipid, or sugar that shows an increase or decrease in the senescent cell group of the present invention. Includes all organic biomolecules such as proteins or sugars.

도 1을 참조하여 설명하면, 상기 방법은 복수의 서로 다른 노화 유도 원인을 갖는 노화 세포군 및 대조군 세포의 RNA 시퀀싱 데이터를 포함하는 복수의 서로 다른 실험데이터를 획득하는 단계(S10); When described with reference to FIG. 1, the method includes acquiring a plurality of different experimental data including RNA sequencing data of a group of senescent cells and control cells having a plurality of different causes of aging (S10);

상기 획득된 복수의 서로 다른 실험데이터의 전사체(trasncriptome)를 분석 파이프라인을 통해 정량화하여 서로 다른 실험에서 파생되는 배치 효과(batch effect)를 감소시켜 배치-수정된(batch-corrected) 데이터를 획득하는 단계(S20); The transcriptome of the obtained plurality of different experimental data is quantified through an analysis pipeline to reduce the batch effect derived from different experiments to obtain batch-corrected data. Step (S20);

상기 배치-수정된 데이터 중 훈련 셋(training set) 내의 복수의 서로 다른 노화 유도 원인을 갖는 노화 세포군 각각에 대해 대조군 세포 대비 차별적으로 발현하는 유전자(differentially expressed genes)를 선별하는 단계(S30); Among the batch-corrected data, selecting differentially expressed genes compared to control cells for each group of senescent cells with a plurality of different causes of aging in the training set (S30);

상기 선별된 유전자 중 복수의 서로 다른 노화 유도 원인을 갖는 노화 세포군에서 공통적으로 존재하는 유전자들 선별하는 단계(S40); 및 Among the selected genes, selecting genes commonly present in a group of senescent cells having a plurality of different causes of aging (S40); and

상기 선별된 유전자의 발현 값을 독립변수로 하는 지도 학습(supervised learning)이 가능한 회귀 모델(regression model)을 이용하여 상기 선별된 유전자 중 세포 노화의 마커를 선별하는 단계(S50)를 포함할 수 있다. It may include a step (S50) of selecting markers of cellular aging among the selected genes using a regression model capable of supervised learning using the expression value of the selected gene as an independent variable. .

일 구체예에 있어서, 상기 복수의 서로 다른 실험데이터는 노화 세포군 및 대조군 세포군(증식 세포군)을 포함하는 것일 수 있다. 상기 노화 세포군은 적어도 2개 이상의 서로 다른 노화 원인을 갖는 복수의 노화 세포군일 수 있다. 상기 노화 원인은 복제 노화(Replicative senescence), 종양유전자 유도 노화(Oncogene induced senescence) 및 치료 유도 노화(Therapy induced senescence)로 이루어진 군으로부터 선택된 어느 하나인 것일 수 있다. 예를 들면, 상기 실험데이터는 복제 노화 세포군, 종양유전자 유도 노화 세포군, 및 대조군 세포군을 포함할 수 있다. 또한, 상기 실험데이터는 복제 노화 세포군, 종양유전자 유도 노화 세포군, 치료 유도 노화 세포군 및 대조군 세포군을 포함할 수 있다. 상기 실험데이터 내의 각각의 세포군은 동일한 실험실 또는 서로 상이한 실험실로부터 유래된 것일 수 있다. In one embodiment, the plurality of different experimental data may include a senescent cell group and a control cell group (proliferating cell group). The senescent cell group may be a plurality of senescent cell groups having at least two different causes of aging. The cause of aging may be any one selected from the group consisting of replicative senescence, oncogene induced senescence, and therapy induced senescence. For example, the experimental data may include a replicative senescent cell group, an oncogene-induced senescent cell group, and a control cell group. Additionally, the experimental data may include a replicative senescent cell group, an oncogene-induced senescent cell group, a treatment-induced senescent cell group, and a control cell group. Each cell population in the experimental data may be derived from the same laboratory or different laboratories.

일 구체예에 있어서, 상기 실험데이터는 생물학적 샘플로부터 실험적으로 측정되거나, 공지된 문헌으로부터 획득하거나, 또는 데이터베이스(DB)에 저장된 데이터를 포함할 수 있다. 상기 데이터베이스는 NCBI(National Center for Biotechnology Information), Gene　Expression Omnibus (GEO), European Bioinformatics Institute databases, 또는 European Nucleotide Archive를 포함할 수 있다. 상기 RNA 시퀀싱 데이터는 FASTQ 포맷의 형태로 구성된 것일 수 있다. In one embodiment, the experimental data may include data measured experimentally from biological samples, obtained from known literature, or stored in a database. The database may include National Center for Biotechnology Information (NCBI), Gene Expression Omnibus (GEO), European Bioinformatics Institute databases, or European Nucleotide Archive. The RNA sequencing data may be in FASTQ format.

일 구체예에 있어서, 상기 획득된 복수의 서로 다른 실험데이터를 훈련 셋과 검정 셋으로 분류하는 단계를 더 포함할 수 있다. 상기 분류하는 단계는 상기 실험데이터를 획득하고 수행되거나, 배치-수정된 데이터를 획득하고 수행될 수 있다. 상기 훈련 셋과 검정 셋의 실험데이터는 소정의 비율(예를 들면, 약 90:10, 약 80:20, 약 75:25, 약 60:40, 약 50:50, 약 40:60, 약 25:75, 약 20:80, 또는 약 10:90)로 분류될 수 있다. 또한, 상기 훈련 셋과 검정 셋은 상기 복수의 서로 다른 노화 유도 원인을 갖는 노화 세포군과 대조군 각각을 모두 포함하는 것일 수 있다. 예를 들면, 상기 훈련 셋 또는 검정 셋은 복제 노화 세포군, 종양유전자 유도 노화 세포군, 및 대조군 세포군을 포함할 수 있다. 또한, 상기 훈련 셋 또는 검정 셋은 복제 노화 세포군, 종양유전자 유도 노화 세포군, 치료 유도 노화 세포군 및 대조군 세포군을 포함할 수 있다. 상기 훈련 셋 또는 검정 셋 내의 각각의 세포군은 동일한 실험실 또는 서로 상이한 실험실로부터 유래된 것일 수 있다.In one embodiment, the step of classifying the obtained plurality of different experimental data into a training set and a test set may be further included. The classifying step may be performed by acquiring the experimental data, or may be performed by acquiring batch-corrected data. The experimental data of the training set and the test set are stored in a predetermined ratio (e.g., about 90:10, about 80:20, about 75:25, about 60:40, about 50:50, about 40:60, about 25 :75, about 20:80, or about 10:90). Additionally, the training set and the test set may include both a control group and a group of senescent cells having a plurality of different causes of aging. For example, the training set or test set may include a replicative senescent cell group, an oncogene-induced senescent cell group, and a control cell group. Additionally, the training set or test set may include a replicative senescent cell group, an oncogene-induced senescent cell group, a treatment-induced senescent cell group, and a control cell group. Each cell population within the training set or test set may be derived from the same laboratory or from different laboratories.

일 구체예에 있어서, 상기 배치-수정된(batch-corrected) 데이터를 획득하는 단계는, 상기 RNA 시퀀싱 데이터를 Fastq 데이터로 구성하고, 상기 Fastq 데이터를 입력하는 단계; 상기 입력된 Fastq 데이터를 품질 관리(Quality control)하고 전처리(pre-processing)하여 클린 리드(clean reads)를 획득하는 단계; 상기 클린 리드를 대응되는 유전체 또는 전사체에 대해 정렬(alignment)시키는 단계; 및/또는 상기 클린 리드에 대응되는 유전체 또는 전사체의 전사물(transcript)을 조립(assembly)하는 단계; 상기 조립된 전사물을 정량화하는 단계를 포함하는 것일 수 있다. In one embodiment, the step of obtaining the batch-corrected data includes configuring the RNA sequencing data as Fastq data and inputting the Fastq data; Obtaining clean reads by quality controlling and pre-processing the input Fastq data; Aligning the clean reads to the corresponding genome or transcript; and/or assembling a transcript of the genome or transcript corresponding to the clean read; It may include the step of quantifying the assembled transcript.

일 구체예에 있어서, 상기 품질 관리는 FASTQ 프로그램, NGSQC 프로그램, 또는 RNA-SeQC 프로그램을 이용하여 수행되는 것일 수 있다. 상기 전처리하여 클린 리드를 획득하는 단계는 Trimmomatic, PRINSEQ, 또는 Soapnuke를 이용하여 수행되는 것일 수 있다. 상기 정렬시키는 단계는, Salmon, Ensemble, Tophat2, HISAT2, STAR, BWA, 또는 Bowtie를 이용하여 수행되는 것일 수 있다, 상기 전사물을 조립하는 단계는, Tximeta, Cufflinks, StringTie, Trinity, SOAPdenovoTrans, 또는 Trans-AByS를 이용하여 수행되는 것일 수 있다, 상기 정량화하는 단계는, limma FeatureCount, HTSeq-Count, Cufflinks, eXpress, RSEM, DEXSeq, Kallisto, Sailfish, 또는 Salmon을 이용하여 수행되는 것일 수 있다. In one embodiment, the quality control may be performed using a FASTQ program, NGSQC program, or RNA-SeQC program. The step of obtaining clean reads through preprocessing may be performed using Trimmomatic, PRINSEQ, or Soapnuke. The alignment step may be performed using Salmon, Ensemble, Tophat2, HISAT2, STAR, BWA, or Bowtie. The step of assembling the transcript may be performed using Tximeta, Cufflinks, StringTie, Trinity, SOAPdenovoTrans, or Trans -It may be performed using AByS. The quantification step may be performed using limma FeatureCount, HTSeq-Count, Cufflinks, eXpress, RSEM, DEXSeq, Kallisto, Sailfish, or Salmon.

일 구체예에 있어서, 상기 배치-수정된 데이터 중 훈련 셋에 대해 모든 세포군에서 발현량이 유의적으로 낮은 유전자는 노화 마커 후보 유전자에서 제거하는 단계를 더 포함할 수 있다. 상기 발현량이 유의적으로 낮은 유전자의 제거는 edgeR 패키지를 통해 제거하는 것일 수 있다. In one embodiment, the step of removing genes with significantly low expression levels in all cell populations for the training set among the batch-corrected data from the aging marker candidate genes may be further included. The genes with significantly low expression levels may be removed through the edgeR package.

일 구체예에 있어서, 차별적으로 발현하는 유전자(differentially expressed genes)를 선별하는 단계는 노화 세포군에서 대조군 대비 유전자의 발현량이 변화된 유전자들 중 유의 확률 p 값(probability value)이 0.05 이하이거나, 발현량의 변화량이 약 2배 이상 또는 약 1/2배 이하인 유전자를 선별하는 단계; 또는 상기 차별적으로 발현하는 유전자에 대해 GO(gene ontology) 분석을 수행하여 검증하는 단계를 더 포함하는 것일 수 있다. 상기 차별적으로 발현하는 유전자의 분석은 분석은 R package edgeR (v3.10)을 통해 진행되었으며 TMM 정상화 이후 샘플의 노화 상태 정보와 배치 정보에 따라 분석 디자인을 구축함으로써 수행되는 것일 수 있다. In one embodiment, the step of selecting differentially expressed genes is when the significance p value (probability value) is less than 0.05 or the expression level of genes whose expression level has changed compared to the control group in the senescent cell group is 0.05 or less. Selecting genes with a change amount of about 2 times or more or about 1/2 times or less; Alternatively, it may further include verifying the differentially expressed genes by performing GO (gene ontology) analysis. The analysis of the differentially expressed genes was conducted using the R package edgeR (v3.10) and may be performed by constructing an analysis design according to the aging status information and batch information of the sample after TMM normalization.

일 구체예에 있어서, 상기 회귀 모델은 기계 학습 모델, 예를 들면, 지도-학습 모델을 포함하는 것일 수 있다. 일 실시예에 있어서, 상기 회귀 모델은 LASSO 모델, 선형 회귀 모델, 및/또는 다른 지도-학습 모델을 포함할 수 있다. 상기 회귀 모델의 독립변수(특징점)는 상기 선별된 유전자의 발현 값(TMM normalized Log₂CPM (count per million))일 수 있고, 종속변수는 데이터의 상태가 대조군(proliferating control)인 경우 0, 노화군(senescent status)인 경우 1로 할당된 것일 수 있다.In one embodiment, the regression model may include a machine learning model, for example, a supervised-learning model. In one embodiment, the regression model may include a LASSO model, a linear regression model, and/or other supervised-learning model. The independent variable (feature point) of the regression model may be the expression value (TMM normalized Log ₂ CPM (count per million)) of the selected gene, and the dependent variable is 0 when the data state is proliferating control, aging In case of sentient status, it may be assigned as 1.

또한, 상기 회귀 모델은 LOOCV(Leave one out cross validation)을 포함하는 것일 수 있다. 상기 LOOCV는 총 N(샘플 수 만큼)번의 모델을 만들고, 각 모델을 만들 때에 하나의 샘플만 제외하면서 그 제외한 샘플로 검정 셋의 퍼포먼스(test set performance)를 계산하여 N개의 퍼포먼스(performance)에 대해서 평균을 내는 방법을 포함하는 것일 수 있다. Additionally, the regression model may include leave one out cross validation (LOOCV). The LOOCV creates a total of N models (as many as the number of samples), excludes only one sample when creating each model, and calculates the test set performance with the excluded sample to calculate the N performance. This may include an averaging method.

일 구체예에 있어서, 상기 방법은 상기 선별된 노화세포의 마커를 검정하는 단계를 더 포함하는 것일 수 있다. In one embodiment, the method may further include the step of testing markers of the selected senescent cells.

상기 검정하는 단계는, (i) 상기 복수의 서로 다른 실험데이터의 적어도 일부를 검정 셋(test set)으로 하여 상기 검정 셋에서 훈련 셋과 동일한 방법으로 선별된 노화세포의 마커가 발현되는지 여부를 확인하는 단계; 및/또는 (ii) 상기 선별된 세포 노화의 마커를 훈련 셋에 적용하여 노화 예측 모델을 생성하고, 이를 이용하여 검정 셋에 대한 검정을 수행하는 단계를 포함하는 것일 수 있다. The testing step includes (i) using at least a portion of the plurality of different experimental data as a test set to determine whether markers of senescent cells selected in the same way as the training set are expressed in the test set. steps; and/or (ii) applying the selected marker of cellular aging to a training set to generate an aging prediction model, and using this to perform a test on the test set.

일 구체예에 있어서, 상기 (ii) 단계는, 상기 획득한 복수의 서로 다른 실험데이터와 다른 경로로 획득된 세포 노화의 마커를 상기 훈련 셋에 적용하여 노화 예측 모델을 생성하고, 이를 이용하여 추가적인 검정 셋에 대한 검정을 수행하며, 상기 검정 결과를 상기 (ii) 단계의 검정 결과와 비교하는 단계를 포함하는 것일 수 있다. In one embodiment, step (ii) generates an aging prediction model by applying the acquired plurality of different experimental data and markers of cellular aging obtained through different paths to the training set, and uses this to generate an additional aging prediction model. It may include performing a test on a test set and comparing the test result with the test result of step (ii).

일 구체예에 있어서, 상기 노화 예측 모델은 기계 학습 모델, 예를 들면, 지도-학습 모델을 포함하는 것일 수 있다. 일 실시예에 있어서, 상기 노화 예측모델은 서포트 벡터 머신을 통해 생성된 것일 수 있다. In one embodiment, the aging prediction model may include a machine learning model, for example, a supervised-learning model. In one embodiment, the aging prediction model may be generated through a support vector machine.

상기 노화 예측 모델은, 선별된 유전자인 RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, 및 PXMP2로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자를 기계 학습 모델, 예를 들면, 서포트 벡터 머신을 통해 생성된 것일 수 있다.The aging prediction model is one or more selected from the group consisting of the selected genes RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, and PXMP2. The gene may be generated through a machine learning model, for example, a support vector machine.

상기 노화 예측 모델은, 선별된 유전자인 EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, 및 H2AC6로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자를 기계 학습 모델, 예를 들면, 서포트 벡터 머신을 통해 생성된 것일 수 있다. The aging prediction model is any selected from the group consisting of selected genes EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, and H2AC6. One or more genes may be generated through a machine learning model, for example, a support vector machine.

상기 노화 예측 모델은, 선별된 유전자인 GAS2L3, AMOT 및 WDR76로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자를 기계 학습 모델, 예를 들면, 서포트 벡터 머신을 통해 생성된 것일 수 있다. The aging prediction model may be generated using one or more genes selected from the group consisting of GAS2L3, AMOT, and WDR76, which are selected genes, through a machine learning model, for example, a support vector machine.

본 명세서의 또 다른 양상은 상기 생성된 세포 노화 예측 모델을 제공하는 것이다. Another aspect of the present specification is to provide the generated cellular aging prediction model.

다른 구체예에 있어서, 상기 방법은 각 단계에서 유전자를 선별하면서 검증하는 단계를 더 포함할 수 있다. 상기 선별된 유전자의 검증은 GO(Gene ontology) 분석, 주성분 분석 또는 클러스터링을 통해 수행되는 것일 수 있다. In another embodiment, the method may further include selecting and verifying genes at each step. Verification of the selected genes may be performed through GO (Gene ontology) analysis, principal component analysis, or clustering.

또 다른 양상은, 상기 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체를 제공하는 것이다. Another aspect is to provide a computer-readable recording medium on which a program for executing the method on a computer is recorded.

또 다른 양상은, 세포 노화의 마커를 검출하기 위한 장치를 제공하는 것이다.Another aspect is to provide a device for detecting markers of cellular aging.

상기 장치는, 복수의 서로 다른 노화 유도 원인을 갖는 노화 세포군 및 대조군 세포의 RNA 시퀀싱 데이터를 포함하는 복수의 서로 다른 실험데이터를 획득하는 획득부; The device includes an acquisition unit that acquires a plurality of different experimental data including RNA sequencing data of a group of senescent cells and control cells having a plurality of different causes of aging;

상기 획득된 복수의 서로 다른 실험데이터의 전사체(trasncriptome)를 분석 파이프라인을 통해 정량화하여 서로 다른 실험에서 파생되는 배치 효과(batch effect)를 감소시켜 배치-수정된(batch-corrected) 데이터 획득부; A batch-corrected data acquisition unit that quantifies the transcriptome of the obtained plurality of different experimental data through an analysis pipeline to reduce the batch effect derived from different experiments. ;

상기 배치-수정된 데이터 중 훈련 셋(training set) 내의 복수의 서로 다른 노화 유도 원인을 갖는 노화 세포군 각각에 대해 대조군 세포 대비 차별적으로 발현하는 유전자(differentially expressed genes)를 선별하는 제1 선별부; A first selection unit that selects differentially expressed genes compared to control cells for each group of senescent cells with a plurality of different causes of aging in a training set among the batch-corrected data;

상기 선별된 유전자 중 복수의 서로 다른 노화 유도 원인을 갖는 노화 세포군에서 공통적으로 존재하는 유전자들 선별하는 제2 선별부; 및 a second selection unit that selects genes commonly present in a group of senescent cells with a plurality of different causes of aging among the selected genes; and

지도 학습(supervised learning)이 가능한 회귀 모델(regression model)을 이용하여 상기 선별된 유전자에 대해 세포 노화의 마커를 선별하는 제3 선별부를 포함할 수 있다. It may include a third selection unit that selects markers of cellular aging for the selected genes using a regression model capable of supervised learning.

또 다른 양상은, 세포 노화를 검출하기 위한 조성물을 제공하는 것이다. Another aspect is to provide a composition for detecting cellular aging.

상기 조성물은 RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, 및 PXMP2로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정할 수 있는 제제를 포함하는 것일 수 있다. The composition expresses or activates one or more genes or proteins selected from the group consisting of RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, and PXMP2 It may include an agent whose level can be measured.

상기 조성물은 EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, 및 H2AC6로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정할 수 있는 제제를 포함하는 것일 수 있다.The composition expresses one or more genes or proteins selected from the group consisting of EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, and H2AC6. Alternatively, it may include an agent whose activity level can be measured.

상기 조성물은 GAS2L3, AMOT 및 WDR76로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정할 수 있는 제제를 포함하는 것일 수 있다. The composition may contain an agent capable of measuring the expression or activity level of one or more genes or proteins selected from the group consisting of GAS2L3, AMOT, and WDR76.

일 구체예에 있어서, 상기 유전자의 발현 수준 측정은 생물학적 시료에서 유전자들의 mRNA 존재 여부와 발현 정도를 확인하는 과정으로 mRNA의 양을 측정하는 것을 포함할 수 있다. 이를 위한 분석 방법으로는 역전사 중합효소반응(RT-PCR), 경쟁적 역전사 중합효소반응(Competitive RT-PCR), 실시간 역전사 중합효소반응(Real-time RT-PCR), RNase 보호 분석법(RPA; RNase protection assay), 노던 블랏팅(Northern blotting), DNA 칩 등을 포함할 수 있다. In one embodiment, measuring the expression level of the gene is a process of confirming the presence and expression level of mRNA of the genes in a biological sample and may include measuring the amount of mRNA. Analysis methods for this include reverse transcription polymerase reaction (RT-PCR), competitive reverse transcription polymerase reaction (Competitive RT-PCR), real-time reverse transcription polymerase reaction (Real-time RT-PCR), and RNase protection assay (RPA). assay), Northern blotting, DNA chip, etc.

상기 유전자의 발현 수준을 측정할 수 있는 제제는 프라이머, 프로브 또는 안티센스 올리고뉴클레오티드를 포함할 수 있다. 상기 유전자들의 핵산 정보가 GeneBank 등에 알려져 있으므로 당업자는 상기 서열을 바탕으로 이들 유전자의 특정 영역을 특이적으로 증폭하는 프라이머 또는 프로브를 디자인할 수 있다.Agents capable of measuring the expression level of the gene may include primers, probes, or antisense oligonucleotides. Since the nucleic acid information of the genes is known in GeneBank, etc., a person skilled in the art can design primers or probes that specifically amplify specific regions of these genes based on the sequences.

본 명세서에서 사용된 용어 "프라이머"는 표적 유전자 서열을 인지하는 정방향 및 역방향의 프라이머로 이루어진 모든 조합의 프라이머쌍을 포함하고, 상세하게는 특이성 및 민감성을 가지는 분석 결과를 제공하는 프라이머 쌍이다. The term “primer” used herein includes all combinations of primer pairs consisting of forward and reverse primers that recognize the target gene sequence, and more specifically, a primer pair that provides analysis results with specificity and sensitivity.

본 명세서에서 사용된 용어 "프로브"란 시료 내의 검출하고자 하는 표적 물질과 특이적으로 결합할 수 있는 물질을 의미하며, 상기 결합을 통하여 특이적으로 시료 내의 표적 물질의 존재를 확인할 수 있는 물질을 의미한다. 프로브 분자의 종류는 당업계에서 통상적으로 사용되는 물질로서 제한은 없으나, 바람직하게는 PNA (peptide nucleic acid), LNA (locked nucleic acid), 펩타이드, 폴리펩타이드, 단백질, RNA 또는 DNA 일 수 있다. 보다 구체적으로, 상기 프로브는 바이오 물질로서 생물에서 유래되거나 이와 유사한 것 또는 생체외에서 제조된 것을 포함하는 것으로 예를 들어, 효소, 단백질, 항체, 미생물, 동식물 세포 및 기관, 신경세포, DNA, 및 RNA일 수 있으며, DNA는 cDNA, 게놈 DNA, 올리고뉴클레오타이드를 포함하며, RNA는 게놈 RNA, mRNA, 올리고뉴클레오타이드를 포함하며, 단백질의 예로는 항체, 항원, 효소, 펩타이드 등을 포함할 수 있다.The term "probe" used herein refers to a substance that can specifically bind to a target substance to be detected in a sample, and refers to a substance that can specifically confirm the presence of the target substance in the sample through said binding. do. The type of probe molecule is not limited as it is a material commonly used in the art, but preferably may be PNA (peptide nucleic acid), LNA (locked nucleic acid), peptide, polypeptide, protein, RNA or DNA. More specifically, the probe is a biomaterial that is derived from or similar to living organisms or includes those produced in vitro, such as enzymes, proteins, antibodies, microorganisms, animal and plant cells and organs, nerve cells, DNA, and RNA. DNA may include cDNA, genomic DNA, and oligonucleotides, RNA may include genomic RNA, mRNA, and oligonucleotides, and examples of proteins may include antibodies, antigens, enzymes, peptides, etc.

본 명세서에서 용어, "안티센스 올리고뉴클레오티드"는 특정 유전자(예를 들면, 특정 유전자의 mRNA)의 서열에 상보적인 핵산 서열을 함유하고 있는 DNA 또는 RNA 또는 이들의 유도체로서, mRNA 내의 상보적인 서열에 결합하여 mRNA의 단백질로의 번역을 저해하는 작용을 한다. 안티센스 올리고뉴클레오티드 서열은 상기 유전자들의 mRNA에 상보적이고 상기 mRNA에 결합할 수 있는 DNA 또는 RNA 서열을 의미한다. 이는 상기 유전자 mRNA의 번역, 세포질 내로의 전위(translocation), 성숙(maturation) 또는 다른 모든 전체적인 생물학적 기능에 대한 필수적인 활성을 저해할 수 있다. 안티센스 올리고뉴클레오티드의 길이는 6 내지 100 염기, 바람직하게는 8 내지 60 염기, 보다 바람직하게는 10 내지 40 염기일 수 있다. As used herein, the term "antisense oligonucleotide" is DNA or RNA or a derivative thereof containing a nucleic acid sequence complementary to the sequence of a specific gene (e.g., mRNA of a specific gene), and binds to the complementary sequence in the mRNA. It acts to inhibit the translation of mRNA into protein. Antisense oligonucleotide sequence refers to a DNA or RNA sequence that is complementary to the mRNA of the genes and is capable of binding to the mRNA. This may inhibit translation, translocation into the cytoplasm, maturation, or any other essential activity for overall biological functions of the gene mRNA. The length of the antisense oligonucleotide may be 6 to 100 bases, preferably 8 to 60 bases, and more preferably 10 to 40 bases.

일 구체예에 있어서, 상기 단백질의 발현 또는 활성 수준 측정은 생물학적 시료에서 상기 단백질의 존재 여부와 발현(또는 활성) 정도를 확인하는 과정을 포함할 수 있다. 이를 위한 분석 방법으로는 단백질 칩 분석, 면역측정법, 리간드 바인딩 어세이, MALDI-TOF(Matrix Desorption/Ionization Time of Flight Mass Spectrometry)분석, SELDI-TOF(Sulface Enhanced Laser Desorption/Ionization Time of Flight Mass Spectrometry)분석, 방사선 면역분석, 방사 면역 확산법, 오우크테로니 면역 확산법, 로케트 면역전기영동, 조직면역 염색, 보체 고정 분석법, 2차원 전기영동 분석, 액상 크로마토그래피-질량분석(liquid chromatography-Mass Spectrometry, LC-MS), LC-MS/MS(liquid chromatography-Mass Spectrometry/ Mass Spectrometry), 웨스턴 블랏, 및 ELISA(enzyme linked immunosorbentassay) 등을 포함할 수 있다. In one embodiment, measuring the expression or activity level of the protein may include the process of confirming the presence and expression (or activity) level of the protein in a biological sample. Analysis methods for this include protein chip analysis, immunoassay, ligand binding assay, MALDI-TOF (Matrix Desorption/Ionization Time of Flight Mass Spectrometry) analysis, and SELDI-TOF (Sulface Enhanced Laser Desorption/Ionization Time of Flight Mass Spectrometry). Analysis, radioimmunoassay, radioimmunodiffusion method, Ouchteroni immunodiffusion method, rocket immunoelectrophoresis, tissue immunostaining, complement fixation assay, two-dimensional electrophoresis analysis, liquid chromatography-Mass Spectrometry (LC) -MS), LC-MS/MS (liquid chromatography-Mass Spectrometry/ Mass Spectrometry), Western blot, and ELISA (enzyme linked immunosorbent assay).

상기 단백질의 활성 수준을 측정할 수 있는 제제는 항체, 앱타머, 아비머(avidity multimer) 또는 페티도모방체(peptidomimerics)를 포함하는 것일 수 있다. Agents capable of measuring the activity level of the protein may include antibodies, aptamers, avidity multimers, or peptidomimerics.

본 명세서에서 사용된 용어 "항체"는 항원성 부위에 대해서 지시되는 특이적인 단백질 분자를 의미할 수 있다. 본 발명의 목적상, 항체는 상기 에스트로겐 수용체, MUCIN5AC, 또는 아로마타제 단백질에 대해 특이적으로 결합하는 항체를 의미하며, 다클론 항체, 단클론 항체 및 재조합 항체를 모두 포함한다. 항체를 생성하는 것은 당업계에 널리 공지된 기술을 이용하여 용이하게 제조할 수 있다. 또한 본 명세서의 항체는 2개의 전체 길이의 경쇄 및 2개의 전체 길이의 중쇄를 가지는 완전한 형태뿐만 아니라, 항체 분자의 기능적인 단편을 포함한다. 항체분자의 기능적인 단편이란 적어도 항원 결합 기능을 보유하고 있는 단편을 뜻하며, Fab, F(ab'), F(ab') 2 및 Fv 등이 있다.As used herein, the term “antibody” may refer to a specific protein molecule directed against an antigenic site. For the purposes of the present invention, an antibody refers to an antibody that specifically binds to the estrogen receptor, MUCIN5AC, or aromatase protein, and includes polyclonal antibodies, monoclonal antibodies, and recombinant antibodies. Antibodies can be easily produced using techniques well known in the art. Antibodies herein also include intact forms with two full-length light chains and two full-length heavy chains, as well as functional fragments of the antibody molecule. Functional fragments of antibody molecules refer to fragments that possess at least an antigen-binding function, and include Fab, F(ab'), F(ab') 2, and Fv.

또 다른 양상은, RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, 및 PXMP2로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정할 수 있는 제제를 포함하는 세포 노화를 검출하기 위한 키트를 제공하는 것이다.Another aspect is the expression of any one or more genes or proteins selected from the group consisting of RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, and PXMP2 Alternatively, a kit for detecting cellular senescence comprising an agent capable of measuring the activity level is provided.

또 다른 양상은, EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, 및 H2AC6로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정할 수 있는 제제를 포함하는 세포 노화를 검출하기 위한 키트를 제공하는 것이다. Another aspect is one or more genes or proteins selected from the group consisting of EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, and H2AC6. To provide a kit for detecting cellular senescence that includes an agent capable of measuring the expression or activity level of.

또 다른 양상은, GAS2L3, AMOT 및 WDR76로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정할 수 있는 제제를 포함하는 세포 노화를 검출하기 위한 키트를 제공하는 것이다. Another aspect is to provide a kit for detecting cellular senescence, which includes an agent capable of measuring the expression or activity level of any one or more genes or proteins selected from the group consisting of GAS2L3, AMOT, and WDR76.

일 구체예에 있어서, 상기 키트는 RT-PCR(Reverse transcription polymerase chain reaction) 키트, DNA 칩 키트, 마이크로어레이 키트, ELISA(Enzyme-linked immunosorbent assay) 키트, 단백질 칩 키트, 래피드(rapid) 키트 또는 MRM(Multiple reaction monitoring) 키트일 수 있다.In one embodiment, the kit is a reverse transcription polymerase chain reaction (RT-PCR) kit, a DNA chip kit, a microarray kit, an enzyme-linked immunosorbent assay (ELISA) kit, a protein chip kit, a rapid kit, or an MRM. It may be a (multiple reaction monitoring) kit.

또한, 상기 키트는 분석 방법에 적합한 한 종류 또는 그 이상의 다른 구성성분 조성물, 용액 또는 장치를 더 포함하여 구성될 수 있다. 예를 들면, 상기 키트는 역전사 중합효소반응을 수행하기 위해 필요한 필수 요소를 포함하는 키트일 수 있다. 역전사 중합효소반응 키트는 마커 유전자에 대한 특이적인 각각의 프라이머 쌍을 포함한다. 프라이머는 상기 각 유전자의 핵산서열에 특이적인 서열을 가지는 뉴클레오타이드로서, 약 7 bp 내지 50 bp의 길이, 보다 바람직하게는 약 10 bp 내지 30 bp 의 길이이다. 또한 대조군 유전자의 핵산 서열에 특이적인 프라이머를 포함할 수 있다. 그외 역전사 중합효소반응 키트는 테스트 튜브 또는 다른 적절한 컨테이너, 반응 완충액(pH 및 마그네슘 농도는 다양), 데옥시뉴클레오타이드(dNTPs), Taq-폴리머라아제 및 역전사효소와 같은 효소, DNAse, RNAse 억제제 DEPC-수(DEPC-water), 멸균수등을 포함할 수 있다. 또한, 예를 들면, DNA 칩 키트는 유전자 또는 그의 단편에 해당하는 cDNA 또는 올리고뉴클레오티드(oligonucleotide)가 부착되어 있는 기판, 및 형광표식 프로브를 제작하기 위한 시약, 제제, 효소 등을 포함할 수 있다. 또한 기판은 대조군 유전자 또는 그의 단편에 해당하는 cDNA 또는 올리고뉴클레오티드를 포함할 수 있다. 또한, 예를 들면, 상기 키트는 LISA를 수행하기 위해 필요한 필수 요소를 포함하는 진단 키트일 수 있다. ELISA 키트는 상기 단백질에 대한 특이적인 항체를 포함한다. 항체는 각 마커 단백질에 대한 특이성 및 친화성이 높고 다른 단백질에 대한 교차 반응성이 거의 없는 항체로, 단클론 항체, 다클론 항체 또는 재조합 항체이다. 또한 ELISA 키트는 대조군 단백질에 특이적인 항체를 포함할 수 있다. 그 외 ELISA 키트는 결합된 항체를 검출할 수 있는 시약, 예를 들면, 표지된 2차 항체, 발색단(chromophores), 효소(예: 항체와 컨주게이트됨) 및 그의 기질 또는 항체와 결합할 수 있는 다른 물질 등을 포함할 수 있다. 또한, 예를 들면, 상기 키트는 분석결과를 알 수 있는 신속한 테스트를 수행하기 위해 필요한 필수 요소를 포함하는 래피드(rapid) 키트 일 수 있다. 래피드 키트는 단백질에 대한 특이적인 항체를 포함한다. 항체는 각 마커 단백질에 대한 특이성 및 친화성이 높고 다른 단백질에 대한 교차 반응성이 거의 없는 항체로, 단클론 항체, 다클론 항체 또는 재조합 항체이다. 또한 래피드 키트는 대조군 단백질에 특이적인 항체를 포함할 수 있다. 그 외 래피드 키트는 결합된 항체를 검출할 수 있는 시약, 예를 들면, 특이항체와 2차 항체가 고정된 나이트로 셀룰로오스 멤브레인, 항체가 결합된 비드에 결합된 멤브레인, 흡수패드와 샘플 패드 등 다른 물질 등을 포함할 수 있다. 또한, 예를 들면, 상기 키트는 질량 분석을 수행하기 위해 필요한 필수 요소를 포함하는 MS/MS 모드인 MRM(Multiple reaction monitoring) 키트일 수 있다. SIM(Selected Ion Monitoring)이 질량분석기의 소스 부분에서 한 번 충돌하여 생긴 이온을 이용하는 방법인 반면, MRM은 상기 한 번 깨진 이온 중에서 특정 이온을 한 번 더 선택하여 연속적으로 연결된 또 다른 MS의 소스를 한 번 더 통과시켜 충돌시킨 후 이 중에서 얻은 이온들을 이용하는 방법이다. 상기 MRM(Multiple reaction monitoring) 분석 방법들을 통하여, 정상 대조군의 단백질 발현 수준과 노화 세포군의 단백질 발현 수준을 비교할 수 있고, 또한, 마커 유전자에서 단백질로의 유의한 발현량 증감여부를 판단하여, 노화 세포군 여부를 판단할 수 있다.In addition, the kit may further include one or more other component compositions, solutions, or devices suitable for the analysis method. For example, the kit may be a kit containing essential elements required to perform a reverse transcription polymerase reaction. The reverse transcription polymerase reaction kit includes each primer pair specific for the marker gene. The primer is a nucleotide having a sequence specific to the nucleic acid sequence of each gene, and is about 7 bp to 50 bp in length, more preferably about 10 bp to 30 bp in length. It may also include primers specific to the nucleic acid sequence of the control gene. In addition, the reverse transcription polymerase reaction kit consists of test tubes or other suitable containers, reaction buffer (pH and magnesium concentration vary), deoxynucleotides (dNTPs), enzymes such as Taq-polymerase and reverse transcriptase, DNAse, and RNAse inhibitor DEPC- It may include water (DEPC-water), sterilized water, etc. Additionally, for example, a DNA chip kit may include a substrate to which a cDNA or oligonucleotide corresponding to a gene or a fragment thereof is attached, and reagents, agents, enzymes, etc. for producing a fluorescent probe. The substrate may also include cDNA or oligonucleotides corresponding to control genes or fragments thereof. Additionally, for example, the kit may be a diagnostic kit containing essential elements necessary to perform LISA. ELISA kits contain specific antibodies against these proteins. Antibodies have high specificity and affinity for each marker protein and almost no cross-reactivity to other proteins, and are either monoclonal antibodies, polyclonal antibodies, or recombinant antibodies. Additionally, ELISA kits may include antibodies specific for control proteins. Other ELISA kits include reagents capable of detecting bound antibodies, such as labeled secondary antibodies, chromophores, enzymes (e.g. conjugated with antibodies) and their substrates or those capable of binding to antibodies. It may contain other substances, etc. Additionally, for example, the kit may be a rapid kit containing the essential elements needed to perform a rapid test that can provide analysis results. Rapid kits contain specific antibodies against proteins. Antibodies have high specificity and affinity for each marker protein and almost no cross-reactivity to other proteins, and are either monoclonal antibodies, polyclonal antibodies, or recombinant antibodies. Additionally, the rapid kit may include antibodies specific for control proteins. In addition, the rapid kit includes reagents that can detect bound antibodies, such as nitrocellulose membranes with specific antibodies and secondary antibodies immobilized on them, membranes bound to antibody-bound beads, absorption pads, and sample pads. It may include substances, etc. Additionally, for example, the kit may be a multiple reaction monitoring (MRM) kit in MS/MS mode that includes the essential elements necessary to perform mass spectrometry. While SIM (Selected Ion Monitoring) is a method that uses ions created by a single collision in the source part of a mass spectrometer, MRM selects a specific ion from among the broken ions once more and uses another continuously connected MS source. This is a method of passing it through one more time, colliding with it, and then using the ions obtained from these. Through the above MRM (multiple reaction monitoring) analysis methods, the protein expression level of the normal control group can be compared with the protein expression level of the senescent cell group, and also, by determining whether there is a significant increase or decrease in the expression level of the marker gene to protein, the senescent cell group You can judge whether or not.

또 다른 양상은 분리된 생물학적 시료로부터 RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2,및 PXMP2로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정하는 단계를 포함하는 세포 노화를 검출하는 방법을 제공하는 것이다.Another aspect is one or more genes selected from the group consisting of RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, and PXMP2 from an isolated biological sample. Alternatively, a method for detecting cellular senescence comprising measuring the expression or activity level of a protein is provided.

일 구체예에 있어서, 상기 방법은 상기 RRM2B, DUSP6, GSAP, AKAP6, C2orf92, 및 AC008012.1로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준이 정상 대조군 시료의 유전자 또는 단백질의 발현 또는 활성 수준보다 높은 경우, 세포 노화로 판단하는 단계; 또는 LDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, 및 PXMP2로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준이 정상 대조군 시료의 유전자 또는 단백질의 발현 또는 활성 수준보다 낮은 경우, 세포 노화로 판단하는 단계를 포함하는 것일 수 있다. In one embodiment, the method determines the expression or activity level of one or more genes or proteins selected from the group consisting of RRM2B, DUSP6, GSAP, AKAP6, C2orf92, and AC008012.1 compared to the expression of the genes or proteins in a normal control sample. or if it is higher than the activity level, determining cell senescence; or the expression or activity level of any one or more genes or proteins selected from the group consisting of LDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, and PXMP2 is lower than the expression or activity level of the genes or proteins in the normal control sample. In this case, it may include a step of determining cell aging.

또 다른 양상은 분리된 생물학적 시료로부터 EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, 및 H2AC6로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정하는 단계를 포함하는 세포 노화를 검출하는 방법을 제공하는 것이다. Another aspect is one selected from the group consisting of EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, and H2AC6 from an isolated biological sample. To provide a method for detecting cellular aging comprising measuring the expression or activity level of the above genes or proteins.

일 구체예에 있어서, 상기 방법은 상기 H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, 및 H2AC6로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준이 정상 대조군 시료의 유전자 또는 단백질의 발현 또는 활성 수준보다 높은 경우, 세포 노화로 판단하는 단계; 또는 EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, 및 WDR76로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준이 정상 대조군 시료의 유전자 또는 단백질의 발현 또는 활성 수준보다 낮은 경우, 세포 노화로 판단하는 단계를 포함하는 것일 수 있다.In one embodiment, the method determines the expression or activity level of one or more genes or proteins selected from the group consisting of H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, and H2AC6 compared to the genes in the normal control sample. Or, if the expression or activity level of the protein is higher than that, determining cell senescence; or when the expression or activity level of any one or more genes or proteins selected from the group consisting of EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, and WDR76 is lower than the expression or activity level of the genes or proteins in the normal control sample, It may include a step of determining cellular aging.

또 다른 양상은 분리된 생물학적 시료로부터 GAS2L3, AMOT 및 WDR76로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 측정하는 단계를 포함하는 세포 노화를 검출하는 방법을 제공하는 것이다. Another aspect is to provide a method for detecting cellular senescence, comprising measuring the expression or activity level of one or more genes or proteins selected from the group consisting of GAS2L3, AMOT, and WDR76 from an isolated biological sample.

일 구체예에 있어서, 상기 방법은 GAS2L3, AMOT 및 WDR76로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준이 정상 대조군 시료의 유전자 또는 단백질의 발현 또는 활성 수준보다 낮은 경우, 세포 노화로 판단하는 단계를 포함하는 것일 수 있다. In one embodiment, the method is used to cause cellular senescence when the expression or activity level of any one or more genes or proteins selected from the group consisting of GAS2L3, AMOT, and WDR76 is lower than the expression or activity level of the genes or proteins of the normal control sample. It may include a judgment step.

일 구체예에 있어서, 상기 유전자의 발현 수준을 측정하는 방법은 역전사 중합효소반응, 경쟁적 역전사 중합효소반응, 실시간 역전사 중합효소반응, RNase 보호 분석법, 노던 블랏팅 또는 DNA 칩을 포함할 수 있다. In one embodiment, the method for measuring the expression level of the gene may include reverse transcription polymerase reaction, competitive reverse transcription polymerase reaction, real-time reverse transcription polymerase reaction, RNase protection assay, Northern blotting, or DNA chip.

일 구체예에 있어서, 상기 단백질의 활성 수준을 측정하는 방법은 웨스턴 블랏팅, ELISA, 방사선 면역분석법, 방사선 면역확 산법, 오우크테로니 (Ouchterlony) 면역확산법, 로케트 면역전기영동, 면역조직화학염색, 면역침전분석, 보체고 정분석, FACS 또는 단백질 칩을 포함할 수 있다. In one embodiment, the method for measuring the activity level of the protein includes Western blotting, ELISA, radioimmunoassay, radioimmunodiffusion, Ouchterlony immunodiffusion, rocket immunoelectrophoresis, and immunohistochemical staining. , immunoprecipitation assay, complement fixation assay, FACS, or protein chip.

또 다른 양상은 RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, 및 PXMP2로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질과 피검 물질을 접촉시키는 단계; 및 Another aspect is a test agent and any one or more genes or proteins selected from the group consisting of RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1, CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, and PXMP2 contacting; and

무처리 대조군에 비하여 상기 유전자 또는 단백질의 발현 또는 활성 수준을 변화시킨 피검 물질을 세놀리틱(senolytic) 약물로 선별하는 단계를 포함하는 세놀리틱 약물을 스크리닝 하는 방법을 제공하는 것이다. The present invention provides a method for screening a senolytic drug, which includes selecting a test substance that changes the expression or activity level of the gene or protein as a senolytic drug compared to an untreated control group.

일 구체예에 있어서, 상기 방법은 상기 RRM2B, DUSP6, GSAP, AKAP6, C2orf92, 및 AC008012.1로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 무처리 대조군 대비 감소시킨 피검 물질을 세놀리틱 약물로 선별하는 단계; 또는 상기 LDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, 및 PXMP2로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 무처리 대조군 대비 증가시킨 피검 물질을 세놀리틱 약물로 선별하는 단계를 포함하는 것일 수 있다.In one embodiment, the method uses a test substance in which the expression or activity level of one or more genes or proteins selected from the group consisting of RRM2B, DUSP6, GSAP, AKAP6, C2orf92, and AC008012.1 is reduced compared to the untreated control group. Screening with a senolytic drug; Or, the test substance that increases the expression or activity level of one or more genes or proteins selected from the group consisting of LDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, and PXMP2 compared to the untreated control group is a senolytic drug. It may include a selection step.

또 다른 양상은 EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, 및 H2AC6로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질과 피검 물질을 접촉시키는 단계; 및 Another aspect is one or more genes or proteins selected from the group consisting of EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76, H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, and H2AC6 contacting the test substance; and

무처리 대조군에 비하여 상기 유전자 또는 단백질의 발현 또는 활성 수준을 변화시킨 피검 물질을 세놀리틱(senolytic) 약물로 선별하는 단계를 포함하는 세놀리틱 약물을 스크리닝 하는 방법을 제공하는 것이다.The present invention provides a method for screening a senolytic drug, which includes selecting a test substance that changes the expression or activity level of the gene or protein as a senolytic drug compared to an untreated control group.

일 구체예에 있어서, 상기 방법은 상기 EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, 및 WDR76로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 무처리 대조군 대비 감소시킨 피검 물질을 세놀리틱 약물로 선별하는 단계; 또는 상기 H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, 및 H2AC6로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 무처리 대조군 대비 증가시킨 피검 물질을 세놀리틱 약물로 선별하는 단계를 포함하는 것일 수 있다. In one embodiment, the method is performed in a subject in which the expression or activity level of one or more genes or proteins selected from the group consisting of EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, and WDR76 is reduced compared to the untreated control group. Screening a substance as a senolytic drug; Or, the test substance that increases the expression or activity level of any one or more genes or proteins selected from the group consisting of H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, and H2AC6 compared to the untreated control group is a senolytic drug. It may include a selection step.

또 다른 양상은 GAS2L3, AMOT 및 WDR76로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질과 피검 물질을 접촉시키는 단계; 및 Another aspect includes contacting a test substance with one or more genes or proteins selected from the group consisting of GAS2L3, AMOT, and WDR76; and

일 구체예에 있어서, 상기 방법은 상기 GAS2L3, AMOT 및 WDR76로 이루어진 군으로부터 선택된 어느 하나 이상의 유전자 또는 단백질의 발현 또는 활성 수준을 무처리 대조군 대비 증가시킨 피검 물질을 세놀리틱 약물로 선별하는 단계를 포함하는 것일 수 있다. In one embodiment, the method includes the step of selecting a test substance with an increased expression or activity level of one or more genes or proteins selected from the group consisting of GAS2L3, AMOT, and WDR76 as a senolytic drug compared to the untreated control group. It may include

상기 피검 물질, 예를 들면, 약물 후보 물질, 피검 화합물 또는 피검 조성물은 저분자 화합물, 항체, 안티센스 뉴클레오티드, 작은 간섭 RNA(short interfering RNA), 짧은 헤어핀 RNA(short hairpin RNA), 핵산, 단백질, 펩티드, 기타 추출물 또는 천연물을 포함할 수 있다. The test substance, for example, a drug candidate, a test compound, or a test composition may include a small molecule compound, an antibody, an antisense nucleotide, a short interfering RNA, a short hairpin RNA, a nucleic acid, a protein, a peptide, Other extracts or natural products may be included.

상기 접촉은 시험관 내(in vitro)에서 수행되는 것일 수 있다. 예를 들면, 상기 유전자 또는 단백질들을 발현하는 또는 낙아웃된 벡터를 세포에 형질전환 하는 단계; 상기 형질전환된 세포에 약물 후보물질을 처리하는 단계를 포함할 수 있다.The contact may be performed in vitro. For example, transforming a cell with a vector expressing or knocking out the genes or proteins; It may include treating the transformed cells with a drug candidate.

상기 세놀리틱 약물은 임상적으로 유의적이거나, 또는 생물학적으로 유의적인 방식으로 노화 세포를 선택적으로(우선적으로 또는 더욱 큰 정도로) 사멸 또는 파괴시키는 데 사용될 수 있는 제제를 의미할 수 있다. 상기 세놀리틱 제제는 노화 세포(예컨대, 노화 지방 전구 세포, 노화 내피 세포, 노화 섬유아세포, 노화 뉴런, 노화 상피 세포, 노화 중간엽 세포, 노화 평활근 세포, 노화 대식세포, 또는 노화 연골 세포) 중 하나 이상의 유형을 선택적으로 사멸시킬 수 있는 제제를 포함할 수 있다. The senolytic drug may refer to an agent that can be used to selectively (preferentially or to a greater extent) kill or destroy senescent cells in a clinically significant or biologically significant manner. The senolytic agent may be used in senescent cells (e.g., senescent adipocytes, senescent endothelial cells, senescent fibroblasts, senescent neurons, senescent epithelial cells, senescent mesenchymal cells, senescent smooth muscle cells, senescent macrophages, or senescent chondrocytes). Agents capable of selectively killing more than one type may be included.

일 양상에 따른 기계 학습을 이용한 세포 노화의 마커를 선별하는 방법에 의하면, 노화 세포 RNA-seq 데이터에 대한 메타분석을 진행하는 분석 파이프라인을 구축하여 충분한 샘플 수를 확보함으로써 통계적으로 보다 유의한 유전자들을 선별할 수 있고, 기계학습적 방법론을 통해 다양한 변수들 중에서 노화 세포 특징적 마커의 후보군을 찾아낼 수 있는 효과가 있다. According to a method of selecting markers of cellular aging using machine learning according to one aspect, a statistically more significant gene is identified by securing a sufficient number of samples by building an analysis pipeline that performs a meta-analysis on senescent cell RNA-seq data. It has the effect of being able to select candidates for senescent cell characteristic markers among various variables through machine learning methodology.

상기 방법에 의해 도출된 세포 노화의 마커는, 기존의 연구보다 다양한 종류의 세포들에도 세포 노화를 특정할 수 있는 더 유의한 유전자 시그니처로서, 노화 세포를 특정 또는 검출하는데 유용하게 사용될 수 있을 뿐만 아니라, 세놀리틱 제제를 선별하는데 유용하게 사용될 수 있는 효과가 있다. The marker of cellular senescence derived by the above method is a more meaningful genetic signature that can specify cellular senescence in various types of cells than existing studies, and can be usefully used to specify or detect senescent cells. , it has an effect that can be useful in selecting senolytic agents.

도 1은 일 구체예에 따른 세포 노화의 마커를 검출하는 방법을 나타낸 흐름도이다.
도 2에 일 실시예에 따른 세포 노화의 바이오 마커를 선별하기 위한 모식도이다.
도 3은 샘플 전체에 대해 발현량이 낮은 유전자를 제거하고 남은 약 14,000개의 유전자에 대한 PCA 분석 결과를 나타낸 도면이다.
도 4는 샘플 전체에 대해 발현량이 낮은 유전자를 제거하고 남은 약 14,000개의 유전자에 대한 클러스터링 분석 결과를 나타낸 도면이다.
도 5는 각 유도체(inducer type)별 차별 발현 유전자들 중 전체 유도체에 걸쳐 공통적으로 존재하는 유전자의 수를 나타낸 벤다이어그램이다; a: 발현이 증가한 유전자, b: 발현이 감소한 유전자.
도 6은 선별된 공통 발현 유전자의 GO 분석 결과를 나타낸 도면이다; a: 발현이 증가한 유전자, b: 발현이 감소한 유전자.
도 7은 선별된 공통 발현 유전자 363개에 대한 PCA 분석 결과를 나타낸 도면이다.
도 8은 선별된 공통 발현 유전자 363개에 대한 클러스터링 분석 결과를 나타낸 도면이다.
도 9는 LASSO에 의해 선별된 15개 유전자의 각 계수(Coefficient) 값을 Barplot으로 나타낸 도면이다.
도 10은 LASSO에 의해 선별된 유전자 15개에 대한 PCA 분석 결과를 나타낸 도면이다.
도 11은 LASSO에 의해 선별된 유전자 15개에 대한 클러스터링 분석 결과를 나타낸 도면이다.
도 12는 검정 셋 데이터(n=84)에서의 LASSO 선별 15개 유전자의 발현 박스플롯(boxplot)을 나타낸 도면이다; a: 양의 계수를 갖는 6개 유전자의 검정 셋 내에서의 분포, b: 음의 계수를 갖는 9개 유전자의 검정 셋 내에서의 분포.
도 13은 두 개의 세포 노화 분류 모델의 성능 평가를 위한 ROC 커브 결과를 나타낸 그래프이다; a: 일 구체예의 검정 데이터, b: 추가적인 다양한 노화 세포 데이터.
도 14는 재선별된 샘플 전체(n=147)에 대해 발현량이 낮은 유전자를 제거하고 남은 약 14,000개의 유전자에 대한 PCA 분석 결과를 나타낸 도면이다.
도 15는 각 유도체(inducer type)별 차별 발현 유전자들 중 전체 유도체에 걸쳐 공통적으로 존재하는 유전자의 수를 나타낸 벤다이어그램이다; a: 발현이 증가한 유전자, b: 발현이 감소한 유전자.
도 16은 선별된 공통 발현 유전자의 GO 분석 결과를 나타낸 도면이다; a: 발현이 증가한 유전자, b: 발현이 감소한 유전자.
도 17은 LASSO에 의해 선별된 17개 유전자의 각 계수(Coefficient) 값을 Barplot으로 나타낸 도면이다.
도 18은 LASSO에 의해 선별된 유전자 17개에 대한 PCA 분석 결과를 나타낸 도면이다.
도 19는 LASSO에 의해 선별된 유전자 17개에 대한 클러스터링 분석 결과를 나타낸 도면이다.
도 20은 검정 셋 데이터(n=68)에서의 LASSO 선별 17개 유전자의 발현 박스플롯(boxplot)을 나타낸 도면이다; a: 양의 계수를 갖는 8개 유전자의 검정 셋 내에서의 분포, b: 음의 계수를 갖는 9개 유전자의 검정 셋 내에서의 분포.
도 21은 다섯 개의 세포 노화 분류 모델의 성능 평가를 위한 ROC 커브 결과를 나타낸 그래프이다; a: 일 구체예의 검정 데이터, b: 추가적인 다양한 노화 세포 데이터.
도 22는 HUVEC 세포를 계대배양하여 노화시킨 후, 노화여부를 SA β-Gal 염색을 통해 확인한 결과이다.
도 23는 노화 유도 세포에서 GAS2L3, AMOT 및 WDR76의 발현량 감소를 RT-PCR 및 전기영동을 통해 확인한 결과이다.
도 24는 노화 유도 세포에서 GAS2L3, AMOT 및 WDR76의 발현량 감소를 qPCR을 통해 정량적으로 확인한 결과이다.Figure 1 is a flowchart showing a method for detecting markers of cellular aging according to one embodiment.
Figure 2 is a schematic diagram for selecting biomarkers of cellular aging according to an embodiment.
Figure 3 is a diagram showing the results of PCA analysis for about 14,000 genes remaining after removing genes with low expression levels for the entire sample.
Figure 4 is a diagram showing the results of clustering analysis for about 14,000 genes remaining after removing genes with low expression levels for the entire sample.
Figure 5 is a Venn diagram showing the number of genes commonly present across all derivatives among differentially expressed genes for each inducer type; a: Gene with increased expression, b: Gene with decreased expression.
Figure 6 is a diagram showing the GO analysis results of selected commonly expressed genes; a: Gene with increased expression, b: Gene with decreased expression.
Figure 7 is a diagram showing the results of PCA analysis for 363 selected commonly expressed genes.
Figure 8 is a diagram showing the results of clustering analysis for 363 selected commonly expressed genes.
Figure 9 is a barplot showing the coefficient values of each of the 15 genes selected by LASSO.
Figure 10 is a diagram showing the results of PCA analysis for 15 genes selected by LASSO.
Figure 11 is a diagram showing the results of clustering analysis for 15 genes selected by LASSO.
Figure 12 is a diagram showing the expression boxplot of 15 genes selected by LASSO in the test set data (n = 84); a: Distribution within the test set of 6 genes with positive coefficients, b: Distribution within the test set of 9 genes with negative coefficients.
Figure 13 is a graph showing ROC curve results for performance evaluation of two cell aging classification models; a: Assay data of one embodiment, b: Additional various senescent cell data.
Figure 14 is a diagram showing the results of PCA analysis for about 14,000 genes remaining after removing genes with low expression levels for all reselected samples (n=147).
Figure 15 is a Venn diagram showing the number of genes commonly present across all derivatives among differentially expressed genes for each inducer type; a: Gene with increased expression, b: Gene with decreased expression.
Figure 16 is a diagram showing the GO analysis results of selected commonly expressed genes; a: Gene with increased expression, b: Gene with decreased expression.
Figure 17 is a barplot showing the coefficient values of each of the 17 genes selected by LASSO.
Figure 18 is a diagram showing the results of PCA analysis for 17 genes selected by LASSO.
Figure 19 is a diagram showing the results of clustering analysis for 17 genes selected by LASSO.
Figure 20 is a diagram showing the expression boxplot of 17 genes selected by LASSO in the test set data (n = 68); a: Distribution within the test set of 8 genes with positive coefficients, b: Distribution within the test set of 9 genes with negative coefficients.
Figure 21 is a graph showing ROC curve results for performance evaluation of five cell aging classification models; a: Assay data of one embodiment, b: Additional various senescent cell data.
Figure 22 shows the results of subculturing HUVEC cells and aging them, and then confirming aging through SA β-Gal staining.
Figure 23 shows the results of confirming the decrease in expression levels of GAS2L3, AMOT, and WDR76 in senescence-induced cells through RT-PCR and electrophoresis.
Figure 24 shows the results of quantitatively confirming the decrease in expression levels of GAS2L3, AMOT, and WDR76 in senescence-induced cells through qPCR.

이하 본 발명을 실시예를 통하여 보다 상세하게 설명한다. 그러나, 이들 실시예는 본 발명을 예시적으로 설명하기 위한 것으로 본 발명의 범위가 이들 실시예에 한정되는 것은 아니다. Hereinafter, the present invention will be described in more detail through examples. However, these examples are for illustrative purposes only and the scope of the present invention is not limited to these examples.

실시예 1. 세포 노화의 바이오마커 규명 Example 1. Identification of biomarkers of cellular aging

도 2에 일 실시예에 따른 세포 노화의 바이오 마커를 선별하기 위한 모식도를 나타내었다. Figure 2 shows a schematic diagram for selecting biomarkers of cellular aging according to one embodiment.

구체적으로 하기와 같다. Specifically, it is as follows.

1.1 데이터 셋 준비 및 분석1.1 Data set preparation and analysis

노화 세포 관련 마커를 규명하기 위해 ENA (European Nucleotide Archive) 공공 데이터베이스 (http://www.ebi.ac.uk/ena/browser/home)　에서 RNA-seq 데이터를 이용하였다. To identify markers related to senescent cells, RNA-seq data was used from the ENA (European Nucleotide Archive) public database (http://www.ebi.ac.uk/ena/browser/home).

노화 세포의 RNA-seq 발현 데이터 셋을 포함하는 13개의 서로 다른 실험 데이터 (GSE63577, GSE72407, GSE113060, GSE70668, GSE132370, GSE130727, GSE98440, GSE109700, GSE130306, GSE94395, GSE168994, GSE72404, GSE75643)를 선정하고, 다양한 유도체(inducer type)에 의해 일어난 섬유아세포 (BJ, HFF, IMR-90, WI-38, LF1, TIG3) 데이터 및 이에 해당하는 대조군 데이터들을 Fastq 포맷으로 다운로드 하였다. 데이터는 대조군 73개, 실험군 92개의 총 165개 샘플로 구성하였으며, 각 데이터들의 특성을 노화 세포가 유도된 유도체(Replicative senescence, Oncogene induced senescence(OIS), Therapy induced senescence(TIS))별로 정리하였다. 또한, 추후 분석을 통해 최종적으로 선별된 유전자들을 검증하기 위해, 각 GSE별로 훈련 셋과 검정 셋을 할당하였다. 데이터 관련 논문들을 참고하여 훈련 데이터 및 모델 검정을 위한 검정 데이터를 약 5:5의 비율로 할당하였다. 각 데이터들은 모두 Illumina Hiseq 플랫폼으로 생성된 Fastq 데이터들로 구성되어 있으며, 데이터 특성 정보를 하기 표 1에 나타내었다. Data from 13 different experiments containing RNA-seq expression data sets of senescent cells (GSE63577, GSE72407, GSE113060, GSE70668, GSE132370, GSE130727, GSE98440, GSE109700, GSE130306, GSE94395, GSE168994, , GSE75643) were selected, and various Fibroblast (BJ, HFF, IMR-90, WI-38, LF1, TIG3) data generated by inducer type and the corresponding control data were downloaded in Fastq format. The data consisted of a total of 165 samples, 73 in the control group and 92 in the experimental group, and the characteristics of each data were organized by derivatives in which senescent cells were induced (Replicative senescence, Oncogene induced senescence (OIS), Therapy induced senescence (TIS)). In addition, in order to verify the ultimately selected genes through later analysis, a training set and a test set were assigned to each GSE. By referring to data-related papers, training data and test data for model testing were allocated in a ratio of approximately 5:5. Each data consists of Fastq data generated using the Illumina Hiseq platform, and data characteristic information is shown in Table 1 below.

샘플 정보sample information GSE numberGSE number 샘플의 수number of samples
(대조군 : 노화 세포군)(Control group: senescent cell group) 세포노화 종류Types of cellular aging
(Inducer types)(Inducer types) 세포의 기원origin of cells 카테고리category
(Tr(훈련 셋)/Te(검정 셋)(Tr(training set)/Te(testing set) GSE63577GSE63577 23 (12 : 11)23 (12:11) ReplicativeReplicative BJ, HFF, IMR90, WI38BJ, HFF, IMR90, WI38 TrTr GSE72407GSE72407 22 (8 : 14)22 (8:14) OIS, TISOIS, TIS IMR90IMR90 TrTr GSE113060GSE113060 10 (5　: 5)10 (5　: 5) OISOIS IMR90IMR90 TrTr GSE70668GSE70668 6 (3 : 3)6 (3:3) OISOIS IMR90IMR90 TrTr GSE132370GSE132370 6 (3 : 3)6 (3:3) TISTIS IMR90IMR90 TrTr GSE130727GSE130727 26 (12 : 14)26 (12:14) Replicative, TIS, OISReplicative, TIS, OIS IMR90, WI38IMR90, WI38 Tr/TeTr/Te GSE98440GSE98440 8 (4 : 4)8 (4:4) ReplicativeReplicative IMR90IMR90 TeTe GSE109700GSE109700 9 (3 : 6)9 (3:6) ReplicativeReplicative LF1LF1 TeTe GSE130306GSE130306 6 (3 : 3)6 (3:3) ReplicativeReplicative WI38WI38 TeTe GSE94395GSE94395 6 (3 : 3)6 (3:3) TISTIS IMR90IMR90 TeTe GSE168994GSE168994 4 (2 : 2)4 (2:2) TISTIS IMR90IMR90 TeTe GSE72404GSE72404 24 (6 : 18)24 (6:18) OISOIS IMR90IMR90 TeTe GSE75643GSE75643 15 (9 : 6)15 (9:6) OISOIS TIG3TIG3 TeTe TotalTotal 165 (73 : 92)165 (73:92) -- -- 　

아울러, 훈련 셋 및 검정 셋의 분할 정보는 하기 표 2에 나타내었다. In addition, the division information of the training set and test set is shown in Table 2 below.

샘플 카테고리Sample Category 대조군control group RSR.S. OISOIS TISTIS TotalTotal 훈련 셋training set 3737 1313 1717 1414 8181 검정 셋black three 3636 1515 2424 99 8484 TotalTotal 7373 2828 4141 2323 165165

훈련 셋과 검정 셋의 비율은 50:50으로 분할하였으며, 훈련 셋과 검증 셋 내에 특정 유도체 데이터가 편향되어 분포하지 않도록 대략적으로 일정한 비율로 존재하도록 하였다. The ratio of the training set and the test set was split 50:50, and the specific derivative data in the training set and validation set was kept in an approximately constant ratio to prevent biased distribution.

1.2 데이터 전처리 및 메타분석1.2 Data preprocessing and meta-analysis

실시예 1.1의 Fastq 파일들을 동일한 파이프라인을 통해 정량화하여, 서로 다른 실험에서 파생되는 배치효과 (Batch effect)를 최소화하고자 SALMON 파이프라인을 제작하여 분석하였다. SALMON 파이프라인은 도 2에 개시된 바와 같이, QC (fastQC, v0.11.9), Trimming (Trimmomatics, v0.39), Mapping (SALMON, v1.10), Raw count quantification (tximeta, v1.4.5), Batch correction (limma, v3.42.2) 단계로 구성되어 있다 (Tximeta 및 limma는 R program v3.6.3 내에서 분석되었음). The Fastq files of Example 1.1 were quantified through the same pipeline, and the SALMON pipeline was created and analyzed to minimize batch effects derived from different experiments. As shown in Figure 2, the SALMON pipeline includes QC (fastQC, v0.11.9), Trimming (Trimmomatics, v0.39), Mapping (SALMON, v1.10), Raw count quantification (tximeta, v1.4.5), and Batch. It consists of the correction (limma, v3.42.2) step (Tximeta and limma were analyzed within the R program v3.6.3).

1.3 탐색적 자료분석 (Exploratory data analysis)1.3 Exploratory data analysis

각각의 GSE별 발현 데이터에 대해 TMM normalization & Log₂CPM (count per million) 정규화 이후 limma (v3.42.2)의 removeBatchEffect function을 이용하여 배치 효과를 확인하였다. 이후에, Batch-corrected, TMM normalized log₂CPM 값을 이용하여 발현 데이터를 분석하였고, 샘플 전체에 걸쳐 발현량이 낮은 유전자들을 edgeR 패키지를 통해 제거하고, 약 14,000개의 유전자만을 사용하여 분석하였다. 상기 약 14,000개의 유전자에 대해 PCA (Principal component analysis) 분석을 진행하였고, Spearman correlation을 통해 PCA 및 클러스터링 결과를 각각 도 3 및 도 4에 나타내었다. For the expression data for each GSE, the batch effect was confirmed using the removeBatchEffect function of limma (v3.42.2) after TMM normalization & Log ₂ CPM (count per million) normalization. Afterwards, expression data were analyzed using batch-corrected, TMM normalized log ₂ CPM values, genes with low expression levels across the samples were removed using the edgeR package, and only about 14,000 genes were used for analysis. Principal component analysis (PCA) analysis was performed on the approximately 14,000 genes, and the PCA and clustering results are shown in Figures 3 and 4, respectively, through Spearman correlation.

도 3은 샘플 전체에 대해 발현량이 낮은 유전자를 제거하고 남은 약 14,000개의 유전자에 대한 PCA 분석 결과를 나타낸 도면이다. Figure 3 is a diagram showing the results of PCA analysis for about 14,000 genes remaining after removing genes with low expression levels for the entire sample.

도 4는 샘플 전체에 대해 발현량이 낮은 유전자를 제거하고 남은 약 14,000개의 유전자에 대한 클러스터링 분석 결과를 나타낸 도면이다. Figure 4 is a diagram showing the results of clustering analysis for about 14,000 genes remaining after removing genes with low expression levels for the entire sample.

1.4 차별발현 유전자 분석 (Differentially expressed genes analysis, DEG analysis)1.4 Differentially expressed genes analysis (DEG analysis)

훈련 데이터 중 유도체별로 분류된 샘플 그룹 각각에 대해, 정상 세포 그룹 대비 노화 세포 그룹에서 발현량이 증가 혹은 감소한 유전자들을 선별하기 위해 차별발현 유전자 분석을 진행하였다. 분석은 R package edgeR (v3.10)을 통해 진행되었으며 TMM normalization 이후 샘플의 노화 상태 정보와 배치 정보에 따라 분석 디자인을 구축하였다. 여러 번 가설 검정시 발생하는 multiple comparison 문제를 해결하기 위해 조정된 p값(adjusted p-values)을 계산하였고, 발현량이 변화된 유전자들 중 조정된 p값이 0.05 이하인 기준을 만족하면서, 동시에 발현값의 변화량(fold change)이 2배 이상(up-regulated) 혹은 1/2배 이하(down-regulated)인 유전자들만을 선별하였다. 각각의 유도체(inducer type)별로 살펴보면, RS에서 총 1,444개의 유전자(up-regulated genes : 877, down-regulated genes : 567)가, OIS에서 총 1,647개의 유전자(up-regulated genes : 602, down-regulated genes : 1,045), TIS에서 총 1,405개의 유전자(up-regulated genes : 802, down-regulated genes : 603)가 차별발현 유전자로 선별되었고, 이를 하기 표 3에 나타내었다. For each sample group classified by derivative among the training data, differential expression gene analysis was performed to select genes whose expression level increased or decreased in the senescent cell group compared to the normal cell group. The analysis was conducted using the R package edgeR (v3.10), and after TMM normalization, the analysis design was constructed according to the aging state information and batch information of the sample. To solve the multiple comparison problem that occurs when testing multiple hypotheses, adjusted p-values were calculated. Among genes whose expression levels were changed, the adjusted p-values satisfied the standard of being less than 0.05 and at the same time, the expression values Only genes with a fold change of more than 2-fold (up-regulated) or 1/2-fold or less (down-regulated) were selected. Looking at each inducer type, a total of 1,444 genes (up-regulated genes: 877, down-regulated genes: 567) in RS and a total of 1,647 genes (up-regulated genes: 602, down-regulated genes) in OIS. genes: 1,045), a total of 1,405 genes (up-regulated genes: 802, down-regulated genes: 603) in TIS were selected as differentially expressed genes, and these are shown in Table 3 below.

차별발현 유전자 분석 정보Differential expression gene analysis information Inducer typeInducer type ComparisonComparison
(Control : Senescent)(Control: Sensitive) Up-regulated genesUp-regulated genes Down-regulated genesDown-regulated genes Total DEGsTotal DEGs RSR.S. 13 : 1313:13 877877 567567 1,4441,444 OISOIS 16 : 1716:17 602602 1,0451,045 1,6471,647 TISTIS 14 : 1414:14 802802 603603 1,4051,405

유도체(inducer type)별 차별발현유전자(DEG) 중 유도체의 종류와 무관하게 발현되는 세포 노화의 특징적 유전자를 선별하기 위해 유도체별 DEG 결과를 up-regulated gene과 down-regulated gene으로 나눈 후, 벤다이어그램으로 각각 나타내어 유전자의 교집합을 선별하였다. 상기 벤다이어그램은 도 5에 나타내었다. 도 5는 각 유도체(inducer type)별 차별 발현 유전자들 중 전체 유도체에 걸쳐 공통적으로 존재하는 유전자의 수를 나타낸 벤다이어그램이다; a: 발현이 증가한 유전자, b: 발현이 감소한 유전자. In order to select the characteristic genes of cellular aging that are expressed regardless of the type of inducer among differentially expressed genes (DEGs) by inducer type, the DEG results for each inducer were divided into up-regulated genes and down-regulated genes, and a Venn diagram was drawn. The intersection of genes was selected by indicating . The Venn diagram is shown in Figure 5. Figure 5 is a Venn diagram showing the number of genes commonly present across all derivatives among differentially expressed genes for each inducer type; a: Gene with increased expression, b: Gene with decreased expression.

도 5에 나타낸 바와 같이, 총 85개의 up-regulated gene과 278개의 down-regulated gene의 총 363개의 유전자가 선별되었고, 이 유전자들은 유도체의 종류와 무관하게 노화 세포에서 발현되고 있음을 의미한다. As shown in Figure 5, a total of 363 genes, including 85 up-regulated genes and 278 down-regulated genes, were selected, indicating that these genes are expressed in senescent cells regardless of the type of derivative.

이후에, 선별된 공통 발현 유전자들이 세포 노화적 특성을 반영하는 유전자 세트인지 확인하기 위해, gProfiler (https://biit.cs.ut.ee/gprofiler/)를 활용하여 GO(gene ontology) 분석을 시행하여 363개의 유전자와 관련있는 생물학적 경로들을 탐색하였다. 유전자들의 발현 증감 방향성에 따라 각각 GO 분석을 진행하였고, GO 결과는 조정된 p-value 값이 0.05 이하인 결과만을 선별하였으며, 그 결과를 도 6에 나타내었다. Afterwards, to confirm that the selected commonly expressed genes are a set of genes reflecting cell aging characteristics, GO (gene ontology) analysis was performed using gProfiler (https://biit.cs.ut.ee/gprofiler/). By conducting this study, biological pathways related to 363 genes were explored. GO analysis was performed for each gene according to the direction of increase or decrease in expression, and only GO results with an adjusted p-value of 0.05 or less were selected, and the results are shown in Figure 6.

도 6은 선별된 공통 발현 유전자의 GO 분석 결과를 나타낸 도면이다; a: 발현이 증가한 유전자, b: 발현이 감소한 유전자. Figure 6 is a diagram showing the GO analysis results of selected commonly expressed genes; a: Gene with increased expression, b: Gene with decreased expression.

도 6에 나타낸 바와 같이, 노화 세포에서 발현이 공통적으로 증가한 85개의 유전자는 세포 노화의 알려진 특징인 DNA-damage response, SASP(Senescence associated secretory phenotype), Inhibition of DNA recombination at telomere, Immune response 및 여러가지 Cellular senescence pathway 관련 특성들이 유의미하게 나타남을 확인할 수 있었다. 또한, 노화 세포에서 발현이 공통적으로 감소한 278개 유전자의 경우 마찬가지로 이미 알려진 세포 노화의 특성에 해당하는 cell cycle 과 관련된 생물학적 경로들 및 DNA replication, DNA repair, telomere maintenance, chromosome organization 및 segregation 과 연관이 있음을 확인하였다. 이를 통해 전체 363개의 교집합 유전자 세트가 세포 노화의 특성을 잘 반영함을 확인할 수 있었다. As shown in Figure 6, the 85 genes whose expression was commonly increased in senescent cells were known characteristics of cellular senescence, such as DNA-damage response, SASP (Senescence associated secretory phenotype), Inhibition of DNA recombination at telomere, Immune response, and various cellular genes. It was confirmed that characteristics related to the senescence pathway appeared significantly. In addition, the 278 genes whose expression is commonly reduced in aging cells are also related to biological pathways related to the cell cycle, DNA replication, DNA repair, telomere maintenance, chromosome organization, and segregation, which correspond to the already known characteristics of cellular aging. was confirmed. Through this, it was confirmed that a total of 363 intersection gene sets well reflect the characteristics of cellular aging.

다음으로, 상기 선별된 공통 발현 유전자 363개를 이용하여 1.3.과 동일하게 PCA 분석 및 클러스터링 분석을 진행하였고, 그 결과를 각각 도 7 및 도 8에 나타내었다. Next, PCA analysis and clustering analysis were performed in the same manner as in 1.3 using the 363 commonly expressed genes selected above, and the results are shown in Figures 7 and 8, respectively.

도 7은 선별된 공통 발현 유전자 363개에 대한 PCA 분석 결과를 나타낸 도면이다. Figure 7 is a diagram showing the results of PCA analysis for 363 selected commonly expressed genes.

도 8은 선별된 공통 발현 유전자 363개에 대한 클러스터링 분석 결과를 나타낸 도면이다. Figure 8 is a diagram showing the results of clustering analysis for 363 selected commonly expressed genes.

도 7에 나타낸 바와 같이, 선별된 공통 발현 유전자 363개의 PCA 결과는 도 3의 약 14,000개의 유전자의 PCA 결과에 비해 PC1에 의한 데이터의 분류가 더 잘 설명되나, 노화 세포와 대조군 데이터를 명확하게 구분하지 못함을 알 수 있었다. As shown in Figure 7, the PCA results of 363 selected commonly expressed genes explain the classification of data by PC1 better than the PCA results of about 14,000 genes in Figure 3, but clearly distinguish between senescent cells and control data. I found out that I couldn't do it.

도 8에 나타낸 바와 같이, 선별된 공통 발현 유전자 363개의 클러스터링 결과는 도 4의 약 14,000개의 유전자의 클러스터링 결과와 비교하여 유전자들의 발현 패턴이 노화 세포와 대조군 사이에서 구별되지만 노화 세포와 대조군 데이터를 명확하게 구분하지는 못함을 알 수 있었다. As shown in Figure 8, the clustering results of 363 selected commonly expressed genes are compared with the clustering results of about 14,000 genes in Figure 4. Although the expression patterns of the genes are differentiated between senescent cells and the control group, the data for senescent cells and the control group are clearly distinct. It was found that it was not possible to distinguish clearly.

1.5 LASSO (least absolute shrinkage and selection operator)를 이용한 변수 선택1.5 Variable selection using LASSO (least absolute shrinkage and selection operator)

선별된 363개의 유전자 중 세포 노화의 특성을 가장 잘 나타내는 최소한의 유전자를 선별하고자 LASSO 회귀를 통해 유전자에 대한 변수 선택 과정을 진행하였다. To select the minimum number of genes that best represent the characteristics of cellular aging among the 363 selected genes, a variable selection process for genes was performed through LASSO regression.

구체적으로, LASSO 회귀는 R package glmnet 4.0-2를 통해 진행되었고, 훈련 데이터 총 81개 데이터 셋의 TMM normalized Log₂CPM (count per million) 값을 사용하여 샘플의 상태에 따라 Control(=0) 혹은 Senescent(=1) 값을 할당한 후 binary classification 모델에 대한 파라미터 조정을 진행하여 선별된 결과값이다. Leave-one-out-cross-validation을 통해 최적의 lambda (λ값 0.006)을 설정한 후 모델을 적합하여 최종적으로 15개의 유전자를 선별하였다. 해당 유전자는 ENSG00000048392(RRM2B), ENSG00000139318(DUSP6), ENSG00000186088(GSAP), ENSG00000151320(AKAP6), ENSG00000228486(C2orf92), ENSG00000285901(AC008012.1), ENSG00000013297(CLDN11), ENSG00000198830(HMGN2), ENSG00000159259 (CHAF1B), ENSG00000003096(KLHL13), ENSG00000213347(MXD3), ENSG00000158321(AUTS2), ENSG00000166833(NAV2), ENSG00000010292(NCAPD2), ENSG00000176894(PXMP2) 이며, 각 유전자들의 계수(coefficient)를 통해 세포 노화에서의 해당 유전자의 발현 방향성(+/-) 및 영향력을 나타낸다. 총 6개의 유전자 (RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1)는 양의 계수를 지니므로 노화 세포에서 발현이 증가한 것이고, 나머지 9개의 유전자 (CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2, NAV2, NCAPD2, PXMP2)는 음의 계수를 지니므로 노화 세포에서의 발현이 감소한 것이다. Specifically, LASSO regression was performed through R package glmnet 4.0-2, and TMM normalized Log ₂ CPM (count per million) values of a total of 81 training data sets were used to control (=0) or control depending on the state of the sample. This is the result selected by assigning a Senescent (=1) value and then adjusting the parameters for the binary classification model. After setting the optimal lambda (λ value 0.006) through leave-one-out-cross-validation, the model was fit and finally 15 genes were selected. The genes in question are ENSG00000048392 (RRM2B), ENSG00000139318 (DUSP6), ENSG00000186088 (GSAP), ENSG00000151320 (AKAP6), ENSG00000228486 (C2orf92), and ENSG00000285901 (AC0). 08012.1), ENSG00000013297 (CLDN11), ENSG00000198830 (HMGN2), ENSG00000159259 (CHAF1B), ENSG00000003096(KLHL13), ENSG00000213347(MXD3), ENSG00000158321(AUTS2), ENSG00000166833(NAV2), ENSG00000010292(NCAPD2), ENSG00000176894(PXMP2) And the direction of expression of the corresponding gene in cellular aging is determined through the coefficient of each gene ( +/-) and influence. A total of 6 genes (RRM2B, DUSP6, GSAP, AKAP6, C2orf92, AC008012.1) have positive coefficients, indicating increased expression in senescent cells, and the remaining 9 genes (CLDN11, HMGN2, CHAF1B, KLHL13, MXD3, AUTS2) , NAV2, NCAPD2, and PXMP2) have negative coefficients, indicating reduced expression in senescent cells.

상기 LASSO 회귀를 통해 유전자에 대한 변수 선택 과정을 진행한 결과는 하기 표 4 및 도 9에 나타내었다. The results of the variable selection process for genes through the LASSO regression are shown in Table 4 and Figure 9 below.

LASSO 회귀를 통해 선별된 세포 노화 특징 유전자 Cellular aging characteristic genes selected through LASSO regression Ensembl gene idEnsembl gene id SYMBOLSYMBOL CoefficientCoefficient DescriptionDescription ENSG00000048392ENSG00000048392 RRM2BRRM2B 1.851.85 Ribonucleotide Reductase Regulatory TP53 Inducible Subunit M2BRibonucleotide Reductase Regulatory TP53 Inducible Subunit M2B ENSG00000139318ENSG00000139318 DUSP6DUSP6 0.450.45 Dual Specificity Phosphatase 6Dual Specificity Phosphatase 6 ENSG00000186088ENSG00000186088 GSAPGSAP 0.450.45 Gamma-Secretase Activating ProteinGamma-Secretase Activating Protein ENSG00000151320ENSG00000151320 AKAP6AKAP6 0.280.28 A-Kinase Anchoring Protein 6A-Kinase Anchoring Protein 6 ENSG00000228486ENSG00000228486 C2orf92C2orf92 0.160.16 Chromosome 2 Open Reading Frame 92Chromosome 2 Open Reading Frame 92 ENSG00000285901ENSG00000285901 AC008012.1AC008012.1 0.130.13 Novel TranscriptNovel Transcript ENSG00000013297ENSG00000013297 CLDN11CLDN11 -0.02-0.02 Claudin 11Claudine 11 ENSG00000198830ENSG00000198830 HMGN2HMGN2 -0.03-0.03 High Mobility Group Nucleosomal Binding Domain 2High Mobility Group Nucleosomal Binding Domain 2 ENSG00000159259ENSG00000159259 CHAF1BCHAF1B -0.14-0.14 Chromatin Assembly Factor 1 Subunit BChromatin Assembly Factor 1 Subunit B ENSG00000003096ENSG00000003096 KLHL13KLHL13 -0.26-0.26 Kelch Like Family Member 13Kelch Like Family Member 13 ENSG00000213347ENSG00000213347 MXD3MXD3 -0.32-0.32 MAX Dimerization Protein 3MAX Dimerization Protein 3 ENSG00000158321ENSG00000158321 AUTS2AUTS2 -0.41-0.41 Activator Of Transcription And Developmental Regulator AUTS2Activator Of Transcription And Developmental Regulator AUTS2 ENSG00000166833ENSG00000166833 NAV2NAV2 -0.51-0.51 Neuron Navigator 2Neuron Navigator 2 ENSG00000010292ENSG00000010292 NCAPD2NCAPD2 -0.94-0.94 Non-SMC Condensin I Complex Subunit D2Non-SMC Condensin I Complex Subunit D2 ENSG00000176894ENSG00000176894 PXMP2PXMP2 -1.19-1.19 Peroxisomal Membrane Protein 2Peroxisomal Membrane Protein 2

도 9는 LASSO에 의해 선별된 15개 유전자의 각 계수(Coefficient) 값을 Barplot으로 나타낸 도면이다. Figure 9 is a barplot showing the coefficient values of each of the 15 genes selected by LASSO.

이후에, 81개의 훈련 셋 데이터에 대해 LASSO에 의해 선별된 15개의 유전자를 이용하여 상기 1.3.과 동일하게 PCA 및 클러스터링 분석을 진행하였고, 그 결과를 각각 도 10 및 도 11에 나타내었다. Afterwards, PCA and clustering analysis were performed in the same manner as in 1.3 above using 15 genes selected by LASSO for the 81 training set data, and the results are shown in Figures 10 and 11, respectively.

도 10은 LASSO에 의해 선별된 유전자 15개에 대한 PCA 분석 결과를 나타낸 도면이다. Figure 10 is a diagram showing the results of PCA analysis for 15 genes selected by LASSO.

도 11은 LASSO에 의해 선별된 유전자 15개에 대한 클러스터링 분석 결과를 나타낸 도면이다. Figure 11 is a diagram showing the results of clustering analysis for 15 genes selected by LASSO.

도 10에 나타낸 바와 같이, LASSO에 의해 선별된 유전자 15개의 PCA 결과는 도 3 및 도 7의 PCA 결과와 비교하여, PC1에 의해 노화 세포와 대조군 데이터가 명확하게 구분되는 것을 확인 할 수 있으며 PC1의 데이터 구분에 대한 설명도 증가하였음을 알 수 있었다. As shown in Figure 10, the PCA results of 15 genes selected by LASSO are compared with the PCA results of Figures 3 and 7, and it can be seen that senescent cells and control data are clearly distinguished by PC1, and the It was found that explanations for data classification also increased.

도 11에 나타낸 바와 같이, LASSO에 의해 선별된 유전자 15개의 클러스터링 결과는 도 4 및 도 8의 클러스터링 결과와 비교하여 유전자들의 발현 패턴이 노화 세포와 대조군 사이에서 명확하게 구별되며, 노화 세포와 대조군 데이터를 명확하게 구분함을 알 수 있었다. As shown in Figure 11, the clustering results of 15 genes selected by LASSO show that the expression patterns of genes are clearly distinguished between senescent cells and the control group compared to the clustering results in Figures 4 and 8, and the senescent cell and control data was found to be clearly distinguished.

이상의 결과는 LASSO에 의해 선별된 15개의 유전자는 훈련 셋 데이터 내에서 노화 세포의 특성을 잘 반영함을 나타냄을 의미한다. The above results mean that the 15 genes selected by LASSO well reflect the characteristics of senescent cells within the training set data.

실시예 2. 선별된 노화 세포 마커 후보 유전자들의 검정 Example 2. Assay of selected senescent cell marker candidate genes

2.1 검정 데이터에서의 대조군 그룹 및 세포 노화 그룹간 유전자 발현 차이 확인2.1 Confirmation of gene expression differences between the control group and the cell aging group in the test data

실시예 1에서 선정된 15개의 노화 세포 마커 후보 유전자들의 발현 패턴이 15개 유전자의 선별 과정에 관여하지 않은 84개의 검정 데이터 내에서도 유의미한 발현 차이를 보이는지 확인하였다. It was confirmed whether the expression patterns of the 15 senescent cell marker candidate genes selected in Example 1 showed significant expression differences even within 84 test data that were not involved in the selection process of the 15 genes.

구체적으로, 84개의 검정 데이터에 대해 훈련 데이터와 동일한 과정으로 TMM 정규화 및 limma removeBatchEffect function을 통해 배치 효과를 제거한 후 Log₂CPM 변환 값을 이용하여 대조군 그룹(n = 36)과 노화세포 그룹(n = 48) 사이에서 15개 유전자의 발현에 유의미한 차이가 존재하는지 확인하였고, 그 결과를 도 12에 나타내었다. Specifically, for the 84 test data, the batch effect was removed through TMM normalization and the limma removeBatchEffect function in the same process as the training data, and then _the control group (n = 36) and the senescent cell group (n = 48), it was confirmed whether there was a significant difference in the expression of 15 genes, and the results are shown in Figure 12.

도 12는 검정 셋 데이터(n=84)에서의 LASSO 선별 15개 유전자의 발현 박스플롯(boxplot)을 나타낸 도면이다; a: 양의 계수를 갖는 6개 유전자의 검정 셋 내에서의 분포, b: 음의 계수를 갖는 9개 유전자의 검정 셋 내에서의 분포. Figure 12 is a diagram showing the expression boxplot of 15 genes selected by LASSO in the assay set data (n=84); a: Distribution within the test set of 6 genes with positive coefficients, b: Distribution within the test set of 9 genes with negative coefficients.

2.2 세포 노화 예측 모델 제작2.2 Creation of cellular aging prediction model

15개의 선별된 유전자들을 훈련 데이터 셋에 적용하여 노화 세포의 상태를 예측하는 2 개의 모델을 제작하여 선별된 유전자들의 유효성을 검증하였다. The effectiveness of the selected genes was verified by applying 15 selected genes to the training data set to create two models that predict the state of senescent cells.

구체적으로, 제1 모델은 서포트 벡터 머신(Support vector machine)의 RBF kernel을 통해 제작하였으며, R package e1071 (version 1.7-3)이 사용되었다. 모델의 학습에 사용된 데이터는 상기 학습에 사용된 81개의 훈련 데이터에 대한 15개 유전자의 발현값이 사용되었으며, LOOCV (leave one out cross-validation)을 통해 최적의 분류 모델을 선택하여 제작하였다. 제2 모델은 타 세포 노화 관련 연구 문헌 (Hernandez-Segura et al. 2017) 에서 선정된 55개의 세포 노화 유전자의 발현 값을 훈련 데이터 셋에 적용하여 제1 모델과 동일한 방식으로 SVM 모델을 제작하였다. 제2 모델은 제1 모델과 노화 세포 분류 성능을 비교하기 위해 제작하였다. Specifically, the first model was produced using the RBF kernel of a support vector machine, and R package e1071 (version 1.7-3) was used. The data used to learn the model were the expression values of 15 genes for the 81 training data used for learning, and the optimal classification model was selected and produced through LOOCV (leave one out cross-validation). For the second model, an SVM model was created in the same manner as the first model by applying the expression values of 55 cellular aging genes selected from other cell aging-related research literature (Hernandez-Segura et al. 2017) to the training data set. The second model was created to compare the senescent cell classification performance with the first model.

2.3 세포 노화 예측 모델 성능 평가2.3 Cell aging prediction model performance evaluation

상기 2.2에서 제작한 두 개의 모델을 이용하여 84개의 노화 섬유아세포 검정 데이터 (대조군 36개, 노화 세포 48개)에 대해 분류 검정을 진행하였고, 그 결과를 도 13A에 나타내었다. 추가로 상기 2.2.에서 제작한 두 개의 모델을 이용하여 42개의 다양한 노화 세포 데이터 (대조군 : 18, 노화 세포 : 24, 섬유아세포가 아닌 42개의 서로 다른 종류 (HUVEC, HAEC, MSC, LS8817, Ovcar3))에 대해 분류 검정을 진행하였고, 그 결과를 도 13B에 나타내었다. 검정 데이터의 Log₂CPM 값들은 훈련 데이터의 발현 값의 평균 및 표준편차를 기준으로 스케일링 후 검정 하였다. A classification test was performed on 84 senescent fibroblast test data (36 control, 48 senescent cells) using the two models created in 2.2 above, and the results are shown in Figure 13A. Additionally, using the two models created in 2.2 above, data on 42 different senescent cells (control: 18, senescent cells: 24, 42 different types other than fibroblasts (HUVEC, HAEC, MSC, LS8817, Ovcar3) ) was subjected to a classification test, and the results are shown in Figure 13B. The Log ₂ CPM values of the test data were tested after scaling based on the mean and standard deviation of the expression values of the training data.

도 13은 두 개의 세포 노화 분류 모델의 성능 평가를 위한 ROC 커브 결과를 나타낸 그래프이다; a: 일 구체예의 검정 데이터, b: 추가적인 다양한 노화 세포 데이터. Figure 13 is a graph showing ROC curve results for performance evaluation of two cell aging classification models; a: Assay data of one embodiment, b: Additional various senescent cell data.

도 13에 나타낸 바와 같이, 두 결과 모두에서 LASSO에 의해 선정된 15개의 유전자 모델이 더 좋은 성능을 보이는 것을 확인 할 수 있었다. As shown in Figure 13, it was confirmed that the 15 gene model selected by LASSO showed better performance in both results.

a: LASSO 15 gene model AUC: 98.9% > Hernandez et al, 55 genes model AUC: 97.6% a: LASSO 15 gene model AUC: 98.9% > Hernandez et al, 55 genes model AUC: 97.6%

b: LASSO 15 gene model AUC: 86.6% > Hernandez et al, 55 genes model AUC: 75.9% b: LASSO 15 genes model AUC: 86.6% > Hernandez et al, 55 genes model AUC: 75.9%

따라서 실시예 2의 결과들을 토대로, 일 구체예에 따라 선별된 15개의 유전자의 발현 양상이 LASSO의 계수 부호 양상과 동일하게, 정상 세포와 노화 세포 그룹간 유의미한 차이를 보이는 것을 확인할 수 있었다. 또한, 15개의 유전자로 구축한 노화 세포 분류 모델이 기존에 연구에서 알려진 55개의 세포 노화 관련 유전자로 구축한 노화 세포 분류 모델보다 세포 노화의 특성을 더 잘 나타냄을 확인 할 수 있었다. 또한, 노화 섬유아세포를 포함한 다양한 세포들을 고려하여 특징 유전자를 제시한 기존 연구 (Hernandez-Segura et al. 2017)하였고, 일 구체예에 따른 방법은 노화 섬유아세포만을 훈련 데이터에 사용하여 유전자를 선별하였음에도 불구하고, 다양한 종류의 세포들에 대해서도 기존 연구보다 더 좋은 분류 성능을 보임을 확인할 수 있었다. Therefore, based on the results of Example 2, it was confirmed that the expression pattern of the 15 genes selected according to one embodiment showed a significant difference between the normal cell and senescent cell groups, identical to the coefficient sign pattern of LASSO. In addition, it was confirmed that the senescent cell classification model constructed with 15 genes better represents the characteristics of cellular aging than the senescent cell classification model constructed with 55 cellular aging-related genes previously known in research. In addition, a previous study presented characteristic genes considering various cells, including senescent fibroblasts (Hernandez-Segura et al. 2017), and although the method according to one embodiment selected genes using only senescent fibroblasts as training data, Nevertheless, it was confirmed that better classification performance was achieved for various types of cells than in previous studies.

이상의 결과는 일 구체예에 따른 방법은 섬유아세포를 포함한 다양한 종류의 세포들에서 통계적으로 더 유의한 마커를 선별할 수 있음을 의미하고, 일 구체예에 따른 방법으로 도출된 15개의 유전자는 노화 세포를 특정 또는 검출하는데 유용하게 사용될 수 있음을 의미한다. The above results mean that the method according to one embodiment can select statistically more significant markers in various types of cells, including fibroblasts, and the 15 genes derived by the method according to one embodiment are senescent cells. This means that it can be usefully used to specify or detect.

실시예 3. 추가적인 세포 노화의 바이오마커Example 3. Additional Biomarkers of Cellular Aging

상기 실시예 1과 동일한 방법 (도 2)을 이용하되, 1.1 데이터 셋 준비 및 분석의 표 1에 명시된 샘플들 중 노화 특성이 상이한 18개의 샘플들을 제거한 뒤 노화 세포에서 발현이 변화하는 17개 유전자를 추가로 선별하여 보다 정교한 유전자들을 획득하였다. The same method as Example 1 (FIG. 2) was used, but after removing 18 samples with different aging characteristics among the samples specified in Table 1 in 1.1 Data Set Preparation and Analysis, 17 genes whose expression changes in senescent cells were analyzed. Through additional screening, more sophisticated genes were obtained.

3.1 데이터 셋 준비 및 분석3.1 Data set preparation and analysis

상기 실시예 1의 165개 섬유아세포 샘플들 중 18개 샘플을 추가로 제거하여 총 147개의 섬유아세포 샘플들을 획득하였다. 제거된 샘플들의 정보는 다음과 같다. Of the 165 fibroblast samples in Example 1, 18 samples were additionally removed to obtain a total of 147 fibroblast samples. Information on the removed samples is as follows.

GSE72404에서는 기존에 활용된 24개 샘플 중 12개의 notch-induced senescence 샘플을 제거하였다. 이는 notch-induced senescence가 primary senescence (RS, OIS 및 TIS) 와 상이한 특성을 보임이 알려져 있기 때문이다. GSE1307272에서는 기존의 26개 샘플에서 4개의 대조군과 2개의 OIS 샘플을 제거하였다. 해당 샘플들은 PCA 결과 이상치로 판단될 수 있어 제거하였다. 데이터 특성 정보를 하기 표 5에 나타내었다.In GSE72404, 12 notch-induced senescence samples out of 24 previously used samples were removed. This is because it is known that notch-induced senescence shows different characteristics from primary senescence (RS, OIS and TIS). In GSE1307272, 4 control groups and 2 OIS samples were removed from the existing 26 samples. These samples were removed because they could be considered outliers in the PCA results. Data characteristic information is shown in Table 5 below.

재선별된 샘플 정보Reselected Sample Information GSE numberGSE number 샘플의 수number of samples
(대조군 : 노화 세포군)(Control group: senescent cell group) 세포노화 종류Types of cellular aging
(Inducer types)(Inducer types) 세포의 기원origin of cells 카테고리category
(Tr(훈련 셋)/Te(검정 셋)(Tr(training set)/Te(testing set) GSE63577GSE63577 23 (12 : 11)23 (12:11) ReplicativeReplicative BJ, HFF, IMR90, WI38BJ, HFF, IMR90, WI38 TrTr GSE72407GSE72407 22 (8 : 14)22 (8:14) OIS, TISOIS, TIS IMR90IMR90 TrTr GSE113060GSE113060 10 (5　: 5)10 (5　: 5) OISOIS IMR90IMR90 TrTr GSE70668GSE70668 6 (3 : 3)6 (3:3) OISOIS IMR90IMR90 TrTr GSE132370GSE132370 6 (3 : 3)6 (3:3) TISTIS IMR90IMR90 TrTr GSE130727GSE130727 20 (8 : 12)20 (8:12) Replicative, TIS, OISReplicative, TIS, OIS IMR90, WI38IMR90, WI38 Tr/TeTr/Te GSE98440GSE98440 8 (4 : 4)8 (4:4) ReplicativeReplicative IMR90IMR90 TeTe GSE109700GSE109700 9 (3 : 6)9 (3:6) ReplicativeReplicative LF1LF1 TeTe GSE130306GSE130306 6 (3 : 3)6 (3:3) ReplicativeReplicative WI38WI38 TeTe GSE94395GSE94395 6 (3 : 3)6 (3:3) TISTIS IMR90IMR90 TeTe GSE168994GSE168994 4 (2 : 2)4 (2:2) TISTIS IMR90IMR90 TeTe GSE72404GSE72404 12 (6 : 6)12 (6:6) OISOIS IMR90IMR90 TeTe GSE75643GSE75643 15 (9 : 6)15 (9:6) OISOIS TIG3TIG3 TeTe TotalTotal 147 (69 : 78)147 (69:78) -- -- 　

아울러, 훈련 셋 및 검정 셋의 분할 정보는 하기 표 6에 나타내었다.In addition, the division information of the training set and test set is shown in Table 6 below.

훈련 셋 및 검정 셋의 분할 정보Partition information for training and testing sets 샘플 카테고리Sample Category 세포 유형cell type 대조군control group RSR.S. OISOIS TISTIS TotalTotal 훈련 셋training set FibroblastsFibroblasts 3737 1313 1515 1414 7979 검정 셋 1black set 1 FibroblastsFibroblasts 3232 1515 1212 99 6868 검정 셋 2black set 2 Other cell typesOther cell types 1515 77 -- 1111 3333 TotalTotal -- 8484 3535 2727 3434 180180

3.2 데이터 전처리 및 메타분석3.2 Data preprocessing and meta-analysis

상기 실시예 1.2 데이터 전처리 및 메타분석 내용과 동일한 방법으로 수행하였다.Data preprocessing and meta-analysis were performed in the same manner as in Example 1.2 above.

3.3 탐색적 자료분석 (Exploratory data analysis)3.3 Exploratory data analysis

상기 실시예 1.3 탐색적 자료분석 내용과 동일한 방법으로 수행하였다. 약 14,000개의 유전자에 대해 PCA (Principal component analysis) 분석을 진행하였고, Spearman correlation을 통해 PCA 결과를 도 14에 나타내었다. The exploratory data analysis was performed in the same manner as in Example 1.3 above. PCA (Principal component analysis) analysis was performed on about 14,000 genes, and the PCA results are shown in Figure 14 through Spearman correlation.

도 14는 재선별된 샘플 전체에 대해 발현량이 낮은 유전자를 제거하고 남은 약 14,000개의 유전자에 대한 PCA 분석 결과를 나타낸 도면이다. Figure 14 is a diagram showing the results of PCA analysis for about 14,000 genes remaining after removing genes with low expression levels for all reselected samples.

3.4 차별발현 유전자 분석 (Differentially expressed genes analysis, DEG analysis)3.4 Differentially expressed genes analysis (DEG analysis)

상기 실시예 1.4 차별발현 유전자 분석 과정과 동일한 방법으로 각각의 유도체(inducer type)별 차별발현 유전자를 선별하였다. 각각의 유도체(inducer type)별로 살펴보면, RS에서 총 1,463개의 유전자(up-regulated genes : 881, down-regulated genes : 582)가, OIS에서 총 2,039개의 유전자(up-regulated genes : 791, down-regulated genes : 1,248), TIS에서 총 1,903개의 유전자(up-regulated genes : 1,238, down-regulated genes : 665)가 차별발현 유전자로 선별되었고, 이를 하기 표 7에 나타내었다. Differentially expressed genes for each inducer type were selected in the same manner as the differentially expressed gene analysis process in Example 1.4 above. Looking at each inducer type, a total of 1,463 genes (up-regulated genes: 881, down-regulated genes: 582) in RS and a total of 2,039 genes (up-regulated genes: 791, down-regulated genes) in OIS. genes: 1,248), a total of 1,903 genes (up-regulated genes: 1,238, down-regulated genes: 665) in TIS were selected as differentially expressed genes, and these are shown in Table 7 below.

차별발현 유전자 분석 정보Differential expression gene analysis information Inducer typeInducer type ComparisonComparison
(Control : Senescent)(Control: Sensitive) Up-regulated genesUp-regulated genes Down-regulated genesDown-regulated genes Total DEGsTotal DEGs RSR.S. 13 : 1313:13 881881 582582 1,4631,463 OISOIS 14 : 1514:15 791791 1,2481,248 2,0392,039 TISTIS 14 : 1414:14 1,2381,238 665665 1,9031,903

유도체(inducer type)별 차별발현유전자(DEG) 중 유도체의 종류와 무관하게 발현되는 세포 노화의 특징적 유전자를 선별하기 위해 유도체별 DEG 결과를 up-regulated gene과 down-regulated gene으로 나눈 후, 벤다이어그램으로 각각 나타내어 유전자의 교집합을 선별하였다. 상기 벤다이어그램은 도 15에 나타내었다. In order to select the characteristic genes of cellular aging that are expressed regardless of the type of inducer among differentially expressed genes (DEGs) by inducer type, the DEG results for each inducer were divided into up-regulated genes and down-regulated genes, and a Venn diagram was drawn. The intersection of genes was selected by indicating . The Venn diagram is shown in Figure 15.

도 15는 각 유도체(inducer type)별 차별 발현 유전자들 중 전체 유도체에 걸쳐 공통적으로 존재하는 유전자의 수를 나타낸 벤다이어그램이다; a: 발현이 증가한 유전자, b: 발현이 감소한 유전자. Figure 15 is a Venn diagram showing the number of genes commonly present across all derivatives among differentially expressed genes for each inducer type; a: Gene with increased expression, b: Gene with decreased expression.

도 15에 나타낸 바와 같이, 총 181개의 up-regulated gene과 284개의 down-regulated gene의 총 465개의 유전자가 선별되었고, 이 유전자들은 유도체의 종류와 무관하게 노화 세포에서 발현되고 있음을 의미한다. As shown in Figure 15, a total of 465 genes, including 181 up-regulated genes and 284 down-regulated genes, were selected, indicating that these genes are expressed in senescent cells regardless of the type of derivative.

이후에, 선별된 공통 발현 유전자들이 세포 노화적 특성을 반영하는 유전자 세트인지 확인하기 위해, gProfiler (https://biit.cs.ut.ee/gprofiler/)를 활용하여 GO(gene ontology) 분석을 시행하여 465개의 유전자와 관련있는 생물학적 경로들을 탐색하였다. 유전자들의 발현 증감 방향성에 따라 각각 GO 분석을 진행하였고, GO 결과는 조정된 p-value 값이 0.05 이하인 결과만을 선별하였으며, 그 결과를 도 16에 나타내었다. Afterwards, to confirm that the selected commonly expressed genes are a set of genes that reflect cell aging characteristics, GO (gene ontology) analysis was performed using gProfiler (https://biit.cs.ut.ee/gprofiler/). By conducting this study, biological pathways related to 465 genes were explored. GO analysis was performed for each gene according to the direction of increase or decrease in expression, and only GO results with an adjusted p-value of 0.05 or less were selected, and the results are shown in Figure 16.

도 16은 선별된 공통 발현 유전자의 GO 분석 결과를 나타낸 도면이다; a: 발현이 증가한 유전자, b: 발현이 감소한 유전자. Figure 16 is a diagram showing the GO analysis results of selected commonly expressed genes; a: Gene with increased expression, b: Gene with decreased expression.

도 16에 나타낸 바와 같이, 노화 세포에서 발현이 공통적으로 증가한 181개의 유전자는 세포 노화의 알려진 특징인 SASP(Senescence associated secretory phenotype), Chromatin organization, Immune response 및 여러가지 Cellular senescence pathway 관련 특성들이 유의미하게 나타남을 확인할 수 있었다. 또한, 노화 세포에서 발현이 공통적으로 감소한 284개 유전자의 경우 마찬가지로 이미 알려진 세포 노화의 특성에 해당하는 cell cycle 과 관련된 생물학적 경로들 및 DNA replication, DNA repair, telomere maintenance 및 organization 과 연관이 있음을 확인하였다. 이를 통해 전체 465개의 교집합 유전자 세트가 세포 노화의 특성을 잘 반영함을 확인할 수 있었다. As shown in Figure 16, the 181 genes whose expression was commonly increased in senescent cells significantly exhibited known characteristics of cellular senescence such as SASP (Senescence associated secretory phenotype), Chromatin organization, Immune response, and various Cellular senescence pathway-related characteristics. I was able to confirm. In addition, in the case of 284 genes whose expression was commonly reduced in aging cells, it was confirmed that they were related to biological pathways related to the cell cycle, DNA replication, DNA repair, telomere maintenance, and organization, which correspond to the already known characteristics of cellular aging. . Through this, it was confirmed that a total of 465 intersection gene sets well reflect the characteristics of cellular aging.

3.5 LASSO (least absolute shrinkage and selection operator)를 이용한 변수 선택3.5 Variable selection using LASSO (least absolute shrinkage and selection operator)

선별된 465개의 유전자 중 세포 노화의 특성을 가장 잘 나타내는 최소한의 유전자를 선별하고자 상기 실시예 1.5 LASSO (least absolute shrinkage and selection operator)를 이용한 변수 선택 과정과 동일한 방법을 이용하였다. To select the minimal genes that best represent the characteristics of cellular aging among the 465 selected genes, the same method as the variable selection process using LASSO (least absolute shrinkage and selection operator) in Example 1.5 above was used.

구체적으로, LASSO 회귀는 R package glmnet 4.0-2를 통해 진행되었고, 훈련 데이터 총 79개 데이터 셋의 TMM normalized Log₂CPM (count per million) 값을 사용하여 샘플의 상태에 따라 Control(=0) 혹은 Senescent(=1) 값을 할당한 후 binary classification 모델에 대한 파라미터 조정을 진행하여 선별된 결과값이다. Leave-one-out-cross-validation을 통해 최적의 lambda (λ)를 설정한 후 모델을 적합하여 최종적으로 17개의 유전자를 선별하였다. 해당 유전자는 ENSG00000106462 (EZH2), ENSG00000120802 (TMPO), ENSG00000126016 (AMOT), ENSG00000139354 (GAS2L3), ENSG00000054654 (SYNE2), ENSG00000164104 (HMGB2), ENSG00000164649 (CDCA7L), ENSG00000092470 (WDR76), ENSG00000273802 (H2BC8), ENSG00000102996 (MMP15), ENSG00000197142 (ACSL5), ENSG00000135218 (CD36), ENSG00000075651 (PLD1), ENSG00000003137 (CYP26B1), ENSG00000186088 (GSAP), ENSG00000197903 (H2BC12), ENSG00000180573 (H2AC6) 이며, 각 유전자들의 계수(coefficient)를 통해 세포 노화에서의 해당 유전자의 발현 방향성(+/-) 및 영향력을 나타낸다. 총 9개의 유전자 (H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, H2AC6) 는 양의 계수를 지니므로 노화 세포에서 발현이 증가한 것이고, 나머지 8개의 유전자 (EZH2, TMPO, AMOT, GAS2L3, SYNE2, HMGB2, CDCA7L, WDR76) 는 음의 계수를 지니므로 노화 세포에서의 발현이 감소한 것이다. Specifically, LASSO regression was performed using the R package glmnet 4.0-2, and TMM normalized Log ₂ CPM (count per million) values of a total of 79 training data sets were used to control (=0) or control depending on the state of the sample. This is the result selected by assigning a Senescent (=1) value and then adjusting the parameters for the binary classification model. After setting the optimal lambda (λ) through leave-one-out-cross-validation, the model was fit and finally 17 genes were selected. The genes in question are ENSG00000106462 (EZH2), ENSG00000120802 (TMPO), ENSG00000126016 (AMOT), ENSG00000139354 (GAS2L3), ENSG00000054654 (SYNE2), ENSG00000164104 (HMGB2) , ENSG00000164649 (CDCA7L), ENSG00000092470 (WDR76), ENSG00000273802 (H2BC8), ENSG00000102996 ( MMP15), ENSG00000197142 (ACSL5), ENSG00000135218 (CD36), ENSG00000075651 (PLD1), ENSG00000003137 (CYP26B1), ENSG00000186088 (GSAP), ENSG00000197903 (H 2BC12), ENSG00000180573 (H2AC6), and cell aging is determined through the coefficient of each gene. Indicates the expression direction (+/-) and influence of the gene in question. A total of 9 genes (H2BC8, MMP15, ACSL5, CD36, PLD1, CYP26B1, GSAP, H2BC12, H2AC6) have positive coefficients, indicating increased expression in senescent cells, and the remaining 8 genes (EZH2, TMPO, AMOT, GAS2L3) , SYNE2, HMGB2, CDCA7L, WDR76) have negative coefficients, so their expression in senescent cells is reduced.

상기 LASSO 회귀를 통해 유전자에 대한 변수 선택 과정을 진행한 결과는 하기 표 8 및 도 17에 나타내었다. The results of the variable selection process for genes through the LASSO regression are shown in Table 8 and Figure 17.

LASSO 회귀를 통해 선별된 세포 노화 특징 유전자Cellular aging characteristic genes selected through LASSO regression Ensembl gene idEnsembl gene id
SYMBOLSYMBOL CoefficientCoefficient DescriptionDescription ENSG00000106462ENSG00000106462 EZH2EZH2 -0.26-0.26 Enhancer Of Zeste 2 Polycomb Repressive Complex 2 SubunitEnhancer Of Zeste 2 Polycomb Repressive Complex 2 Subunit ENSG00000120802ENSG00000120802 TMPOTMPO -0.26-0.26 ThymopoietinThymopoietin ENSG00000126016ENSG00000126016 AMOTAMOT -0.17-0.17 AngiomotinAngiomotin ENSG00000139354ENSG00000139354 GAS2L3GAS2L3 -0.09-0.09 Growth Arrest Specific 2 Like 3Growth Arrest Specific 2 Like 3 ENSG00000054654ENSG00000054654 SYNE2SYNE2 -0.09-0.09 Spectrin Repeat Containing Nuclear Envelope Protein 2Spectrin Repeat Containing Nuclear Envelope Protein 2 ENSG00000164104ENSG00000164104 HMGB2HMGB2 -0.08-0.08 High Mobility Group Box 2High Mobility Group Box 2 ENSG00000164649ENSG00000164649 CDCA7LCDCA7L -0.05-0.05 Cell Division Cycle Associated 7 LikeCell Division Cycle Associated 7 Like ENSG00000092470ENSG00000092470 WDR76WDR76 -0.001-0.001 WD Repeat Domain 76WD Repeat Domain 76 ENSG00000273802ENSG00000273802 H2BC8H2BC8 0.0050.005 H2B Clustered Histone 8H2B Clustered Histone 8 ENSG00000102996ENSG00000102996 MMP15MMP15 0.040.04 Matrix Metallopeptidase 15Matrix Metallopeptidase 15 ENSG00000197142ENSG00000197142 ACSL5ACSL5 0.060.06 Acyl-CoA Synthetase Long Chain Family Member 5Acyl-CoA Synthetase Long Chain Family Member 5 ENSG00000135218ENSG00000135218 CD36CD36 0.090.09 CD36 MoleculeCD36 Molecule ENSG00000075651ENSG00000075651 PLD1PLD1 0.110.11 Phospholipase D1Phospholipase D1 ENSG00000003137ENSG00000003137 CYP26B1CYP26B1 0.140.14 Cytochrome P450 Family 26 Subfamily B Member 1Cytochrome P450 Family 26 Subfamily B Member 1 ENSG00000186088ENSG00000186088 GSAPGSAP 0.220.22 Gamma-Secretase Activating ProteinGamma-Secretase Activating Protein ENSG00000197903ENSG00000197903 H2BC12H2BC12 0.240.24 H2B Clustered Histone 12H2B Clustered Histone 12 ENSG00000180573ENSG00000180573 H2AC6H2AC6 0.780.78 H2A Clustered Histone 6H2A Clustered Histone 6

도 17은 LASSO에 의해 선별된 17개 유전자의 각 계수(Coefficient) 값을 Barplot으로 나타낸 도면이다. Figure 17 is a barplot showing the coefficient values of each of the 17 genes selected by LASSO.

이후에, 79개의 훈련 셋 데이터에 대해 LASSO에 의해 선별된 17개의 유전자를 이용하여 상기 3.3.과 동일하게 PCA 분석을 진행하고 클러스터링 결과를 확인한 뒤 각각 도 18 및 도 19에 나타내었다. Afterwards, PCA analysis was performed in the same manner as in 3.3 above using 17 genes selected by LASSO for the 79 training set data, and the clustering results were confirmed and shown in Figures 18 and 19, respectively.

도 18은 LASSO에 의해 선별된 유전자 17개에 대한 PCA 분석 결과를 나타낸 도면이다. Figure 18 is a diagram showing the results of PCA analysis for 17 genes selected by LASSO.

도 19는 LASSO에 의해 선별된 유전자 17개에 대한 클러스터링 분석 결과를 나타낸 도면이다. Figure 19 is a diagram showing the results of clustering analysis for 17 genes selected by LASSO.

도 18에 나타낸 바와 같이, LASSO에 의해 선별된 유전자 17개의 PCA 결과는 도 14의 PCA 결과와 비교하여, PC1에 의해 노화 세포와 대조군 데이터가 명확하게 구분되는 것을 확인 할 수 있으며 PC1의 데이터 구분에 대한 설명도 증가하였음을 알 수 있었다. As shown in Figure 18, the PCA results of 17 genes selected by LASSO are compared with the PCA results in Figure 14, and it can be seen that senescent cells and control data are clearly distinguished by PC1, and the data classification of PC1 It was found that the number of explanations also increased.

도 19에 나타낸 바와 같이, LASSO에 의해 선별된 유전자 17개의 클러스터링 결과는 유전자들의 발현 패턴이 노화 세포와 대조군 사이에서 명확하게 구별되며, 노화 세포와 대조군 데이터를 명확하게 구분함을 알 수 있었다. As shown in Figure 19, the clustering results of 17 genes selected by LASSO showed that the expression patterns of genes were clearly distinguished between senescent cells and the control group, and clearly distinguished the senescent cell and control data.

이상의 결과는 LASSO에 의해 선별된 17개의 유전자는 훈련 셋 데이터 내에서 노화 세포의 특성을 잘 반영함을 나타냄을 의미한다. The above results mean that the 17 genes selected by LASSO well reflect the characteristics of senescent cells within the training set data.

실시예 4. 선별된 노화 세포 마커 후보 유전자들의 검정 Example 4. Assay of selected senescent cell marker candidate genes

4.1 검정 데이터에서의 대조군 그룹 및 세포 노화 그룹간 유전자 발현 차이 확인4.1 Confirmation of gene expression differences between the control group and the cell aging group in the test data

실시예 3에서 선정된 17개의 노화 세포 마커 후보 유전자들의 발현 패턴이 17개 유전자의 선별 과정에 관여하지 않은 68개의 검정 데이터 내에서도 유의미한 발현 차이를 보이는지 확인하였다. It was confirmed whether the expression patterns of the 17 senescent cell marker candidate genes selected in Example 3 showed significant expression differences even within 68 test data that were not involved in the selection process of the 17 genes.

구체적으로, 68개의 검정 데이터에 대해 훈련 데이터와 동일한 과정으로 TMM 정규화 및 limma removeBatchEffect function을 통해 배치 효과를 제거한 후 Log₂CPM 변환 값을 이용하여 대조군 그룹(n = 32)과 노화세포 그룹(n = 36) 사이에서 17개 유전자의 발현에 유의미한 차이가 존재하는지 확인하였고, 그 결과를 도 20에 나타내었다. Specifically, for the 68 test data, the batch effect was removed through TMM normalization and the limma removeBatchEffect function in the _same process as the training data, and then the control group (n = 32) and the senescent cell group (n = 36), it was confirmed whether there was a significant difference in the expression of 17 genes, and the results are shown in Figure 20.

도 20는 검정 셋 데이터(n= 68)에서의 LASSO 선별 17개 유전자의 발현 박스플롯(boxplot)을 나타낸 도면이다; a: 양의 계수를 갖는 8개 유전자의 검정 셋 내에서의 분포, b: 음의 계수를 갖는 9개 유전자의 검정 셋 내에서의 분포. Figure 20 is a diagram showing the expression boxplot of 17 genes selected by LASSO in the assay set data (n=68); a: Distribution within the test set of 8 genes with positive coefficients, b: Distribution within the test set of 9 genes with negative coefficients.

4.2 세포 노화 예측 모델 제작4.2 Creation of cellular aging prediction model

17개의 선별된 유전자들을 훈련 데이터 셋에 적용하여 노화 세포의 상태를 예측하는 5 개의 모델을 제작하여 선별된 유전자들의 유효성을 검증하였다. The effectiveness of the selected genes was verified by applying 17 selected genes to the training data set to create 5 models that predict the state of senescent cells.

구체적으로, 제1 모델은 서포트 벡터 머신(Support vector machine)의 RBF kernel을 통해 제작하였으며, R package e1071 (version 1.7-3)이 사용되었다. 모델의 학습에 사용된 데이터는 상기 학습에 사용된 79개의 훈련 데이터에 대한 17개 유전자의 발현값이 사용되었으며, LOOCV (leave one out cross-validation)을 통해 최적의 분류 모델을 선택하여 제작하였다. 제2, 제3, 제4, 제5 모델은 타 세포 노화 관련 연구 문헌 [Hernandez-Segura et al. 2017, Kiss et al. 2020, Park et al. 2021, Casella et al. 2019]에서 선정된 다양한 세포 노화 유전자의 발현 값을 훈련 데이터 셋에 적용하여 제1 모델과 동일한 방식으로 SVM 모델을 제작하였다. 제2, 제3, 제4, 제5 모델은 제1 모델과 노화 세포 분류 성능을 비교하기 위해 제작하였다. Specifically, the first model was produced using the RBF kernel of a support vector machine, and R package e1071 (version 1.7-3) was used. The data used to learn the model were the expression values of 17 genes for the 79 training data used for learning, and the optimal classification model was selected and produced through LOOCV (leave one out cross-validation). Models 2, 3, 4, and 5 are described in other cell aging-related research literature [Hernandez-Segura et al. 2017, Kiss et al. 2020, Park et al. 2021, Casella et al. 2019], the expression values of various cellular aging genes selected were applied to the training data set to create an SVM model in the same way as the first model. The second, third, fourth, and fifth models were created to compare the senescent cell classification performance with the first model.

4.3 세포 노화 예측 모델 성능 평가4.3 Cell aging prediction model performance evaluation

상기 4.2에서 제작한 다섯 개의 모델을 이용하여 68개의 노화 섬유아세포 검정 데이터 (대조군 32개, 노화 세포 36개)에 대해 분류 검정을 진행하였고, 그 결과를 도 21a에 나타내었다. 추가로 상기 4.2.에서 제작한 두 개의 모델을 이용하여 33개의 다양한 노화 세포 데이터 (대조군 : 15, 노화 세포 : 18, 섬유아세포가 아닌 33개의 서로 다른 종류 (HUVEC, HAEC, MSC, LS8817, Ovcar3))에 대해 분류 검정을 진행하였고, 그 결과를 도 21b에 나타내었다. 검정 데이터의 Log₂CPM 값들은 훈련 데이터의 발현 값의 평균 및 표준편차를 기준으로 스케일링 후 검정 하였다. A classification test was performed on 68 senescent fibroblast test data (32 control, 36 senescent cells) using the five models created in 4.2 above, and the results are shown in Figure 21a. In addition, using the two models created in 4.2 above, data on 33 different senescent cells (control: 15, senescent cells: 18, 33 different types other than fibroblasts (HUVEC, HAEC, MSC, LS8817, Ovcar3) ), a classification test was performed, and the results are shown in Figure 21b. The Log ₂ CPM values of the test data were tested after scaling based on the mean and standard deviation of the expression values of the training data.

도 21은 다섯 개의 세포 노화 분류 모델의 성능 평가를 위한 ROC 커브 결과를 나타낸 그래프이다; a: 일 구체예의 검정 데이터, b: 추가적인 다양한 노화 세포 데이터. Figure 21 is a graph showing ROC curve results for performance evaluation of five cell aging classification models; a: Assay data of one embodiment, b: Additional various senescent cell data.

도 21에 나타낸 바와 같이, 두 결과 모두에서 LASSO에 의해 선정된 17개의 유전자 모델이 더 좋은 성능을 보이는 것을 확인 할 수 있었다. As shown in Figure 21, it was confirmed that the 17 gene model selected by LASSO showed better performance in both results.

a: LASSO 17 gene model AUC: 100% = Casella et al, model AUC: 100% > Hernandez et al, model AUC: 99.7% > Kiss et al, model AUC: 99.5% > Park et al, model AUC : 98%a: LASSO 17 gene model AUC: 100% = Casella et al, model AUC: 100% > Hernandez et al, model AUC: 99.7% > Kiss et al, model AUC: 99.5% > Park et al, model AUC: 98%

b: LASSO 17 gene model AUC: 98.9% > Casella et al, model AUC: 98.1% > Kiss et al, model AUC: 86.3% > Hernandez et al, model AUC: 82.6% > Park et al, model AUC : 70.7%.b: LASSO 17 gene model AUC: 98.9% > Casella et al, model AUC: 98.1% > Kiss et al, model AUC: 86.3% > Hernandez et al, model AUC: 82.6% > Park et al, model AUC: 70.7% .

따라서 실시예 4의 결과들을 토대로, 일 구체예에 따라 선별된 17개의 유전자의 발현 양상이 LASSO의 계수 부호 양상과 동일하게, 정상 세포와 노화 세포 그룹간 유의미한 차이를 보이는 것을 확인할 수 있었다. 또한, 17개의 유전자로 구축한 노화 세포 분류 모델이 기존에 연구에서 알려진 다양한 세포 노화 관련 유전자로 구축한 노화 세포 분류 모델보다 세포 노화의 특성을 더 잘 나타냄을 확인 할 수 있었다. Therefore, based on the results of Example 4, it was confirmed that the expression pattern of the 17 genes selected according to one embodiment showed a significant difference between the normal cell and senescent cell groups, identical to the coefficient sign pattern of LASSO. In addition, it was confirmed that the senescent cell classification model constructed with 17 genes better represents the characteristics of cellular aging than the senescent cell classification model constructed with various cellular aging-related genes known in existing research.

일 구체예에 따른 방법은 노화 섬유아세포만을 훈련 데이터에 사용하여 유전자를 선별하였음에도 불구하고, 다양한 종류의 세포들에 대해서도 기존 연구보다 더 좋은 분류 성능을 보임을 확인할 수 있었다. Although the method according to one embodiment selected genes using only senescent fibroblasts as training data, it was confirmed that it showed better classification performance than existing studies for various types of cells.

이상의 결과는 일 구체예에 따른 방법은 섬유아세포를 포함한 다양한 종류의 세포들에서 통계적으로 더 유의한 마커를 선별할 수 있음을 의미하고, 일 구체예에 따른 방법으로 도출된 17개의 유전자는 노화 세포를 특정 또는 검출하는데 유용하게 사용될 수 있음을 의미한다.The above results mean that the method according to one embodiment can select statistically more significant markers in various types of cells, including fibroblasts, and the 17 genes derived by the method according to one embodiment are senescent cells. This means that it can be usefully used to specify or detect.

실시예 5. 노화 유도 세포에서의 선별한 바이오마커의 발현 변화 확인Example 5. Confirmation of expression changes of selected biomarkers in senescence-induced cells

상기 실시예 3에서 선별한 17개의 유전자 중, GAS2L3, AMOT 및 WDR76 유전자가 노화를 유도한 세포에서 실제 발현량에 변화가 나타나는 지 알아보았다.Among the 17 genes selected in Example 3, we examined whether the GAS2L3, AMOT, and WDR76 genes showed changes in actual expression levels in cells that induced senescence.

5.1 세포 노화 유도 및 확인5.1 Induction and confirmation of cell aging

인간 제대 정맥 내피세포 (Human Umbilical Vein Endothelial Cell, HUVEC cell, LONZA, HUVEC, C2519A)를 사용하였고 EGM2 (LONZA, EGM-2 Endothelial Cell Growth Medium-2 BulletKit, CC-3162) 배지에서 배양하였다. 세포의 계대 배양 횟수에 차이를 두어 노화를 유도 하였다. Passage 3 (P3) 과 passage 7 (P7) 의 HUVEC 세포를 배양 후 이 세포가 passage에 따른 노화정도를 파악하기 위해 SA β-Gal (senescence-associated beta-galactosidase, Cell Signaling, Senesence β-Galactosidase Staining Kit, #9860) 염색을 진행하였다. 염색 결과를 도 22에 나타내었다.Human umbilical vein endothelial cells (HUVEC cells, LONZA, HUVEC, C2519A) were used and cultured in EGM2 (LONZA, EGM-2 Endothelial Cell Growth Medium-2 BulletKit, CC-3162) medium. Senescence was induced by varying the number of subcultures of cells. After culturing HUVEC cells at passage 3 (P3) and passage 7 (P7), SA β-Gal (senescence-associated beta-galactosidase, Cell Signaling, Senesence β-Galactosidase Staining Kit) was used to determine the degree of aging of these cells according to passage. , #9860) staining was performed. The staining results are shown in Figure 22.

도 22에 나타낸 바와 같이, P7의 세포가 P3의 세포 보다 세포 내 진하고 많은 파란색을 띄고 있어 더 많은 노화가 진행된 것을 알 수 있었다.As shown in Figure 22, P7 cells had a darker and more blue color than P3 cells, indicating that more aging had progressed.

5.2 GAS2L3, AMOT 및 WDR76의 발현 변화 확인5.2 Confirmation of expression changes in GAS2L3, AMOT and WDR76

Trizol (Invitrogen, TRIzol Reagent, 15596018)을 사용하여 P3 및 P7 세포들의 RNA을 추출하여였고, cDNA synthesis kit (LaboPass, cDNA synthesis kit, SMRTK002)을 사용하여 cDNA (complenentary DeoxyriboNucleic Acid)을 합성하였다. 이후 하기의 표 9의 프라이머와 PCR premixture (SolGent, Solg 2X Multiplex PCR Smart mix, SMP01-M25P)을 이용하여 GAS2L3, AMOT 및 WDR76 유전자들만 특이적으로 증폭시켰고 GAPDH을 세포의 cDNA의 정량을 위해 사용하였다. 해당과정에서 PCR 기기 (Aplied Biosystems, MiniAmp Plus Thermal Cycler, A37835)를 사용하였다.RNA from P3 and P7 cells was extracted using Trizol (Invitrogen, TRIzol Reagent, 15596018), and cDNA (complementary DeoxyriboNucleic Acid) was synthesized using a cDNA synthesis kit (LaboPass, cDNA synthesis kit, SMRTK002). Afterwards, only the GAS2L3, AMOT, and WDR76 genes were specifically amplified using the primers and PCR premixture (SolGent, Solg 2X Multiplex PCR Smart mix, SMP01-M25P) shown in Table 9 below, and GAPDH was used to quantify the cDNA of cells. . A PCR device (Aplied Biosystems, MiniAmp Plus Thermal Cycler, A37835) was used in this process.

PCR 수행 후, 전기영동을 통해 GAS2L3, AMOT 및 WDR76의 발현량을 확인하였고, 그 결과를 도 23에 나타내었다.After performing PCR, the expression levels of GAS2L3, AMOT, and WDR76 were confirmed through electrophoresis, and the results are shown in Figure 23.

또한, 정량적으로 발현량을 비교하기 위하여 상기에서 수득한 cDNA를 이용하여 qPCR을 수행하였다. 구체적으로, 하기 표 9의 프라이머를 이용하였고, qPCR을 위해 SYBR green (Applied Biosystems, SYBR Green PCR Master Mix, 2109533)을 이용한 qPCR (quantitative PCR)시약을 사용 하였고, real time PCR (Applied Biosystems, StepOne Real-time PCR System, 4376357)을 이용하여 진행하였다. qPCR의 결과를 도 24에 나타내었다.Additionally, in order to quantitatively compare expression levels, qPCR was performed using the cDNA obtained above. Specifically, the primers in Table 9 below were used, and for qPCR, a qPCR (quantitative PCR) reagent using SYBR green (Applied Biosystems, SYBR Green PCR Master Mix, 2109533) was used, and real time PCR (Applied Biosystems, StepOne Real) was used. -time PCR System, 4376357) was used. The results of qPCR are shown in Figure 24.

유전자gene 프라이머 방향Primer direction 5'to 3'5'to 3' 서열번호sequence number AMOTAMOT ForwardForward AAA CGT GAG CAG CTA GAG CACAAA CGT GAG CAG CTA GAG CAC 서열번호 1SEQ ID NO: 1 ReverseReverse TGT GTC CCT CTG AGC AGC CATGT GTC CCT CTG AGC AGC CA 서열번호 2SEQ ID NO: 2 GAS2L3GAS2L3 ForwardForward ATC TGG GCT CTA TGT CAG TCC GATC TGG GCT CTA TGT CAG TCC G 서열번호 3SEQ ID NO: 3 ReverseReverse TTG GCA AAA GTG TGG ACT CGGTTG GCA AAA GTG TGG ACT CGG 서열번호 4SEQ ID NO: 4 WDR76WDR76 ForwardForward CTG CTG CAA GAC TCC GTG AACTG CTG CAA GAC TCC GTG AA 서열번호 5SEQ ID NO: 5 ReverseReverse GCC CAG GAG GTA ACA AAG GAGGCC CAG GAG GTA ACA AAG GAG 서열번호 6SEQ ID NO: 6 GAPDHGAPDH ForwardForward ATC CCA TCA CCA TCT TCCATC CCA TCA CCA TCT TCC 서열번호 7SEQ ID NO: 7 ReverseReverse CCA TCA CGC CAC GAT TTCCCA TCA CGC CAC GAT TTC 서열번호 8SEQ ID NO: 8

도 23에 나타낸 바와 같이, GAS2L3, AMOT 및 WDR76에 해당하는 밴드가 P7 보다 P3에서 보다 진하게 나타나 노화가 진행된 P7의 세포들에서 P3의 세포 보다 낮은 GAS2L3, AMOT 및 WDR76의 발현을 나타내는 것을 확인하였다.As shown in Figure 23, the bands corresponding to GAS2L3, AMOT, and WDR76 appeared darker in P3 than in P7, confirming that aged P7 cells showed lower expression of GAS2L3, AMOT, and WDR76 than P3 cells.

또한, 도 24에 나타낸 바와 같이, qPCR 결과 역시 GAS2L3, AMOT 및 WDR76 유전자 발현양이 P7 세포들에서 감소하는 것을 확인하였다.In addition, as shown in Figure 24, qPCR results also confirmed that GAS2L3, AMOT, and WDR76 gene expression levels were decreased in P7 cells.

상기 결과로부터, 실제 노화된 세포에서 GAS2L3, AMOT 및 WDR76의 발현이 감소하는 것을 알 수 있었고, GAS2L3, AMOT 및 WDR76 유전자가 노화 세포의 바이오마커로 활용될 수 있음을 알 수 있었다.From the above results, it was found that the expression of GAS2L3, AMOT, and WDR76 was decreased in actual senescent cells, and that the GAS2L3, AMOT, and WDR76 genes could be used as biomarkers of senescent cells.

Claims

분리된 내피 세포로부터 GAS2L3 및 WDR76의 유전자 또는 단백질의 발현 또는 활성 수준을 측정하는 단계를 포함하는 세포 노화를 검출하는 방법.A method for detecting cellular senescence comprising measuring the expression or activity level of genes or proteins of GAS2L3 and WDR76 from isolated endothelial cells.

삭제delete

제1항에 있어서, 상기 GAS2L3 및 WDR76의 유전자 또는 단백질의 발현 또는 활성 수준이 정상 대조군 시료의 유전자 또는 단백질의 발현 또는 활성 수준보다 낮은 경우, 세포 노화로 판단하는 단계를 포함하는 세포 노화를 검출하는 방법.The method of claim 1, wherein when the expression or activity level of the gene or protein of GAS2L3 and WDR76 is lower than the expression or activity level of the gene or protein of the normal control sample, detecting cellular senescence comprising determining cell senescence. method.