CN115862876B

CN115862876B - Device for predicting prognosis of lung adenocarcinoma patient based on immune microenvironment gene group

Info

Publication number: CN115862876B
Application number: CN202310186412.8A
Authority: CN
Inventors: 樊小龙; 朱明瑾; 李玖一; 张韵秋
Original assignee: Beijing Normal University
Current assignee: Beijing Tongcheng Rongxin Technology Co.,Ltd.
Priority date: 2023-03-02
Filing date: 2023-03-02
Publication date: 2023-05-26
Anticipated expiration: 2043-03-02
Also published as: CN115862876A

Abstract

The invention discloses a device for predicting prognosis and medication guidance of lung adenocarcinoma patients based on immune microenvironment gene groups in the field of diagnosis and a computer readable storage medium. The invention aims to solve the technical problem of predicting prognosis of a lung cancer patient or guiding medication thereof. The invention provides a device for predicting prognosis of a lung adenocarcinoma patient to be detected, which comprises an immune subtype construction module, an immune subtype module and a prognosis prediction module, wherein the immune subtype is performed on the patient to be detected based on 75 genes of an immune microenvironment gene group, and prognosis of the lung adenocarcinoma patient to be detected is predicted or clinical medication of the patient to be detected is guided according to the immune subtype. The device provided by the invention can be used for effectively predicting the prognosis of a lung adenocarcinoma patient; and can avoid the transition of chemotherapy, on one hand, can make the patient avoid bearing the serious minus effect that chemotherapy brings, on the other hand, will lighten patient and social economic burden too.

Description

Device for predicting prognosis of lung adenocarcinoma patient based on immune microenvironment gene group

Technical Field

The invention relates to a device for predicting prognosis and medication guidance of lung adenocarcinoma patients based on immune microenvironment gene groups in the field of diagnosis and a computer readable storage medium.

Background

Non-small cell lung cancer accounts for 80% of all lung cancers. 50-60% of non-small cell lung cancers are lung adenocarcinoma, and about 30% of initial non-small cell lung cancers are early-to-mid (I-IIIB) adenocarcinoma. And due to aging of population and air pollution, the incidence rate of each subtype of lung cancer is still in an ascending stage, and the health of the population is seriously endangered. Lung adenocarcinoma originates from bronchial mucosal epithelial cells, a few mucinous glands, originating from large bronchi, are well developed in the elderly.

Due to the popularization of the second generation sequencing technology, in combination with the deep exploration of the pathogenesis of lung adenocarcinoma, genomic variation detection at the DNA level is widely applied to diagnosis and treatment of advanced lung adenocarcinoma, and a relatively normalized treatment strategy is established at present. For lung cancer patients carrying genetic variation such as EGFR mutation, ALK fusion, ROS1 fusion, MET mutation, BRAF/RAS/MEK1 mutation and the like, targeted therapy can be carried out, and for advanced lung cancer patients incapable of applying targeted therapy, platinum-based chemotherapy is mainly carried out, but the treatment effect is limited, the 5-year survival time is only improved by 5%, and the side effect is large, so that serious myelosuppression is often caused, and neutrophil reduction and serious infection are caused. In recent years, treatment with immune checkpoint inhibitors has been carried out with good news for lung cancer patients, but there are also large individual differences, and only < 25% of patients can benefit.

Early detection and early diagnosis of lung cancer can significantly improve the survival time of patients. From a population level, the 5-year survival rate after phase I lung cancer surgery is between 58% and 73%. Due to the popularity of physical examination and early screening, lung cancer patients diagnosed in recent years increasingly tend to early-to-mid lung cancer. However, conventional diagnosis and molecular pathology detection in the prior art cannot be used for carrying out objective and accurate individualized risk assessment on early and middle lung cancer, and cannot be used for predicting the response of the lung cancer to chemotherapy. Existing molecular typing studies, although finding gene networks based on cell proliferation, extracellular matrix and direct correlation with prognosis, have not been effectively applied to personalized risk assessment of early-to-mid lung adenocarcinoma and prediction of chemotherapy response.

Disclosure of Invention

The invention aims to solve the technical problems of how to classify a lung cancer patient and/or how to conduct guiding medication on the lung cancer patient and/or how to predict prognosis of the lung cancer patient and/or how to conduct risk assessment on the lung cancer patient.

To solve the above technical problems, the present invention firstly provides a device for predicting prognosis of lung adenocarcinoma patient to be tested, which may include the following modules:

m1) immunotyping model building block: obtaining an immunotyping pattern for predicting an immunosubtype of a single sample lung adenocarcinoma patient based on expression profile data, hierarchical clustering and SVM algorithm of 75 genes of a known lung adenocarcinoma sample set;

M2) immunophenotyping module: for predicting an immunotyping subtype of a lung adenocarcinoma patient to be tested using the immunotyping subtype based on the expression profile data of the 75 genes of the lung adenocarcinoma patient to be tested;

m3) prognosis prediction module: for predicting prognosis of a patient with lung adenocarcinoma to be tested based on said immunophenotyping subtype.

The 75 genes may be genes as shown below:

ACKR1, AIF1, ALOX5, C1QB, CAPG, CCL, CD14, CD2, CD247, CD37, CD3D, CD, CD8A, CFD, CORO1A, CST7, CTSW, CYBA, DENND3, FCER1G, GMFG, GZMA, GZMB, HCK, IL21R, IL2RG, LCK, LDLRAP1, LST1, LTB, MAFB, MZB1, NCF4, NKG7, PILRA, PSMB10, PTPN7, PYCARD, RAC2, RRAS, SH3BGRL3, SIGLEC7, SLAMF8, STARD5, THEMIS2, TNFRSF1B, TYROBP, BRIX1, BZW2, CABYR, CKS1B, CTBP2, DARS2, DDX1, ECT2, EEF1E1, EIF3J, EIF5B, HSPD1, HSPE1, MRPL19, MRPS16, MTFR1, MTMR2, NDS 1, NOL7, NOLC1, NUDT15, SLC7A11, SRTFE 1, SNB 2, YWHAE M, UGDH.

The immunophenotyping subtypes may include the hyperimmune subtype (Immune High) and the hypoimmunity subtype (Immune Low). The Immune High subtype may be a High immunocompetent immunophenotype subtype; the immuneLow subtype may be a Low immunocompetent immunophenotype subtype.

The prognosis of the lung adenocarcinoma patient to be detected with the Immune subtype being the Immune High subtype can be better than that of the lung adenocarcinoma patient to be detected with the Immune subtype being the Immune Low subtype.

The 75 genes may specifically be genes as follows: ACKR1 (GenBank No.: 2532,update date:2022-12-5), AIF1 (GenBank accession number 199,update date:2022-12-4), ALOX5 (GenBank accession number 240,update date:2022-12-8), BRIX1 (GenBank accession number 55299,update date:2022-12-4), BZW2 (GenBank accession number 28969,update date:2022-8-12), C1QB (GenBank accession number 713,update date:2022-12-13), CABYR (GenBank accession number 26256,update date:2022-8-12), CAPG (GenBank accession number 822,update date:2022-12-8), CCL5 (GenBank accession number 6352,update date:2022-12-13), CD14 (GenBank accession number 929,update date:2022-12-8), CD2 (GenBank accession number 914,update date:2022-12-8), CD247 (GenBank accession number 919,update date:2022-12-21), CD37 (GenBank accession number 951,update date:2022-9-22), CD3D (GenBank accession number 915,update date:2022-12-8), CD72 (GenBank accession number 971,update date:2022-11), CD 35-12-13), CD14 (GenBank accession number 6352,update date:2022-12-8), CD14 (GenBank accession number 929,update date:2022-12-8), CD247 (GenBank accession number 919,update date:2022-12-21), CD37 (GenBank accession number 951,update date:2022-9-22), CD3D (GenBank accession number 915,update date:2022-12-8), CD72 (GenBank accession number 971,update date:2022-8-9-35-20), CD 20-12-8, CD14 (GenBank accession number 6352,update date:2022-12-8) CTBP2 (GenBank: 1488,update date:2022-12-4), CTSW (GenBank: 1521,update date:2022-9-18), CYBA (GenBank: 1535,update date:2022-12-13), DARS2 (GenBank: 55157,update date:2022-11-6), DDX1 (GenBank: 1653,update date:2022-12-8), DENND3 (GenBank: 22898,update date:2022-12-8), ECT2 (GenBank: 1894,update date:2022-12-13), EEF1E1 (GenBank: 9521,update date:2022-12-4), EIF3J (GenBank: 8669,update date:2022-12-8), EIF5B (GenBank: 9669,update date:2022-12-8), FCER1G (GenBank: 2207,update date:2022-9-22), GMFG (GenBank: 9535,update date:2022-12-8), GZMA (GenBank: 3001,update date:2022-9-9), GZMB (GenBank: 39372-12-13), HSPA: HCK (GenBank: 1894,update date:2022-12-13), EEF1E1 (GenBank: 9521,update date:2022-12-4), EIF3J (GenBank: 8669,update date:2022-12-8), EIF5B (GenBank: 9669,update date:2022-12-8), FCER1G (GenBank: 2207,update date:2022-9-22), GMFG (GenBank: 9535,update date:2022-12-8), GZMB (GenBank: 3272-12-9-9), GZMB (GenBank: 3002,update date:2022-12-13), HSP (GenBank: 5272-12-35-9-9), hP (GenBank: 8669,update date:2022-20-35-20) LDLRAP1 (GenBank No.: 26119,update date:2022-11-5), LST1 (GenBank No. 7940,update date:2022-9-22), LTB (GenBank No. 4050,update date:2022-12-9), MAFB (GenBank No. 9935,update date:2022-12-8), MRPL19 (GenBank No. 9801,update date:2022-12-8), MRPS16 (GenBank No. 51021,update date:2022-12-21), MTFR1 (GenBank No. 9650,update date:2022-12-17), MTMR2 (GenBank No. 8898,update date:2022-12-8), MZB1 (GenBank No. 51237,update date:2022-8-12), NCF1 (GenBank No. 653361,update date:2022-12-8), NCF4 (GenBank No. 4689,update date:2022-12-8), NDUFS1 (GenBank No. 4719,update date:2022-12-8), NKG7 (GenBank No. 4818,update date:2022-9-22), NOL7 (GenBank No. 39372-12-8), NOLC1 (GenBank No. 9221,update date:2022-12-4), DT (GenBank No. 8898,update date:2022-12-8), MZB1 (GenBank No. 51237,update date:2022-8), NCF1 (GenBank No. 653361,update date:2022-12-8), NCF4 (GenBank No. 4689,update date:2022-12-8), NDUFS1 (GenBank No. 4719,update date:2022-12-8), NKG7 (GenBank No. 4818,update date:2022-9-22), NOL7 (GenBank No. 3872-12-8), NOL 1 (GenBank No. 9221,update date:2022-12-8), NOL 1 (GenBank No. 20) RAC2 (GenBank: 5880,update date:2022-12-8), RRAS (GenBank: 6237,update date:2022-12-8), SH3BGRL3 (GenBank: 83442,update date:2022-12-4), SIGLEC7 (GenBank: 27036,update date:2022-12-18), SLAMF8 (GenBank: 56833,update date:2022-9-22), SLC7A11 (GenBank: 23657,update date:2022-12-13), SNRPE (GenBank: 6635,update date:2022-12-8), SRPK1 (GenBank: 6732,update date:2022-12-8), STARD5 (GenBank: 80765,update date:2022-12-8), TFB2M (GenBank: 64216,update date:2022-12-8), THEMIS2 (GenBank: 9473,update date:2022-12-8), TNFRSF1B (GenBank: 7133,update date:2022-11-6), TYROBP (GenBank: 7305,update date:2022-10-9), UGDH (GenBank: 7358,update date:2022-12-4) and YAE (GenBank: 7531,update date:2022-WHAZ).

M1) the immunotyping model building block may be built up by a method comprising the steps of:

m1-1) carrying out hierarchical clustering on the basis of the expression profile data of 75 genes of a known lung adenocarcinoma sample set to obtain an immunophenotyping result of the known lung adenocarcinoma sample;

m1-2) based on the expression profile data of 75 genes of the known lung adenocarcinoma sample set and the immunophenotyping results, an immunophenotyping model predicting the immunophenotype of a single-sample lung adenocarcinoma patient was obtained using an SVM algorithm.

The SVM algorithm may be derived from a machine learning package sklearn package in the python language. The number of samples in the known lung adenocarcinoma sample set may be greater than 100.

In the above device, the lung adenocarcinoma patient may be an early-middle stage lung adenocarcinoma patient.

The prognosis may be overall survival, such as overall survival over 5 years.

In the above device, the lung adenocarcinoma patient may be a lung adenocarcinoma smoking patient.

The lung adenocarcinoma smoking patient may be a phase I, phase II or phase III a lung adenocarcinoma patient.

In order to solve the technical problems, the invention also provides a device for predicting the guiding medication of a patient with lung adenocarcinoma to be detected, which can comprise the following modules:

n1) immunotyping model building block: obtaining an immunotyping pattern for predicting an immunosubtype of a single sample lung adenocarcinoma patient based on expression profile data, hierarchical clustering and SVM algorithm of 75 genes of a known lung adenocarcinoma sample set;

N2) immunophenotyping module: predicting an immunotyping subtype of a lung adenocarcinoma patient to be tested using the immunotyping subtype based on expression profile data of 75 genes of the lung adenocarcinoma patient to be tested;

n3) instruction medication output module: for determining whether a patient with lung adenocarcinoma to be tested would benefit from chemotherapy with cisplatin in combination with vinorelbine based on the immunophenotyping subtype.

In the above device, the immunophenotyping subtype may include an Immune High subtype and an Immune Low subtype. The Immune High subtype may be a High immunocompetent immunophenotype subtype; the immuneLow subtype may be a Low immunocompetent immunophenotype subtype.

The guiding administration of patients with lung adenocarcinoma to be tested, of which the immunophenotyping subtype is the Immune Low subtype, in N3) can be benefited from chemotherapy by using cisplatin in combination with vinorelbine. The guiding administration of the patient with lung adenocarcinoma to be detected, of which the immunophenotype is the Immune High subtype, can be chemotherapy without the benefit of cisplatin combined with vinorelbine.

The 75 genes may be genes as shown below:

The 75 genes may specifically be genes as follows:

ACKR1 (GenBank: 2532,update date:2022-12-5), AIF1 (GenBank: 199,update date:2022-12-4), ALOX5 (GenBank: 240,update date:2022-12-8), BRIX1 (GenBank: 55299,update date:2022-12-4), BZW2 (GenBank: 28969,update date:2022-8-12), C1QB (GenBank: 713,update date:2022-12-13), CABYR (GenBank: 26256,update date:2022-8-12), CAPG (GenBank: 822,update date:2022-12-8), CCL5 (GenBank: 6352,update date:2022-12-13), CD14 (GenBank: 929,update date:2022-12-8), CD2 (GenBank: 914,update date:2022-12-8), CD247 (GenBank: 919,update date:2022-12-21), CD37 (GenBank: 951,update date:2022-9-22), CD3D (GenBank: 915,update date:2022-12-8), CD72 (GenBank: 971,update date:2022), CD (GenBank: 6352,update date:2022-12-13), CD14 (GenBank: 929,update date:2022-12-8), CD2 (GenBank: 914,update date:2022-12-8), CD247 (GenBank: 919,update date:2022-12-21), CD37 (GenBank: 951,update date:2022-9-22), CD3D (GenBank: 915,update date:2022-12-8), CD72 (GenBank: 971,update date:2022-11-12-20-8), CD14 (GenBank: 6352,update date:2022-12-35, CD (GenBank: 6352,update date:2022-35-12-35), CD (Gen 5-20-35 (GenBank: 6352,update date:2022-12) CTBP2 (GenBank: 1488,update date:2022-12-4), CTSW (GenBank: 1521,update date:2022-9-18), CYBA (GenBank: 1535,update date:2022-12-13), DARS2 (GenBank: 55157,update date:2022-11-6), DDX1 (GenBank: 1653,update date:2022-12-8), DENND3 (GenBank: 22898,update date:2022-12-8), ECT2 (GenBank: 1894,update date:2022-12-13), EEF1E1 (GenBank: 9521,update date:2022-12-4), EIF3J (GenBank: 8669,update date:2022-12-8), EIF5B (GenBank: 9669,update date:2022-12-8), FCER1G (GenBank: 2207,update date:2022-9-22), GMFG (GenBank: 9535,update date:2022-12-8), GZMA (GenBank: 3001,update date:2022-9-9), GZMB (GenBank: 39372-12-13), HSPA: HCK (GenBank: 1894,update date:2022-12-13), EEF1E1 (GenBank: 9521,update date:2022-12-4), EIF3J (GenBank: 8669,update date:2022-12-8), EIF5B (GenBank: 9669,update date:2022-12-8), FCER1G (GenBank: 2207,update date:2022-9-22), GMFG (GenBank: 9535,update date:2022-12-8), GZMB (GenBank: 3272-12-9-9), GZMB (GenBank: 3002,update date:2022-12-13), HSP (GenBank: 5272-12-35-9-9), hP (GenBank: 8669,update date:2022-20-35-20) LDLRAP1 (GenBank No.: 26119,update date:2022-11-5), LST1 (GenBank No. 7940,update date:2022-9-22), LTB (GenBank No. 4050,update date:2022-12-9), MAFB (GenBank No. 9935,update date:2022-12-8), MRPL19 (GenBank No. 9801,update date:2022-12-8), MRPS16 (GenBank No. 51021,update date:2022-12-21), MTFR1 (GenBank No. 9650,update date:2022-12-17), MTMR2 (GenBank No. 8898,update date:2022-12-8), MZB1 (GenBank No. 51237,update date:2022-8-12), NCF1 (GenBank No. 653361,update date:2022-12-8), NCF4 (GenBank No. 4689,update date:2022-12-8), NDUFS1 (GenBank No. 4719,update date:2022-12-8), NKG7 (GenBank No. 4818,update date:2022-9-22), NOL7 (GenBank No. 39372-12-8), NOLC1 (GenBank No. 9221,update date:2022-12-4), DT (GenBank No. 8898,update date:2022-12-8), MZB1 (GenBank No. 51237,update date:2022-8), NCF1 (GenBank No. 653361,update date:2022-12-8), NCF4 (GenBank No. 4689,update date:2022-12-8), NDUFS1 (GenBank No. 4719,update date:2022-12-8), NKG7 (GenBank No. 4818,update date:2022-9-22), NOL7 (GenBank No. 3872-12-8), NOL 1 (GenBank No. 9221,update date:2022-12-8), NOL 1 (GenBank No. 20) RAC2 (GenBank: 5880,update date:2022-12-8), RRAS (GenBank: 6237,update date:2022-12-8), SH3BGRL3 (GenBank: 83442,update date:2022-12-4), SIGLEC7 (GenBank: 27036,update date:2022-12-18), SLAMF8 (GenBank: 56833,update date:2022-9-22), SLC7A11 (GenBank: 23657,update date:2022-12-13), SNRPE (GenBank: 6635,update date:2022-12-8), SRPK1 (GenBank: 6732,update date:2022-12-8), STARD5 (GenBank: 80765,update date:2022-12-8), TFB2M (GenBank: 64216,update date:2022-12-8), THEMIS2 (GenBank: 9473,update date:2022-12-8), TNFRSF1B (GenBank: 7133,update date:2022-11-6), TYROBP (GenBank: 7305,update date:2022-10-9), UGDH (GenBank: 7358,update date:2022-12-4) and YAE (GenBank: 7531,update date:2022-WHAZ).

N1) the immunotyping model building block may be built up by a method comprising the steps of:

n1-1) carrying out hierarchical clustering on the basis of the expression profile data of 75 genes of a known lung adenocarcinoma sample set to obtain an immunophenotyping result of the known lung adenocarcinoma sample;

n1-2) based on the expression profile data of 75 genes of the known lung adenocarcinoma sample set and the immunophenotyping results, an immunophenotyping model predicting the immunophenotype of a single-sample lung adenocarcinoma patient was obtained using an SVM algorithm.

In order to solve the above technical problem, the present invention further provides a computer readable storage medium for predicting prognosis of lung adenocarcinoma patient to be tested, the computer readable storage medium can cause a computer to execute the following steps:

c1 Obtaining an immunotyping model for predicting an immunosubtype of a single-sample lung adenocarcinoma patient based on the expression profile data, hierarchical clustering and SVM algorithm of 75 genes of the known lung adenocarcinoma sample set;

c2 Predicting an immunotyping subtype of the lung adenocarcinoma patient to be tested using the immunotyping subtype based on the expression profile data of 75 genes of the lung adenocarcinoma patient to be tested;

C3 Predicting prognosis of a patient with lung adenocarcinoma to be tested based on said immunophenotyping subtype.

The 75 genes may be genes as shown below:

The immunophenotyping may include an Immune High subtype and an Immune Low subtype. The Immune High subtype may be a High immunocompetent immunophenotype subtype; the immuneLow subtype may be a Low immunocompetent immunophenotype subtype.

In the above computer-readable storage medium, the lung adenocarcinoma patient may be an early-to-middle lung adenocarcinoma patient. The prognosis may be overall survival of more than 5 years.

The lung adenocarcinoma patient may be a lung adenocarcinoma smoking patient. The lung adenocarcinoma smoking patient may be a phase I, phase II or phase III a lung adenocarcinoma patient.

C1 The immunotyping model construction may be established by a method comprising the steps of:

c1-1) carrying out hierarchical clustering on the basis of the expression profile data of 75 genes of a known lung adenocarcinoma sample set to obtain an immunophenotyping result of the known lung adenocarcinoma sample;

c1-2) based on the expression profile data of 75 genes of the known lung adenocarcinoma sample set and the immunophenotyping results, an immunophenotyping model predicting the immunophenotype of a single-sample lung adenocarcinoma patient was obtained using SVM algorithm.

In order to solve the technical problem, the invention also provides a computer readable storage medium for predicting the lung adenocarcinoma patient to be tested to guide the medication, and the computer readable storage medium causes a computer to operate the following steps:

d1 Immune typing model building block): obtaining an immunotyping pattern for predicting an immunosubtype of a single sample lung adenocarcinoma patient based on expression profile data, hierarchical clustering and SVM algorithm of 75 genes of a known lung adenocarcinoma sample set;

d2 Immunosyping module): predicting an immunotyping subtype of a lung adenocarcinoma patient to be tested using the immunotyping subtype based on expression profile data of 75 genes of the lung adenocarcinoma patient to be tested;

d3 A guideline medication output module): for determining whether a patient with lung adenocarcinoma to be tested would benefit from chemotherapy with cisplatin in combination with vinorelbine based on the immunophenotyping subtype.

The 75 genes may be genes as shown below:

In the above computer-readable storage medium, the immunophenotyping may include an Immune High subtype and an Immune Low subtype. The Immune High subtype may be a High immunocompetent immunophenotype subtype; the immuneLow subtype may be a Low immunocompetent immunophenotype subtype.

D3 In a patient with lung adenocarcinoma to be tested, the immunophenotyping subtype being the Immune Low subtype, the guiding administration of the patient with lung adenocarcinoma to be tested can benefit from chemotherapy by combining cisplatin with vinorelbine; the guiding administration of the patient with lung adenocarcinoma to be detected, of which the immunophenotype is the Immune High subtype, can be chemotherapy without the benefit of cisplatin combined with vinorelbine.

The 75 genes described above may specifically be genes as follows:

D1 The immunotyping model construction may be established by a method comprising the steps of:

d1-1) carrying out hierarchical clustering on the basis of the expression profile data of 75 genes of a known lung adenocarcinoma sample set to obtain an immunophenotyping result of the known lung adenocarcinoma sample;

d1-2) obtaining an immunotyping model for predicting an immunosubtype of a single sample lung adenocarcinoma patient using an SVM algorithm based on the expression profile data of the 75 genes of the known lung adenocarcinoma sample set and the immunotyping result.

The use of a substance or device for detecting the expression profile of 75 genes of the human genome in the preparation of a product for predicting prognosis and/or guiding administration of a patient with lung adenocarcinoma to be detected also falls within the scope of the present invention.

The 75 genes may be genes as shown below:

The 75 genes may specifically be genes as follows:

In the above application, the lung adenocarcinoma patient may be an early-to-mid lung adenocarcinoma patient. The prognosis may be overall survival of more than 5 years.

The use of the apparatus described above or the computer readable storage medium described above for developing and/or preparing a product for preventing and/or treating lung adenocarcinoma also falls within the scope of the present invention.

Environmental factors such as smoking are core factors for driving lung cancer origin, and in order to better promote accurate treatment level of early and middle lung adenocarcinoma patients with smoking history, effective risk assessment and treatment response prediction are carried out on the early and middle lung adenocarcinoma patients, the invention provides an SVM single sample subtype prediction model established based on the expression profile of 75 genes of an immune microenvironment gene group of the early and middle lung adenocarcinoma patients, and the prognosis of the patients can be predicted by carrying out immune subtype molecular typing on the early and middle lung adenocarcinoma patients: the prognosis of the lung adenocarcinoma patient to be detected with the Immune subtype being the Immune High subtype can be better than that of the lung adenocarcinoma patient to be detected with the Immune subtype being the Immune Low subtype; meanwhile, the clinical medication of patients can be guided based on the immune microenvironment gene group 75 genes and the SVM single sample subtype prediction model: the guiding administration of the patient with lung adenocarcinoma to be tested, of which the immunophenotype is the Immune Low subtype, can be benefited from chemotherapy by using cisplatin in combination with vinorelbine; the guiding administration of the patient with lung adenocarcinoma to be detected, of which the immunophenotype is the Immune High subtype, can be chemotherapy without the benefit of cisplatin combined with vinorelbine. The 75 genes of the immune microenvironment gene group and the established SVM single sample subtype prediction model can avoid transition use of chemotherapy in lung adenocarcinoma patients, so that the patients can avoid serious side effects caused by the chemotherapy on one hand, and the economic burden of the patients and society on the other hand can be reduced.

Drawings

FIG. 1 is a graph of the test subject working characteristics (receiver operating characteristic curve, ROC for short) of the SVM single sample subtype predictive model in the TCGA dataset. The ordinate is true positive rate, and the abscissa is false positive rate.

Fig. 2 is a ROC diagram of the classification result of the SVM single sample subtype prediction model in GSE81089 dataset. The ordinate is true positive rate, and the abscissa is false positive rate.

Fig. 3 is a ROC graph of the classification result of the SVM single sample subtype prediction model in GSE68465 dataset. The ordinate is true positive rate, and the abscissa is false positive rate.

Fig. 4 is a ROC diagram of the classification result of the SVM single sample subtype prediction model in the GSE14814 dataset. The ordinate is true positive rate, and the abscissa is false positive rate.

FIG. 5 is an overall survival curve of Immune High and Immune Low molecular subtypes of TCGA mid-early lung adenocarcinoma (TCGA-LUAD) smokers. The ordinate indicates survival rate, and the abscissa indicates total survival time (years).

FIG. 6 shows the overall survival curves of Immune High and Immune Low molecular subtypes of a patient who smoked with TCGA II stage and IIIA stage lung adenocarcinoma. The ordinate indicates survival rate, and the abscissa indicates total survival time (years).

FIG. 7 is an overall survival curve of Immune High and Immune Low molecular subtypes in patients with mid-early and mid-lung adenocarcinoma smoking GSE 81089. The ordinate indicates survival rate, and the abscissa indicates total survival time (years).

Figure 8 is an overall survival curve of immunehigh and immunelow molecular subtypes in a patient who smoked with GSE68465 early and medium stage lung adenocarcinoma. The ordinate indicates survival rate, and the abscissa indicates total survival time (years).

FIG. 9 shows the overall survival curves of Immune High and Immune Low molecular subtypes of a stage II GSE68465 and stage IIIA lung adenocarcinoma smoker. The ordinate indicates survival rate, and the abscissa indicates total survival time (years).

FIG. 10 is an overall survival curve of Immune High and Immune Low molecular subtypes of GSE14814 lung adenocarcinoma patients. The ordinate indicates survival rate, and the abscissa indicates total survival time (years).

FIG. 11 is an analysis of the Immune High and Immune Low subtype specific chemotherapy response in GSE14814 dataset. t represents time: landmark was analyzed as survival rate 1 year or 2-5 years after treatment. A: chemotherapy group (ACT) and observation group (OBS) survival curves; b: survival curves for Immune High group chemotherapy (ACT) patients versus non-chemotherapy (OBS) patients. C: survival curves for Immune Low group chemotherapy (ACT) patients versus non-chemotherapy (OBS) patients.

FIG. 12 is a graph showing a single sample subtype predictive model-SVM predictive model and verification of its accuracy, built using different data sets. A is the accuracy of verification of a single sample subtype prediction model-SVM established by using a TCGA training set in a TCGA data set; b is the accuracy of a single sample subtype prediction model-SVM established by using the TCGA data set in the GSE81089 data set; c is the accuracy of verification of a single sample subtype prediction model-SVM established by using a GSE68465 data set in the GSE68465 data set; d is the accuracy of the single sample subtype predictive model-SVM-built using the GSE68465 dataset to verify in the GSE14814 dataset.

Detailed Description

The following detailed description of the invention is provided in connection with the accompanying drawings that are presented to illustrate the invention and not to limit the scope thereof. The examples provided below are intended as guidelines for further modifications by one of ordinary skill in the art and are not to be construed as limiting the invention in any way.

The experimental methods in the following examples, unless otherwise specified, are conventional methods, and are carried out according to techniques or conditions described in the literature in the field or according to the product specifications. Materials, reagents and the like used in the examples described below are commercially available unless otherwise specified.

Example 1 screening and molecular typing of molecular diagnostic markers for early and mid-stage lung adenocarcinoma.

1. Screening of molecular diagnosis markers of early and middle stage lung adenocarcinoma.

The invention discovers 250 genes related to immunity in the research of the big data of the early-stage human glioma transcriptome, utilizes a human lung adenocarcinoma RNA-seq platform expression level data set (TCGA-LUAD) (portal.gdc.cancer.gov) and a GEO public database data set GSE14814 (www.ncbi.nlm.nih.gov/GEO/query/acc.cgiac=GSE 14814) provided by a TCGA public database, and automatically classifies early-stage and medium-stage lung adenocarcinoma samples in the two databases into an Immune High group (High Immune subtype group) with High expression of 250 genes and an Immune Low group (Low Immune subtype group) with Low expression of 250 genes according to the expression profile of 250 genes related to immunity through hierarchical clustering differential expression analysis.

Comparing two groups of sample data of Immune High and Immune Low in a GSE14814 data set, and screening 87 genes with differential expression in the two groups of samples from the 250 genes; simultaneously, 48 genes which are highly expressed in the Immune Low group samples compared with the Immune high group samples are screened out. The lung adenocarcinoma Immune subtype group (Immune High and Immune Low group) selected at this stage correlated 135 genes in total. The TCGA-LUAD and GSE14814 data sets are regrouped by 135 genes, t test is carried out in the grouped samples, 60 genes with smaller expression difference between an Immune High group and an Immune Low group are deleted, and the following 75 genes are obtained and used as lung adenocarcinoma Immune subtype classifier genes. Of the 75 classifier genes, 48 genes were highly expressed only in the Immune High group, while the other 27 genes were highly expressed only in the Immune Low group.

The names of the 75 classifier genes and the corresponding GenBank information on NCBI are as follows:

2. A model of early and mid lung adenocarcinoma sample molecular typing was constructed and validated using 75 classifier genes.

2.1 data sources.

The establishment and prediction of the early and mid lung adenocarcinoma single sample immune subtype prediction model was performed by downloading the TCGA-LUAD dataset (portal. Gdc.cancer. Gov) and GSE81089 datasets (www.ncbi.nlm.nih.gov/geo/query/acc. Cgiac=gse 81089) of the RNA-Seq platform, GSE68465 dataset (www.ncbi.nlm.nih.gov/geo/query/acc. Cgiac=gse 68465) and GSE14814 datasets (www.ncbi.nlm.nih.gov/geo/query/acc. Cgiac=gse 14814) from the Affymetrix chip platform U133A.

2.2 model creation and model verification.

Using expression level data of 75 classifier genes in 469 cases of early and middle lung adenocarcinoma samples of I-III phase of a TCGA-LUAD data set, classifying the 469 cases of early and middle lung adenocarcinoma into two Immune subtypes of Immune High and Immune Low by using 75 Immune subtype classifier genes in a hierarchical clustering mode. Clustering parameters: fc=1.2, p=1.46 e-5, q=1.45 e-5. But cluster-based classification methods do not allow molecular typing for individual samples. To perform personalized diagnostics, 469 samples were taken according to approximately 2: the proportion of 1 is randomly divided into training sets (312 cases) and verification sets (157 cases). The training set sample data are used for establishing a single sample subtype prediction model of early and medium stage lung adenocarcinoma, and the verification set sample data are used for individually evaluating the prediction model. The sampling adopts a layered sampling mode to ensure that the proportion of samples in two subtype groups of Immune High and Immune Low in the samples of the training set and the test set is consistent with the proportion of the samples in the data source database.

According to the expression level data (expression profile data: 48 classifier genes are highly expressed in an Immune High sample but are lowly expressed in an Immune Low sample, and the other 27 classifier genes are highly expressed in an Immune Low sample but are lowly expressed in an Immune High sample) of the 312 samples in the training set, and two Immune subtype labels of Immune High and Immune Low, into which the 312 samples are divided based on the expression level data, using a hierarchical clustering algorithm, are established, and a model of single early and middle lung adenocarcinoma patient sample Immune High subtype and Immune Low subtype, which is a single sample subtype prediction model, is established, using an SVM algorithm provided by a machine learning package sklearn package (scikit-learn. Org/stable/index. Html) in the python language, in which the expression profiles of the two mutually exclusive classifier genes are found in a single sample.

And the model is adopted to predict and verify the immune subtype of the verification set sample in the TCGA-LUAD data set and the immune subtype of the verification set sample in the GSE81089 data set. And drawing a working characteristic curve (receiver operating characteristic curve, abbreviated as ROC curve) of the test subject to verify the accuracy of model typing.

The code is as follows:

data normalization and establishment of standard typing labels:

run in R (download site: https:// cran. R-project. Org/mirrors. Html):

setwd("SVM/TCGA-LUAD")；

GSE81089＜-read.csv("GSE81089.csv")；

LUAD＜-read.csv("New LUAD.csv")。

# read-in of the expression profile matrix for all variables after normalization:

gene＜-read.csv("gene.csv")；

RNA_Seq＜-merge(GSE81089,LUAD,by＝"ID")；

data＜-t(RNA_Seq)；

data＜-data.frame(data)；

colnames(data)＝data[1,]；

data1＝data[-1,]；

data2＝as.data.frame(lapply(data1,as.numeric))；

out_pca＜-prcomp(data2[,-9418])；

autoplot(out_pca,data＝data2,colour＝'Lable',size＝1,label＝FALSE)。

# PCA plot, determine if there is a lot effect:

pheno＜-data2；

row.names(RNA_Seq)＜-RNA_Seq[,1]；

RNA_Seq1＜-RNA_Seq[,-1]；

combat_edata＜-ComBat(dat＝RNA_Seq1,batch＝pheno$Lable)。

# batch effect was eliminated using combat:

dat5＝as.data.frame(t(combat_edata))；

out_pca＜-prcomp(dat5[,-9418])；

autoplot(out_pca,data＝dat5,colour＝'Lable',size＝1,label＝FALSE)；

dat4＝as.data.frame(combat_edata)；

write.csv(dat4,file＝"RNA seq_batch.csv")；

datt＜-read.csv("RNA seq_batch.csv")；

GENE＜-read.csv("GENE_75.csv")；

RNA_seq_75＜-merge(datt,GENE,by＝"X.2")；

write.csv(RNA_seq_75,file＝"RNA seq_74batch.csv")。

# take 75 gene expression profile matrix after elimination of batch effect:

training_data＝read.table("LUAD_batch.csv",sep＝",",header＝T,row.names＝1)；

dim(training_data)；

LUAD_USE＜-apply(training_data,2,scale)。

the # data is normalized.

The SVM algorithm code run in # python is as follows:

import pandas as pd；

import numpy as np；

import math；

import pandas as pd；

import matplotlib.pyplot as plt；

from sklearn import svm,datasets,preprocessing；

from sklearn.metrics import roc_curve,auc；

from sklearn.model_selection import train_test_split；

from sklearn.model_selection import cross_val_score；

from sklearn.multiclass import OneVsRestClassifier。

# load necessary modules:

DATA_LUAD＝pd.read_csv(r"SVM\LUAD_73batch_ex.csv")；

xunlian_y＝pd.read_csv(r"SVM\TCGA-LUAD\LUAD-class-469-hc2.csv")；

shujufenge1＝np.array(xunlian_y)；

xunlian_y＝shujufenge1.transpose()；

for e in range(len(xunlian_y))；

xunlian_y[e]＝int(xunlian_y[e])；

xunlian_y＝xunlian_y.ravel()；

xunlian_y＝xunlian_y.transpose()；

xunlian_y；

shuju＝DATA_LUAD；

shujufenge＝np.array(shuju)；

shujufenge＝shujufenge.transpose()；

xunlian_x＝shujufenge[:,:]；

for i in range(len(xunlian_x))：

for j in range(len(xunlian_x[0]))：

xunlian_x[i][j]＝float(xunlian_x[i][j])。

data is matrixed and named xunlian_x, labels are matrixed and named xunlian_y:

x_train,x_test,y_train,y_test＝train_test_split(xunlian_x,xunlian_y,test_size＝0.33,random_state＝12)。

data were split proportionally #:

y_train＝y_train.ravel()；

from cProfile import label；

from colorsys import yiq_to_rgb；

from random import random；

from tkinter import Y；

import numpy as np；

import matplotlib.pyplot as plt；

from itertools import cycle；

import pandas as pd；

from sklearn import svm,datasets,preprocessing；

from sklearn.metrics import roc_curve,auc；

from sklearn.model_selection import train_test_split；

from sklearn.preprocessing import label_binarize；

from sklearn.multiclass import OneVsOneClassifier；

import os；

model＝svm.SVC(kernel＝'linear',probability＝True,class_weight＝'balanced',C＝15)；model.fit(x_train,y_train.astype('int'))。

# modeling:

predict_test＝model.predict(test_x)。

# prediction:

y_score＝model.decision_function(test_x)；

from sklearn.metrics import confusion_matrix；

import matplotlib.pyplot as plt。

# generate confusion matrix:

import numpy as np；

cm＝confusion_matrix(test_y,predict_test)；

fpr,tpr,threshold＝roc_curve(test_y.astype('float'),y_score)；

roc_auc＝auc(fpr,tpr)；

plt.figure()；

print(roc_auc)；

lw＝2；

plt.figure()；

plt.plot(fpr,tpr,color＝'darkorange',lw＝lw,label＝'ROC curve(area＝％0.2f)'％roc_auc)；plt.plot([0,1],[0,1],color＝'navy',lw＝lw,linestyle＝'--')；

plt.xlim([-0.025,1.025])；

plt.ylim([-0.025,1.025])；

plt.legend(loc＝4)；

plt.xlabel('False Positive Rate')；

plt.ylabel('True Positive Rate')；

plt.title('TCGA-LUAD')；

plt.legend(loc＝"lower right")；

plt.savefig('LUAD.pdf')；

plt.show()。

# ROC curve was plotted.

The codes for subtype prediction of GSE81089 dataset samples, which are also RNA-seq platforms, using the early-to-mid lung adenocarcinoma single sample subtype prediction model established in step 2.1 are as follows:

#DATA＝pd.read_csv(r"SVM\GSE81089_73batch_ex.csv")；

#test_y＝pd.read_csv(r"SVM\GSE81089_class.csv")；

shujufenge1＝np.array(test_y)；

test_y＝shujufenge1.transpose()；

for e in range(len(test_y))：

test_y[e]＝int(test_y[e])；

test_y＝test_y.ravel()；

test_y＝test_y.transpose()；

test_y；

shuju＝DATA；

shujufenge＝np.array(shuju)；

shujufenge＝shujufenge.transpose()；

test_x＝shujufenge[:,:]；

for i in range(len(test_x))：

for j in range(len(test_x[0]))：

test_x[i][j]＝float(test_x[i][j])。

model＝svm.SVC(kernel＝'linear',probability＝True,class_weight＝'balanced',C＝15)；

model.fit(x_train,y_train.astype('int'))；

predict_test＝model.predict(test_x)；

y_score＝model.decision_function(test_x)。

and according to the returned ROC curve and the accuracy of the confusion matrix, continuously iterating and optimizing the model to finally obtain the early and medium stage lung adenocarcinoma single sample immune subtype prediction model with the accuracy exceeding 90%. The early-mid lung adenocarcinoma single-sample immune subtype prediction model is the SVM prediction model.

The single sample in the verification set based on the TCGA-LUAD data set is subjected to immune typing by using the constructed model, and compared with immune subtypes of the sample in the TCGA-LUAD data set obtained based on the typing standard established by hierarchical clustering, the accuracy is shown as A in figure 12 (more than 94 percent in each mode); the ROC curve is shown in fig. 1 (AUC value 0.98).

The single sample subtype prediction model-SVM prediction model established based on the TCGA data set is used for carrying out single sample immune subtype prediction on the early and middle lung adenocarcinoma samples in the GSE81089 data set which is also an RNA-seq platform, compared with the immune subtype typing result obtained by the typing standard established based on hierarchical clustering, the accuracy is shown as B in figure 12, the GSE81089 data set is easily affected by noise due to the limitation of sample size, but the immune subtype predicted based on 75 classifier genes can still be maintained to be more than 85% by using the single sample subtype prediction model; the ROC curve is shown in fig. 2 (AUC value 0.90).

2.3 establishing and verifying a single sample subtype prediction model of the early and middle lung adenocarcinoma based on the chip data set.

To determine the likelihood of the application of 75 classifier genes in Affymetrix chip data, an SVM single sample subtype predictive model based on the U133A platform was established in the GEO lung adenocarcinoma database using the same method as in step 2.2.

For the data sets GSE68465 and GSE14814 of the U133A chip platform, the expression quantity data of 75 classifier genes in 439 cases of I-III stage lung adenocarcinoma samples of the GSE68465 data set are utilized, and firstly, the 439 cases of early-middle stage lung adenocarcinoma samples are classified into two subtypes of Immune High and Immune Low by using a hierarchical clustering mode. The clustering parameters are as follows: p=0.002, q=2.4e ^-4 . To conduct personalized typing predictions, 439 samples were taken according to about 2:1 is randomly divided into training set (293 cases) samples and validation set (146 cases) samples for use in building and evaluating classifiers that validate personalized typing predictions constructed using 75 classifier genes. The sampling adopts a layered sampling mode so as to ensure that the sample proportion of two subtypes of Immune High and Immune Low in the samples of the training set and the verification set is consistent with the self proportion in the data source database.

Using the same method as the step 2.2, according to the expression quantity data of 75 classifier genes of 293 cases of samples in a training set of a GSE68465 data set and two subtype label groups of Immune High and Immune Low which are divided by a hierarchical clustering algorithm, using an SVM algorithm provided by a machine learning package sklearn package in a python language to establish a model capable of predicting the Immune High subtype and the Immune Low subtype of a single early and medium lung adenocarcinoma sample patient based on a U133A chip platform based on the expression quantity data of the 75 classifier genes of the 293 cases of samples and the obtained two Immune subtype group labels of Immune High and Immune Low; and adopting the model to predict immune subtype of the verification set sample in the GSE68465 data set and the sample in the GSE14814 data set. And simultaneously drawing an ROC curve to verify the parting accuracy of the model.

The code is as follows:

DATA_LUAD＝pd.read_csv(r"SVM\GSE68465_batch_ex.csv")；

xunlian_y＝pd.read_csv(r"SVM\GSE68564-class2.csv")；

shujufenge1＝np.array(xunlian_y)；

xunlian_y＝shujufenge1.transpose()；

for e in range(len(xunlian_y))：

xunlian_y[e]＝int(xunlian_y[e])。

xunlian_y＝xunlian_y.ravel()；

xunlian_y＝xunlian_y.transpose()；

xunlian_y。

shuju＝DATA_LUAD；

shujufenge＝np.array(shuju)；

shujufenge＝shujufenge.transpose()；

xunlian_x＝shujufenge[:,:]；

for i in range(len(xunlian_x))：

for j in range(len(xunlian_x[0]))：

xunlian_x[i][j]＝float(xunlian_x[i][j])。

data is matrixed and named xunlian_x, labels are matrixed and named xunlian_y:

data were split proportionally #:

y_train＝y_train.ravel()；

model＝svm.SVC(kernel＝'linear',probability＝True,class_weight＝'balanced',C＝0.01)；

model.fit(x_train,y_train.astype('int'))。

# modeling:

predict_test＝model.predict(test_x)。

# prediction:

y_score＝model.decision_function(test_x)；

from sklearn.metrics import confusion_matrix；

import matplotlib.pyplot as plt。

# generate confusion matrix:

import numpy as np；

cm＝confusion_matrix(test_y,predict_test)；

fpr,tpr,threshold＝roc_curve(test_y.astype('float'),y_score)；

roc_auc＝auc(fpr,tpr)；

plt.figure()；

print(roc_auc)；

lw＝2；

plt.figure()；

plt.plot(fpr,tpr,color＝'darkorange',lw＝lw,label＝'ROC curve(area＝％0.2f)'％roc_auc)；

plt.plot([0,1],[0,1],color＝'navy',lw＝lw,linestyle＝'--')；

plt.xlim([-0.025,1.025])；

plt.ylim([-0.025,1.025])；

plt.legend(loc＝4)。

plt.xlabel('False Positive Rate')；

plt.ylabel('True Positive Rate')；

plt.title('TCGA-LUAD')；

plt.legend(loc＝"lower right")；

plt.savefig('LUAD.pdf')；

plt.show()。

# ROC curve was plotted.

Subtype prediction codes for GSE14814 datasets, which are both U133A chip platforms, using the established single sample subtype prediction model are as follows:

#DATA＝pd.read_csv(r"SVM\GSE14814batch_ex.csv")；

#test_y＝pd.read_csv(r"SVM\GSE14814_class.csv")；

shujufenge1＝np.array(test_y)；

test_y＝shujufenge1.transpose()；

for e in range(len(test_y))：

test_y[e]＝int(test_y[e])。

test_y＝test_y.ravel()；

test_y＝test_y.transpose()；

test_y；

shuju＝DATA；

shujufenge＝np.array(shuju)；

shujufenge＝shujufenge.transpose()；

test_x＝shujufenge[:,:]；

for i in range(len(test_x))：

for j in range(len(test_x[0]))：

test_x[i][j]＝float(test_x[i][j])。

model＝svm.SVC(kernel＝'linear,probability＝True,class_weight＝'balanced',C＝0.01)；

model.fit(x_train,y_train.astype('int'))；

predict_test＝model.predict(test_x)；

y_score＝model.decision_function(test_x)。

the single sample in the verification set based on the GSE68465 data set is subjected to immune typing by using a constructed model-SVM prediction model, and compared with immune subtypes obtained by a typing standard established based on hierarchical clustering, the immune subtype prediction result of the model is shown as C in figure 12 (the accuracy is more than 95 percent); the ROC curve is shown in fig. 3 (AUC value 0.99).

The GSE14814 database contains 71 early and mid lung adenocarcinoma samples. Performing immune typing on a single sample in the GSE14814 chip data set by using a single sample immune subtype prediction model-SVM prediction model established based on a verification set sample in the GSE68465 data set, wherein the prediction result is shown as D in figure 12 (the accuracy is more than 89 percent) compared with an immune subtype obtained based on a typing standard established by hierarchical clustering; the ROC curve is shown in fig. 4 (AUC value 0.98).

Example 2, a single sample subtype predictive model of early-mid lung adenocarcinoma SVM was applied to predict the prognosis survival of smoke-absorbing lung adenocarcinoma patients.

1. Models were used to predict immune subtypes and prognosis of patients who smoke early and mid lung adenocarcinoma from TCGA-LUAD dataset.

According to the expression level data of 75 classifier genes of 469 samples (pre-treatment detection) of early and middle stage lung adenocarcinoma patients of the TCGA-LUAD dataset, 319 smoking patients in the 469 samples were subjected to immune subtype prediction classification by using the single sample SVM immune typing model of early and middle stage lung adenocarcinoma established in example 1. 141 cases of early and medium stage lung adenocarcinoma smoking patients are predicted to be of the Immune High subtype, and 178 cases of early and medium stage lung adenocarcinoma smoking patients are predicted to be of the Immune Low subtype.

2. Prognosis analysis of early and middle stage lung adenocarcinoma smoking patients.

Based on the follow-up results of 319 patients who had been smoked with early and medium stage lung adenocarcinoma, 5-year survival analysis (Kaplan-Meier curve and Log-rank test) was performed. The survival curve results are shown in fig. 5, and two groups of early and medium stage lung adenocarcinoma subtype smoking patients with Immune subtype classification of immunehigh and immunelow in step 1 have significantly different prognosis: the overall survival rate of the Immune High subtype (hyperimmune subtype) was significantly higher for 5 years than that of the Immune Low subtype (hypoimmunity subtype) (log-rank test, p=0.0005; risk ratio 0.4828). And as shown in fig. 6, even in patients of the same stage II and stage IIIA (173 out of 319 early and medium stage lung adenocarcinoma smoking patients), the Immune subtype (Immune Signature) still can distinguish the group of patients into an Immune High group (hyperimmune subtype group, 62 cases) with better prognosis and an Immune Low group (hypoimmune subtype group, 111 cases) with worse prognosis, the overall 5-year survival rate of the Immune High subtype is significantly higher than that of the Immune Low subtype (log-rank test, p=0.0454; risk ratio 0.5905).

Therefore, the SVM single sample Immune subtype prediction model established based on the 75 Immune classifier gene groups in the embodiment 1 can predict the prognosis of patients with early and middle stage lung adenocarcinoma smoking to be detected, especially patients in the II phase or the IIIA phase.

3. The model was used to predict the immune subtype and prognosis of patients with early and mid lung adenocarcinoma smoking from the GSE81089 dataset.

According to the expression level data of 75 Immune classifier genes of 103 cases of early and medium stage lung adenocarcinoma patient samples (pre-treatment detection) in the GSE81089 data set, the Immune subtype of 43 smoking patients in the 103 cases of samples is classified by adopting the lung adenocarcinoma SVM single sample subtype prediction model obtained in the example 1. Of these, 22 patients with Immune High subtype early and medium stage lung adenocarcinoma were obtained, and 21 patients with Immune Low subtype early and medium stage lung adenocarcinoma were obtained.

The longest follow-up time for 43 smoking patients was 84 months, and based on the follow-up results, a lifetime analysis (Kaplan-Meier curve and Log-rank test) was performed, and the results are shown in fig. 7, and it can be seen that in GSE81089 dataset, two subtypes of immunehigh and immunelow, early and medium lung adenocarcinoma smoking patients also have significantly different prognosis: the overall survival rate of the Immune High subtype (hyperimmune subtype) smokers was significantly higher in 7 years than that of the Immune Low subtype (hyperimmune subtype) smokers (log-rank test, p=0.042; risk ratio 0.3556).

4. The model was used to predict the immune subtype and prognosis of patients with early and mid lung adenocarcinoma smoking from the GSE68465 dataset.

According to the expression level data of 75 Immune classifier genes of 439 cases of early and medium stage lung adenocarcinoma smoking patient samples (pre-treatment detection) in the database GSE68465, the 439 cases of samples are divided into two Immune High (234 cases) and Immune Low subtypes (205 cases) by adopting the early and medium stage lung adenocarcinoma SVM single sample subtype prediction model obtained in the example 1. Of these, 298, 145 were the Immune High subtype and 153 were the Immune Low subtype of the smoker patients.

The longest follow-up time for 298 smokers was 60 months. Based on the follow-up results, a lifetime analysis (Kaplan-Meier curve and Log-rank test) was performed. The results are shown in fig. 8, where the two imune High and imune Low subtypes also have significantly different prognosis: the overall survival rate of the Immune High subtype (hyperimmune subtype) was significantly higher for 5 years than that of the Immune Low subtype (hypoimmunity subtype) (log-rank test, p=0.0007, risk ratio 0.5447). And as shown in fig. 9, even for patients of the same phase II and phase IIIA (219 out of 298 smoking patients), the Immune subtype (Immune Signature) still enabled the classification of this group of patients into the Immune High group (100 cases) with better prognosis and the Immune Low group (119 cases) with worse prognosis, with a significantly higher overall survival rate of 5 years for the Immune High subtype (hyperimmune subtype) than for the Immune Low subtype (hyperimmune subtype) (log-rank test, p=0.0006; risk ratio 0.5053).

Therefore, the SVM single sample subtype prediction model based on 75 Immune classifier gene groups established in the embodiment 1 can predict prognosis of patients with early and middle stage lung adenocarcinoma smoking to be detected based on a chip platform, especially phase II or phase III A patients.

5. The model was used to predict the immune subtype and prognosis of patients with early and mid lung adenocarcinoma smoking from the GSE14814 dataset.

According to the expression level data of 75 Immune classifier genes of 71 samples of lung adenocarcinoma smoking patients (pre-treatment detection) in the verification set in the database GSE14814, the 71 samples are divided into two subtypes of Immune High (35 cases) and Immune Low (36 cases) by adopting the SVM single sample subtype prediction model obtained in the embodiment 1.

The longest follow-up time for 71 patients was 120 months, and based on the follow-up results, a survival analysis (Kaplan-Meier curve and Log-rank test) was performed, as shown in fig. 10, with significantly different prognosis for both imune High and imune Low subtypes in the GSE14814 dataset: the overall survival rate of the Immune High subtype (hyperimmune subtype) was significantly higher for 5 years than that of the Immune Low subtype (hypoimmunity subtype) (log-rank test, p= 0.0230; risk ratio 0.3436).

Example 3, use of immune microenvironment populations and molecular typing thereof for early and mid lung adenocarcinoma in predicting whether a patient to be tested would benefit from cisplatin in combination with vinorelbine therapy.

GSE14814 dataset expression level of early and mid lung adenocarcinoma data sets were derived from a prospective phase III clinical trial (JBR.10) with patient medication regimens. The trial randomized patients into the experimental group (ACT, cisplatin in combination with vinorelbine chemotherapy, 39) and the observations group (OBS, 32). Transcriptome expression data for the group of patients was collected prior to treatment. Since the treatment effect is hardly reflected in a short time, the difference of survival time is not obvious in 1 year, and as shown in FIG. 11A, the difference of ACT group and OBS group in 1 year after treatment (t.ltoreq.1) is not obvious by using the method of Landmark segmentation analysis: restricted Mean Survival Time (RMST) analysis, p= 0.6913. The ACT group exhibited significant survival advantages at years 2-5 (t > 1) following treatment using cisplatin in combination with vinorelbine chemotherapy regimen, with significant improvement in survival rate for the ACT group compared to OBS group: RMST analysis, p=0.014.

The Immune subtype of each sample in the sample data set is predicted by using an SVM single sample subtype prediction model established based on the expression profile of 75 genes of the Immune microenvironment gene group obtained in the example 1, and two subtype groups of Immune High and Immune Low are obtained together. The two subtypes Immune High (23 cases in ACT group, 12 cases in OBS group) and Immune Low (16 cases in ACT group, 20 cases in OBS group) were then grouped according to the drug treatment regimen and subjected to survival analysis (Landmark segmentation analysis and RMST analysis).

As a result, as shown in fig. 11, B in fig. 11 is a labmark segment and RMST analysis of the immunehigh group, and C in fig. 11 is a labmark segment and RMST analysis of the immunelow group. It can be observed that the chemotherapy group (ACT group) using cisplatin in combination with vinorelbine only prolonged the survival of the patients in the immunelow group (C in fig. 11) and not in the immunehigh group (B in fig. 11) compared to the observation group (OBS group). The concrete steps are as follows: neither the Immune High group (t.ltoreq.1) nor the 2-5 years (t > 1) after treatment showed survival advantages over the observation group (OBS group) in chemotherapy (ACT group) (B, t.ltoreq.1, p=0.5019; t > 1, p= 0.9456 in FIG. 11). Whereas the ACT regimen in the immunelow group exhibited significant survival advantage at 2-5 years (t > 1) post-treatment, the survival rate of the ACT group was improved compared to the OBS group: RMST analysis, p=0.0011 (C, t.ltoreq.1, p=0.4251; t > 1, p=0.0011 in fig. 11). Thus, only early and mid stage lung adenocarcinomas of the Immune Low group can benefit from treatment with cisplatin in combination with vinorelbine, whereas early and mid stage lung adenocarcinomas of the Immune High group cannot benefit from treatment with cisplatin in combination with vinorelbine.

In conclusion, the prognosis of a patient can be predicted by carrying out immune subtype molecular typing on the early and medium stage lung adenocarcinoma patient based on the SVM single sample subtype prediction model established by the expression profile of 75 genes of the immune microenvironment gene group; meanwhile, clinical medication of patients is guided, and chemotherapy in Immune High group patients can be avoided, so that on one hand, patients can avoid serious side effects caused by chemotherapy, and on the other hand, economic burden of patients and society can be reduced.

The immune microenvironment in a tumor sample is a comprehensive reflection of various genomic variations carried by tumor cells by the body. Variations in the DNA level (i.e., genetic mutations) in tumor cells can potentially produce proteins that are never encountered during development of the body, and are therefore recognized by the body as "neoantigen", activating the body's rejection immune response to tumors. The chromosome instability carried by tumor cells activates the innate immune cGAS-STING signal channel, so that the activation of the I-type interferon signal channel has the effect of inhibiting tumor growth, but the activation of the NF-KB signal channel caused by the chromosome instability induces the generation of various inflammatory factors, establishes an infection-like microenvironment and promotes the survival and evolution of tumors. Based on the early original discovery of brain glioma EM/PM molecular typing based on cytologic origin (Sun et al Aglioma classification scheme based on coexpression modules of EGFR and PDGFRA [ J ]. Proceedings of the National Academy of Sciences of the United States of America,2018,2014, volume 111, pages 3538-43.) the present invention repeatedly explored, and found that there was a group of genes representing different immune cell populations in its microenvironment in the transcriptome large data of multiple tumors. The gene group can also distinguish IDH wild type glioma into molecular subtypes of high-expression or low-expression immune microenvironment gene groups. However, IDH wild-type gliomas with high expression of immune microenvironment gene groups progressed rapidly with significantly poorer prognosis compared to IDH wild-type gliomas with low expression of immune microenvironment gene groups. In contrast to the results of Immune microenvironment molecular typing in gliomas, the present invention found that the early and mid stage lung adenocarcinoma of the Immune subtype of the High expression Immune microenvironment gene group (Immune High group) was well predicted but did not benefit from chemotherapy, while the early and mid stage lung adenocarcinoma of the Immune subtype of the Low expression Immune microenvironment gene group (Immune Low group) was poorly predicted but was benefited from chemotherapy.

The present invention is described in detail above. It will be apparent to those skilled in the art that the present invention can be practiced in a wide variety of equivalent parameters, experimental platforms and conditions without departing from the spirit and scope of the invention and without undue experimentation. While the invention has been described with respect to specific embodiments, it will be appreciated that the invention may be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

Claims

1. The device for predicting prognosis of lung adenocarcinoma patient to be detected is characterized in that: the device comprises the following modules:

m1) the immunotyping model building block is built up by a method comprising the steps of:

M1-2) obtaining an immunotyping model for predicting an immunosubtype of a single sample lung adenocarcinoma patient using an SVM algorithm based on expression profile data of 75 genes of a known lung adenocarcinoma sample set and the immunotyping result;

the 75 genes are obtained by comparing two groups of sample data of Immune High and Immune Low in a GSE14814 data set;

m3) prognosis prediction module: for predicting prognosis of a patient with lung adenocarcinoma to be tested based on said immunophenotyping subtype; the 75 genes are as follows:

2. The apparatus according to claim 1, wherein: the lung adenocarcinoma patients are early and medium stage lung adenocarcinoma patients.

3. The apparatus according to claim 1 or 2, characterized in that: the lung adenocarcinoma patients are lung adenocarcinoma smoking patients.

4. The device for predicting and guiding the administration of the drug to the patient with the lung adenocarcinoma to be detected is characterized in that: the device comprises the following modules:

n1) the immunotyping model building block is built up by a method comprising the steps of:

n1-2) obtaining an immunotyping pattern predicting an immunosubtype of a single sample lung adenocarcinoma patient using an SVM algorithm based on the expression profile data of the 75 genes of the known lung adenocarcinoma sample set and the immunotyping result;

n3) instruction medication output module: for determining whether a patient with lung adenocarcinoma to be tested would benefit from cisplatin in combination with vinorelbine for chemotherapy based on the immunophenotyping subtype;

the 75 genes are as follows:

5. A computer readable storage medium for predicting prognosis of a patient with lung adenocarcinoma to be measured, characterized in that: the computer readable storage medium causes a computer to execute the steps of:

c1 The immunotyping model construction is established by a method comprising the steps of:

c1-2) obtaining an immunotyping pattern predicting an immunosubtype of a single sample lung adenocarcinoma patient using an SVM algorithm based on expression profile data of 75 genes of a known lung adenocarcinoma sample set and the immunotyping result;

c3 Predicting prognosis of a patient with lung adenocarcinoma to be tested based on said immunophenotyping subtype;

the 75 genes are as follows:

6. The computer-readable storage medium according to claim 5, wherein: the lung adenocarcinoma patients are early-medium-term lung adenocarcinoma patients; the prognosis is the overall survival of more than 5 years.

7. A computer readable storage medium for predicting drug use in a patient with lung adenocarcinoma to be tested, characterized in that: the computer readable storage medium causes a computer to execute the steps of:

d1 The immunotyping model construction is established by a method comprising the steps of:

d1-2) obtaining an immunotyping model for predicting an immunosubtype of a single sample lung adenocarcinoma patient using an SVM algorithm based on the expression profile data of the 75 genes of the known lung adenocarcinoma sample set and the immunotyping result;

d3 A guideline medication output module): for determining whether a patient with lung adenocarcinoma to be tested would benefit from cisplatin in combination with vinorelbine for chemotherapy based on the immunophenotyping subtype;

the 75 genes are as follows:

8. Application of a substance or device for detecting the expression profile of 75 genes of human genome in preparing a product for predicting prognosis and/or guiding administration of a patient with lung adenocarcinoma to be detected;

The 75 genes are as follows:

9. The use according to claim 8, characterized in that: the lung adenocarcinoma patients are early-medium-term lung adenocarcinoma patients; the prognosis is the overall survival of more than 5 years.

10. Use of the device of any one of claims 1-4 or the computer readable storage medium of any one of claims 5-7 for developing and/or preparing a product for preventing and/or treating lung adenocarcinoma.