CN115478106B - LR-based method for typing triple negative breast cancer and application thereof - Google Patents

LR-based method for typing triple negative breast cancer and application thereof Download PDF

Info

Publication number
CN115478106B
CN115478106B CN202210994924.2A CN202210994924A CN115478106B CN 115478106 B CN115478106 B CN 115478106B CN 202210994924 A CN202210994924 A CN 202210994924A CN 115478106 B CN115478106 B CN 115478106B
Authority
CN
China
Prior art keywords
breast cancer
typing
negative breast
triple negative
pairs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210994924.2A
Other languages
Chinese (zh)
Other versions
CN115478106A (en
Inventor
肖雨
朱孝辉
潘威君
王斐斐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southern Hospital Southern Medical University
Original Assignee
Southern Hospital Southern Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern Hospital Southern Medical University filed Critical Southern Hospital Southern Medical University
Priority to CN202210994924.2A priority Critical patent/CN115478106B/en
Publication of CN115478106A publication Critical patent/CN115478106A/en
Application granted granted Critical
Publication of CN115478106B publication Critical patent/CN115478106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Zoology (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Immunology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Oncology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Hospice & Palliative Care (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for typing triple negative breast cancer based on LR and application thereof. The method can realize the accurate typing of the triple negative breast cancer by detecting the 145-pair triple negative breast cancer prognosis related LR pairs in the sample and based on an algorithm. Compared with the traditional typing mode of 5 subtypes, the typing method provided by the invention has more pertinence, and 3 subtypes obtained by typing can well divide patients with triple negative breast cancer into 3 types, so that reasonable and proper treatment or treatment is given according to obvious differences in survival rate, prognosis effect estimation, immune effect, clinical treatment effectiveness and the like corresponding to the 3 subtypes, the treatment effectiveness is improved, and the death rate of the triple negative breast cancer is effectively reduced.

Description

LR-based method for typing triple negative breast cancer and application thereof
Technical Field
The invention relates to the field of molecular biology, in particular to a method for typing triple negative breast cancer based on LR and application thereof.
Background
Breast cancer is one of the most common female cancers at present, accounting for 11.7% of all cancer cases. In the clinic, breast cancer can be divided into three major subtypes, including hormone receptor positive/HER 2 negative subtype (70%), HER2 positive subtype (15% -20%) and triple negative subtype (TNBC, specifically tumor type lacking the above 3 standard molecular markers, 15%) according to the expression of molecular markers such as estrogen, progestin receptor and human epidermal growth factor receptor 2 (HER 2). Of all three breast cancer subtypes, triple Negative Breast Cancer (TNBC) is the most invasive, worst prognosis subtype. With the reports of related studies, the human eye was studied to consider triple negative breast cancer as a single heterogeneous breast cancer subtype based on various clinical, pathological and genetic factor analyses. The multi-group analysis and research also provides new insight for biological heterogeneity of TNBC, and classification of the tumors into different molecular subtypes according to recurrent genetic aberration, transcription patterns, tumor microenvironment characteristics and the like, and accurate typing of the molecular subtypes and prognosis situation prediction based on the genetic maps of the molecular subtypes can be helpful for promoting research of personalized treatment. However, the multiple-group analysis has complex steps, high detection cost and time cost, high requirements on personnel, and incapability of being popularized effectively, and no other means for effectively typing different molecular subtypes of triple negative breast cancer exist in the prior art.
Tumors are heterogeneous mixtures of cancerous and non-cancerous cells. Intercellular communication (TME) mediated by ligand-receptor interactions in the tumor microenvironment has profound effects on tumor progression. Whereas studies have demonstrated that communication between these cells within a tumor is critical for tumor progression. Communication between these cells is achieved by ligands (proteins, peptides, fatty acids, steroids, gases and other low molecular weight compounds) produced by the cells, which are secreted by the cells or are present on the cell surface and thus act as receptors on or within the target cells. The literature indicates that most cells express tens to hundreds of ligands and receptors, forming a highly linked signaling network through multiple ligand-receptor pairs. The biological importance and availability of receptors and their corresponding ligands has been assigned as a particularly useful clinical target for cancer, but there is currently no use for triple negative breast cancer.
Therefore, developing a high-efficiency and accurate subtype typing method for triple negative breast cancer has great significance for diagnosis and early treatment of triple negative breast cancer.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems in the prior art described above. Therefore, the invention provides a method for typing triple negative breast cancer based on LR and application thereof, wherein the typing method can further type triple negative breast cancer, and compared with the traditional typing method, the method can reflect the survival condition of a patient and forecast the development trend of the disease, thereby developing more effective treatment means or modes and effectively reducing the death rate of triple negative breast cancer.
In a first aspect of the present invention, there is provided a method for typing a triple negative breast cancer, comprising the steps of:
detecting the expression condition of a triple negative breast cancer prognosis related LR pair in a sample, and typing the sample by using an algorithm, wherein the algorithm is a K-means algorithm, a 1-Pearson related algorithm and a clustering algorithm.
According to a first aspect of the invention, in some embodiments of the invention, the triple negative breast cancer prognosis-related LR pair comprises: APOB-ENO1, CXCL12-ITGA4, GPI-AMFR, MUC7-SELL, SELPLG-SELL, PODXL-SELL, BSG-SLC16A7, CD22-PTPRC, PTPRC-CD22, PPBP-CXCR1, IL7-IL7R, CCL-CCR 7, SERPING1-SELP, CCL16-CCR2, PLG-F2RL1, CXCL13-ACKR4, IL11-IL6ST, ICOS-ICOSLG, ICOSLG-ICOS, CCL19-ACKR4, CXCL3-CXCR1, ICAM4-ITGA4 CCL16-CCR5, CD2-CD48, CD48-CD2, TNFSF13B-TNFRSF13B, ADAM-ITGA 4, HBEGF-ERBB2, CALR-SCARF1, CXCL13-CXCR5, LGI3-STX1A, CD-CD 48, CD48-CD244, HLA-A-KIR3DL1, HLA-B-KIR3DL1, HLA-F-KIR3DL1, LGI3-FLOT1, VEGFA-GPC1, EFNA4-EPHA5, QRFP-P2RY14, FGF4-FGFRL1, CXCL8-CXCR1 NMS-NMUR2, GLG1-SELE, LY9-LY9, EDN3-KEL, CXCL13-CCR10, CCL19-CXCR3, CCL21-CXCR3, ADM-RAMP2, CFH-SELL, CXCL12-ITGB1, CD34-SELL, NMB-BRS3, CCL21-CCR7, PODXL2-SELL, SERPING1-SELE, CLEC2B-KLRF1, KLRF1-CLEC2B, CXCL-CXCR 3, SEMA4D-MET, ADM-MRGPRX2, EBI3-IL6ST TSLP-IL7R, B2M-CD1B, ADAM-ITGA 4, WNT8A-FZD5, ADAM7-ITGB7, GNRH1-GNRHR, ADM-CALCR, LTB-LTBR, COL14A1-CD44, F2-F2RL2, DEFB103A-CCR6, SLIT3-ROBO1, VTN-PLAUR, CCL25-ACKR2, CCL19-CCR10, LRFN3-LRFN3, CXCL9-CCR3, MRC1-PTPRC, PTPRC-MRC1, CD58-CD2, CYTL1-CCR2, DKK2-LRP6, SERPINE1-LRP1, EFNA4-EPHA6, NMB-NMBR, MMP7-ERBB4, POMC-OPRD1, CLEC2D-KLRB1, KLRB1-CLEC2D, GDF-TDGF 1, GUCA2A-GUCY2C, IL-KCNJ 10, FGF3-FGFRL1, LRPAP1-SORT1, CXCL11-CXCR3, WNT9A-FZD10, APOB-LSR, CD70-CD27, ANGPTL1-TEK, TNF-TNFRSF1B, CXCL-CD 4, NECTIN 2-TIT, NECTIN4-TIGIT, TIT-NECTIN 2, FASLG-FAS, POMC-MC3R, VCAM-ITGB 7, POMC-APP 1, IL 22-PLIL 22RA1, FASLG-FAS, SEL 21-KR 4, ASTRF 13-FAS CD160-TNFRSF14, TNFRSF14-CD160, FGF6-FGFR1, SPP1-ITGB1, MADCAM1-ITGA4, OXT-AVPR1A, CXCL-CXCR 3, CXCL5-ACKR1, BTLA-TNFRSF14, TNFRSF14-BTLA, CCL5-CCR3, TAC3-TACR1, CXCL10-SDC4, VEGFA-KDR, EFNA1-EPHA5, CCL5-ACKR1, EFNA1-EPHA2, DKK4-KREMEN2, POMC-MC2R, GNAS-PTGDR, CCL13-CCR2, IL18-IL18R1, MYL9-CD69, CXCL6-CXCR1, RSPO3-LRP6, AMBN-CD63, CALR-TSHR, ICAM3-ITGAL, NOTU-NMUR 1 and NOT 4-CH 3.
In the present invention, the inventors have screened the 145 pairs of LR associated with TNBC prognosis together by analyzing and studying the conventional Triple Negative Breast Cancer (TNBC) related LR pairs, wherein 44 pairs of LR correspond to poor prognosis and 101 pairs of LR correspond to good prognosis.
Wherein, the liquid crystal display device comprises a liquid crystal display device, the undesirable 44 pair LR is APOB-ENO1, GPI-AMFR, BSG-SLC16A7, PPBP-CXCR1, PLG-F2RL1, CXCL3-CXCR1, HBEGF-ERBB2, CALR-SCARF1, LGI3-STX1A, LGI 3-FLT 1, VEGFA-GPC1, EFNA4-EPHA5, FGF 4-FGFFRL 1, CXCL8-CXCR1, NMS-NMUR2, ADM-RAMP2, NMB-BRS3, ADM-MRGPRX2, WNT8A-FZD5, ADM-CALCR, VTN-PLAUR, LRFN3-LRFN3, SERPINE1-LRP1, EFNA4-EPHA6, LGI3-STX 1-A, LGI-FLT 1, VEGFA-GPC1, EFNA4-EPHA5, FGF4-FGFR 1, FGFR 22, FGFR 1-LR 22, FGFR1, NMLR-LR 1, NMLR-4-BRR 1, NMLR-BRR 1, KTU-4-BRR 1, KTU-35, and so on-4-BRL 1.
A good 101 pair LR is CXCL12-ITGA4, MUC7-SELL, SELPLG-SELL, PODXL-SELL, CD22-PTPRC, PTPRC-CD22, IL7-IL7R, CCL19-CCR7, SERPING1-SELP, CCL16-CCR2, CXCL13-ACKR4, IL11-IL6ST, ICOS-ICOSLG, ICOSLG-ICOS, CCL19-ACKR4, ICAM4-ITGA4, CCL16-CCR5, CD2-CD48, CD48-CD2, TNFSF13B-TNFRSF13B, ADAM-ITGA 4 CXCL13-CXCR5, CD244-CD48, CD48-CD244, HLA-A-KIR3DL1, HLA-B-KIR3DL1, HLA-F-KIR3DL1, QRFP-P2RY14, GLG1-SELE, LY9-LY9, EDN3-KEL, CXCL13-CCR10, CCL19-CXCR3, CCL21-CXCR3, CFH-SELL, CXCL12-ITGB1, CD34-SELL, CCL21-CCR7, PODXL2-SELL, SERPING1-SELE CLEC2B-KLRF1, KLRF1-CLEC2B, CXCL-CXCR 3, SEMA4D-MET, EBI3-IL6ST, TSLP-IL7R, B2M-CD1B, ADAM-ITGA 4, ADAM7-ITGB7, GNRH1-GNRHR, LTB-LTBR, COL14A1-CD44, F2-F2RL2, DEFB103A-CCR6, SLIT3-ROBO1, CCL25-ACKR2, CCL19-CCR10, CXCL9-CCR3, MRC1-PTPRC, PTPRC-MRC1, CD58-CD2 CYTL1-CCR2, DKK2-LRP6, MMP7-ERBB4, POMC-OPRD1, CLEC2D-KLRB1, KLRB1-CLEC2D, GUCA2A-GUCY2C, IL-KCNJ 10, CXCL11-CXCR3, WNT9A-FZD10, CD70-CD27, ANGPTL1-TEK, TNF-TNFRSF1B, CXCL-CD 4, NECTIN2-TIGIT, NECTIN4-TIGIT, TIGIT-NECTIN2, FASLG-FAS, POMC-MC3R, VCAM-ITGB 7, SELPLG-E, CCL21-ACKR4, TNFSF13-FAS, CD160-TNFRSF14, TNFRSF14-CD160, MADCAM1-ITGA4, CXCL9-CXCR3, CXCL5-ACKR1, BTLA-TNFRSF14, TNFRSF14-BTLA, CCL5-CCR3, CXCL10-SDC4, CCL5-ACKR1, POMC-MC2R, GNAS-PTGDR, CCL13-CCR2, IL18-IL18R1, MYL9-CD69, RSPO3-LRP6, ICAM3-ITGAL.
During the development of cancer, cancer cell-stromal cell crosstalk is coordinated by a number of ligand-receptor interactions to produce TME (tumor microenvironment) that favors tumor growth. In the tumor microenvironment, the intercellular communication based on LR pairs is the basis for poor prognosis of various cancers (such as pancreatic ductal adenocarcinoma and colorectal carcinoma, etc.), and thus further research of receptors and ligands and their interactions is a hotspot and focus in the art. In the present invention, the LR pairs are all from the document management database Connectome DB2020, and the Connectome DB2020 is a database integrating 2293-pair LR interactions, in the present invention, the inventors obtained the 145-pair LR based on the 2293-pair LR in the TNBC database as a basis for screening, however, those skilled in the art may select LR pairs from other database sources according to actual use requirements.
In the present invention, enrichment analysis of 145 above for LR found that there were mainly 10 pathways most abundant among 145 for LR, including viral protein interactions with cytokines and cytokine receptors, cytokine-cytokine receptor interactions, adhesion molecules (CAMs), chemokine signaling pathways, igA-produced intestinal immune networks, rheumatoid arthritis, proteoglycans in cancer, malaria, neuroactive ligand-receptor interactions, and hematopoietic cell lineages.
In some embodiments of the invention, the algorithm has a cluster number K of 3.
In some embodiments of the present invention, the number of clusters is determined by a consensus Cumulative Distribution Function (CDF) graph and a delta area graph, and the criterion is that the consistency in the clusters is high, the variation coefficient is low, the area under the CDF curve is not significantly increased, and the test of the inventor finds that a stable clustering result is generated when k=3.
In some embodiments of the invention, the triple negative breast cancer is classified as type C1, type C2, and type C3.
In some embodiments of the invention, the defined criteria for the typing are based on training set time-to-live, in particular, C1> C2> C3 over time-to-live for different typing. I.e. the cluster with the longest total survival of the patients in the training set is defined as C1, C2 for the next time and C3 for the next time.
For the method of the invention, under the condition that the training set, the clustering method and the conditions are fully disclosed, a person skilled in the art can reproduce the parting standard of the invention based on the training set, the clustering method and the conditions, so that the accurate parting can be realized by comparing Euclidean distances by putting the expression bands obtained by the actual detection of the subject into the matrix provided by the invention based on the reproduced parting standard.
In the present invention, the inventors have adopted the same molecular subtype determination method in three groups of conventional TNBC queues of different sources to form corresponding three molecular subtypes, and observed that there are also significant and similar differences in prognosis between the three subtypes in survival analysis.
In some embodiments of the invention, the criterion for typing is:
calculating Euclidean distances between a subject sample and three typing clustering centers through the expression profile, and judging typing according to the distances; if the Euclidean distance between the centroid of the subject sample and the C1 type clustering center is shorter than the Euclidean distance between the centroid and C2 and C3, the subject is triple-negative breast cancer C1 type; if the Euclidean distance between the centroid of the subject sample and the C2 type clustering center is shorter than the Euclidean distance between the centroid and C1 and C3, the subject is triple negative breast cancer C2 type; if the Euclidean distance between the centroid of the subject sample and the C3 type cluster center is shorter than the Euclidean distance between the centroid and C1 and C2, the subject is triple negative breast cancer type C3.
In the present invention, the inventors analyzed mutations and Copy Number Variation (CNV) among the three different triple negative breast cancer subtypes obtained by the above method, found that no significant correlation was found between the molecular subtype and clinical variables such as tumor stage, age and sex. Moreover, a significant difference in the distribution of the widely accepted 5 breast cancer indigenous molecular subtypes (Luminal A, luminal B, HER2-enriched, basal-like and Claudin-low) among the three subtypes based on the LR pair was also noted. Wherein Claudin-low subtype samples account for a large proportion of the C3 subtype, whereas Basal-like subtype samples account for a larger proportion of the C1 subtype. There is also a significant difference in mortality between C1 and C3. Over 60% of the C1 samples died and over 55% of the C3 samples survived. Whereas in the other queue, the age distribution trends for C1 and C3 are opposite. There is also a statistically significant difference in survival status between the three subtypes. However, three subtypes in the present invention can realize the evaluation effect on the treatment and prognosis effect which cannot be realized by the conventional 5 subtype typing.
In the present invention, the inventors explored the molecular biological differences between LR pairs based on three molecular subtypes, and found that in C1 and C3 in the three TNBC dataset, glycolysis, hypoxia and estrogen responses were significantly up-regulated early, while 10 pathways including apoptosis, TNFA signaling through NF-xB and complement, etc. were significantly down-regulated. Upon further comparison of the activities of the various pathways between C1 and C2 and between C2 and C3 subtypes in the metabolic cohort, 6 pathways were found to be activated in each of the LR pair-based molecular subtypes, including glycolysis, hypoxia, epithelial-mesenchymal transition, MYC targets, myogenesis, early and late estrogenic responses.
In the present invention, the inventors conducted an immunoassay on three molecular subtypes, and found that most immune cells (16 total) in which there was a difference in estimated proportion between the three molecular subtypes based on LR pairs were in metabolic groups, including naive B cells, memory B cells, CD 8T cells, naive CD 4T cells, activated CD4 memory T cells, delta-gamma T cells, resting and activated NK cells, M0 macrophages, M1 macrophages, M2 macrophages, resting dendritic cells, activated dendritic cells, resting and activated mast cells, and neutrophils. Among the LR pair-based molecular subtypes in all three TNBC queues, there were significant differences in the estimated proportion of naive B cells, naive CD 4T cells, activated CD4 memory T cells, delta-gamma T cells, activated NK cells, M0 macrophages, M1 macrophages, M2 macrophages, and activated mast cells. The matrix score, immune score and estimated score were compared between subtypes by the kruskalwall test. The immune scores between the three molecular subtypes in each cohort showed significant differences, with p values <0.01. The immune/estimated scores between the three molecular types in each cohort also showed a high degree of significant difference, with p values of <0.0001. Regardless of which of the three scores, C3 always > C2> C1.
In a second aspect, the invention provides the use of the typing method of the first aspect of the invention for dividing the population suffering from triple negative breast cancer.
In the present invention, the inventors succeeded in constructing a risk model based on LR pairs (LR pairs) based on the typing method described in the first aspect, which model analyzes prognosis-related LR pairs using LASSO-punished Cox regression, eliminating unimportant LR pairs by reducing the weight of model parameters, and obtaining a primary screening LR pair. The prescreened LR was then filtered through the stepAIC strategy in the MASS software package. The LR versus scoring model was built using the genes with the lowest stepAIC values and the coefficients for each gene were obtained by multiplex Cox regression analysis.
In some embodiments of the invention, the C1 population is superior to the C2 population in terms of clinical treatment effect.
In a third aspect of the present invention, there is provided the use of a test product for detecting LR versus expression level as follows in the preparation of a triple negative breast cancer diagnosis and/or typing product;
wherein the LR pair comprises: APOB-ENO1, CXCL12-ITGA4, GPI-AMFR, MUC7-SELL, SELPLG-SELL, PODXL-SELL, BSG-SLC16A7, CD22-PTPRC, PTPRC-CD22, PPBP-CXCR1, IL7-IL7R, CCL-CCR 7, SERPING1-SELP, CCL16-CCR2, PLG-F2RL1, CXCL13-ACKR4, IL11-IL6ST, ICOS-ICOSLG, ICOSLG-ICOS, CCL19-ACKR4, CXCL3-CXCR1, ICAM4-ITGA4 CCL16-CCR5, CD2-CD48, CD48-CD2, TNFSF13B-TNFRSF13B, ADAM-ITGA 4, HBEGF-ERBB2, CALR-SCARF1, CXCL13-CXCR5, LGI3-STX1A, CD-CD 48, CD48-CD244, HLA-A-KIR3DL1, HLA-B-KIR3DL1, HLA-F-KIR3DL1, LGI3-FLOT1, VEGFA-GPC1, EFNA4-EPHA5, QRFP-P2RY14, FGF4-FGFRL1, CXCL8-CXCR1 NMS-NMUR2, GLG1-SELE, LY9-LY9, EDN3-KEL, CXCL13-CCR10, CCL19-CXCR3, CCL21-CXCR3, ADM-RAMP2, CFH-SELL, CXCL12-ITGB1, CD34-SELL, NMB-BRS3, CCL21-CCR7, PODXL2-SELL, SERPING1-SELE, CLEC2B-KLRF1, KLRF1-CLEC2B, CXCL-CXCR 3, SEMA4D-MET, ADM-MRGPRX2, EBI3-IL6ST TSLP-IL7R, B2M-CD1B, ADAM-ITGA 4, WNT8A-FZD5, ADAM7-ITGB7, GNRH1-GNRHR, ADM-CALCR, LTB-LTBR, COL14A1-CD44, F2-F2RL2, DEFB103A-CCR6, SLIT3-ROBO1, VTN-PLAUR, CCL25-ACKR2, CCL19-CCR10, LRFN3-LRFN3, CXCL9-CCR3, MRC1-PTPRC, PTPRC-MRC1, CD58-CD2, CYTL1-CCR2, DKK2-LRP6, SERPINE1-LRP1, EFNA4-EPHA6, NMB-NMBR, MMP7-ERBB4, POMC-OPRD1, CLEC2D-KLRB1, KLRB1-CLEC2D, GDF-TDGF 1, GUCA2A-GUCY2C, IL-KCNJ 10, FGF3-FGFRL1, LRPAP1-SORT1, CXCL11-CXCR3, WNT9A-FZD10, APOB-LSR, CD70-CD27, ANGPTL1-TEK, TNF-TNFRSF1B, CXCL-CD 4, NECTIN 2-TIT, NECTIN4-TIGIT, TIT-NECTIN 2, FASLG-FAS, POMC-MC3R, VCAM-ITGB 7, POMC-APP 1, IL 22-PLIL 22RA1, FASLG-FAS, SEL 21-KR 4, ASTRF 13-FAS CD160-TNFRSF14, TNFRSF14-CD160, FGF6-FGFR1, SPP1-ITGB1, MADCAM1-ITGA4, OXT-AVPR1A, CXCL-CXCR 3, CXCL5-ACKR1, BTLA-TNFRSF14, TNFRSF14-BTLA, CCL5-CCR3, TAC3-TACR1, CXCL10-SDC4, VEGFA-KDR, EFNA1-EPHA5, CCL5-ACKR1, EFNA1-EPHA2, DKK4-KREMEN2, POMC-MC2R, GNAS-PTGDR, CCL13-CCR2, IL18-IL18R1, MYL9-CD69, CXCL6-CXCR1, RSPO3-LRP6, AMBN-CD63, CALR-TSHR, ICAM3-ITGAL, NOTU-NMUR 1 and NOT 4-CH 3.
In the present invention, the LR pairs are all from the document management database connector DB2020, and those skilled in the art can select LR pairs from other database sources according to actual use requirements.
In some embodiments of the invention, the detection products for detecting LR expression levels include, but are not limited to, detection products constructed based on semi-quantitative RT-PCR, northern blot, real-time fluorescent quantitative PCR, and the like. The relevant specific primers or probes and the like can be obtained based on routine in the art.
In some embodiments of the invention, the detection products include, but are not limited to, detection reagents, detection kits, gene chips.
In a fourth aspect of the invention, there is provided a set of detection systems comprising:
a detection unit for detecting LR pairs; and
a parting unit;
the parting unit carries the data obtained by the detection unit into a matrix, and performs parting according to an original modeling classification method and parameters to obtain parting results.
In some embodiments of the invention, the matrix is obtained based on the training set in the examples, examples of which are shown in table 1 and fig. 24.
In some embodiments of the present invention, the typing unit further includes a calculating device, configured to calculate euclidean distances between the sample data to be measured and three known cluster centers (C1 to C3), and derive a typing result according to the euclidean distances. If the Euclidean distance between the centroid of the sample data to be detected and the C1 type clustering center is shorter than the Euclidean distance between the centroid and C2 and C3 types, the subject is triple negative breast cancer C1 type; if the Euclidean distance between the centroid of the sample data to be detected and the C2 type clustering center is shorter than the Euclidean distance between the centroid and C1 and C3 type clustering centers, the subject is triple negative breast cancer C2 type; if the Euclidean distance between the centroid of the sample data to be detected and the C3 type clustering center is shorter than the Euclidean distance between the centroid and C1 and C2, the subject is triple negative breast cancer C3 type.
The beneficial effects of the invention are as follows:
the invention provides a method for typing triple negative breast cancer, which is characterized in that an algorithm is used for typing a sample by detecting the expression condition of an LR pair related to triple negative breast cancer prognosis in the sample, the typing is accurate and quick, compared with the traditional typing mode of 5 subtypes, 3 subtypes obtained by typing can well divide triple negative breast cancer patients into 3 types, and reasonable and proper treatment or treatment is given according to the obvious differences in the survival rate, prognosis effect estimation, immune effect, clinical treatment effectiveness and the like corresponding to the 3 subtypes, so that the treatment effectiveness is improved, and the death rate of the triple negative breast cancer is effectively reduced.
Drawings
FIG. 1 is a screening of LR pairs related to prognosis in the present invention, wherein A is a screening flow chart; b is a predicted volcanic plot of 145 versus LR; c is 145 versus LR interactive network diagram.
FIG. 2 is a KEGG pathway for 10 highly enriched 145 pairs of LRs.
FIG. 3 is a graph of consistent cluster Cumulative Distribution Function (CDF) for three TNBC subtype recognition results based on LR pairs, where A is K=2-9; b is an increment-area curve of the consistent clustering of samples in the METARIC; c samples clustered heatmap when consistency k=3.
FIG. 4 is a Kaplan-Meier analysis of three subtypes of OS in the METARIC dataset.
FIG. 5 is a Kaplan-Meier analysis of three molecular subtype OSs in GSE58812 dataset (A) and GSE21653 dataset (B).
FIG. 6 is a graph showing clinical characteristics and genomic changes of molecular subtypes based on LR pairs, wherein A is the stage, grade, age, distribution ratio of PAM50+Claudin-low molecular subtypes and survival status of each subtype in the METARIC database; b is the age and distribution ratio of survival status for each subtype in GSE588123 cohort.
Fig. 7 is an age and survival profile for three subtypes in GSE21653 dataset.
FIG. 8 is a waterfall plot (chi-square test) of somatic mutations and CNV in METARIC databases for three subtypes in the present invention.
FIG. 9 is a functional analysis of molecular subtypes based on LR pairs, wherein A is a GSEA bubble pattern for the C1 and C3 subtypes in the metabolic queue; b is a GSEA bubble map of the C1 and C3 subtypes in three queues; c is a GSEA normalized enrichment fraction (NE) heat map of C1 and C2, C1 and C3, and C2 and C3, with the vertical axis representing different comparison sets and the horizontal axis representing path names; d is a path radar map of C1 to C2 and C2 to C3 coherent activation in the METABRIC database.
FIG. 10 is an estimated proportion of 22 immune cells in the LR pair-based molecular subtype in the METABIC (A), GSE58812 (B), GSE21653 (C) cohort; p-value was calculated by kruskalWallis test, ns: p >0.05,: p <0.05,: p <0.01,: p <0.001, < p <0.0001.
FIG. 11 is matrix score, immune score and estimated score between three LR pair based molecular subtypes in the METABIC (A), GSE58812 (B), GSE21653 (C) cohort; p-value was calculated by kruskalWallis test, ns: p >0.05,: p <0.05,: p <0.01,: p <0.001, < p <0.0001.
FIG. 12 is a graph of LASSO-COX regression analysis and fitting LASSO-COX regression model for 6 pairs of LRs.
Fig. 13 is a graph of coefficients corresponding to these predictors in a COX regression model for 4 pairs of LR.
FIG. 14 is a plot of LR pair scores for three LR pair-based subtypes in metabolic group (A), GSE58812 cohort (B), and GSE21653 cohort (C), tested by KruskalWallis.
FIG. 15 is a log rank test based on the OS estimation results of samples with different LR pair scores in Kaplan-Meier comparison METABIC queue (A), GSE58812 queue (C) and GSE21653 queue (D); the time-dependent ROC curve shows the predictive ability of LR to score in the metabolic group (B); ns: p >0.05,: p <0.05,: p <0.01,: p <0.001, < p <0.0001.
Fig. 16 is a time dependent ROC curve showing the predictive ability of LR versus score in GSE58812 queue (a) and GSE21653 queue (B).
Figure 17 is a forest graph of coefficients of univariate (a) and multivariate (B) COX regressions and confidence intervals thereof, including factors in the metabolome such as LR versus score, patient age, stage, grade, and patient outcome.
FIG. 18 is a correlation between LR pair score and immune composition and immune related pathway, where A is the result between ssGSEA score of the Pearson correlation analysis KEGG pathway and LR score in METABIC, r >0.4; b Wilcoxon test for the relative abundance of 22 immune cells in the high LR pair score and low LR pair score in the METABIIC cohort; c is the estimated immune score of the high LR pair score and the low LR pair score group in the METABIC queue, and Wilcoxon test; d is a pearson correlation analysis of LR versus fraction and immune cell components; ns: p >0.05,: p <0.05,: p <0.01,: p <0.001, < p <0.0001.
FIG. 19 is a correlation of LR versus score with immune checkpoint gene expression, wilcoxon test; ns: p >0.05,: p <0.05,: p <0.01,: p <0.001, < p <0.0001.
Fig. 20 is a correlation between LR versus scoring model and exclusion score (a), dysfunction score (B) and TIDE score (C).
FIG. 21 is a plot of LR versus score variability (A) between the Complete Response (CR)/Partial Response (PR) group and the stable disease (PD)/Progressive Disease (PD) group in the IMvigor210 cohort and survival curves for different LR score groups in the IMvigor210 cohort; ns: p >0.05,: p <0.05,: p <0.01,: p <0.001, < p <0.0001.
FIG. 22 is a log rank test of the response of patients with different LR versus scores to anti-PD-L1 treatment in an IMvigor210 cohort.
FIG. 23 is a plot of LR versus score versus drug sensitivity, wherein A is the correlation of LR versus score with drug sensitivity curve AUC, spearman correlation analysis; b is the difference of IC50 estimated values of paclitaxel, veliparib, olaparib and Talazoparib between different LR score groups, and the wilcoxon test; ns: p >0.05,: p <0.05,: p <0.01,: p <0.001, < p <0.0001.
Fig. 24 is exemplary data for 4 samples of other LR pairs.
Detailed Description
In order to make the objects, technical solutions and technical effects of the present invention more clear, the present invention will be described in further detail with reference to the following specific embodiments. It should be understood that the detailed description is presented herein for purposes of illustration only and is not intended to limit the invention.
The experimental materials and reagents used, unless otherwise specified, are those conventionally available commercially.
In the embodiment of the invention, TNBC data resources are mainly METABIC data sets (breast cancer database) which are obtained through Bio Cancer Genomics Portal (cBioPortal) arrangement. The METABIC dataset was downloaded from cBioPortal (http:// cBioport. Org /) and availability filtered. Finally, the dataset used in the present invention included genomic variation data of 318 TNBC samples and a motif table profile of 298 samples (all from the meta data set), as well as microarray data of 10783 TNBC samples collected from GSE58812 and GSE21653 data sets in the gene expression integrated database (Gene Expression Omnibus, GEO, https:// www.ncbi.nlm.nih.gov/GEO /).
In the embodiment of the invention, each statistic is analyzed by adopting R4.0.2 software. Kaplan-Meier survival curves and subject operating characteristics (ROC) curves were visualized by the "survivin" software package and the "time ROC", respectively. LR scores and clinical parameters were included in Cox proportional risk regression to determine independent factors for predicting TNBC prognosis. The p-value cutoff was set to 0.05.
In the present invention, "expression amount of LR pair" refers to the sum of expression amounts of two genes in the LR pair, and examples thereof are: "CXCL9-CCR3 expression level" or "CXCL9-CCR3" refers to the expression level of CXCL 9+CCR3.
In the present invention, the detection of each LR pair or gene can be quantitatively detected based on the existing detection kit or detection product, or can be detected by using primers and/or probes by conventional means in the art, and the model construction and evaluation effect in the present invention is not limited to the selection of detection products.
Ligand receptor pair acquisition and screening
The inventors downloaded 2293 interacting ligand-receptor (LR) pairs from literature management database connectome DB2020 for screening of LR. Wherein, the screening standard is: a patient is defined as highly expressed if the sum of gene expression in LR is equal to or greater than the median of the sum of all patient LR gene expression. Otherwise, the patient is defined as under-expressed. The "survival" packet in the R packet was used to analyze the correlation between each pair of LR and TNBC patient survival in each queue. The statistical significance was analyzed by the Peto and Peto corrections of the Gehan-Wilcoxon test, and the exponential coefficient of the Cox regression model was established to calculate the risk ratio (HR). The "ramp" function in the "meta" package was used to integrate the P values of the different queues based on the Edgington method and to perform multiple test corrections based on the store method.
The results are as follows.
The screening flow chart is shown in fig. 1A.
As described above, the inventors have performed survival analysis of LR pairs on meta, GSE58812 and GSE21653 for screening LR pairs associated with TNBC prognosis, and combined the prognostic significance P-values of LR groups generated by the three queues for meta analysis, integrated the P-values of the three queues based on the edginton method using the "ramp" function in the "meta" software package, and performed multiple test corrections based on the store method using the "qValue" software package. The results co-screened 145 pairs of LR associated with TNBC prognosis, 44 pairs of LR had poor prognosis and 101 pairs of LR had good prognosis (fig. 1B). And an interaction network graph was plotted for these LR pairs correlated with TNBC prognosis (fig. 1C).
Enrichment analysis was performed by further incorporating the LR pair described above into KEGG (fig. 2). As a result, it was found that there are mainly 10 pathways most abundant in 145 versus LR, including the interaction of viral proteins with cytokines and cytokine receptors, cytokine-cytokine receptor interactions, adhesion molecules (CAMs), chemokine signaling pathways, igA-produced intestinal immune networks, rheumatoid arthritis, proteoglycans in cancer, malaria, neuroactive ligand-receptor interactions, and hematopoietic cell lineages.
LR subtype typing based on consensus clustering
The clusters were classified using "consensus clusters" according to the expression of TNBC prognosis-related LR pairs. Where the K-means algorithm, "1-Pearson correlation" and clustering algorithm are specified to divide each sample into K groups and let each bootstrap involve 80% of the samples for a total of 500 replicates. The heatmap of the consensus cluster is generated from the R data packet "pheeatmap". The number of clusters is determined by a consensus Cumulative Distribution Function (CDF) graph and a delta area graph, and the standard is that the consistency in the clusters is high, the variation coefficient is low, and the area under a CDF curve is not increased significantly.
The results are as follows.
The inventors examined whether TNBC samples can be distinguished into different subtypes (three TNBC subtypes) based on the diversity of expression patterns of their prognostic-related LR pairs using the above method. Wherein important prognostic-related LR pairs are included in a cluster pattern for analysis, the expression abundance of each LR pair being represented by the sum of the expression of the ligand and receptor genes. In the METARIC queue, 298 TNBC samples were clustered by consistent cluster analysis. In the optimization of the cluster number K, the Cumulative Distribution Function (CDF) curve shows that stable clustering results are produced when k=3 (fig. 3A and 3B), so k=3 is selected as the final option (fig. 3C).
Further analysis of the prognostic signatures then indicated that there was a significant difference in prognosis between the three subtypes. The Overall Survival (OS) of C1 is most unfavorable, the OS of C3 being the longest of the three subtypes, and the OS of C2 being intermediate between the two subtypes (fig. 4). Furthermore, the inventors also formed the corresponding three molecular subtypes after applying the same molecular subtype determination method to the TNBC patient cohorts of GSE58812 and GSE21653, and observed that there was also a significant and similar difference in prognosis between the three subtypes in the survival analysis (fig. 5A and 5B).
The matrix obtained finally is shown by taking ADAM28-ITGA4 as an example (table 1), the clustering data of the subject can be obtained by bringing the data of the subject into the matrix, and the typing can be judged according to the clustering data and the verification set clustering typing information in the invention. Exemplary data for other LR pairs can be found in fig. 24.
Figure BDA0003805230500000091
/>
Figure BDA0003805230500000101
/>
Figure BDA0003805230500000111
Analysis of mutations and Copy Number Variation (CNV) between different triple negative breast cancer subtypes
Genomic data types based on cbioPortal integration include somatic mutations, copy number changes, gene expression, and DNA methylation. In this example, the inventors queried and downloaded carrier cell mutation and copy number change data directly from cBioPortal and analyzed according to cBioPortal program. Wherein the "maftools" package is used to visualize mutation data. The differences in CNV genes of subtypes with significant gain and loss were compared using the chi-square test.
The results are as follows.
The inventors have found that different clinical features and genomic mutations may also be contributing factors to different prognostic outcomes. The inventors analyzed the clinical characteristics of each subtype in the three TNBC datasets. Among these, no significant correlation was found between the molecular subtypes in the METARIC database and clinical variables (such as tumor stage, age and sex), and significant differences in the distribution of the widely accepted 5 breast cancer-specific molecular subtypes (lumineal a, lumineal B, HER2-enriched, basal-like and Claudin-low) among the three LR pair-based subtypes were also noted. Wherein Claudin-low subtype samples account for a large proportion of the C3 subtype, whereas Basal-like subtype samples account for a larger proportion of the C1 subtype. There is also a significant difference in mortality between C1 and C3. Over 60% of the C1 samples died and over 55% of the C3 samples survived (fig. 6A). Whereas in the GSE58812 cohort, the age distribution trends for C1 and C3 are opposite. There was also a statistically significant difference in survival status between the three subtypes (fig. 6B). However, there was no significant difference in age distribution between the three subtypes in the GSE21653 dataset. However, among them, the proportion of surviving patients in C1 and C3 was large, and the proportion of surviving samples in C3 was high (FIG. 7).
The top 10 genes with the greatest variation among the three subtypes are further shown as a waterfall plot, and the top 10 CNV deleted genes and CNV amplified genes in this heat plot were found to show a relatively high mutation rate and mutation diversity for C1 and C2 (fig. 8).
Functional enrichment analysis
The Hallmark gene set is retrieved and downloaded from the molecular characterization database (MSigDB). GSEA analysis (Gene Set Enrichment Analysis) was performed on the screened LR sets using the GSEA software program and the most significantly enriched signal pathway was selected from the Normalized Enrichment Score (NES), with the screening criteria being a False Discovery Rate (FDR) <0.05.
The results are as follows.
To further explore the molecular biological differences between LR pairs based on the three molecular subtypes, the inventors performed GSEA on all three TNBC datasets. For GSEA of METARIC database, it was found that the activity of 14 pathways in C1 was significantly increased compared to C3, mainly cell cycle related signaling pathways such as MYC target, E2F target, G2M checkpoint and cancer related pathways, as well as glycolysis, hypoxia, etc.; the activity of the 11 pathways was significantly reduced, mainly immune-related pathways such as complement, inflammatory response, interferon- α response, allograft rejection, interferon- γ response, etc. (fig. 9A). Whereas in C1 and C3 in the three TNBC dataset, glycolysis, hypoxia and estrogen responses were significantly up-regulated early, while 10 pathways including apoptosis, TNFA signaling through NF-xB and complement, etc were significantly down-regulated (fig. 9B). The inventors yet further compared the activities of the various pathways between C1 and C2 and between C2 and C3 subtypes in the metabolic cohort, and found that 6 pathways were activated in each LR pair-based molecular subtype, including glycolysis, hypoxia, epithelial-mesenchymal transition, MYC targets, myogenesis, early and late estrogenic responses (fig. 9C, fig. 9D).
Immunoassay
The ratio of stroma to immune cells in the tumor samples was inferred by expression profile, and the "esimate" in the R data packet was used to calculate the immune score and stroma score. Wherein, the higher the immune score and matrix score, the higher the TME content. The degree of infiltration of 22 immune cells in TNBC was further quantified by the CIBERSORT algorithm.
The results are as follows.
After running cibelort, an estimated proportion of immune cells of 22 molecular subtypes based on LR pairs was obtained in three TNBC queues. The Kruskal-Wallis test results showed that most immune cells (16 total) with a difference in estimated proportion between the three molecular subtypes based on LR pairs were in the metabolic group, including naive B cells, memory B cells, CD 8T cells, naive CD 4T cells, activated CD4 memory T cells, delta-gamma T cells, resting and activated NK cells, M0 macrophages, M1 macrophages, M2 macrophages, resting dendritic cells, activated dendritic cells, resting and activated mast cells, and neutrophils (fig. 10A). Among the LR pair-based molecular subtypes in all three TNBC queues, there were significant differences in the estimated ratios of naive B cells, naive CD 4T cells, activated CD4 memory T cells, delta-gamma T cells, activated NK cells, M0 macrophages, M1 macrophages, M2 macrophages, and activated mast cells (fig. 10B and 10C). The matrix score, immune score and estimated score were compared between subtypes by the kruskalwall test. The immune scores between the three molecular subtypes in each cohort showed significant differences, with p values <0.01. The immune/estimated scores between the three molecular types in each cohort also showed a high degree of significant difference, with p values of <0.0001. Regardless of which of the three scores, C3 always > C2> C1 (fig. 11A, 11B, 11C).
Risk model construction based on LR pairs (LR pairs)
Important genes are screened from LR pairs related to prognosis, and a risk model is constructed.
The specific screening steps are as follows:
the prognosis-related LR pairs were analyzed using LASSO-punished Cox regression, and the unimportant LR pairs were eliminated by reducing the weight of the model parameters, resulting in a prescreened LR pair. The prescreened LR was then filtered through the stepAIC strategy in the MASS software package. The LR versus scoring model was built using the genes with the lowest stepAIC values and the coefficients for each gene were obtained by multiplex Cox regression analysis.
The results are as follows.
To select the LR pair best suited for predicting TNBC prognosis, the inventors performed LASSO-COX regression analysis on the 145 pairs of LR screened in the above examples, and co-screened 6 pairs of LR in a 10-fold cross-validation process, as they presented non-zero coefficients in the fitted LASSO-COX regression model (fig. 12). By stepic multifactor regression analysis, 4 of the LR pairs (CXCL 9-CCR3, GPI-AMFR, IL18-IL18R1, and PLG-F2RL 1) were finally selected, with statistical fits of the model and the number of parameters used for the fits.
The model formula obtained is:
LR pair score = -0.08996361 × (CXCL 9-CCR3 expression level) +0.27093847 × (GPI-AMFR expression level) -0.29143116 × (IL 18-IL18R1 expression level) +0.28034741 × (PLG-F2 RL1 expression level).
Here, "CXCL9-CCR3" refers to the expression level of CXCL 9+CCR3, and the like.
Essentially, the model formulation of risk score for each patient described above is based on LR versus score = bat x Expi. Wherein, expi refers to the expression level of the ligand i gene, beta is the coefficient of the specific gene regressed by multivariable Cox, and patients can be divided into a high risk group (high LR fraction group) and a low risk group (low LR fraction group) by performing zscore treatment with "0" as a threshold. However, when the method is used for prognosis analysis, a survival curve can be further drawn by using a Kaplan-Meier method to intuitively show the prognosis risk condition, wherein the significance of the difference is determined by using a log rank test.
It should be understood, of course, that the threshold used in embodiments of the present invention is essentially a continuous variable and may be defined as a cut off value as appropriate when classification of the ranking data is desired. In the present embodiment, the threshold value is set to "0".
The coefficients corresponding to these predictors in the COX regression model are shown in fig. 13. Based on the 4 LR pairs described above, an LR pair scoring model and an LR score were constructed for quantitative analysis of the LR pair pattern of TNBC samples. The inventors also found that the LR fraction of the C1 subtype was significantly higher than that of the C2 and C3 subtypes in the METABRIC, GSE58812 and GSE21653 queues (fig. 14A, 14B, 14C). To analyze the clinical relevance of LR pairs, TNBC samples from each cohort were divided into two groups according to LR pair scores. LR in the meta panel showed significantly favorable survival results for patients with low scores (fig. 15A). The area under the curve (AUC) of the LR versus fractional time-dependent ROC curves were 0.72, 0.63, 0.65 and 0.66 at 1, 3, 5 and 10 years, respectively (fig. 15B). The reliability of the LR score was further verified using 107 samples from GSE58812 and 83 samples from GSE21653, and it was found that in both verification sets, samples with high LR scores showed higher mortality and shorter survival times (fig. 15C, 15D). GSES8812 validated the AUC values for the LR versus scoring model were 0.72, 0.75 and 0.67 at 3 years, 5 years and 10 years, respectively (fig. 16A). The LR pair scoring model performed better in GSE21653 validation cohort with AUCs corresponding to 1, 3, and 5 year survival of 0.90, 0.87, and 0.78, respectively (fig. 16B).
Furthermore, univariate Cox regression model analysis in METABRIC showed that stage, age, and LR versus score correlated significantly with the prognosis of TNBC (fig. 17A). And in the multivariate Cox regression model, these prognostic factors can all be considered as independent prognostic factors for TNBC (fig. 17B).
Correlation between LR pair scores and immune composition and immune-related pathways
In order to find the route most relevant to LR score. The inventors further analyzed METABRIC samples using "GSVA" in the R package.
The results are as follows.
Single sample GSEA (ssGSEA) scores were obtained for METABRIC samples with different functions by "GSVA" and 30 pathways significantly correlated with LR scores were obtained by Pearson correlation analysis. Wherein 2 pathways are positively correlated with the LR fraction and 28 pathways are negatively correlated with the LR fraction. ssGSEA scores of immune-related pathways such as chemokine signaling pathway, antigen processing and presentation, natural killer cell mediated cytotoxicity, toll-like receptor signaling pathway, natural killer cell mediated cytotoxicity and T cell receptor signaling pathway were significantly inversely correlated with LR pair scores (fig. 18A). Further analysis of the relationship between LR versus fraction and tumor immune components found that at least a significant difference between the high LR fraction samples and the low LR fraction samples was generally present in 22 immune cells (fig. 18B).
Furthermore, pearson correlation analysis between LR pair score and immune cells showed that LR pair score was significantly inversely correlated with CD 8T cells, activated CD4 memory T cells and macrophages, but positively correlated with M0 macrophages and M2 macrophages (fig. 18D), indicating that there was a correlation between LR pair score and tumor immunity.
LR effect of scoring model in predicting clinical treatment response
The relationship between LR score values and gene expression levels in immune checkpoints was determined by the Wilcoxon test and a block diagram was generated to enable visualization. Tumor immune dysfunction and rejection (TIDE) is the prediction of Immune Checkpoint Blockade (ICB) therapeutic response in a sample by modeling the exact genetic characteristics of both immune escape mechanisms.
The inventors downloaded drug susceptibility data for about 1000 cancer cell lines from cancer drug susceptibility Genomics (GDSC) (http:// www.cancerrxgene.org, GDSC is the largest common resource platform for drug susceptibility and drug response molecular markers for cancer cells), wherein data was downloaded primarily for breast cell lines, resulting in a total of 50 cell line data for 190 drugs. The correlation between drug sensitivity and LR fraction values was calculated using Spearman correlation analysis with the area under the curve (AUC) values of the anti-tumor drugs in the tumor cell lines as drug response index. The adjusted FDR was calculated using the Benjamin and Hochberg methods. Correlation of Rs absolute >0.2 and FDR <0.05 is considered statistically significant. Furthermore, the recommended antitumor drugs paclitaxel, veliparib, olaparib (Olaparib) and Talazoparib (Talazoparib) were compared in different LR-scoring groups for maximum inhibitory concentration (in half inhibitory concentration IC 50) in TNBC treatment using prropetic software package.
The results are as follows.
In conjunction with the correlation between LR pair scores and tumor immunity disclosed in the above examples, the inventors further analyzed the correlation between LR pair scores and immune checkpoint genes. In terms of expression levels, 18 of the 19 immune checkpoints showed a difference between the two LR versus scoring set, with a higher LR versus scoring set (fig. 19). The high LR pair scores also showed significantly up-regulated T cell rejection scores and significantly down-regulated T cell dysfunction scores compared to the low LR pair score groups, while the TIDE scores did not significantly differ between the two groups (figure 20).
Further, the inventors examined the ability of LR to score predictive Immune Checkpoint Inhibitor (ICI) treatment response in immune treatment cohort IMvigor210 (anti-PD-LI). Samples of Stable Disease (SD) and Progressive Disease (PD) were found to have significantly higher LR scores than samples of Complete Response (CR) and Partial Response (PR) (fig. 21A). The anti-PD-L1 treated samples were divided into low LR and high LR fraction groups. In the IMvigor210 cohort, the prognosis for the samples with high LR scores remained significantly worse than for the samples with low LR scores (fig. 21B). The proportion of LR that responded positively to low-score patients against PD-L1 treatment was significantly higher than for LR-score patients (fig. 22).
The GDSC database stores therapeutic response data for various anticancer drugs, as well as gene expression profiles for a large number of cancer cell lines. The inventors have found that LR is significantly correlated with the therapeutic response of 29 drugs by Spearman correlation analysis of GDSC data, as shown by the area under the drug sensitivity curve (AUC). Of these, 28 related pairs (LR pairs) were positive, indicating that the high LR fraction in the tumor was related to its resistance to these drugs (fig. 23A). Furthermore, by comparing the IC50 estimates of paclitaxel, veliparib, olaparib (Olaparib), and Talazoparib (Talazoparib) in the two LR-score sets, the IC50 values of the four drugs in the low LR-score set were found to be significantly lower than the high LR-score set, indicating that the low LR-score set may be more sensitive to the four drug treatment (fig. 23B).
In summary, by performing TNBC survival analysis on the 2293 pairs of LR, the inventors screened 145 pairs of LR significantly correlated with TNBC prognosis altogether, and then obtained three LR pairs subtypes of TNBC by means of unsupervised clustering according to the expression of 145 pairs of LR. Of the three LR versus subtypes, the C1 subtype has the worst prognosis, with the most aggressive breast cancer subtype, the basal-like subtype, having a significantly higher proportion in the C1 subtype group than the other two groups, with the highest proportion of deaths in the corresponding clinical profile of this group. In addition, the C1 subtype group showed the lowest anti-tumor immune response, such as lower tumor infiltrating lymphocytes (naive B cells, CD 8T cells, naive CD4T cells), matrix scoring and immune scoring, which may be the cause of poor prognosis of the C1 subtype.
In addition, in addition to typing TNBC based on 145, lasso regression and Cox analysis were performed on 145 pairs of LR, and 4 pairs of LR were selected to construct an LR pair scoring model. The significance of this LR to the scoring model for prognostic evaluation was confirmed in both TCGA and both geographic datasets. In this model, samples with high LR versus score show significantly shorter survival times than samples with low LR versus score. Chemokine signaling pathways are well known in the art to promote an anti-tumor response of the immune system by recruiting immune cells; antigen processing and presentation as the initiation of adaptive immune responses play a key role in anti-tumor immunity; the intensity of T cell receptor signaling is a key determinant of T cell mediated anti-tumor responses; natural killer cell mediated cytotoxicity is an important effector mechanism of the immune system against cancer; and activation of toll-like receptor signaling pathways can be used to enhance immune responses against malignant cells, and the like. The present invention demonstrates that LR versus fraction is not only significantly inversely related to chemokine signaling pathway, antigen processing and presentation, T cell receptor signaling pathway, natural killer cell mediated cytotoxicity, toll-like receptor signaling pathway, natural killer cell mediated cytotoxicity and T cell receptor signaling pathway, but also shows matrix scoring, immune scoring, and infiltration of CD 8T cells, activated CD4 memory T cells and macrophages. Furthermore, high LR had no significant difference in TIDE score between the score and low LR score, and immune escape was likely to have no significant effect on LR score. Taken together, all of the above results, it is believed that TNBC samples with high LR versus score do not have strong anti-tumor immunity.
The different ligands expressed by cancer cells bind to cell surface receptors on immune cells, triggering inhibitory pathways (e.g., PD-1/PD-L1) and promoting immune tolerance in immune cells. In the present invention, the inventors validated the ability of 4 LR versus scores (based on LR versus score model) to predict Immune Checkpoint Inhibitor (ICI) treatment response using an anti-PD-L1 cohort. The LR score of a fully or partially remitted patient was found to be significantly lower than that of a patient with stable or progressive disease. The clinical benefit of low LR versus anti-PD-L1 treatment was significantly greater than that of high LR versus score, which demonstrates the effectiveness of LR versus scoring model prediction for anti-PD-L1 treatment.
Some molecular targeted antitumor drugs can prevent cancer immunotherapy drug resistance, but only single drug treatment can not achieve stable treatment effect, and the combination of the antitumor drugs and ICI immunotherapy can greatly improve prognosis of patients. In the examples of the present invention, the inventors determined 29 pairs of LR versus score and drug sensitivity in the GDSC database by Spearman correlation analysis, with 28 pairs of drug sensitivity curves showing a significant positive correlation between AUC and LR score (only Wnt-C59 shows sensitivity correlated with LR score). This suggests that they exhibit resistance to LR fractions, based on the efficient availability of drug-resistant drugs for the targeted drug development of these LR pairs.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (6)

  1. Application of LR to expression level detection products in preparation of triple negative breast cancer parting products;
    the LR pairs were: APOB-ENO1, CXCL12-ITGA4, GPI-AMFR, MUC7-SELL, SELPLG-SELL, PODXL-SELL, BSG-SLC16A7, CD22-PTPRC, PTPRC-CD22, PPBP-CXCR1, IL7-IL7R, CCL-CCR 7, SERPING1-SELP, CCL16-CCR2, PLG-F2RL1, CXCL13-ACKR4, IL11-IL6ST, ICOS-ICOSLG, ICOSLG-ICOS, CCL19-ACKR4, CXCL3-CXCR1, ICAM4-ITGA4 CCL16-CCR5, CD2-CD48, CD48-CD2, TNFSF13B-TNFRSF13B, ADAM-ITGA 4, HBEGF-ERBB2, CALR-SCARF1, CXCL13-CXCR5, LGI3-STX1A, CD-CD 48, CD48-CD244, HLA-A-KIR3DL1, HLA-B-KIR3DL1, HLA-F-KIR3DL1, LGI3-FLOT1, VEGFA-GPC1, EFNA4-EPHA5, QRFP-P2RY14, FGF4-FGFRL1, CXCL8-CXCR1 NMS-NMUR2, GLG1-SELE, LY9-LY9, EDN3-KEL, CXCL13-CCR10, CCL19-CXCR3, CCL21-CXCR3, ADM-RAMP2, CFH-SELL, CXCL12-ITGB1, CD34-SELL, NMB-BRS3, CCL21-CCR7, PODXL2-SELL, SERPING1-SELE, CLEC2B-KLRF1, KLRF1-CLEC2B, CXCL-CXCR 3, SEMA4D-MET, ADM-MRGPRX2, EBI3-IL6ST TSLP-IL7R, B2M-CD1B, ADAM-ITGA 4, WNT8A-FZD5, ADAM7-ITGB7, GNRH1-GNRHR, ADM-CALCR, LTB-LTBR, COL14A1-CD44, F2-F2RL2, DEFB103A-CCR6, SLIT3-ROBO1, VTN-PLAUR, CCL25-ACKR2, CCL19-CCR10, LRFN3-LRFN3, CXCL9-CCR3, MRC1-PTPRC, PTPRC-MRC1, CD58-CD2, CYTL1-CCR2, DKK2-LRP6, SERPINE1-LRP1, EFNA4-EPHA6, NMB-NMBR, MMP7-ERBB4, POMC-OPRD1, CLEC2D-KLRB1, KLRB1-CLEC2D, GDF-TDGF 1, GUCA2A-GUCY2C, IL-KCNJ 10, FGF3-FGFRL1, LRPAP1-SORT1, CXCL11-CXCR3, WNT9A-FZD10, APOB-LSR, CD70-CD27, ANGPTL1-TEK, TNF-TNFRSF1B, CXCL-CD 4, NECTIN 2-TIT, NECTIN4-TIGIT, TIT-NECTIN 2, FASLG-FAS, POMC-MC3R, VCAM-ITGB 7, POMC-APP 1, IL 22-PLIL 22RA1, FASLG-FAS, SEL 21-KR 4, ASTRF 13-FAS CD160-TNFRSF14, TNFRSF14-CD160, FGF6-FGFR1, SPP1-ITGB1, MADCAM1-ITGA4, OXT-AVPR1A, CXCL-CXCR 3, CXCL5-ACKR1, BTLA-TNFRSF14, TNFRSF14-BTLA, CCL5-CCR3, TAC3-TACR1, CXCL10-SDC4, VEGFA-KDR, EFNA1-EPHA5, CCL5-ACKR1, EFNA1-EPHA2, DKK4-KREMEN2, POMC-MC2R, GNAS-PTGDR, CCL13-CCR2, IL18-IL18R1, MYL9-CD69, CXCL6-CXCR1, RSPO3-LRP6, AMBN-CD63, CALR-TSHR, ICAM3-ITGAL, NOTU-NMUR 1 and 4-CH 3.
  2. 2. The use of claim 1, wherein the test product comprises a test reagent.
  3. 3. The use according to claim 1, wherein the method of using the test product comprises the steps of:
    detecting the expression of the LR pairs in the sample, and typing the sample by using an algorithm, wherein the algorithm is a K-means algorithm, a 1-Pearson correlation algorithm and a clustering algorithm.
  4. 4. The use according to claim 3, characterized in that the number of clusters K of the algorithm is 3.
  5. 5. The use according to claim 3, wherein the triple negative breast cancer is typed as C1, C2 and C3.
  6. 6. The use according to claim 5, wherein the criterion for typing is:
    calculating Euclidean distances between a subject sample and three typing clustering centers through the expression profile, and judging typing according to the distances;
    wherein the method comprises the steps of
    If the Euclidean distance between the centroid of the subject sample and the C1 type clustering center is shorter than the Euclidean distance between the centroid and C2 and C3, the subject is triple negative breast cancer C1 type;
    if the Euclidean distance between the centroid of the subject sample and the C2 type clustering center is shorter than the Euclidean distance between the centroid and C1 and C3, the subject is triple negative breast cancer C2 type;
    If the Euclidean distance between the centroid of the subject sample and the C3 type cluster center is shorter than the Euclidean distance between the centroid and C1 and C2, the subject is triple negative breast cancer type C3.
CN202210994924.2A 2022-08-18 2022-08-18 LR-based method for typing triple negative breast cancer and application thereof Active CN115478106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210994924.2A CN115478106B (en) 2022-08-18 2022-08-18 LR-based method for typing triple negative breast cancer and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210994924.2A CN115478106B (en) 2022-08-18 2022-08-18 LR-based method for typing triple negative breast cancer and application thereof

Publications (2)

Publication Number Publication Date
CN115478106A CN115478106A (en) 2022-12-16
CN115478106B true CN115478106B (en) 2023-07-07

Family

ID=84422859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210994924.2A Active CN115478106B (en) 2022-08-18 2022-08-18 LR-based method for typing triple negative breast cancer and application thereof

Country Status (1)

Country Link
CN (1) CN115478106B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1977052A (en) * 2004-03-23 2007-06-06 肿瘤疗法科学股份有限公司 Method for diagnosing non-small cell lung cancer
CN113195733A (en) * 2018-10-18 2021-07-30 新加坡科技研究局 Method for quantifying molecular activity in human tumor cancer cells

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2012340186A1 (en) * 2011-11-18 2014-06-19 Vanderbilt University Markers of triple-negative breast cancer and uses thereof
WO2015148971A2 (en) * 2014-03-27 2015-10-01 Research Foundation Of The City University Of New York Method for detecting or treating triple negative breast cancer
EP3194624B1 (en) * 2014-09-15 2022-02-16 Garvan Institute of Medical Research Methods for diagnosis, prognosis and monitoring of breast cancer and reagents therefor
CN114438209A (en) * 2022-02-08 2022-05-06 深圳市陆为生物技术有限公司 Marker and model for predicting overall survival of three-negative breast cancer clinical prognosis
CN114891887A (en) * 2022-05-13 2022-08-12 西安交通大学 Method for screening triple negative breast cancer prognosis gene marker

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1977052A (en) * 2004-03-23 2007-06-06 肿瘤疗法科学股份有限公司 Method for diagnosing non-small cell lung cancer
CN113195733A (en) * 2018-10-18 2021-07-30 新加坡科技研究局 Method for quantifying molecular activity in human tumor cancer cells

Also Published As

Publication number Publication date
CN115478106A (en) 2022-12-16

Similar Documents

Publication Publication Date Title
CN115424669B (en) Triple negative breast cancer curative effect and prognosis evaluation model based on LR (long-term evolution) score
Zhang et al. Pan-cancer landscape of T-cell exhaustion heterogeneity within the tumor microenvironment revealed a progressive roadmap of hierarchical dysfunction associated with prognosis and therapeutic efficacy
Hu et al. Development and verification of the hypoxia-related and immune-associated prognosis signature for hepatocellular carcinoma
Wang et al. A novel tumor mutational burden-based risk model predicts prognosis and correlates with immune infiltration in ovarian cancer
Zhang et al. Development and validation of a fourteen-innate immunity-related gene pairs signature for predicting prognosis head and neck squamous cell carcinoma
Yu et al. Comprehensive analysis and establishment of a prediction model of alternative splicing events reveal the prognostic predictor and immune microenvironment signatures in triple negative breast cancer
Chen et al. Molecular subtyping of glioblastoma based on immune-related genes for prognosis
Zhao et al. Identification of hepatocellular carcinoma prognostic markers based on 10-immune gene signature
Zhang et al. Integrated multi-omics identified the novel intratumor microbiome-derived subtypes and signature to predict the outcome, tumor microenvironment heterogeneity, and immunotherapy response for pancreatic cancer patients
Liu et al. Molecular analysis of Chinese oesophageal squamous cell carcinoma identifies novel subtypes associated with distinct clinical outcomes
Dai et al. Multi-omics analyses of CD276 in pan-cancer reveals its clinical prognostic value in glioblastoma and other major cancer types
Zhou et al. Characterization of aging cancer-associated fibroblasts draws implications in prognosis and immunotherapy response in low-grade gliomas
Tan et al. Molecular subtypes based on the stemness index predict prognosis in glioma patients
Zhang et al. Hallmark guided identification and characterization of a novel immune-relevant signature for prognostication of recurrence in stage I–III lung adenocarcinoma
Xu et al. Comprehensive FGFR3 alteration-related transcriptomic characterization is involved in immune infiltration and correlated with prognosis and immunotherapy response of bladder cancer
Yang et al. Integrated transcriptome analyses and experimental verifications of mesenchymal-associated TNFRSF1A as a diagnostic and prognostic biomarker in gliomas
CN115478106B (en) LR-based method for typing triple negative breast cancer and application thereof
Jiang et al. Microenvironment-related gene TNFSF13B predicts poor prognosis in kidney renal clear cell carcinoma
Chen et al. Identification of a ZC3H12D-regulated competing endogenous RNA network for prognosis of lung adenocarcinoma at single-cell level
Pan et al. The molecular subtypes of triple negative breast cancer were defined and a ligand-receptor pair score model was constructed by comprehensive analysis of ligand-receptor pairs
Zhang et al. A novel model associated with tumor microenvironment on predicting prognosis and immunotherapy in triple negative breast cancer
Cai et al. Identification of a basement membrane-related gene signature for predicting prognosis and estimating the tumor immune microenvironment in breast cancer
Lu et al. Establishment and evaluation of module-based immune-associated gene signature to predict overall survival in patients of colon adenocarcinoma
Ji et al. Identification of necroptosis subtypes and development of necroptosis-related risk score model for in ovarian cancer
Xu Crosstalk of three novel types of programmed cell death defines distinct microenvironment characterization and pharmacogenomic landscape in breast cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant