WO2022141775A1 - Construction method for tumor immune checkpoint inhibitor therapy effectiveness evaluation model based on dna methylation spectrum - Google Patents

Construction method for tumor immune checkpoint inhibitor therapy effectiveness evaluation model based on dna methylation spectrum Download PDF

Info

Publication number
WO2022141775A1
WO2022141775A1 PCT/CN2021/076879 CN2021076879W WO2022141775A1 WO 2022141775 A1 WO2022141775 A1 WO 2022141775A1 CN 2021076879 W CN2021076879 W CN 2021076879W WO 2022141775 A1 WO2022141775 A1 WO 2022141775A1
Authority
WO
WIPO (PCT)
Prior art keywords
tumor
methylation
types
immune checkpoint
model
Prior art date
Application number
PCT/CN2021/076879
Other languages
French (fr)
Chinese (zh)
Inventor
郭昊
徐炳祥
葛明晖
颜林林
李诗濛
任用
Original Assignee
江苏先声医疗器械有限公司
江苏先声诊断技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江苏先声医疗器械有限公司, 江苏先声诊断技术有限公司 filed Critical 江苏先声医疗器械有限公司
Publication of WO2022141775A1 publication Critical patent/WO2022141775A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Definitions

  • the invention relates to the field of bioinformatics analysis, in particular to a method for constructing a model for evaluating the efficacy of tumor immune checkpoint inhibitor therapy based on DNA methylation profiles.
  • tumor immunotherapy technologies represented by immune checkpoint inhibitor therapy have been introduced into the treatment of various tumor types, especially in melanoma, renal cell carcinoma, non-small cell lung cancer, head and neck cancer, Urothelial carcinoma, Hodgkin's lymphoma, Michael cell carcinoma and many other advanced malignant tumors have significant curative effect, which greatly improves the survival time and quality of life of some patients with advanced tumors.
  • These tumor immunotherapies do not directly target tumor cells, but fight and eliminate tumor cells by promoting the body's immune response and improving the recognition ability of immune cells to achieve therapeutic purposes.
  • immune checkpoint inhibitor therapy is only effective in some patients, and there are still many patients who cannot benefit from immune checkpoint inhibitor therapy.
  • Some patients even have a series of immune-related adverse events (irAEs) including skin, gastrointestinal, liver, and endocrine side effects due to excessive enhancement of their own non-specific immune responses, and sometimes even life-threatening. Therefore, identifying patient cases who can respond well to immune checkpoint inhibitors is an important means to improve their therapeutic efficacy, safety, and to expand the boundaries of their application.
  • irAEs immune-related adverse events
  • TMB tumor mutation burden
  • MSI microsatellite instability
  • PD-1 programmed death receptor ligand 1
  • the pan-tumor level of immune checkpoint inhibitor treatment efficacy evaluation models are generally based on gene expression profiling data, but the low content of RNA extraction and the instability of RNA itself make the detection of gene expression profiling in tumor tissue many limitations. Therefore, there is an urgent need in the prior art to develop a method for evaluating the efficacy of immune checkpoint inhibitor therapy at the pan-tumor level based on stable and easily detectable biomarkers.
  • the purpose of the present invention is to seek a method or model for the development of an assessment method or model of the therapeutic efficacy of immune checkpoint inhibitors at the pan-tumor level based on stable and easily detectable biomarkers.
  • the present invention provides the following technical solutions:
  • the present invention first provides a method for screening DNA methylation characteristic sites, which is characterized in that it comprises the following steps:
  • Step 1) in a given tumor cohort containing samples of multiple tumor types, perform immune infiltration analysis on each tumor sample based on the detected DNA methylation profile data, and calculate the relative proportion of each type of immune cells in each sample.
  • Infiltration content Cluster analysis is performed based on the immune cell infiltration content in each tumor type cohort sample, and the number of clusters is set to 2 to obtain sample cohorts of two types of immune cell infiltration patterns on each tumor;
  • Step 2) according to the indirect evaluation index of the therapeutic effectiveness of the immune checkpoint inhibitor, select the tumor type that is significantly associated with it;
  • Step 3 analyzing the degree of difference in methylation rates at each methylation site between the sample cohorts of the two types of immune cell infiltration patterns for the tumor types screened by the above indicators, and constructing a set of characteristic methylation sites.
  • step 2) is to select the tumor type that is significantly associated with it according to the indirect evaluation indicators of the therapeutic effectiveness of the three kinds of immune checkpoint inhibitors;
  • the indirect evaluation indexes of the treatment effectiveness of the three immune checkpoint inhibitors are the prognosis survival time (OS) evaluation index, the tumor mutation burden (TMB) evaluation index and the PD-L1 expression level evaluation index;
  • the tumor types that are significantly associated with it are selected in the following manner: for the prognosis survival time (OS) evaluation index, use log rank test to screen the tumor types with significant difference in the prognosis survival time in the two types of samples; Tumor mutation burden (TMB) evaluation index, the Mann Whitney U test was used to screen the tumor types with significant differences in the mutation burden of the two types of samples; for the evaluation index of PD-L1 expression level, DESeq2 of R software was used The expression differences between the two types of samples were characterized, and the tumor types with significant differences in the expression level of PD-L1 gene were selected;
  • OS prognosis survival time
  • TMB Tumor mutation burden
  • DESeq2 of R software was used The expression differences between the two types of samples were characterized, and the tumor types with significant differences in the expression level of PD-L1 gene were selected;
  • the pvalues of the above tests are all corrected by FDR, and there is a significant difference, and the adj.p-value is less than 0.05.
  • the step 3) uses the missMethyl software package to analyze the degree of difference in the methylation rates of the two types of tumor samples at each methylation site, and define the significant adj.p-value after FDR correction. Values less than 0.05 are significant differentially methylated sites;
  • the methylation sites that are significantly different in their associated tumor types and whose methylation rates are in the same direction in more than half of the tumor types are retained.
  • Definition is the characteristic methylation site that is significantly associated with this indicator; the three characteristic methylation site sets are merged into the final screened characteristic methylation site set;
  • methylation sites related to tumor immune infiltration detected in public reports are added to the set of characteristic methylation sites as the final set of characteristic methylation sites.
  • step 4 adopting the features confirmed in public reports that are related to the efficacy of immunotherapy to indirectly define the effectiveness of immune checkpoint inhibitor therapy:
  • the defined cases in the patient cohort meeting the following conditions are effective cases of immune checkpoint inhibitor treatment: 1)
  • TMB tumor mutational burden
  • TGFB score 21050467 Public reports
  • the TGF- ⁇ -related immune score in (TGFB score 21050467) was lower than the median of all samples; by the above definition, the data set was divided into groups effective and ineffective against immune checkpoint inhibitors.
  • the present invention also provides a method for constructing a model for evaluating the therapeutic effectiveness of tumor immune checkpoint inhibitors based on DNA methylation profiles, characterized in that the method includes:
  • the methylation rate of the final set of characteristic methylation sites obtained by the above method is an independent variable, and the model training is performed according to the effectiveness of the immune checkpoint inhibitor defined in the above method as a dependent variable.
  • a support vector machine classifier can be used to construct an immune checkpoint inhibitor treatment effectiveness evaluation model, and the hyperparameters in the model can be selected by a cross-validation method;
  • random oversampling is used in the model training process to solve the serious class imbalance problem faced by the model;
  • the F_1 value (F_1) or the Matthews correlation coefficient (MCC) is used to measure the model prediction performance ;
  • the original queue is randomly divided into two subsets, in the former subset, training is performed according to the obtained hyperparameters, and its prediction performance is calculated on the latter subset.
  • the present invention also provides a method for evaluating the therapeutic efficacy of a tumor immune checkpoint inhibitor based on DNA methylation profiles, including the model construction method described above, and then evaluating the sample based on the model.
  • the present invention further provides a DNA methylation characteristic site screening system or the construction of a DNA methylation profile-based tumor immune checkpoint inhibitor treatment effectiveness evaluation model, which is characterized in that it includes the following modules:
  • Immune infiltration analysis module This module performs immune infiltration analysis on each tumor sample based on the detected DNA methylation profile data in a given tumor cohort containing samples of multiple tumor types. The relative infiltration content of immune cells by type, based on the immune cell infiltration content in each tumor type cohort sample, perform cluster analysis, and set the number of clusters to 2 to obtain samples of two types of immune cell infiltration patterns on each tumor. queue.
  • Tumor type screening module This module selects tumor types that are significantly associated with it based on the indirect evaluation indicators of the effectiveness of immune checkpoint inhibitor therapy;
  • Feature methylation site building module This module analyzes the difference in the methylation rates of the two types of tumor samples at each methylation site for the tumor types screened by the above indicators, and constructs a feature methylation site gather.
  • module 2) is to select tumor types that are significantly associated with them according to the indirect evaluation indicators of the therapeutic effectiveness of 3 immune checkpoint inhibitors;
  • the indirect evaluation indexes of the treatment effectiveness of the three immune checkpoint inhibitors are the prognosis survival time (OS) evaluation index, the tumor mutation burden (TMB) evaluation index and the PD-L1 expression level evaluation index;
  • the tumor types that are significantly associated with it are selected in the following manner: for the prognosis survival time (OS) evaluation index, use log rank test to screen the tumor types with significant difference in the prognosis survival time in the two types of samples; Tumor mutation burden (TMB) evaluation index, the Mann Whitney U test was used to screen the tumor types with significant differences in the mutation burden of the two types of samples; for the evaluation index of PD-L1 expression level, DESeq2 of R software was used The expression differences between the two types of samples were characterized, and the tumor types with significant differences in the expression level of PD-L1 gene were selected;
  • OS prognosis survival time
  • TMB Tumor mutation burden
  • DESeq2 of R software was used The expression differences between the two types of samples were characterized, and the tumor types with significant differences in the expression level of PD-L1 gene were selected;
  • the pvalues of the above tests are all corrected by FDR, and there is a significant difference, and the adj.p-value is less than 0.05.
  • the module 3) uses the missMethyl software package to analyze the degree of difference in the methylation rates of the two types of tumor samples at each methylation site, and defines the significant adj.p-value after FDR correction. Values less than 0.05 are significant differentially methylated sites;
  • the methylation sites that are significantly different in their associated tumor types and whose methylation rates are in the same direction in more than half of the tumor types are retained.
  • Definition is the characteristic methylation site that is significantly associated with this indicator; the three characteristic methylation site sets are merged into the final screened characteristic methylation site set;
  • methylation sites related to tumor immune infiltration detected in public reports are added to the set of characteristic methylation sites as the final set of characteristic methylation sites.
  • modules 4 which indirectly defines the effectiveness of immune checkpoint inhibitor therapy by adopting features confirmed in public reports that are related to the efficacy of immunotherapy:
  • the defined cases in the patient cohort meeting the following conditions are effective cases of immune checkpoint inhibitor treatment: 1)
  • TMB tumor mutational burden
  • TGFB score 21050467 Public reports
  • the TGF- ⁇ -related immune score in (TGFB score 21050467) was lower than the median of all samples; by the above definition, the data set was divided into groups effective and ineffective against immune checkpoint inhibitors.
  • the present invention also provides an apparatus, which is characterized by comprising: at least one memory for storing a program; and at least one processor for loading the program to execute the above method.
  • the present invention also provides a storage medium storing processor-executable instructions, wherein the processor-executable instructions are used to implement the above method when executed by the processor.
  • the present invention also provides an application of the above-mentioned detection device or storage medium in the construction of a model for evaluating the therapeutic efficacy of a tumor immune checkpoint inhibitor.
  • the DNA methylation signal used in the present invention is stable and has sufficient tumor type and cell type specificity. At the same time, compared with the existing biomarkers of the efficacy of immune checkpoint inhibitor therapy, the acquisition of DNA methylation profiles is convenient and low-cost. Its acquisition does not require the use of high-cost, long-term high-throughput sequencing technologies such as tumor mutation burden (TMB) and microsatellite instability (MSI), nor is it affected by the instability of RNA itself, unlike gene expression profiles. Troubled by problems such as RNA environmental pollution.
  • TMB tumor mutation burden
  • MSI microsatellite instability
  • the model of the present invention is constructed on a pan-cancer scale. Multiple tumor types were comprehensively considered in both feature selection and model construction, and model performance validation also showed high performance at both the pan-tumor level and across multiple individual tumor types.
  • the feature screening and model construction work of the present invention can be carried out in a tumor cohort that has not actually undergone immunotherapy but has abundant high-throughput data. Since the large-scale tumor cohort treated with immune checkpoint inhibitor is still relatively scarce at present, a large number of indicators closely related to immune checkpoint inhibitor treatment are used in the present invention to measure its effectiveness. This allows models to be built in cohorts that have not actually undergone immunotherapy, but have abundant high-throughput genomic, transcriptomic, and epigenetic data. It greatly expands the possible boundaries of immunotherapy marker screening.
  • an example is applied to the TCGA cohort, and a tumor immune checkpoint inhibitor treatment efficacy prediction model based on DNA methylation level is constructed, and the model has the same effect on the pan-tumor level.
  • Other models have similar predictive performance.
  • it is complementary to the model constructed based on gene expression profile, which provides the possibility to integrate other omics data to further improve the prediction performance of the model.
  • the tumor immune checkpoint inhibitor treatment effectiveness evaluation model constructed by using the TCGA pan-tumor cohort not only has a good prediction effect at the pan-tumor level, but also has a better prediction effect at the level of a single tumor type. High prediction performance.
  • the DNA methylation profile-based feature selection method can efficiently select DNA methylation signature sites that are closely related to the efficacy of immune checkpoint inhibitor therapy in the TCGA cohort;
  • Figure 3 The DNA methylation profiling-based immune checkpoint inhibitor treatment efficacy evaluation model has similar performance and complementarity with the gene expression profiling-based pan-tumor model on the TCGA cohort;
  • Figure 4 DNA methylation profiling-based model for evaluating the efficacy of immune checkpoint inhibitor therapy with high performance at the level of individual tumor types.
  • the terms “comprising”, “comprising”, “having”, “containing” or “involving” are inclusive or open ended and do not exclude other unrecited elements or method steps .
  • the term “consisting of” is considered to be a preferred embodiment of the term “comprising”. If in the following a group is defined as comprising at least a certain number of embodiments, this should also be understood to disclose a group which preferably consists only of these embodiments.
  • the terms “approximately” and “substantially” in the present invention represent an accuracy interval that can be understood by those skilled in the art and still can guarantee the technical effect of the feature in question.
  • the term generally means ⁇ 10%, preferably ⁇ 5%, of the indicated value.
  • the core of the patent of the present invention is to construct a method for constructing a model for evaluating the efficacy of tumor immune checkpoint inhibitor therapy based on DNA methylation profile data and having high prediction accuracy at the pan-tumor level and single tumor type level.
  • the present invention designs a set of methods for screening characteristic methylation sites of tumor immune checkpoint inhibitor treatment effectiveness based on the results of immune infiltration analysis using DNA methylation profile data, and based on the characteristic screening results and existing Known immunotherapy markers to construct a model for evaluating the effectiveness of tumor immune checkpoint inhibitors.
  • the present invention designs a set of screening methods for DNA methylation characteristic sites based on background knowledge.
  • the method is based on the fact stated in published reports that the analysis of immune infiltration based on DNA methylation profiles shows that tumor samples of most tumor types can be divided into two categories: high and low infiltrative levels. Evidence suggests that there are significant differences in the effectiveness of immune checkpoint inhibitor therapy between samples with these two types of infiltration levels in some tumor types. By detecting these tumor types and analyzing the differentially methylated sites of two types of samples, a collection of characteristic methylated sites can be obtained.
  • the present invention specifically provides a method for screening DNA methylation characteristic sites and a method for constructing a model for evaluating the therapeutic effectiveness of tumor immune checkpoint inhibitors based on DNA methylation profiles, comprising the following steps:
  • Step 1) in a given tumor cohort containing samples of multiple tumor types, perform immune infiltration analysis on each tumor sample based on the detected DNA methylation profile data, and calculate the relative proportion of each type of immune cells in each sample.
  • cluster analysis is performed based on the immune cell infiltration content in each tumor type cohort sample; preferably, the number of categories of the cluster is set to 2, to obtain samples of two types of immune cell infiltration patterns on each tumor. queue;
  • Step 2) according to the indirect evaluation index of the therapeutic effectiveness of the immune checkpoint inhibitor, select the tumor type that is significantly associated with it;
  • Step 3 analyzing the degree of difference in methylation rates at each methylation site between the sample cohorts of the two types of immune cell infiltration patterns for the tumor types screened by the above indicators, and constructing a set of characteristic methylation sites.
  • the step 2) is to select a tumor type that is significantly associated with the three types of immune checkpoint inhibitor therapy effectiveness indirect evaluation indicators, and the three immune checkpoint inhibitor therapy effectiveness
  • the indirect evaluation indexes are the evaluation index of prognosis survival time (OS), the evaluation index of tumor mutation burden (TMB) and the evaluation index of PD-L1 expression level; Time (OS) evaluation index, using log rank test to screen tumor types with significant difference in prognosis and survival time between two types of samples; for tumor mutational burden (TMB) evaluation index, using Mann Whitney U test (Mann Whitney U test) U test) to screen the tumor types with significant differences in the mutation load of the two types of samples; for the evaluation index of PD-L1 expression level, use the DESeq2 package of R software to characterize the expression differences between the two types of samples, and select the expression of PD-L1 gene.
  • the step 3) uses the missMethyl software package to analyze the degree of difference in the methylation rates of the two types of tumor samples at each methylation site, and define the significance after FDR correction adj.
  • p -value less than 0.05 is a significant differentially methylated site; preferably, in the screening results of each index, the remaining tumor types associated with it are significantly different, and in more than half of the tumor types, its methylation sites are significantly different.
  • the methylation sites with the same direction of difference in the basement rate are defined as the characteristic methylation sites that are significantly associated with this index; the three characteristic methylation site sets are combined into the final screened characteristic methylation sites Collection; more preferably, methylation sites related to tumor immune infiltration detected in public reports are added to the collection of characteristic methylation sites as the final collection of characteristic methylation sites.
  • the method further includes: step 4), using features confirmed in public reports to correlate with the efficacy of immunotherapy to indirectly define the effectiveness of immune checkpoint inhibitor therapy :
  • the defined case in the patient cohort that simultaneously meets the following conditions is an effective case for immune checkpoint inhibitor therapy: 1)
  • the tumor mutational burden (TMB) value is higher than the upper quartile value of all samples; 2) Public
  • the reported TGF- ⁇ -related immune score (TGFB score 21050467) was lower than the median of all samples.
  • the dataset is divided into effective and ineffective groups for immune checkpoint inhibitors, and this definition allows the model to be built on tumor cohorts that have not actually been treated with immune checkpoint inhibitors, but have rich multi-omics data. above.
  • the final model is an independent variable based on the methylation rate of the final set of characteristic methylation sites, and the effectiveness of the immune checkpoint inhibitor defined above is the dependent variable for model training, and a support vector machine classifier (SVM) is used for model training.
  • SVM support vector machine classifier
  • An immune checkpoint inhibitor treatment efficacy evaluation model was constructed, and hyperparameters in the model were selected by using a 5-fold cross validation method. Random oversampling is used in the model training process to solve the severe class imbalance problem faced by the model. Model prediction performance was measured using F 1 value (F 1 ) and Matthews correlation coefficient (MCC). After the model hyperparameters are obtained through training, the original queue is randomly divided into two subsets at a ratio of 8:2. In the former subset, training is performed according to the hyperparameters obtained above, and its prediction performance is calculated on the latter subset. This randomized model evaluation process was repeated 100 times to obtain a comprehensive evaluation of the performance of the model building method.
  • the patented method of the present invention performs modeling on a pan-tumor cohort consisting of 6,381 samples from 22 tumor types from TCGA according to the above-mentioned process, and finally identified 3,143 methylation sites.
  • the patent of the present invention designs a method for screening characteristic methylation sites of tumor immune checkpoint inhibitor treatment effectiveness based on the results of immune infiltration analysis using DNA methylation profile data, and based on the characteristic screening results and known immune Therapeutic markers to construct a model for evaluating the effectiveness of tumor immune checkpoint inhibitors.
  • DNA methylation chips can obtain methylation level data for a large number of sites at one time, the number of sites far exceeds the number of samples that can be carried by clinical cohorts, and there is often significant collinearity between methylation levels at different sites. . These characteristics make most existing data-driven model selection methods inefficient. Therefore, in this example, a set of screening methods for DNA methylation characteristic sites based on background knowledge is designed.
  • the method is based on the fact stated in published reports that the analysis of immune infiltration based on DNA methylation profiles shows that tumor samples of most tumor types can be divided into two categories: high and low infiltrative levels. Evidence suggests that there are significant differences in the effectiveness of immune checkpoint inhibitor therapy between samples with these two types of infiltration levels in some tumor types. By detecting these tumor types and analyzing the differentially methylated sites of two types of samples, a collection of characteristic methylated sites can be obtained.
  • the patent of the present invention uses a series of relatively easy-to-obtain and evidence-supported biomarkers to indirectly evaluate the efficacy of tumor immune checkpoint inhibitor therapy.
  • OS prognosis overall survival time
  • TMB tumor mutation burden
  • PD-L1 programmed death ligand 1
  • immune infiltration analysis was performed on each tumor sample based on the detected DNA methylation profile data, and the number of immune cells of each type in each sample was calculated.
  • cluster analysis was performed based on the immune cell infiltration content in each tumor type cohort sample, and the number of clusters was set to 2 to obtain two types of immune cell infiltration patterns on each tumor.
  • the tumor types that are significantly associated with them are selected as follows: test) to screen the tumor types with significantly different prognosis and survival time in the two types of samples; for the TMB evaluation index, the Mann Whitney U test was used to screen the tumor types with significant differences in the mutation load of the two types of samples; for PD -L1 expression level evaluation index, use the DESeq2 package of R software to describe the expression difference between the two types of samples, and select tumor types with significant differences in the expression level of PD-L1 gene.
  • the pvalues of the above tests were all corrected by FDR, and there were significant differences (adj.p-value less than 0.05).
  • the third step is to use the missMethyl software package for the tumor types screened by the above three indicators to analyze the degree of difference in the methylation rates of the two types of tumor samples at each methylation site, and define the significant difference after FDR correction. Sex adj.p-value less than 0.05 was considered as a significant differentially methylated site.
  • the methylation sites that were significantly different in their associated tumor types and whose methylation rates were in the same direction in more than half of the tumor types were retained, and were defined as Indicators are significantly associated with characteristic methylation sites.
  • the three feature methylation site sets were combined into the final screened feature methylation site set. Finally, on this basis, the methylation sites detected in public reports related to tumor immune infiltration were added as the final set of characteristic methylation sites.
  • TMB tumor mutational burden
  • TGF- ⁇ -related immune score (TGFB score 21050467) in public reports is lower than the median value of all samples.
  • the dataset is divided into effective and ineffective groups for immune checkpoint inhibitors, and this definition allows the model to be built on tumor cohorts that have not actually been treated with immune checkpoint inhibitors, but have rich multi-omics data. above.
  • the final model of the above screening is an independent variable according to the methylation rate of the final characteristic methylation site set, and the effectiveness of the immune checkpoint inhibitor defined above is the dependent variable for model training, using a support vector machine classifier (support vector machine classifier).
  • SVM support vector machine classifier
  • Random oversampling is used in the model training process to solve the severe class imbalance problem faced by the model.
  • Model prediction performance was measured using F 1 value (F 1 ) and Matthews correlation coefficient (MCC).
  • F 1 F 1
  • MCC Matthews correlation coefficient
  • the present patent applies the above feature screening and model building methods to a pan-tumor cohort from TCGA consisting of 22 tumor types and 6381 patients. And demonstrated its superiority on this cohort (Supplementary Table 1 lists all tumor types and the number of samples included in each tumor type).
  • ⁇ value The methylation rate ( ⁇ value) of about 480,000 sites measured by the above-mentioned cohort Illumina Infinium HumanMethylation450BeadChip, the gene expression profile measured by RNA-seq, the somatic mutation map obtained by genome sequencing, and the time-to-live data. Tumor mutational burden and TGF- ⁇ score were calculated for each sample according to commonly used definitions.
  • This feature set can efficiently discriminate between immune checkpoint inhibitor-responsive and ineffective samples. This is illustrated from the following aspects:
  • the final feature set contains more two groups of samples that are effective in comparing immune checkpoint inhibitor therapy Methylation sites exhibiting differences in methylation levels.
  • the prediction model constructed based on the methylation levels of the methylation sites included in the selected final feature set can discriminate immune checkpoint inhibitor-responsive and ineffective samples at the pan-tumor level with high accuracy and sensitivity. This can be demonstrated by the following points:
  • the patent of the present invention tested the effect of a series of common machine learning prediction models, including logistic regression (LR) support vector classification with L 1 regular term A support vector machine classifier (SVM), a random forest classifier (RF) and a k nearest neighbor classifier (KNN).
  • LR logistic regression
  • SVM support vector machine classifier
  • RF random forest classifier
  • KNN k nearest neighbor classifier
  • the SVM model is significantly better than other models regardless of whether the model performance is measured by F1 value or MCC ( Fig . 2A).
  • the predictive performance of all machine learning models was significantly higher than the background model formed by reporting a class of samples with a higher average tumor mutational burden in the clustering results in each tumor type as being effective on immune checkpoint inhibitor therapy (appendix).
  • Figure 2A ).
  • the prediction accuracy of the DNA methylation-based immune checkpoint inhibitor treatment efficacy evaluation model is similar to that of the gene expression profiling-based model at the pan-tumor level and the two are complementary. This can be demonstrated by the following aspects:
  • Models based on DNA methylation have similar predictive performance to models based on gene expression profiles.
  • the model based on gene expression profile here is constructed by SVM method based on the expression level (log 2 (FPKM+1)) of tumor immunity-related genes according to published reports. Different from public reports, the random oversampling method is also used to solve the class imbalance problem during model training.
  • the predictive performance of the predictive model thus constructed was significantly higher than that of the publicly reported model (in 100 randomized evaluations, the average MCC score reached 0.463, compared to only 0.296 in the published report). In 100 randomization evaluations, the model based on DNA methylation significantly outperformed the model based on gene expression profiles if the F1 value was used to measure the predictive performance of the model ( Fig .
  • DNA methylation-based tumor immune checkpoint inhibitor treatment efficacy evaluation model constructed in the TCGA pan-tumor cohort according to the method listed in the patent of the present invention was constructed based on the gene expression profile at the pan-tumor level and published reports. have similar prediction accuracy.
  • the immune checkpoint inhibitor treatment efficacy evaluation model constructed on the TCGA cohort based on the method listed in the patent of the present invention also has high prediction accuracy at the level of a single tumor type. This conclusion was demonstrated by examining the predictive accuracy of the model on tumor types (10 in total) in which more than 5% of the samples in the TCGA cohort were marked as responding to immune checkpoint inhibitor therapy.
  • the prediction model based on the final feature set confirmed in public reports is closely related to immune escape and has Higher predictive accuracy among tumor types likely to benefit from immune checkpoint inhibitor therapy.
  • the model based on the final feature set consistently had significantly higher prediction accuracy on 5 of the 10 tumor types described above (with paired The sample t-test measures the difference in the prediction accuracy of the models, and takes the threshold of significance level as 0.1).
  • the prediction accuracy of the model based on the final feature set was not significantly weaker than the model based on the random control feature set (Fig. 4B).

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Organic Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Immunology (AREA)
  • Evolutionary Biology (AREA)
  • Pathology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Zoology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Hospice & Palliative Care (AREA)
  • Epidemiology (AREA)
  • Oncology (AREA)
  • Data Mining & Analysis (AREA)
  • Microbiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

Provided are a DNA methylation feature site screening method and a construction method for a tumor immune checkpoint inhibitor therapy effectiveness evaluation model based on a DNA methylation spectrum.

Description

基于DNA甲基化谱的肿瘤免疫检查点抑制剂治疗有效性评估模型的构建方法Construction of a model for evaluating the efficacy of tumor immune checkpoint inhibitor therapy based on DNA methylation profiles
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求于2021年01月04日提交中国专利局的申请号为CN202110005009.1、名称为“基于DNA甲基化谱的肿瘤免疫检查点抑制剂治疗有效性评估模型的构建方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires a Chinese patent with the application number CN202110005009.1 and the title of "Constructing a Model for Evaluating Therapeutic Effectiveness of Tumor Immune Checkpoint Inhibitors Based on DNA Methylation Profiles" filed with the China Patent Office on January 4, 2021 priority to the application, the entire contents of which are incorporated herein by reference.
技术领域technical field
本发明涉及生物信息学分析领域,特别是涉及一种基于DNA甲基化谱的肿瘤免疫检查点抑制剂治疗有效性评估模型的构建方法。The invention relates to the field of bioinformatics analysis, in particular to a method for constructing a model for evaluating the efficacy of tumor immune checkpoint inhibitor therapy based on DNA methylation profiles.
背景技术Background technique
近年来,以免疫检查点抑制剂治疗为代表的一系列肿瘤免疫治疗技术被引入至多种肿瘤类型的治疗过程中,尤其在在黑色素瘤、肾细胞癌、非小细胞肺癌、头颈部癌、尿路上皮癌、霍奇金淋巴瘤、麦克尔细胞癌等众多晚期恶性肿瘤的治疗中具有显著疗效,极大地提高了部分晚期肿瘤患者的生存时间和生活质量。这些肿瘤免疫疗法不直接针对于肿瘤细胞,而是通过促进人体的免疫反应和提高免疫细胞识别能力来对抗并清除肿瘤细胞以达到治疗目的。In recent years, a series of tumor immunotherapy technologies represented by immune checkpoint inhibitor therapy have been introduced into the treatment of various tumor types, especially in melanoma, renal cell carcinoma, non-small cell lung cancer, head and neck cancer, Urothelial carcinoma, Hodgkin's lymphoma, Michael cell carcinoma and many other advanced malignant tumors have significant curative effect, which greatly improves the survival time and quality of life of some patients with advanced tumors. These tumor immunotherapies do not directly target tumor cells, but fight and eliminate tumor cells by promoting the body's immune response and improving the recognition ability of immune cells to achieve therapeutic purposes.
但临床统计结果显示,免疫检查点抑制剂治疗只在部分患者中有效,仍有许多患者无法通过免疫检查点抑制剂治疗获益。甚至有部分患者因自身非特异性免疫反应过度加强而导致出现一系列包括皮肤、胃肠、肝脏、内分泌在内的免疫相关不良事件(immune-related adverse event,irAE)的副作用,有时甚至危及生命。因此,识别能对免疫检查点抑制剂做出良好反应的患者病例是提高其治疗有效性,安全性以及拓展其应用边界的重要手段。However, clinical statistics show that immune checkpoint inhibitor therapy is only effective in some patients, and there are still many patients who cannot benefit from immune checkpoint inhibitor therapy. Some patients even have a series of immune-related adverse events (irAEs) including skin, gastrointestinal, liver, and endocrine side effects due to excessive enhancement of their own non-specific immune responses, and sometimes even life-threatening. Therefore, identifying patient cases who can respond well to immune checkpoint inhibitors is an important means to improve their therapeutic efficacy, safety, and to expand the boundaries of their application.
目前,研究者已经发现多种免疫检查点抑制剂有效性评估的生物标志物,其中包括肿瘤突变负荷(tumor mutation burden,TMB)、新抗原(neoantigens)、微卫星不稳定性(microsetellate instability,MSI)、程序性死亡受体配体1(PD-L1)表达水平和肿瘤免疫浸润程度等。但以上标志物的检测普遍需要通过手术的侵入性采样,难以高效追踪。并且具有肿瘤类型特异性,需要对各肿瘤类型进行单独设计评估模型和模型验 证。而泛肿瘤水平上的免疫检查点抑制剂治疗有效性评估模型普遍以基因表达谱数据为基础,但RNA提取含量低和RNA自身的不稳定性使得肿瘤组织的基因表达谱检测存在诸多限制。因此,现有技术中亟求一种基于稳定且便于检测的生物标志物来开发在泛肿瘤水平上的免疫检查点抑制剂治疗有效性评估的方法。At present, researchers have discovered a variety of biomarkers for evaluating the effectiveness of immune checkpoint inhibitors, including tumor mutation burden (TMB), neoantigens (neoantigens), and microsatellite instability (MSI). ), the expression level of programmed death receptor ligand 1 (PD-L1) and the degree of tumor immune infiltration. However, the detection of the above markers generally requires invasive sampling through surgery, which is difficult to track efficiently. And it is tumor-type specific, requiring separate design evaluation models and model validation for each tumor type. The pan-tumor level of immune checkpoint inhibitor treatment efficacy evaluation models are generally based on gene expression profiling data, but the low content of RNA extraction and the instability of RNA itself make the detection of gene expression profiling in tumor tissue many limitations. Therefore, there is an urgent need in the prior art to develop a method for evaluating the efficacy of immune checkpoint inhibitor therapy at the pan-tumor level based on stable and easily detectable biomarkers.
鉴于此,提出本发明。In view of this, the present invention is proposed.
发明内容SUMMARY OF THE INVENTION
本发明的目的是寻求一种基于稳定且便于检测的生物标志物来开发在泛肿瘤水平上的免疫检查点抑制剂治疗有效性的评估方法或模型。为实现上述目的,本发明提供如下技术方案:The purpose of the present invention is to seek a method or model for the development of an assessment method or model of the therapeutic efficacy of immune checkpoint inhibitors at the pan-tumor level based on stable and easily detectable biomarkers. To achieve the above object, the present invention provides the following technical solutions:
本发明首先提供了一种DNA甲基化特征位点筛选方法,其特征在于,包括如下步骤:The present invention first provides a method for screening DNA methylation characteristic sites, which is characterized in that it comprises the following steps:
步骤1),在给定的包含多个肿瘤类型样本的肿瘤队列中,基于检测的DNA甲基化谱数据对每个肿瘤样本进行免疫浸润分析,计算得到每个样本中各类型免疫细胞的相对浸润含量,基于各个肿瘤类型队列样本中的免疫细胞浸润含量进行聚类分析,聚类的类别数设置为2,得到在每个癌肿上的两类免疫细胞浸润模式的样本队列;Step 1), in a given tumor cohort containing samples of multiple tumor types, perform immune infiltration analysis on each tumor sample based on the detected DNA methylation profile data, and calculate the relative proportion of each type of immune cells in each sample. Infiltration content: Cluster analysis is performed based on the immune cell infiltration content in each tumor type cohort sample, and the number of clusters is set to 2 to obtain sample cohorts of two types of immune cell infiltration patterns on each tumor;
步骤2),根据免疫检查点抑制剂治疗有效性间接评估指标,选出与之有显著关联的肿瘤类型;Step 2), according to the indirect evaluation index of the therapeutic effectiveness of the immune checkpoint inhibitor, select the tumor type that is significantly associated with it;
步骤3),对上述指标所筛选出的肿瘤类型分析两类免疫细胞浸润模式的样本队列在各甲基化位点上甲基化率的差异程度,构建特征甲基化位点集合。Step 3), analyzing the degree of difference in methylation rates at each methylation site between the sample cohorts of the two types of immune cell infiltration patterns for the tumor types screened by the above indicators, and constructing a set of characteristic methylation sites.
近一步的,所述步骤2)是根据3种免疫检查点抑制剂治疗有效性间接评估指标,选出与之有显著关联的肿瘤类型;Further, the step 2) is to select the tumor type that is significantly associated with it according to the indirect evaluation indicators of the therapeutic effectiveness of the three kinds of immune checkpoint inhibitors;
更进一步的,所述3种免疫检查点抑制剂治疗有效性间接评估指标为预后生存时间(OS)评估指标、肿瘤突变负荷(TMB)评估指标和PD-L1表达水平评估指标;Further, the indirect evaluation indexes of the treatment effectiveness of the three immune checkpoint inhibitors are the prognosis survival time (OS) evaluation index, the tumor mutation burden (TMB) evaluation index and the PD-L1 expression level evaluation index;
优选的,按照如下方式选出与之有显著关联的肿瘤类型:针对预后生存时间(OS)评估指标,使用时序检验(log rank test)筛选两类样本中预后生存时间显著差异的肿瘤类型;针对肿瘤突变负荷(TMB)评估指标,用曼-惠特尼U检验(Mann Whitney U test)筛选两类样本突变负荷存在显著差异的肿瘤类型;针对PD-L1表达水平评估指标,使用R软件的DESeq2包在两类样本之间的表达差异进行刻画,选取PD-L1基因的表达水平存在显著差异的肿瘤类型;Preferably, the tumor types that are significantly associated with it are selected in the following manner: for the prognosis survival time (OS) evaluation index, use log rank test to screen the tumor types with significant difference in the prognosis survival time in the two types of samples; Tumor mutation burden (TMB) evaluation index, the Mann Whitney U test was used to screen the tumor types with significant differences in the mutation burden of the two types of samples; for the evaluation index of PD-L1 expression level, DESeq2 of R software was used The expression differences between the two types of samples were characterized, and the tumor types with significant differences in the expression level of PD-L1 gene were selected;
更优选的,上述检验的pvalue均经过FDR校正,并存在显著性差异,adj.p-value小于0.05。More preferably, the pvalues of the above tests are all corrected by FDR, and there is a significant difference, and the adj.p-value is less than 0.05.
近一步的,所述步骤3)分别使用missMethyl软件包来分析两类肿瘤样本在各甲基化位点上甲基化率的差异程度,并定义经FDR校正后的显著性adj.p-value值小于0.05为显著的差异甲基化位点;Further, the step 3) uses the missMethyl software package to analyze the degree of difference in the methylation rates of the two types of tumor samples at each methylation site, and define the significant adj.p-value after FDR correction. Values less than 0.05 are significant differentially methylated sites;
优选的,在每种指标的筛选结果中,保留在与其关联的肿瘤类型上均为显著性差异且在一半以上的肿瘤类型中其甲基化率的差异方向一致的甲基化位点,定义为与该指标显著关联的特征甲基化位点;将三个特征甲基化位点集合合并为最终筛选的特征甲基化位点集合;Preferably, in the screening results of each index, the methylation sites that are significantly different in their associated tumor types and whose methylation rates are in the same direction in more than half of the tumor types are retained. Definition is the characteristic methylation site that is significantly associated with this indicator; the three characteristic methylation site sets are merged into the final screened characteristic methylation site set;
更优选的,在特征甲基化位点集合基础上加入公开报道中检测到的与肿瘤免疫浸润相关的甲基化位点作为最终特征甲基化位点集合。More preferably, methylation sites related to tumor immune infiltration detected in public reports are added to the set of characteristic methylation sites as the final set of characteristic methylation sites.
进一步的,上述方法还包括:步骤4),采用公开报道中证实与免疫治疗疗效相关的特征来间接定义免疫检查点抑制剂治疗的有效性:Further, the above method also includes: step 4), adopting the features confirmed in public reports that are related to the efficacy of immunotherapy to indirectly define the effectiveness of immune checkpoint inhibitor therapy:
优选的,所述定义在患者队列中同时满足如下条件病例为免疫检查点抑制剂治疗有效病例:1)肿瘤突变负荷(TMB)值高于所有样本的上四分位点值;2)公开报道中的TGF-β相关免疫评分(TGFB score 21050467)低于所有样本中位数值;通过上述定义,将数据集分为对免疫检查点抑制剂有效和无效组。Preferably, the defined cases in the patient cohort meeting the following conditions are effective cases of immune checkpoint inhibitor treatment: 1) The tumor mutational burden (TMB) value is higher than the upper quartile value of all samples; 2) Public reports The TGF-β-related immune score in (TGFB score 21050467) was lower than the median of all samples; by the above definition, the data set was divided into groups effective and ineffective against immune checkpoint inhibitors.
本发明还提供一种基于DNA甲基化谱的肿瘤免疫检查点抑制剂治疗有效性评估模型的构建方法,其特征在于,所述方法包括:The present invention also provides a method for constructing a model for evaluating the therapeutic effectiveness of tumor immune checkpoint inhibitors based on DNA methylation profiles, characterized in that the method includes:
上述方法获得的最终特征甲基化位点集合的甲基化率为自变量,根据上述方法中定义的免疫检查点抑制剂有效性为因变量,进行模型训练。The methylation rate of the final set of characteristic methylation sites obtained by the above method is an independent variable, and the model training is performed according to the effectiveness of the immune checkpoint inhibitor defined in the above method as a dependent variable.
进一步的,可以使用支持向量分类器(support vector machine classifier,SVM)构建免疫检查点抑制剂治疗有效性评估模型,通过用交叉验证(cross validation)方法选择模型中的超参数;Further, a support vector machine classifier (SVM) can be used to construct an immune checkpoint inhibitor treatment effectiveness evaluation model, and the hyperparameters in the model can be selected by a cross-validation method;
优选的,在模型训练过程中使用随机过抽样(random oversampling)解决模型面临的严重类不均衡(class imbalance)问题;使用F_1值(F_1)或马修斯相关系数(MCC)来衡量模型预测性能;Preferably, random oversampling is used in the model training process to solve the serious class imbalance problem faced by the model; the F_1 value (F_1) or the Matthews correlation coefficient (MCC) is used to measure the model prediction performance ;
更优选的,通过训练得到模型超参数后,将原始队列随机划分两个子集,在前一个子集中,根据得到的超参数进行训练并在后一个子集上计算其预测性能。More preferably, after the model hyperparameters are obtained through training, the original queue is randomly divided into two subsets, in the former subset, training is performed according to the obtained hyperparameters, and its prediction performance is calculated on the latter subset.
本发明还提供一种基于DNA甲基化谱的肿瘤免疫检查点抑制剂治疗有效性评估方法,包括如上任一所述的模型构建方法,随后基于模型对样本进行评估。The present invention also provides a method for evaluating the therapeutic efficacy of a tumor immune checkpoint inhibitor based on DNA methylation profiles, including the model construction method described above, and then evaluating the sample based on the model.
本发明进一步提供一种DNA甲基化特征位点筛选***或基于DNA甲基化谱的肿瘤免疫检查点抑制剂治疗有效性评估模型的构建,其特征在于,包括如下模块:The present invention further provides a DNA methylation characteristic site screening system or the construction of a DNA methylation profile-based tumor immune checkpoint inhibitor treatment effectiveness evaluation model, which is characterized in that it includes the following modules:
1)免疫浸润分析模块:该模块在给定的包含多个肿瘤类型样本的肿瘤队列中,基于检测的DNA甲基化谱数据对每个肿瘤样本进行免疫浸润分析,计算得到每个样本中各类型免疫细胞的相对浸润含量,基于各个肿瘤类型队列样本中的免疫细胞浸润含量进行聚类分析,聚类的类别数设置为2,得到在每个癌肿上的两类免疫细胞浸润模式的样本队列。1) Immune infiltration analysis module: This module performs immune infiltration analysis on each tumor sample based on the detected DNA methylation profile data in a given tumor cohort containing samples of multiple tumor types. The relative infiltration content of immune cells by type, based on the immune cell infiltration content in each tumor type cohort sample, perform cluster analysis, and set the number of clusters to 2 to obtain samples of two types of immune cell infiltration patterns on each tumor. queue.
2)肿瘤类型筛选模块:该模块根据免疫检查点抑制剂治疗有效性间接评估指标,选出与之有显著关联的肿瘤类型;2) Tumor type screening module: This module selects tumor types that are significantly associated with it based on the indirect evaluation indicators of the effectiveness of immune checkpoint inhibitor therapy;
3)特征甲基化位点构建模块:该模块对上述指标所筛选出的肿瘤类型分析两类肿瘤样本在各甲基化位点上甲基化率的差异程度,构建特征甲基化位点集合。3) Feature methylation site building module: This module analyzes the difference in the methylation rates of the two types of tumor samples at each methylation site for the tumor types screened by the above indicators, and constructs a feature methylation site gather.
近一步的,所述模块2)是根据3种免疫检查点抑制剂治疗有效性间接评估指标,选出与之有显著关联的肿瘤类型;Further, the module 2) is to select tumor types that are significantly associated with them according to the indirect evaluation indicators of the therapeutic effectiveness of 3 immune checkpoint inhibitors;
更进一步的,所述3种免疫检查点抑制剂治疗有效性间接评估指标为预后生存时间(OS)评估指标、肿瘤突变负荷(TMB)评估指标和PD-L1表达水平评估指标;Further, the indirect evaluation indexes of the treatment effectiveness of the three immune checkpoint inhibitors are the prognosis survival time (OS) evaluation index, the tumor mutation burden (TMB) evaluation index and the PD-L1 expression level evaluation index;
优选的,按照如下方式选出与之有显著关联的肿瘤类型:针对预后生存时间(OS)评估指标,使用时序检验(log rank test)筛选两类样本中预后生存时间显著差异的肿瘤类型;针对肿瘤突变负荷(TMB)评估指标,用曼-惠特尼U检验(Mann Whitney U test)筛选两类样本突变负荷存在显著差异的肿瘤类型;针对PD-L1表达水平评估指标,使用R软件的DESeq2包在两类样本之间的表达差异进行刻画,选取PD-L1基因的表达水平存在显著差异的肿瘤类型;Preferably, the tumor types that are significantly associated with it are selected in the following manner: for the prognosis survival time (OS) evaluation index, use log rank test to screen the tumor types with significant difference in the prognosis survival time in the two types of samples; Tumor mutation burden (TMB) evaluation index, the Mann Whitney U test was used to screen the tumor types with significant differences in the mutation burden of the two types of samples; for the evaluation index of PD-L1 expression level, DESeq2 of R software was used The expression differences between the two types of samples were characterized, and the tumor types with significant differences in the expression level of PD-L1 gene were selected;
更优选的,上述检验的pvalue均经过FDR校正,并存在显著性差异,adj.p-value小于0.05。More preferably, the pvalues of the above tests are all corrected by FDR, and there is a significant difference, and the adj.p-value is less than 0.05.
近一步的,所述模块3)分别使用missMethyl软件包来分析两类肿瘤样本在各甲基化位点上甲基化率的差异程度,并定义经FDR校正后的显著性adj.p-value值小于0.05为显著的差异甲基化位点;Further, the module 3) uses the missMethyl software package to analyze the degree of difference in the methylation rates of the two types of tumor samples at each methylation site, and defines the significant adj.p-value after FDR correction. Values less than 0.05 are significant differentially methylated sites;
优选的,在每种指标的筛选结果中,保留在与其关联的肿瘤类型上均为显著性差异且在一半以上的肿瘤类型中其甲基化率的差异方向一致的甲基化位点,定义为与该 指标显著关联的特征甲基化位点;将三个特征甲基化位点集合合并为最终筛选的特征甲基化位点集合;Preferably, in the screening results of each index, the methylation sites that are significantly different in their associated tumor types and whose methylation rates are in the same direction in more than half of the tumor types are retained. Definition is the characteristic methylation site that is significantly associated with this indicator; the three characteristic methylation site sets are merged into the final screened characteristic methylation site set;
更优选的,在特征甲基化位点集合基础上加入公开报道中检测到的与肿瘤免疫浸润相关的甲基化位点作为最终特征甲基化位点集合。More preferably, methylation sites related to tumor immune infiltration detected in public reports are added to the set of characteristic methylation sites as the final set of characteristic methylation sites.
进一步的,上述模块还包括:模块4),该模块通过采用公开报道中证实与免疫治疗疗效相关的特征来间接定义免疫检查点抑制剂治疗的有效性:Further, the above-mentioned modules also include: module 4), which indirectly defines the effectiveness of immune checkpoint inhibitor therapy by adopting features confirmed in public reports that are related to the efficacy of immunotherapy:
优选的,所述定义在患者队列中同时满足如下条件病例为免疫检查点抑制剂治疗有效病例:1)肿瘤突变负荷(TMB)值高于所有样本的上四分位点值;2)公开报道中的TGF-β相关免疫评分(TGFB score 21050467)低于所有样本中位数值;通过上述定义,将数据集分为对免疫检查点抑制剂有效和无效组。Preferably, the defined cases in the patient cohort meeting the following conditions are effective cases of immune checkpoint inhibitor treatment: 1) The tumor mutational burden (TMB) value is higher than the upper quartile value of all samples; 2) Public reports The TGF-β-related immune score in (TGFB score 21050467) was lower than the median of all samples; by the above definition, the data set was divided into groups effective and ineffective against immune checkpoint inhibitors.
本发明还提供一种装置,其特征在于,包括:至少一个存储器,用于存储程序;至少一个处理器,用于加载所述程序以执行上述方法。The present invention also provides an apparatus, which is characterized by comprising: at least one memory for storing a program; and at least one processor for loading the program to execute the above method.
本发明还提供一种存储介质,其中存储有处理器可执行的指令,其特征在于,所述处理器可执行的指令在由处理器执行时用于实现上述方法。The present invention also provides a storage medium storing processor-executable instructions, wherein the processor-executable instructions are used to implement the above method when executed by the processor.
本发明还提供一种上述检测装置或存储介质在肿瘤免疫检查点抑制剂治疗有效性评估模型构建中的应用。The present invention also provides an application of the above-mentioned detection device or storage medium in the construction of a model for evaluating the therapeutic efficacy of a tumor immune checkpoint inhibitor.
本发明有益的技术效果:Beneficial technical effects of the present invention:
1)本发明所使用的DNA甲基化信号是稳定且具有足够的肿瘤类型和细胞类型特异性。同时,相对于已有免疫检查点抑制剂治疗有效性生物标志物,DNA甲基化谱的获取是便捷且低成本的。其获取既不需要如肿瘤突变负荷(TMB),微卫星不稳定性(MSI)等借助高成本,长周期的高通量测序技术,也不同于基因表达谱那样受到RNA自身的不稳定性及RNA环境污染等问题的困扰。1) The DNA methylation signal used in the present invention is stable and has sufficient tumor type and cell type specificity. At the same time, compared with the existing biomarkers of the efficacy of immune checkpoint inhibitor therapy, the acquisition of DNA methylation profiles is convenient and low-cost. Its acquisition does not require the use of high-cost, long-term high-throughput sequencing technologies such as tumor mutation burden (TMB) and microsatellite instability (MSI), nor is it affected by the instability of RNA itself, unlike gene expression profiles. Troubled by problems such as RNA environmental pollution.
2)本发明的模型是在泛肿瘤(pan-cancer)尺度上构建的。特征选择和模型构建过程中均综合考虑了多个肿瘤类型,模型性能验证也显示,在泛肿瘤水平和多个单独的肿瘤类型中,模型均具有较高性能。2) The model of the present invention is constructed on a pan-cancer scale. Multiple tumor types were comprehensively considered in both feature selection and model construction, and model performance validation also showed high performance at both the pan-tumor level and across multiple individual tumor types.
3)本发明的特征筛选和模型构建工作可以在未实际经过免疫治疗但具有丰富高通量数据的肿瘤队列中开展。由于目前经过免疫检查点抑制剂治疗的大规模肿瘤队列尚比较匮乏,本发明中采用了大量与免疫检查点抑制剂治疗密切相关的指 标衡量其有效性。这使得模型可以在未实际经过免疫治疗,但具有丰富的高通量基因组,转录组和表观遗传学数据的队列中建立。极大地拓展了免疫治疗标志物筛选的可能边界。3) The feature screening and model construction work of the present invention can be carried out in a tumor cohort that has not actually undergone immunotherapy but has abundant high-throughput data. Since the large-scale tumor cohort treated with immune checkpoint inhibitor is still relatively scarce at present, a large number of indicators closely related to immune checkpoint inhibitor treatment are used in the present invention to measure its effectiveness. This allows models to be built in cohorts that have not actually undergone immunotherapy, but have abundant high-throughput genomic, transcriptomic, and epigenetic data. It greatly expands the possible boundaries of immunotherapy marker screening.
4)根据本发明特征筛选和模型构建方法,实例应用于TCGA队列之上,构建了基于DNA甲基化水平的肿瘤免疫检查点抑制剂治疗疗效预测模型,并且该模型在泛肿瘤水平上有与其他模型具有相似的预测性能。同时与基于基因表达谱构建的模型具有互补性,为整合其他组学数据以进一步提升模型预测效能提供了可能。4) According to the feature screening and model construction method of the present invention, an example is applied to the TCGA cohort, and a tumor immune checkpoint inhibitor treatment efficacy prediction model based on DNA methylation level is constructed, and the model has the same effect on the pan-tumor level. Other models have similar predictive performance. At the same time, it is complementary to the model constructed based on gene expression profile, which provides the possibility to integrate other omics data to further improve the prediction performance of the model.
5)基于本发明专利中的方法,利用TCGA泛肿瘤队列构建的肿瘤免疫检查点抑制剂治疗有效性评估模型不仅在泛肿瘤水平上有良好的预测效果,且在单个肿瘤类型水平上均具有较高的预测性能。5) Based on the method in the patent of the present invention, the tumor immune checkpoint inhibitor treatment effectiveness evaluation model constructed by using the TCGA pan-tumor cohort not only has a good prediction effect at the pan-tumor level, but also has a better prediction effect at the level of a single tumor type. High prediction performance.
附图说明Description of drawings
为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the specific embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the specific embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without creative efforts.
图1:基于DNA甲基化谱的特征选择方法在TCGA队列中能高效选出与免疫检查点抑制剂治疗有效性密切相关的DNA甲基化特征位点;Figure 1: The DNA methylation profile-based feature selection method can efficiently select DNA methylation signature sites that are closely related to the efficacy of immune checkpoint inhibitor therapy in the TCGA cohort;
图2:基于本发明专利描述的模型构建方法在TCGA队列中构建的免疫检查点抑制剂治疗有效性评估模型在泛肿瘤水平上具有良好性能;Figure 2: Based on the model construction method described in the patent of the present invention, the immune checkpoint inhibitor treatment effectiveness evaluation model constructed in the TCGA cohort has good performance at the pan-tumor level;
图3:基于DNA甲基化谱的免疫检查点抑制剂治疗有效性评估模型在TCGA队列上与基于基因表达谱的泛肿瘤模型有相似的性能且二者具有互补性;Figure 3: The DNA methylation profiling-based immune checkpoint inhibitor treatment efficacy evaluation model has similar performance and complementarity with the gene expression profiling-based pan-tumor model on the TCGA cohort;
图4:基于DNA甲基化谱的免疫检查点抑制剂治疗有效性评估模型在单个肿瘤类型水平上具有较高性能。Figure 4: DNA methylation profiling-based model for evaluating the efficacy of immune checkpoint inhibitor therapy with high performance at the level of individual tumor types.
具体实施方式Detailed ways
下面将结合实施例对本发明的实施方案进行详细描述,但是本领域技术人员将会 理解,下列实施例仅用于说明本发明,而不应视为限制本发明的范围,并且所述实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The embodiments of the present invention will be described in detail below in conjunction with the examples, but those skilled in the art will understand that the following examples are only used to illustrate the present invention, and should not be regarded as limiting the scope of the present invention, and the examples are Some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
部分术语定义Some term definitions
除非在下文中另有定义,本发明具体实施方式中所用的所有技术术语和科学术语的含义意图与本领域技术人员通常所理解的相同。虽然相信以下术语对于本领域技术人员很好理解,但仍然阐述以下定义以更好地解释本发明。Unless otherwise defined hereinafter, all technical and scientific terms used in the detailed description of the present invention are intended to have the same meaning as commonly understood by one of ordinary skill in the art. While the following terms are believed to be well understood by those skilled in the art, the following definitions are set forth to better explain the present invention.
如本发明中所使用,术语“包括”、“包含”、“具有”、“含有”或“涉及”为包含性的(inclusive)或开放式的,且不排除其它未列举的元素或方法步骤。术语“由…组成”被认为是术语“包含”的优选实施方案。如果在下文中某一组被定义为包含至少一定数目的实施方案,这也应被理解为揭示了一个优选地仅由这些实施方案组成的组。As used herein, the terms "comprising", "comprising", "having", "containing" or "involving" are inclusive or open ended and do not exclude other unrecited elements or method steps . The term "consisting of" is considered to be a preferred embodiment of the term "comprising". If in the following a group is defined as comprising at least a certain number of embodiments, this should also be understood to disclose a group which preferably consists only of these embodiments.
本发明中的术语“大约”、“大体”表示本领域技术人员能够理解的仍可保证论及特征的技术效果的准确度区间。该术语通常表示偏离指示数值的±10%,优选±5%。The terms "approximately" and "substantially" in the present invention represent an accuracy interval that can be understood by those skilled in the art and still can guarantee the technical effect of the feature in question. The term generally means ±10%, preferably ±5%, of the indicated value.
在提及单数形式名词时使用的不定冠词或定冠词例如“一个”或“一种”,“所述”,包括该名词的复数形式。The use of indefinite or definite articles such as "a" or "an", "the" when referring to a noun in the singular includes the plural of that noun.
此外,说明书和权利要求书中的术语第一、第二、第三、(a)、(b)、(c)以及诸如此类,是用于区分相似的元素,不是描述顺序或时间次序必须的。应理解,如此应用的术语在适当的环境下可互换,并且本发明描述的实施方案能以不同于本发明描述或举例说明的其它顺序实施。Furthermore, the terms first, second, third, (a), (b), (c) and the like in the description and claims are used to distinguish between similar elements and are not necessarily of a descriptive order or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments described herein can be practiced in other sequences than described or illustrated herein.
以下描述仅是为了帮助理解本发明而提供。这些描述不应被理解为具有小于本领域技术人员所理解的范围。The following description is provided only to aid understanding of the present invention. These descriptions should not be construed as having a scope smaller than that understood by those skilled in the art.
本发明专利核心旨在构建一套可基于DNA甲基化谱数据,在泛肿瘤水平和单一肿瘤类型水平上均有较高预测精度的肿瘤免疫检查点抑制剂治疗有效性评估模型的构建方法。为此,本发明设计了一套基于使用DNA甲基化谱数据进行免疫浸润分析的结果来筛选肿瘤免疫检查点抑制剂治疗有效性特征甲基化位点的方法,并基于特征筛选结果和已知免疫治疗标志物来构建肿瘤免疫检查点抑制剂有效性评估模型。The core of the patent of the present invention is to construct a method for constructing a model for evaluating the efficacy of tumor immune checkpoint inhibitor therapy based on DNA methylation profile data and having high prediction accuracy at the pan-tumor level and single tumor type level. To this end, the present invention designs a set of methods for screening characteristic methylation sites of tumor immune checkpoint inhibitor treatment effectiveness based on the results of immune infiltration analysis using DNA methylation profile data, and based on the characteristic screening results and existing Known immunotherapy markers to construct a model for evaluating the effectiveness of tumor immune checkpoint inhibitors.
具体的,本发明设计了一套基于背景知识的DNA甲基化特征位点筛选方法。该方法基于公开报道中所陈述的如下事实:基于DNA甲基化谱的免疫浸润分析显示,大部分肿瘤类型的肿瘤样本均可以被分为浸润水平较高和浸润水平较低两类,同时有证据表明在部分肿瘤类型中以上两类浸润水平的样本在免疫检查点抑制剂治疗的有效性上存在显著差异。通过检出这些肿瘤类型并分析其中两类样本的差异甲基化位点,便可获得特征甲基化位点集合。Specifically, the present invention designs a set of screening methods for DNA methylation characteristic sites based on background knowledge. The method is based on the fact stated in published reports that the analysis of immune infiltration based on DNA methylation profiles shows that tumor samples of most tumor types can be divided into two categories: high and low infiltrative levels. Evidence suggests that there are significant differences in the effectiveness of immune checkpoint inhibitor therapy between samples with these two types of infiltration levels in some tumor types. By detecting these tumor types and analyzing the differentially methylated sites of two types of samples, a collection of characteristic methylated sites can be obtained.
在一些实施方式中,本发明具体提供了一种DNA甲基化特征位点筛选方法以及基于DNA甲基化谱的肿瘤免疫检查点抑制剂治疗有效性评估模型的构建方法,包括如下步骤:In some embodiments, the present invention specifically provides a method for screening DNA methylation characteristic sites and a method for constructing a model for evaluating the therapeutic effectiveness of tumor immune checkpoint inhibitors based on DNA methylation profiles, comprising the following steps:
步骤1),在给定的包含多个肿瘤类型样本的肿瘤队列中,基于检测的DNA甲基化谱数据对每个肿瘤样本进行免疫浸润分析,计算得到每个样本中各类型免疫细胞的相对浸润含量,基于各个肿瘤类型队列样本中的免疫细胞浸润含量进行聚类分析;优选的,所述聚类的类别数设置为2,得到在每个癌肿上的两类免疫细胞浸润模式的样本队列;Step 1), in a given tumor cohort containing samples of multiple tumor types, perform immune infiltration analysis on each tumor sample based on the detected DNA methylation profile data, and calculate the relative proportion of each type of immune cells in each sample. Infiltration content, cluster analysis is performed based on the immune cell infiltration content in each tumor type cohort sample; preferably, the number of categories of the cluster is set to 2, to obtain samples of two types of immune cell infiltration patterns on each tumor. queue;
步骤2),根据免疫检查点抑制剂治疗有效性间接评估指标,选出与之有显著关联的肿瘤类型;Step 2), according to the indirect evaluation index of the therapeutic effectiveness of the immune checkpoint inhibitor, select the tumor type that is significantly associated with it;
步骤3),对上述指标所筛选出的肿瘤类型分析两类免疫细胞浸润模式的样本队列在各甲基化位点上甲基化率的差异程度,构建特征甲基化位点集合。Step 3), analyzing the degree of difference in methylation rates at each methylation site between the sample cohorts of the two types of immune cell infiltration patterns for the tumor types screened by the above indicators, and constructing a set of characteristic methylation sites.
在一些实施方式中,所述步骤2)是根据3种免疫检查点抑制剂治疗有效性间接评估指标,选出与之有显著关联的肿瘤类型,所述3种免疫检查点抑制剂治疗有效性间接评估指标为预后生存时间(OS)评估指标、肿瘤突变负荷(TMB)评估指标和PD-L1表达水平评估指标;优选的,按照如下方式选出与之有显著关联的肿瘤类型:针对预后生存时间(OS)评估指标,使用时序检验(log rank test)筛选两类样本中预后生存时间显著差异的肿瘤类型;针对肿瘤突变负荷(TMB)评估指标,用曼-惠特尼U检验(Mann Whitney U test)筛选两类样本突变负荷存在显著差异的肿瘤类型;针对PD-L1表达水平评估指标,使用R软件的DESeq2包在两类样本之间的表达差异进行刻画,选取PD-L1基因的表达水平存在显著差异的肿瘤类型;更优选的,上述检验的pvalue均经过FDR校正,并存在显著性差异,adj.p-value小于0.05。In some embodiments, the step 2) is to select a tumor type that is significantly associated with the three types of immune checkpoint inhibitor therapy effectiveness indirect evaluation indicators, and the three immune checkpoint inhibitor therapy effectiveness The indirect evaluation indexes are the evaluation index of prognosis survival time (OS), the evaluation index of tumor mutation burden (TMB) and the evaluation index of PD-L1 expression level; Time (OS) evaluation index, using log rank test to screen tumor types with significant difference in prognosis and survival time between two types of samples; for tumor mutational burden (TMB) evaluation index, using Mann Whitney U test (Mann Whitney U test) U test) to screen the tumor types with significant differences in the mutation load of the two types of samples; for the evaluation index of PD-L1 expression level, use the DESeq2 package of R software to characterize the expression differences between the two types of samples, and select the expression of PD-L1 gene. Tumor types with significantly different levels; more preferably, the pvalues of the above tests are all corrected by FDR, and there are significant differences, and the adj.p-value is less than 0.05.
在一些实施方式中,所述步骤3)分别使用missMethyl软件包来分析两类肿瘤样本在各甲基化位点上甲基化率的差异程度,并定义经FDR校正后的显著性adj.p-value值小于0.05为显著的差异甲基化位点;优选的,在每种指标的筛选结果中,保留在与其关联的肿瘤类型上均为显著性差异且在一半以上的肿瘤类型中其甲基化率的差异方向一致的甲基化位点,定义为与该指标显著关联的特征甲基化位点;将三个特征甲基化位点集合合并为最终筛选的特征甲基化位点集合;更优选的,在特征甲基化位点集合基础上加入公开报道中检测到的与肿瘤免疫浸润相关的甲基化位点作为最终特征甲基化位点集合。In some embodiments, the step 3) uses the missMethyl software package to analyze the degree of difference in the methylation rates of the two types of tumor samples at each methylation site, and define the significance after FDR correction adj.p -value less than 0.05 is a significant differentially methylated site; preferably, in the screening results of each index, the remaining tumor types associated with it are significantly different, and in more than half of the tumor types, its methylation sites are significantly different. The methylation sites with the same direction of difference in the basement rate are defined as the characteristic methylation sites that are significantly associated with this index; the three characteristic methylation site sets are combined into the final screened characteristic methylation sites Collection; more preferably, methylation sites related to tumor immune infiltration detected in public reports are added to the collection of characteristic methylation sites as the final collection of characteristic methylation sites.
在一些实施方式中,由于免疫治疗样本队列较难获取,因此所述方法进一步包括:步骤4),采用公开报道中证实与免疫治疗疗效相关的特征来间接定义免疫检查点抑制剂治疗的有效性:优选的,所述定义在患者队列中同时满足如下条件病例为免疫检查点抑制剂治疗有效病例:1)肿瘤突变负荷(TMB)值高于所有样本的上四分位点值;2)公开报道中的TGF-β相关免疫评分(TGFB score 21050467)低于所有样本中位数值。In some embodiments, since the immunotherapy sample cohort is difficult to obtain, the method further includes: step 4), using features confirmed in public reports to correlate with the efficacy of immunotherapy to indirectly define the effectiveness of immune checkpoint inhibitor therapy : Preferably, the defined case in the patient cohort that simultaneously meets the following conditions is an effective case for immune checkpoint inhibitor therapy: 1) The tumor mutational burden (TMB) value is higher than the upper quartile value of all samples; 2) Public The reported TGF-β-related immune score (TGFB score 21050467) was lower than the median of all samples.
通过上述定义,将数据集分为对免疫检查点抑制剂有效和无效组,并且此定义使得该模型可建立于未实际经过免疫检查点抑制剂治疗,但具有丰富的多组学数据的肿瘤队列之上。By the above definition, the dataset is divided into effective and ineffective groups for immune checkpoint inhibitors, and this definition allows the model to be built on tumor cohorts that have not actually been treated with immune checkpoint inhibitors, but have rich multi-omics data. above.
最终模型根据最终特征甲基化位点集合的甲基化率为自变量,上述定义的免疫检查点抑制剂有效性为因变量进行模型训练,使用支持向量分类器(support vector machine classifier,SVM)构建免疫检查点抑制剂治疗有效性评估模型,并通过用5折交叉验证(5-fold cross validation)方法选择模型中的超参数。在模型训练过程中使用随机过抽样(random oversampling)解决模型面临的严重类不均衡(class imbalance)问题。使用F 1值(F 1)和马修斯相关系数(MCC)来衡量模型预测性能。通过训练得到模型超参数后,将原始队列以8:2的比例随机划分两个子集,在前一个子集种,根据上述得到的超参数进行训练并在后一个子集上计算其预测性能。重复此随机化模型评估过程100次以获得模型构建方法性能的全面评估。 The final model is an independent variable based on the methylation rate of the final set of characteristic methylation sites, and the effectiveness of the immune checkpoint inhibitor defined above is the dependent variable for model training, and a support vector machine classifier (SVM) is used for model training. An immune checkpoint inhibitor treatment efficacy evaluation model was constructed, and hyperparameters in the model were selected by using a 5-fold cross validation method. Random oversampling is used in the model training process to solve the severe class imbalance problem faced by the model. Model prediction performance was measured using F 1 value (F 1 ) and Matthews correlation coefficient (MCC). After the model hyperparameters are obtained through training, the original queue is randomly divided into two subsets at a ratio of 8:2. In the former subset, training is performed according to the hyperparameters obtained above, and its prediction performance is calculated on the latter subset. This randomized model evaluation process was repeated 100 times to obtain a comprehensive evaluation of the performance of the model building method.
在一些实施例中,本发明专利方法按照上述流程,在来自TCGA的由22个肿瘤类型的6381个样本组成的泛肿瘤队列上进行建模,最后识别了由3143个甲基化位点构成 的最终特征集合,并基于此特征集合构建了在泛肿瘤水平上和在单一肿瘤类型水平上都具备较高性能的肿瘤免疫检查点抑制剂治疗有效性评估模型。In some embodiments, the patented method of the present invention performs modeling on a pan-tumor cohort consisting of 6,381 samples from 22 tumor types from TCGA according to the above-mentioned process, and finally identified 3,143 methylation sites. The final feature set, and based on this feature set, a tumor immune checkpoint inhibitor treatment efficacy evaluation model with high performance at both the pan-tumor level and the single tumor type level was constructed.
本发明通过附图和如下实施例进一步描述,所述的附图和实施例只是为了例证本发明的特定实施方案,不应理解为以任何方式限制本发明范围之意。除非另外说明,本发明中所公开的实验方法均采用本技术领域常规技术,实施例中所用的试剂和原材料均可由市场购得。The present invention is further described by means of the accompanying drawings and the following examples, which are intended to illustrate specific embodiments of the invention only and should not be construed to limit the scope of the invention in any way. Unless otherwise stated, the experimental methods disclosed in the present invention all adopt conventional techniques in the technical field, and the reagents and raw materials used in the examples can be purchased from the market.
实施例1方法建立Example 1 method establishment
本发明专利设计了一套基于使用DNA甲基化谱数据进行免疫浸润分析的结果来筛选肿瘤免疫检查点抑制剂治疗有效性特征甲基化位点的方法,并基于特征筛选结果和已知免疫治疗标志物来构建肿瘤免疫检查点抑制剂有效性评估模型。The patent of the present invention designs a method for screening characteristic methylation sites of tumor immune checkpoint inhibitor treatment effectiveness based on the results of immune infiltration analysis using DNA methylation profile data, and based on the characteristic screening results and known immune Therapeutic markers to construct a model for evaluating the effectiveness of tumor immune checkpoint inhibitors.
1.基于背景知识的DNA甲基化特征位点的筛选方法建立1. Establishment of a screening method for DNA methylation characteristic sites based on background knowledge
鉴于DNA甲基化芯片可一次性获取大量位点的甲基化水平数据,位点数目远超临床队列所能承载的样本数目,且不同位点甲基化水平之间往往存在显著的共线性。这些特点使得绝大部分现有的以数据为驱动的模型选择方法变得效率低下。因此本实施例设计了一套基于背景知识的DNA甲基化特征位点筛选方法。Given that DNA methylation chips can obtain methylation level data for a large number of sites at one time, the number of sites far exceeds the number of samples that can be carried by clinical cohorts, and there is often significant collinearity between methylation levels at different sites. . These characteristics make most existing data-driven model selection methods inefficient. Therefore, in this example, a set of screening methods for DNA methylation characteristic sites based on background knowledge is designed.
该方法基于公开报道中所陈述的如下事实:基于DNA甲基化谱的免疫浸润分析显示,大部分肿瘤类型的肿瘤样本均可以被分为浸润水平较高和浸润水平较低两类,同时有证据表明在部分肿瘤类型中以上两类浸润水平的样本在免疫检查点抑制剂治疗的有效性上存在显著差异。通过检出这些肿瘤类型并分析其中两类样本的差异甲基化位点,便可获得特征甲基化位点集合。The method is based on the fact stated in published reports that the analysis of immune infiltration based on DNA methylation profiles shows that tumor samples of most tumor types can be divided into two categories: high and low infiltrative levels. Evidence suggests that there are significant differences in the effectiveness of immune checkpoint inhibitor therapy between samples with these two types of infiltration levels in some tumor types. By detecting these tumor types and analyzing the differentially methylated sites of two types of samples, a collection of characteristic methylated sites can be obtained.
由于接受免疫治疗的样本获取比较困难,本发明专利采用一系列相对容易获取且有证据支持的生物标志物间接评估肿瘤免疫检查点抑制剂治疗的效性。基于公有开报道所列举的事实,选取了预后总生存时间(overall survival time,OS),肿瘤突变负荷(tumor mutation burden,TMB)和程序性死亡受体配体1(PD-L1)基因的表达水平作为免疫检查点抑制剂治疗有效性的评估指标。DNA甲基化特征位点筛选的具体实施细节如下:Since it is difficult to obtain samples for immunotherapy, the patent of the present invention uses a series of relatively easy-to-obtain and evidence-supported biomarkers to indirectly evaluate the efficacy of tumor immune checkpoint inhibitor therapy. Based on the facts listed in the public reports, the prognosis overall survival time (OS), tumor mutation burden (TMB) and expression of programmed death ligand 1 (PD-L1) gene were selected. level as an indicator of the efficacy of immune checkpoint inhibitor therapy. The specific implementation details of DNA methylation signature site screening are as follows:
第一步,在给定的包含多个肿瘤类型样本的肿瘤队列中,基于检测得的DNA甲基 化谱数据对每个肿瘤样本进行免疫浸润分析,计算得到每个样本中各类型免疫细胞的相对浸润含量,基于各个肿瘤类型队列样本中的免疫细胞浸润含量进行聚类分析,聚类的类别数设置为2,得到在每个癌肿上的两类免疫细胞浸润模式的样本队列。In the first step, in a given tumor cohort containing samples of multiple tumor types, immune infiltration analysis was performed on each tumor sample based on the detected DNA methylation profile data, and the number of immune cells of each type in each sample was calculated. For relative infiltration content, cluster analysis was performed based on the immune cell infiltration content in each tumor type cohort sample, and the number of clusters was set to 2 to obtain two types of immune cell infiltration patterns on each tumor.
第二步,根据3种免疫检查点抑制剂治疗有效性间接评估指标,分别照如下方式选出与之有显著关联的肿瘤类型:针对预后生存时间(OS)评估指标,使用时序检验(log rank test)筛选两类样本中预后生存时间显著差异的肿瘤类型;针对TMB评估指标,用曼-惠特尼U检验(Mann Whitney U test)筛选两类样本突变负荷存在显著差异的肿瘤类型;针对PD-L1表达水平评估指标,使用R软件的DESeq2包在两类样本之间的表达差异进行刻画,选取PD-L1基因的表达水平存在显著差异的肿瘤类型。以上检验的pvalue均经过FDR校正,并存在显著性差异(adj.p-value小于0.05)。In the second step, according to the three indirect evaluation indicators of the efficacy of immune checkpoint inhibitor therapy, the tumor types that are significantly associated with them are selected as follows: test) to screen the tumor types with significantly different prognosis and survival time in the two types of samples; for the TMB evaluation index, the Mann Whitney U test was used to screen the tumor types with significant differences in the mutation load of the two types of samples; for PD -L1 expression level evaluation index, use the DESeq2 package of R software to describe the expression difference between the two types of samples, and select tumor types with significant differences in the expression level of PD-L1 gene. The pvalues of the above tests were all corrected by FDR, and there were significant differences (adj.p-value less than 0.05).
第三步,对以上3种指标所筛选出的肿瘤类型分别使用missMethyl软件包来分析两类肿瘤样本在各甲基化位点上甲基化率的差异程度,并定义经FDR校正后的显著性adj.p-value值小于0.05为显著的差异甲基化位点。在每种指标的筛选结果中,保留在与其关联的肿瘤类型上均为显著性差异且在一半以上的肿瘤类型中其甲基化率的差异方向一致的甲基化位点,定义为与该指标显著关联的特征甲基化位点。将三个特征甲基化位点集合合并为最终筛选的特征甲基化位点集合。最后在此基础上加入公开报道中检测到的与肿瘤免疫浸润相关的甲基化位点作为最终特征甲基化位点集合。The third step is to use the missMethyl software package for the tumor types screened by the above three indicators to analyze the degree of difference in the methylation rates of the two types of tumor samples at each methylation site, and define the significant difference after FDR correction. Sex adj.p-value less than 0.05 was considered as a significant differentially methylated site. In the screening results of each index, the methylation sites that were significantly different in their associated tumor types and whose methylation rates were in the same direction in more than half of the tumor types were retained, and were defined as Indicators are significantly associated with characteristic methylation sites. The three feature methylation site sets were combined into the final screened feature methylation site set. Finally, on this basis, the methylation sites detected in public reports related to tumor immune infiltration were added as the final set of characteristic methylation sites.
第四步,由于免疫治疗样本队列较难获取,因此采用公开报道中证实与免疫治疗疗效相关的特征来间接定义免疫检查点抑制剂治疗的有效性。定义在患者队列中同时满足如下条件病例为免疫检查点抑制剂治疗有效病例:In the fourth step, since immunotherapy sample cohorts are difficult to obtain, the efficacy of immune checkpoint inhibitor therapy is indirectly defined by features confirmed in public reports that correlate with the efficacy of immunotherapy. Cases in the patient cohort who meet the following conditions at the same time are defined as effective cases of immune checkpoint inhibitor therapy:
1)肿瘤突变负荷(TMB)值高于所有样本的上四分位点值;1) The tumor mutational burden (TMB) value is higher than the upper quartile value of all samples;
2)公开报道中的TGF-β相关免疫评分(TGFB score 21050467)低于所有样本中位数值。2) The TGF-β-related immune score (TGFB score 21050467) in public reports is lower than the median value of all samples.
通过上述定义,将数据集分为对免疫检查点抑制剂有效和无效组,并且此定义使得该模型可建立于未实际经过免疫检查点抑制剂治疗,但具有丰富的多组学数据的肿瘤队列之上。By the above definition, the dataset is divided into effective and ineffective groups for immune checkpoint inhibitors, and this definition allows the model to be built on tumor cohorts that have not actually been treated with immune checkpoint inhibitors, but have rich multi-omics data. above.
2.构建肿瘤免疫检查点抑制剂有效性评估模型2. Construction of a model for evaluating the effectiveness of tumor immune checkpoint inhibitors
上述筛选的最终模型根据最终特征甲基化位点集合的甲基化率为自变量,上述定义的免疫检查点抑制剂有效性为因变量进行模型训练,使用支持向量分类器(support vector machine classifier,SVM)构建免疫检查点抑制剂治疗有效性评估模型,并通过用5折交叉验证(5-fold cross validation)方法选择模型中的超参数。在模型训练过程中使用随机过抽样(random oversampling)解决模型面临的严重类不均衡(class imbalance)问题。使用F 1值(F 1)和马修斯相关系数(MCC)来衡量模型预测性能。通过训练得到模型超参数后,将原始队列以8:2的比例随机划分两个子集,在前一个子集中,根据上述得到的超参数进行训练并在后一个子集上计算其预测性能。重复此随机化模型评估过程100次以获得模型构建方法性能的全面评估。 The final model of the above screening is an independent variable according to the methylation rate of the final characteristic methylation site set, and the effectiveness of the immune checkpoint inhibitor defined above is the dependent variable for model training, using a support vector machine classifier (support vector machine classifier). , SVM) to construct an immune checkpoint inhibitor treatment efficacy evaluation model, and select the hyperparameters in the model by using a 5-fold cross validation method. Random oversampling is used in the model training process to solve the severe class imbalance problem faced by the model. Model prediction performance was measured using F 1 value (F 1 ) and Matthews correlation coefficient (MCC). After the model hyperparameters are obtained through training, the original queue is randomly divided into two subsets with a ratio of 8:2. In the former subset, training is performed according to the hyperparameters obtained above, and its prediction performance is calculated on the latter subset. This randomized model evaluation process was repeated 100 times to obtain a comprehensive evaluation of the performance of the model building method.
实施例2临床样本验证Example 2 Clinical sample verification
本发明专利将上述特征筛选和模型构建方法运用于一个来自TCGA的,由22种肿瘤类型和6381例患者组成的泛肿瘤队列。并在该队列上展示其优良性(附表1列出了所有肿瘤类型及每肿瘤类型所包含样本数量)。The present patent applies the above feature screening and model building methods to a pan-tumor cohort from TCGA consisting of 22 tumor types and 6381 patients. And demonstrated its superiority on this cohort (Supplementary Table 1 lists all tumor types and the number of samples included in each tumor type).
Figure PCTCN2021076879-appb-000001
Figure PCTCN2021076879-appb-000001
Figure PCTCN2021076879-appb-000002
Figure PCTCN2021076879-appb-000002
从公开数据中获取上述队列Illumina Infinium HumanMethylation450BeadChip测得的约48万个位点的甲基化率(β值),由RNA-seq测得的基因表达谱,由基因组测序获得的体细胞突变图谱和生存时间数据。并按通行定义计算了每一个样本的肿瘤突变负荷和TGF-β分值。The methylation rate (β value) of about 480,000 sites measured by the above-mentioned cohort Illumina Infinium HumanMethylation450BeadChip, the gene expression profile measured by RNA-seq, the somatic mutation map obtained by genome sequencing, and the time-to-live data. Tumor mutational burden and TGF-β score were calculated for each sample according to commonly used definitions.
在此队列上,通过特征选择算法得到一个包含2083个DNA甲基化位点的最终特征集合,附表2示例性的列出了最终特征集合甲基化位点以及筛选过程信息,如下。On this cohort, a final feature set containing 2083 DNA methylation sites was obtained through the feature selection algorithm, and Table 2 exemplarily lists the final feature set methylation sites and the screening process information, as follows.
Figure PCTCN2021076879-appb-000003
Figure PCTCN2021076879-appb-000003
Figure PCTCN2021076879-appb-000004
Figure PCTCN2021076879-appb-000004
该特征集合能高效区分免疫检查点抑制剂治疗有效和无效的样本。这一点从如下几个方面得以说明:This feature set can efficiently discriminate between immune checkpoint inhibitor-responsive and ineffective samples. This is illustrated from the following aspects:
a)观察可知免疫检查点抑制剂治疗有效和无效的样本在最终特征集合中各位点上的甲基化率存在显著差异(附图1A)。此外,与从所有检测的甲基化位点中随机挑选的,与最终特征集合有相同数目甲基化位点的随机阴性对照相比,如果将样本按照特征位点的甲基化率无监督聚类为两组,则分组情况与样本是否免疫检查点抑制剂治疗有效之间存在更密切的相关性(在100次对比中,若分别以F 1值和MCC衡量其相关性,则基于最终特征集合的聚类分别高于96和92组对照特征集合,附图1B)。 a) It was observed that there were significant differences in the methylation rates at each site in the final feature set between the effective and ineffective immune checkpoint inhibitor treatment samples (Fig. 1A). In addition, compared to a random negative control with the same number of methylation sites as the final signature set, randomly selected from all the detected methylation sites, if the samples were unsupervised by the methylation rate of the signature sites If clustered into two groups, there is a closer correlation between the grouping status and whether the sample is effective in immune checkpoint inhibitor treatment (in 100 comparisons, if the correlation is measured by F 1 value and MCC, respectively, based on the final The clustering of the feature sets was higher than that of the 96 and 92 control feature sets, respectively, Figure 1B).
b)与上述随机选取的,和最终特征集合由相同数目甲基化位点的阴性对照特征集合相比,最终特征集合包含更多的在对比免疫检查点抑制剂治疗有效与否的两 组样本中展现出甲基化水平差异的甲基化位点。即使忽略肿瘤类型,直接对比所有免疫检查点抑制剂治疗有效和无效样本的甲基化率时,最终特征集合中75.56%的甲基化位点被报告为存在甲基化差异,而此数值在对照特征集合中最高为70.28%(附图1C)。b) Compared with the negative control feature set that is randomly selected and the final feature set consists of the same number of methylation sites, the final feature set contains more two groups of samples that are effective in comparing immune checkpoint inhibitor therapy Methylation sites exhibiting differences in methylation levels. When directly comparing the methylation rates of all immune checkpoint inhibitor-responsive and ineffective samples, even ignoring tumor type, 75.56% of the methylated sites in the final signature set were reported to be differentially methylated, and this value is in the The highest in the control feature set was 70.28% (Fig. 1C).
c)在基于基因本体论(Gene ontopology,GO)和京都基因与基因组百科全书(Kyoto Encyclopedia of Genes and Genomes,KEGG)的基因功能富集分析中,最终特征集合中甲基化位点所在基因富集大量与免疫和肿瘤相关的功能项。例如GO中"immune response","immune system process"和"regulation of immune system process"等,以及KEGG中的"Th17 cell differentiation"等(附图1D)。此外,相比对照特征集合,最终特征集合中甲基化位点所在基因在基因富集分析中富集显著更多的功能项。例如在基于GO和KEGG的基因功能富集分析中,最终特征集合中甲基化位点所在基因共分别富集得347和9个功能项,而100次随机对照特征集合中最多仅分别富集95和8个功能项。c) In the gene function enrichment analysis based on Gene Ontopology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), the genes where methylation sites are located in the final feature set are enriched A large number of functional items related to immunity and tumors are collected. For example, "immune response", "immune system process" and "regulation of immune system process" in GO, and "Th17 cell differentiation" in KEGG (Fig. 1D). In addition, compared with the control feature set, the genes containing methylation sites in the final feature set were enriched with significantly more functional items in the gene enrichment analysis. For example, in the gene function enrichment analysis based on GO and KEGG, a total of 347 and 9 functional items were enriched in the genes where the methylation sites were located in the final feature set, while only 100 random control feature sets were enriched separately at most. 95 and 8 functional items.
基于所选最终特征集合所包含甲基化位点的甲基化水平构建的预测模型能以较高准确度和灵敏度在泛肿瘤水平区分免疫检查点抑制剂治疗有效和无效的样本。此项可通过如下几点加以论证:The prediction model constructed based on the methylation levels of the methylation sites included in the selected final feature set can discriminate immune checkpoint inhibitor-responsive and ineffective samples at the pan-tumor level with high accuracy and sensitivity. This can be demonstrated by the following points:
a)在此队列上,基于上述最终特征集合,本发明专利测试了一系列常见机器学习预测模型的效果,其中包括带L 1正则项的逻辑斯蒂回归模型(logistic regression,LR)支持向量分类器(support vector machine classifier,SVM),随机森林分类器(random forest classifier,RF)和k近邻分类器(k nearest neighbor classifier,KNN)。按照上述模型性能评估策略,无论以F 1值还是MCC衡量模型性能,SVM模型均明显优于其他模型(附图2A)。此外,所有机器学***均肿瘤突变负荷的一类样本报告为免疫检查点抑制剂治疗有效所形成的背景模型(附图2A)。 a) On this cohort, based on the above-mentioned final feature set, the patent of the present invention tested the effect of a series of common machine learning prediction models, including logistic regression (LR) support vector classification with L 1 regular term A support vector machine classifier (SVM), a random forest classifier (RF) and a k nearest neighbor classifier (KNN). According to the above-mentioned model performance evaluation strategy, the SVM model is significantly better than other models regardless of whether the model performance is measured by F1 value or MCC ( Fig . 2A). In addition, the predictive performance of all machine learning models was significantly higher than the background model formed by reporting a class of samples with a higher average tumor mutational burden in the clustering results in each tumor type as being effective on immune checkpoint inhibitor therapy (appendix). Figure 2A).
b)上述SVM模型的预测能力显著高于基于随机对照特征集合所构建的模型。本示例对比了基于最终特征集合的模型与100个基于不同随机对照特征集合建立的模型的预测能力。结果显示,在绝大多数SVM模型的超参数设定下,基于最终特征集合的模型均一致优于基于随机对照特征集合的模型(附图2B)。b) The predictive power of the above SVM model is significantly higher than that of the model constructed based on the random control feature set. This example compares the predictive power of a model based on the final feature set with 100 models based on different random control feature sets. The results show that under the hyperparameter settings of most SVM models, the models based on the final feature set consistently outperform the models based on the random control feature set (Fig. 2B).
c)基于最终特征集合的模型所具有的预测能力不来自于过拟合。为证实此论断,随机重排样本所具有的治疗有效性标签,并基于上述最终特征集合构建对重排后的标签的预测模型。结果显示,在100次上述随机化评估中,随机化模型的预测性能均较原模型有显著下降。这些随机化模型的预测能力均仅略高于随机猜测时。说明在此队列中,最终特征集合尚不至于过大以致出现过拟合现象(附图2C)。c) The predictive power of the model based on the final feature set does not come from overfitting. To confirm this assertion, we randomly rearrange the treatment effectiveness labels of the samples, and build a prediction model for the rearranged labels based on the final feature set described above. The results show that in 100 randomization evaluations, the prediction performance of the randomized model is significantly lower than that of the original model. The predictive power of these randomized models is only slightly higher than when random guessing. It shows that in this queue, the final feature set is not too large to cause overfitting (Fig. 2C).
d)众多与免疫检查点抑制剂治疗有效性相关的指标变量在模型预测为免疫检查点抑制剂治疗有效和无效的两组样本之间存在差异。在所有100次随机化模型性能评估中,被模型预测为免疫检查点抑制剂治疗有效的样本均具有比被模型预测为无效的样本更高的肿瘤突变负荷(附图2D)。在其中99此评估中,被模型预测为有效的样本有比模型预测为无效者更高的PD-L1基因平均表达水平(附图2E)。此外,在其中66次评估中,此PD-L1基因表达水平的差异是显著的(以FDR校正后的显著性水平小于0.01为阈值,附图2E)。d) Numerous index variables related to the effectiveness of immune checkpoint inhibitor therapy were different between the two groups of samples predicted by the model to be effective and ineffective. Across all 100 randomized model performance evaluations, samples predicted by the model to respond to immune checkpoint inhibitor therapy had higher tumor mutational burdens than samples predicted to be ineffective (Fig. 2D). In 99 of these assessments, samples predicted to be effective by the model had higher mean expression levels of the PD-L1 gene than those predicted to be ineffective by the model (Fig. 2E). Furthermore, in 66 of these assessments, this difference in the expression level of the PD-L1 gene was significant (with an FDR-corrected significance level of less than 0.01 as the threshold, Figure 2E).
基于DNA甲基化的免疫检查点抑制剂治疗有效性评估模型的预测准确度与基于基因表达谱的模型在泛肿瘤水平上相近且二者存在互补性。此项可通过如下几个方面加以论证:The prediction accuracy of the DNA methylation-based immune checkpoint inhibitor treatment efficacy evaluation model is similar to that of the gene expression profiling-based model at the pan-tumor level and the two are complementary. This can be demonstrated by the following aspects:
a)基于DNA甲基化的模型有与基于基因表达谱的模型相似的预测性能。此处基于基因表达谱的模型按照公开报道,基于肿瘤免疫相关基因的表达水平(log 2(FPKM+1))用SVM方法构建。与公开报道不同,模型训练时同样采用随机过抽样方法解决类不均衡问题。如此构建的预测模型预测性能明显高于公开报道中的模型(在100次随机化评估中,平均MCC分值达到0.463,而公开报道中仅0.296)。在100次随机化评估中,若以F 1值衡量模型预测性能,则基于DNA甲基化的模型显著优于基于基因表达谱的模型(附图3A);若以MCC值衡量模型的预测性能,则二者无显著差异(附图3B);若以AUC(受试者操作特性曲线下面积,area under receiver operating characteristic curve)衡量模型预测性能,则基于基因表达谱的模型显著优于基于DNA甲基化的模型(附图3C)。综合以上三个指标,可以认为两个模型对肿瘤样本免疫检查点抑制剂治疗有效性的预测效能是近似的。这一结论也可通过检视二者的受试者操作特性曲线(receiver operating characteristic curve,ROC)得出(附图3D)。 a) Models based on DNA methylation have similar predictive performance to models based on gene expression profiles. The model based on gene expression profile here is constructed by SVM method based on the expression level (log 2 (FPKM+1)) of tumor immunity-related genes according to published reports. Different from public reports, the random oversampling method is also used to solve the class imbalance problem during model training. The predictive performance of the predictive model thus constructed was significantly higher than that of the publicly reported model (in 100 randomized evaluations, the average MCC score reached 0.463, compared to only 0.296 in the published report). In 100 randomization evaluations, the model based on DNA methylation significantly outperformed the model based on gene expression profiles if the F1 value was used to measure the predictive performance of the model ( Fig . 3A); if the predictive performance of the model was measured by the MCC value , there is no significant difference between the two (Fig. 3B); if the prediction performance of the model is measured by AUC (area under receiver operating characteristic curve), the model based on gene expression profile is significantly better than the model based on DNA Model of methylation (Fig. 3C). Combining the above three indicators, it can be considered that the predictive performance of the two models for the efficacy of immune checkpoint inhibitor therapy in tumor samples is similar. This conclusion can also be drawn by examining the receiver operating characteristic curve (ROC) of the two (Fig. 3D).
b)基于DNA甲基化的模型与基于基因表达谱的模型存在互补性。此结论通过如下几个方面得以论证:b) Models based on DNA methylation are complementary to models based on gene expression profiles. This conclusion is justified by the following aspects:
a)基于DNA甲基化的模型中所选择的甲基化位点所在基因与基于基因表达谱的模型所涉及的基因有较大差异。在前者所涉及的1660个基因中,仅有384个与后者相同(后者共包含2614个基因)。上述两个集合在基因本体论富集分析中所富集的功能项也存在明显差异(附图3E)。a) The genes where the methylation sites selected in the DNA methylation-based model are quite different from those involved in the gene expression profile-based model. Among the 1660 genes involved in the former, only 384 are identical to the latter (the latter contains a total of 2614 genes). There were also significant differences in the functional items enriched in the Gene Ontology enrichment analysis between the above two sets (Fig. 3E).
b)同时基于DNA甲基化谱和基因表达谱的模型在本队列中有比单独基于DNA甲基化水平或基因表达谱的模型更高的预测性能。此处为使DNA甲基化率与基因表达水平可比,使用极差标准化方法将同一样本的所有基因表达水平(log 2(FPKM+1))标准化至区间[0,1],继而使用与前述完全相同的策略在合并DNA甲基化水平和基因表达水平的特征集合上构建预测模型。100次随机化评估结果显示,无论使用何种模型预测性能评估方法,合并两组特征后的模型对肿瘤免疫检查点抑制剂治疗有效性的预测性能均优于仅考虑DNA甲基化水平或基因表达水平的模型(附图3A-C)。此结论也可通过检视三个模型的ROC曲线获得(附图3D)。 b) Models based on both DNA methylation profiles and gene expression profiles had higher predictive performance in this cohort than models based on DNA methylation levels or gene expression profiles alone. Here, in order to make DNA methylation rates comparable to gene expression levels, the range normalization method was used to normalize all gene expression levels (log 2 (FPKM+1)) of the same sample to the interval [0,1], and then use the same The exact same strategy builds predictive models on feature sets that incorporate DNA methylation levels and gene expression levels. The results of 100 randomization evaluations showed that regardless of the model prediction performance evaluation method used, the model after combining the characteristics of the two groups was better than only considering DNA methylation levels or genes. Models of expression levels (Figures 3A-C). This conclusion can also be obtained by examining the ROC curves of the three models (Fig. 3D).
综上,依本发明专利所列方法在TCGA泛肿瘤队列中构建的基于DNA甲基化水平的肿瘤免疫检查点抑制剂治疗有效性评估模型在泛肿瘤水平上与公开报道中基于基因表达谱构建者有类似的预测精度。In conclusion, the DNA methylation-based tumor immune checkpoint inhibitor treatment efficacy evaluation model constructed in the TCGA pan-tumor cohort according to the method listed in the patent of the present invention was constructed based on the gene expression profile at the pan-tumor level and published reports. have similar prediction accuracy.
基于本发明专利中所列方式在TCGA队列上构建的免疫检查点抑制剂治疗有效性评估模型在单个肿瘤类型水平上也具有较高预测精度。此结论通过考察模型在TCGA队列中5%以上样本标记为免疫检查点抑制剂治疗有效的肿瘤类型(共10种)上的预测精度得以论证。The immune checkpoint inhibitor treatment efficacy evaluation model constructed on the TCGA cohort based on the method listed in the patent of the present invention also has high prediction accuracy at the level of a single tumor type. This conclusion was demonstrated by examining the predictive accuracy of the model on tumor types (10 in total) in which more than 5% of the samples in the TCGA cohort were marked as responding to immune checkpoint inhibitor therapy.
a)基于DNA甲基化水平的模型在肿瘤类型水平上正确的反映了不同肿瘤类型样本免疫检查点抑制剂治疗的有效率差异。为此,在上述100次随机化评估中,计算了模型在每一肿瘤类型中预测的免疫检查点抑制剂治疗有效率,并与对应的有效率真值进行比较。结果显示,虽然由于存在较高假阳性率,模型预测的有效率普遍高于真值,但二者变化趋势高度一致(在100次随机化模型评估中二者平均斯皮尔曼相关系数达到0.73,附图4A)。a) The model based on the DNA methylation level correctly reflects the difference in the response rate of immune checkpoint inhibitor treatment in different tumor types at the tumor type level. To this end, in the 100 randomized evaluations described above, the model's predicted response rate to immune checkpoint inhibitor therapy in each tumor type was calculated and compared to the corresponding true response rate. The results show that although the effective rate predicted by the model is generally higher than the true value due to the high false positive rate, the change trend of the two is highly consistent (the average Spearman correlation coefficient of the two in 100 random model evaluations reaches 0.73, Figure 4A).
b)与基于随机选择的,与最终特征集合有相同数目的甲基化位点构建的预测模型 相比,基于最终特征集合构建的预测模型在公开报道中确认的,与免疫逃逸密切相关且有可能从免疫检查点抑制剂治疗中获益的肿瘤类型中有更高的预测精度。无论使用F 1值还是MCC值衡量模型预测精度,在100次随机化模型评估中,基于最终特征集合的模型始终在上述10个肿瘤类型中的5个上有显著更高的预测精度(用配对样本t检验衡量模型预测精度的差异,取显著性水平阈值为0.1)。且在所有10个肿瘤类型中,基于最终特征集合的模型预测精度均不显著弱于基于随机对照特征集的模型(附图4B)。 b) Compared with the prediction model based on random selection and with the same number of methylation sites as the final feature set, the prediction model based on the final feature set confirmed in public reports is closely related to immune escape and has Higher predictive accuracy among tumor types likely to benefit from immune checkpoint inhibitor therapy. Regardless of whether the F1 value or the MCC value was used to measure model prediction accuracy, across 100 randomized model evaluations, the model based on the final feature set consistently had significantly higher prediction accuracy on 5 of the 10 tumor types described above (with paired The sample t-test measures the difference in the prediction accuracy of the models, and takes the threshold of significance level as 0.1). And in all 10 tumor types, the prediction accuracy of the model based on the final feature set was not significantly weaker than the model based on the random control feature set (Fig. 4B).
基于DNA甲基化水平的模型在实际接受免疫检查点抑制剂治疗的肿瘤队列中预测了治疗的有效性。为此,考察了一个公开报道的,由58位接受过免疫检查点抑制剂治疗的非小细胞肺癌患者组成的队列(SMC队列)。用基于TCGA队列所有样本训练的疗效评估模型对每一位患者治疗的有效性进行了评估并与真是结果进行了比较。结果显示预测精度的F 1值,MCC值和AUC值分别达到了0.42,0.21和0.70。模型的ROC曲线也显示其具有较好的预测准确度和灵敏度(附图4C)。需注意,此队列未包含于TCGA队列,非小细胞肺癌也不在参与模型构建的22个肿瘤类型之中,模型在此完全独立于TCGA队列的肿瘤队列中仍有较高的预测性能这一事实进一步论证了模型构建方法和基于此方法构建的模型的有效性。 Models based on DNA methylation levels predicted treatment effectiveness in tumor cohorts actually receiving immune checkpoint inhibitor therapy. To this end, a publicly reported cohort of 58 patients with non-small cell lung cancer who had been treated with immune checkpoint inhibitors (SMC cohort) was examined. The efficacy of each patient's treatment was assessed using an efficacy evaluation model trained on all samples of the TCGA cohort and compared with the real results. The results showed that the F 1 value, MCC value and AUC value of the prediction accuracy reached 0.42, 0.21 and 0.70, respectively. The ROC curve of the model also showed good prediction accuracy and sensitivity (Fig. 4C). Note that this cohort was not included in the TCGA cohort, nor was NSCLC among the 22 tumor types involved in the model construction, the fact that the model still has high predictive performance in this tumor cohort completely independent of the TCGA cohort The validity of the model construction method and the model constructed based on this method is further demonstrated.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,但本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: It is still possible to modify the technical solutions recorded in the foregoing embodiments, or perform equivalent replacements to some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention. range.

Claims (10)

  1. 一种DNA甲基化特征位点筛选方法,其特征在于,包括如下步骤:A method for screening DNA methylation characteristic sites, comprising the steps of:
    步骤1),在给定的包含多个肿瘤类型样本的肿瘤队列中,基于检测的DNA甲基化谱数据对每个肿瘤样本进行免疫浸润分析,计算得到每个样本中各类型免疫细胞的相对浸润含量,基于各个肿瘤类型队列样本中的免疫细胞浸润含量进行聚类分析;Step 1), in a given tumor cohort containing samples of multiple tumor types, perform immune infiltration analysis on each tumor sample based on the detected DNA methylation profile data, and calculate the relative proportion of each type of immune cells in each sample. Infiltration content, cluster analysis based on immune cell infiltration content in each tumor type cohort sample;
    优选的,所述聚类的类别数设置为2,得到在每个癌肿上的两类免疫细胞浸润模式的样本队列;Preferably, the number of categories of the clusters is set to 2, to obtain sample queues of two types of immune cell infiltration patterns on each tumor;
    步骤2),根据免疫检查点抑制剂治疗有效性间接评估指标,选出与之有显著关联的肿瘤类型;Step 2), according to the indirect evaluation index of the therapeutic effectiveness of the immune checkpoint inhibitor, select the tumor type that is significantly associated with it;
    步骤3),对上述指标所筛选出的肿瘤类型分析两类免疫细胞浸润模式的样本队列在各甲基化位点上甲基化率的差异程度,构建特征甲基化位点集合。Step 3), analyzing the degree of difference in methylation rates at each methylation site between the sample cohorts of the two types of immune cell infiltration patterns for the tumor types screened by the above indicators, and constructing a set of characteristic methylation sites.
  2. 权利要求1所述的DNA甲基化特征位点筛选方法,其特征在于:The method for screening DNA methylation characteristic sites according to claim 1, wherein:
    所述步骤2)是根据3种免疫检查点抑制剂治疗有效性间接评估指标,选出与之有显著关联的肿瘤类型,所述3种免疫检查点抑制剂治疗有效性间接评估指标为预后生存时间(OS)评估指标、肿瘤突变负荷(TMB)评估指标和PD-L1表达水平评估指标;The step 2) is to select the tumor type that is significantly related to the three kinds of indirect evaluation indicators of the treatment effectiveness of immune checkpoint inhibitors, and the indirect evaluation indicators of the three kinds of immune checkpoint inhibitor treatment effectiveness are prognosis and survival. Time (OS) evaluation index, tumor mutation burden (TMB) evaluation index and PD-L1 expression level evaluation index;
    优选的,按照如下方式选出与之有显著关联的肿瘤类型:针对预后生存时间(OS)评估指标,使用时序检验(log rank test)筛选两类样本中预后生存时间显著差异的肿瘤类型;针对肿瘤突变负荷(TMB)评估指标,用曼-惠特尼U检验(Mann Whitney U test)筛选两类样本突变负荷存在显著差异的肿瘤类型;针对PD-L1表达水平评估指标,使用R软件的DESeq2包在两类样本之间的表达差异进行刻画,选取PD-L1基因的表达水平存在显著差异的肿瘤类型;Preferably, the tumor types that are significantly associated with it are selected in the following manner: for the prognosis survival time (OS) evaluation index, use log rank test to screen the tumor types with significant difference in the prognosis survival time in the two types of samples; Tumor mutation burden (TMB) evaluation index, the Mann Whitney U test was used to screen the tumor types with significant differences in the mutation burden of the two types of samples; for the evaluation index of PD-L1 expression level, DESeq2 of R software was used The expression differences between the two types of samples were characterized, and the tumor types with significant differences in the expression level of PD-L1 gene were selected;
    更优选的,上述检验的pvalue均经过FDR校正,并存在显著性差异,adj.p-value小于0.05。More preferably, the pvalues of the above tests are all corrected by FDR, and there is a significant difference, and the adj.p-value is less than 0.05.
  3. 权利要求1-2任一所述的DNA甲基化特征位点筛选方法,其特征在于:The DNA methylation characteristic site screening method described in any one of claim 1-2, is characterized in that:
    所述步骤3)分别使用missMethyl软件包来分析两类肿瘤样本在各甲基化位点上甲基化率的差异程度,并定义经FDR校正后的显著性adj.p-value值小于0.05为显著的差异甲基化位点;The step 3) uses the missMethyl software package to analyze the degree of difference in the methylation rates of the two types of tumor samples at each methylation site, and defines that the significant adj.p-value after FDR correction is less than 0.05 as Significant differential methylation sites;
    优选的,在每种指标的筛选结果中,保留在与其关联的肿瘤类型上均为显著性差异且在一半以上的肿瘤类型中其甲基化率的差异方向一致的甲基化位点,定义为与该指标显著关联的特征甲基化位点;将三个特征甲基化位点集合合并为最终筛选的特征甲基化位点集合;Preferably, in the screening results of each index, the methylation sites that are significantly different in their associated tumor types and whose methylation rates are in the same direction in more than half of the tumor types are retained. Definition is the characteristic methylation site that is significantly associated with this indicator; the three characteristic methylation site sets are merged into the final screened characteristic methylation site set;
    更优选的,在特征甲基化位点集合基础上加入公开报道中检测到的与肿瘤免疫浸润相关的甲基化位点作为最终特征甲基化位点集合。More preferably, methylation sites related to tumor immune infiltration detected in public reports are added to the set of characteristic methylation sites as the final set of characteristic methylation sites.
  4. 权利要求1所述的DNA甲基化特征位点筛选方法,其特征在于,所述方法进一步包括:The method for screening DNA methylation characteristic sites according to claim 1, wherein the method further comprises:
    步骤4),采用公开报道中证实与免疫治疗疗效相关的特征来间接定义免疫检查点抑制剂治疗的有效性:Step 4), using the features confirmed in public reports that are related to the efficacy of immunotherapy to indirectly define the effectiveness of immune checkpoint inhibitor therapy:
    优选的,所述定义在患者队列中同时满足如下条件病例为免疫检查点抑制剂治疗有效病例:1)肿瘤突变负荷(TMB)值高于所有样本的上四分位点值;2)公开报道中的TGF-β相关免疫评分(TGFB score 21050467)低于所有样本中位数值;通过上述定义,将数据集分为对免疫检查点抑制剂有效和无效组。Preferably, the defined cases in the patient cohort meeting the following conditions are effective cases of immune checkpoint inhibitor treatment: 1) The tumor mutational burden (TMB) value is higher than the upper quartile value of all samples; 2) Public reports The TGF-β-related immune score in (TGFB score 21050467) was lower than the median of all samples; by the above definition, the data set was divided into groups effective and ineffective against immune checkpoint inhibitors.
  5. 一种基于DNA甲基化谱的肿瘤免疫检查点抑制剂治疗有效性评估模型的构建方法,其特征在于,所述方法包括:A method for constructing a model for evaluating the efficacy of tumor immune checkpoint inhibitor therapy based on DNA methylation profiles, characterized in that the method comprises:
    根据权利要求1-4任一所述方法获得的最终特征甲基化位点集合的甲基化率为自变量,根据权利要求1-4任意所述方法中定义的免疫检查点抑制剂有效性为因变量,进行模型训练。The methylation rate of the final set of characteristic methylation sites obtained according to any of the methods of claims 1-4 is an independent variable, and according to the effectiveness of the immune checkpoint inhibitor defined in any of the methods of claims 1-4 For the dependent variable, model training is performed.
  6. 权利要求5所述的基于DNA甲基化谱的肿瘤免疫检查点抑制剂治疗有效性评估模型的构建方法,其特征在于,所述模型训练具体:使用支持向量分类器(support vector machine classifier,SVM)构建免疫检查点抑制剂治疗有效性评估模型,通过用交叉验证(cross validation)方法选择模型中的超参数;优选的,在模型训练过程中使用随机过抽样(random oversampling)解决模型面临的严重类不均衡(class imbalance)问题;使用F 1值(F 1)或马修斯相关系数(MCC)来衡量模型预测性能;更优选的,通过训练得到模型超参数后,将原始队列随机划分两个子集,在前一个子集中,根据得到的超参数进行训练并在后一个子集上计算其预测性能。 The method for constructing a DNA methylation profile-based tumor immune checkpoint inhibitor treatment effectiveness evaluation model according to claim 5, wherein the model training is specifically: using a support vector machine classifier (SVM) ) Build an immune checkpoint inhibitor treatment effectiveness evaluation model, and select hyperparameters in the model by using a cross-validation method; preferably, use random oversampling in the model training process to solve serious problems faced by the model Class imbalance problem; use F 1 value (F 1 ) or Matthews correlation coefficient (MCC) to measure model prediction performance; more preferably, after obtaining model hyperparameters through training, the original cohort is randomly divided into two subsets, train on the obtained hyperparameters on the former subset and compute its predictive performance on the latter subset.
  7. 一种DNA甲基化特征位点筛选***或基于DNA甲基化谱的肿瘤免疫检查点抑 制剂治疗有效性评估模型的构建***,其特征在于,包括如下模块:A DNA methylation characteristic site screening system or a construction system of a DNA methylation profile-based tumor immune checkpoint inhibitor treatment effectiveness evaluation model, is characterized in that, comprises the following modules:
    1)免疫浸润分析模块:该模块在给定的包含多个肿瘤类型样本的肿瘤队列中,基于检测的DNA甲基化谱数据对每个肿瘤样本进行免疫浸润分析,计算得到每个样本中各类型免疫细胞的相对浸润含量,基于各个肿瘤类型队列样本中的免疫细胞浸润含量进行聚类分析,聚类的类别数设置为2,得到在每个癌肿上的两类免疫细胞浸润模式的样本队列。1) Immune infiltration analysis module: This module performs immune infiltration analysis on each tumor sample based on the detected DNA methylation profile data in a given tumor cohort containing samples of multiple tumor types. The relative infiltration content of immune cells of each type, based on the immune cell infiltration content in each tumor type cohort sample, perform clustering analysis, and the number of clusters is set to 2 to obtain samples of two types of immune cell infiltration patterns on each tumor. queue.
    2)肿瘤类型筛选模块:该模块根据免疫检查点抑制剂治疗有效性间接评估指标,选出与之有显著关联的肿瘤类型;2) Tumor type screening module: This module selects tumor types that are significantly associated with it according to the indirect evaluation indicators of the efficacy of immune checkpoint inhibitor therapy;
    3)特征甲基化位点构建模块:该模块对上述指标所筛选出的肿瘤类型分析两类肿瘤样本在各甲基化位点上甲基化率的差异程度,构建特征甲基化位点集合。3) Feature methylation site building module: This module analyzes the difference in the methylation rates of the two types of tumor samples at each methylation site for the tumor types screened by the above indicators, and constructs a feature methylation site gather.
  8. 一种装置,其特征在于,包括:至少一个存储器,用于存储程序;至少一个处理器,用于加载所述程序以执行如权利要求1-6任一项所述的方法。An apparatus, characterized by comprising: at least one memory for storing a program; at least one processor for loading the program to execute the method according to any one of claims 1-6.
  9. 一种存储介质,其中存储有处理器可执行的指令,其特征在于,所述处理器可执行的指令在由处理器执行时用于实现如权利要求1-6任一项所述的方法。A storage medium storing processor-executable instructions, wherein the processor-executable instructions are used to implement the method according to any one of claims 1-6 when executed by the processor.
  10. 权利要求8所述的检测装置或权利要求9所述的存储介质在肿瘤免疫检查点抑制剂治疗有效性评估模型构建中的应用。Application of the detection device according to claim 8 or the storage medium according to claim 9 in the construction of a model for evaluating the efficacy of tumor immune checkpoint inhibitor therapy.
PCT/CN2021/076879 2021-01-04 2021-02-19 Construction method for tumor immune checkpoint inhibitor therapy effectiveness evaluation model based on dna methylation spectrum WO2022141775A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110005009.1A CN112735513B (en) 2021-01-04 2021-01-04 Construction method of tumor immune checkpoint inhibitor treatment effectiveness evaluation model based on DNA methylation spectrum
CN202110005009.1 2021-01-04

Publications (1)

Publication Number Publication Date
WO2022141775A1 true WO2022141775A1 (en) 2022-07-07

Family

ID=75590038

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076879 WO2022141775A1 (en) 2021-01-04 2021-02-19 Construction method for tumor immune checkpoint inhibitor therapy effectiveness evaluation model based on dna methylation spectrum

Country Status (2)

Country Link
CN (1) CN112735513B (en)
WO (1) WO2022141775A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115831216A (en) * 2022-08-26 2023-03-21 中山大学孙逸仙纪念医院 Tumor immune therapy efficacy prediction model based on tumor immune microenvironment and construction method thereof
CN116153424A (en) * 2023-04-18 2023-05-23 北京概普生物科技有限公司 Monogenic pan-cancer prognosis analysis system and analysis method
CN116206682A (en) * 2023-03-08 2023-06-02 南方医科大学南方医院 Tumor typing method for remarkably changing co-expression gene module based on anti-vascular treatment
CN117198406A (en) * 2023-09-21 2023-12-08 亦康(北京)医药科技有限公司 Feature screening method, system, electronic equipment and medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113462776B (en) * 2021-06-25 2023-03-28 复旦大学附属肿瘤医院 m 6 Application of A modification-related combined genome in prediction of immunotherapy efficacy of renal clear cell carcinoma patient
CN113436741B (en) * 2021-07-16 2023-02-28 四川大学华西医院 Lung cancer recurrence prediction method based on tissue specific enhancer region DNA methylation
CN114373550B (en) * 2022-03-21 2022-06-21 普瑞基准科技(北京)有限公司 Medicine IC50 deep learning model prediction method based on molecular structure and gene expression

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108064170A (en) * 2015-05-29 2018-05-22 豪夫迈·罗氏有限公司 PD-L1 promoter methylations in cancer
CN108135985A (en) * 2015-09-10 2018-06-08 癌症研究技术有限公司 " intervention of immunologic test point " in cancer
WO2019012105A1 (en) * 2017-07-14 2019-01-17 Université Libre de Bruxelles Method for predicting responsiveness to immunotherapy
CN109355381A (en) * 2018-09-14 2019-02-19 深圳市太空科技南方研究院 For predicting the biomarker and method of PD1/L1 inhibitor curative effect
CN109715829A (en) * 2016-05-16 2019-05-03 迪莫·迪特里希 A method of the reaction of assessment prognosis and prediction malignant disease patient to immunization therapy
CN111630184A (en) * 2017-11-05 2020-09-04 迪莫·迪特里希 Method for determining the response of a malignant disease to immunotherapy

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016100975A1 (en) * 2014-12-19 2016-06-23 Massachsetts Institute Ot Technology Molecular biomarkers for cancer immunotherapy
CN107190076B (en) * 2017-06-28 2019-12-27 中国科学院苏州生物医学工程技术研究所 Human tumor-related methylation site and screening method and application thereof
WO2019089740A1 (en) * 2017-11-03 2019-05-09 Dana-Farber Cancer Institute, Inc. Biomarkers of clinical response and benefit to immune checkpoint inhibitor therapy
CN109859796B (en) * 2019-01-04 2023-04-25 浙江大学 Dimension reduction analysis method for DNA methylation spectrum of gastric cancer
CN111793134A (en) * 2019-04-08 2020-10-20 华中科技大学 Medicine, tumor vaccine and inhibitor for cancer treatment
US20220326216A1 (en) * 2019-05-02 2022-10-13 St. Jude Children's Research Hospital, Inc. T cell gene expression analysis for use in t cell therapies
CN111564177B (en) * 2020-05-22 2023-03-31 四川大学华西医院 Construction method of early non-small cell lung cancer recurrence model based on DNA methylation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108064170A (en) * 2015-05-29 2018-05-22 豪夫迈·罗氏有限公司 PD-L1 promoter methylations in cancer
CN108135985A (en) * 2015-09-10 2018-06-08 癌症研究技术有限公司 " intervention of immunologic test point " in cancer
CN109715829A (en) * 2016-05-16 2019-05-03 迪莫·迪特里希 A method of the reaction of assessment prognosis and prediction malignant disease patient to immunization therapy
WO2019012105A1 (en) * 2017-07-14 2019-01-17 Université Libre de Bruxelles Method for predicting responsiveness to immunotherapy
CN111630184A (en) * 2017-11-05 2020-09-04 迪莫·迪特里希 Method for determining the response of a malignant disease to immunotherapy
CN109355381A (en) * 2018-09-14 2019-02-19 深圳市太空科技南方研究院 For predicting the biomarker and method of PD1/L1 inhibitor curative effect

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANKUR CHAKRAVARTHY, ET. AL.: "Pan-cancer deconvolution of tumour composition using DNA methylation", NATURE COMMUNICATIONS, VOL. 9, 1 December 2018 (2018-12-01), pages 1 - 13, XP055703264, Retrieved from the Internet <URL:http://www.nature.com/articles/s41467-018-05570-1.pdf> [retrieved on 20200610], DOI: 10.1038/s41467-018-05570-1 *
CRISTESCU RAZVAN, MOGG ROBIN, AYERS MARK, ALBRIGHT ANDREW, MURPHY ERIN, YEARLEY JENNIFER, SHER XINWEI, LIU XIAO QIAO, LU HONGCHAO,: "Pan-tumor genomic biomarkers for PD-1 checkpoint blockade–based immunotherapy", SCIENCE, AMERICAN ASSOCIATION FOR THE ADVANCEMENT OF SCIENCE, US, vol. 362, no. 6411, 12 October 2018 (2018-10-12), US , pages 1 - 10, XP055948101, ISSN: 0036-8075, DOI: 10.1126/science.aar3593 *
LIANG CONG, YU XIAOQING, LI BO, CHEN Y. ANN, CONEJO-GARCIA JOSE R., WANG XUEFENG: "DNA methylation-based immune cell deconvolution in solid tumors", BIORXIV, 26 April 2019 (2019-04-26), pages 1 - 21, XP055948093, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/619965v1.full.pdf> [retrieved on 20220802], DOI: 10.1101/619965 *
ZHANG BAIHONG, YUE HONGYUN: "Predictors of response to tumor immunotherapy", AIZHENG JINZHAN = ONCOLOGY PROGRESS, ZHONGGUO XIEHE YIKE DAXUE CHUBANSHE, CN, vol. 17, no. 1, 1 January 2019 (2019-01-01), CN , pages 1 - 4,55, XP055948095, ISSN: 1672-1535, DOI: 10.11877/j.issn.1672-1535.2019.17.01.01 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115831216A (en) * 2022-08-26 2023-03-21 中山大学孙逸仙纪念医院 Tumor immune therapy efficacy prediction model based on tumor immune microenvironment and construction method thereof
CN115831216B (en) * 2022-08-26 2023-08-25 中山大学孙逸仙纪念医院 Tumor immune treatment efficacy prediction model based on tumor immune microenvironment and construction method thereof
CN116206682A (en) * 2023-03-08 2023-06-02 南方医科大学南方医院 Tumor typing method for remarkably changing co-expression gene module based on anti-vascular treatment
CN116206682B (en) * 2023-03-08 2023-10-24 南方医科大学南方医院 Tumor typing method for remarkably changing co-expression gene module based on anti-vascular treatment
CN116153424A (en) * 2023-04-18 2023-05-23 北京概普生物科技有限公司 Monogenic pan-cancer prognosis analysis system and analysis method
CN117198406A (en) * 2023-09-21 2023-12-08 亦康(北京)医药科技有限公司 Feature screening method, system, electronic equipment and medium
CN117198406B (en) * 2023-09-21 2024-06-11 亦康(北京)医药科技有限公司 Feature screening method, system, electronic equipment and medium

Also Published As

Publication number Publication date
CN112735513B (en) 2021-11-19
CN112735513A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
WO2022141775A1 (en) Construction method for tumor immune checkpoint inhibitor therapy effectiveness evaluation model based on dna methylation spectrum
CN110770838B (en) Methods and systems for determining somatically mutated clonality
CN113366122B (en) Free DNA end characterization
CN110468207B (en) Glioma EM/PM molecular typing method based on Taqman low-density chip and application thereof
WO2022170909A1 (en) Drug sensitivity prediction method, electronic device and computer-readable storage medium
Kennedy et al. An integrated-omics analysis of the epigenetic landscape of gene expression in human blood cells
EP3973080B1 (en) Systems and methods for determining whether a subject has a cancer condition using transfer learning
CN110273003B (en) Marker tool for prognosis recurrence detection of papillary renal cell carcinoma patient and establishment of risk assessment model thereof
CN102203787A (en) Genomic classification of colorectal cancer based on patterns of gene copy number alterations
US20210238668A1 (en) Biterminal dna fragment types in cell-free samples and uses thereof
Roth et al. Differentially regulated miRNAs as prognostic biomarkers in the blood of primary CNS lymphoma patients
CN111833965A (en) Urinary sediment genomic DNA classification method, device and application
CN112941180A (en) Group of lung cancer DNA methylation molecular markers and application thereof in preparation of lung cancer early diagnosis kit
CN111304308A (en) Method for auditing detection result of high-throughput sequencing gene variation
AU2021248502A1 (en) Cancer classification with synthetic spiked-in training samples
WO2019046804A1 (en) Identifying false positive variants using a significance model
Usvasalo et al. Prognostic classification of patients with acute lymphoblastic leukemia by using gene copy number profiles identified from array-based comparative genomic hybridization data
CN116230081A (en) Biomarker for prognosis prediction of lung adenocarcinoma, application and model construction method
CN117275585A (en) Method for constructing lung cancer early-screening model based on LP-WGS and DNA methylation and electronic equipment
CN107075586A (en) Glycosyltransferase gene express spectra for identifying kinds cancer type and hypotype
CN116153387A (en) Overall survival rate prognosis model for lung squamous carcinoma patient and application
Kittler et al. Grade progression in urothelial carcinoma can occur with high or low mutational homology: a first-step toward tumor-specific care in initial low-grade bladder cancer
WO2024027591A1 (en) Multi-cancer methylation detection kit and use thereof
CN116987789B (en) UTUC molecular typing, single sample classifier and construction method thereof
US20210230705A1 (en) Method to predict pathological grade and to identify drug targets against glioma tumor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21912545

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21912545

Country of ref document: EP

Kind code of ref document: A1