CN112820403A - Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data - Google Patents

Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data Download PDF

Info

Publication number
CN112820403A
CN112820403A CN202110210941.8A CN202110210941A CN112820403A CN 112820403 A CN112820403 A CN 112820403A CN 202110210941 A CN202110210941 A CN 202110210941A CN 112820403 A CN112820403 A CN 112820403A
Authority
CN
China
Prior art keywords
risk
data
function
network
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110210941.8A
Other languages
Chinese (zh)
Other versions
CN112820403B (en
Inventor
杨跃东
柴华
张仲岳
周翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202110210941.8A priority Critical patent/CN112820403B/en
Publication of CN112820403A publication Critical patent/CN112820403A/en
Application granted granted Critical
Publication of CN112820403B publication Critical patent/CN112820403B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a deep learning method for predicting prognosis risk of a cancer patient based on multiomic data, which is used for predicting prognosis risk of the cancer patient and comprises the following steps: s1: acquiring clinical data Y of a target cancer patient and corresponding multigroup chemical expression data X thereof from an existing public data set; s2: constructing a deep neural network; s3: multiple histology data X of cancer with existing public data setpAnd patient clinical information YpUpdating the weight theta through the constructed deep neural network to obtain a pre-training network N based on the public data setp(ii) a S4: to network NpTraining again until the training times epoch reach the operation upper limit, thereby obtaining a risk prediction network Nf(ii) a S5: XGboost algorithm is utilized to select the first n gene characteristics of the Import coefficient of the target cancer patientImproving risk prediction network NfAnd obtaining a final risk prediction model. The invention improves the robustness of the prediction model and more accurately predicts the prognosis risk of the cancer patient by utilizing the multiomic data.

Description

Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data
Technical Field
The invention relates to the technical field of survival analysis of cancer patients, in particular to a deep learning method for predicting the prognosis risk of the cancer patients based on multigroup data.
Background
The high incidence of cancer has prompted the development of medical assistance techniques in recent years, and prognostic risk analysis is a key medical assistance technique that can assist in the selection of different treatment regimens based on the potential risk of prognosis for different patients.
Most methods for predicting cancer prognosis are realized by analyzing expression data of a single omic, such as gene mRNA expression data, methylation data, miRNA data and the like, however, the prognosis of a patient is jointly regulated by multiple molecules at different levels, and strong complementary effects and interactions exist among the molecules at different levels, so that the result of the analysis of the single omic data can only provide one-sided information. In addition, data analysis of different omics and different modes is fused, and the problem that a single-group method is too sensitive to noise can be solved through error cancellation. Therefore, the fusion of various data for cancer analysis has become a powerful tool in recent years.
The biggest difficulty in fusing multiple sets of omic data is how to optimize the dimensionality reduction effect of high-dimensional omic data by using the cancer data of a small sample. In 2018, Li Xin et al (Li Xin, Weigong, Lu Zhang Yan, etc.) build a lung adenocarcinoma prognosis related risk prediction model [ J ] based on multiomic data, Nanjing university of medicine science (Nature science edition), 2018,38 (12); 1820 one 1825) use a traditional Cox method regularized by L1 to build a lung adenocarcinoma prognosis related risk prediction model based on multigroup chemical data, and build the prognosis related risk prediction model by integrating multigroup chemical information of a lung adenocarcinoma clinical information group, a genome and a transcription group, but the method is not robust enough, cannot solve the defect of poor performance in high-dimensional small sample cancer data, and has low prediction accuracy. Then, researchers apply deep learning to this field, and extract high-dimensional multi-group chemical characteristics (including mRNA, miRNA, and methylation data) of liver cancer by using a self-encoder, and then use the compressed characteristics to identify different clinical subtypes of patients. On the basis, researchers fuse relevant data of copy number variation and are used for distinguishing two prognosis subtypes of high-risk neuroblastoma. Besides this method, some variants based on other self-encoder methods are derived. However, the biggest problem with this framework is that it splits feature reduction and patient risk prediction into two models to do, and the method is not robust enough. In 2019, researchers combine a loss function of a proportional risk model with a deep neural network, and survival risk of a patient is directly predicted by utilizing multiomic data. The method has the problems that the deep neural network directly optimizes the loss function of risk prediction, and the reconstruction characteristics after multi-layer compression in the network still keep the spatial distribution characteristics of the initial characteristics, so that the performance of the method is limited.
Disclosure of Invention
The invention provides a deep learning method for predicting the prognosis risk of a cancer patient based on multiomic data, aiming at overcoming the defects that the accuracy of prognosis risk prediction is not high and the target data set is small in the prior art.
The primary objective of the present invention is to solve the above technical problems, and the technical solution of the present invention is as follows:
a deep learning method for predicting the risk of prognosis for a cancer patient based on multiple sets of mathematical data, comprising the steps of:
s1: acquiring clinical data Y of a target cancer patient and corresponding multigroup chemical expression data X thereof from an existing public data set;
s2: constructing a deep neural network;
s3: multiple histology data X of cancer with existing public data setpAnd patient clinical information YpUpdating the weight theta through the constructed deep neural network to obtain a pre-training network N based on the public data setp
S4: comparing the clinical data Y of the target cancer patient and the multigroup expression data X thereof to the network NpTraining again until the training times epoch reach the operation upper limit, thereby obtaining a risk prediction network Nf
S5: selecting target cancer patients by using XGBoost algorithmImproving risk prediction network N by the first N gene characteristics of the Importance coefficientfAnd obtaining a final risk prediction model.
Further, the specific process of constructing the deep neural network in step S2 is as follows:
s201: coding a plurality of groups of chemical expression data X to generate compression characteristics z ═ E (X), decoding the compression characteristics to generate new characteristics X', and calculating the data recovery loss Lr after decoding;
s202: defining a risk of survival function representing the survival rate of the cancer patient before a time-set time t;
s203: constructing a proportional risk function by using the survival risk function;
s204: constructing a maximum likelihood function by using the proportional risk function, and obtaining a preliminary prognosis risk prediction loss function through the maximum likelihood function;
s205: and adding the data recovery loss Lr into a preliminary prognosis risk prediction loss function to construct a final loss function.
Further, the loss function expression is:
Figure BDA0002952268420000021
further, the survival risk function is expressed as: s (T) ═ Pr (T > T)
Wherein T is the time to survival collected to the patient;
survival risk function at time t:
Figure BDA0002952268420000031
further, the proportional risk function is:
λ(t|x)=λ0(t)*exph(x)wherein h (X) ═ β Xi,λ0(t) represents the basic risk function at time t.
Further, the maximum likelihood function may be expressed as:
Figure BDA0002952268420000032
further, the preliminary prognostic risk prediction loss function can then be expressed as:
Figure BDA0002952268420000033
further, the final loss function is expressed as: lTRDN=(1-γ)lr+γlpWherein gamma is more than 0 and less than 1.
Further, the final risk prediction model in step S5 represents:
Figure BDA0002952268420000034
wherein, XmTo construct mRNA characteristics of the model, YmPredicting a network N for riskfThe risk of the patient is predicted and,
Figure BDA0002952268420000035
Figure BDA0002952268420000036
representing the space of a regression tree, q the structure of the tree, T the number of leaf nodes in the tree, fkRepresenting the structure q of the regression tree with weight w.
Further, in step S5, the value of n is 200 according to the first n gene features of the Importance coefficient.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
according to the invention, more prior knowledge is obtained through the public data set during deep neural network learning, the robustness of the prediction model is improved, a data recovery loss function and a risk prediction loss function are introduced, and the prognosis risk of a cancer patient is predicted more accurately by utilizing the multiomic data.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a representation of the prognostic risk prediction methods for different patients in the example of the present invention in simulated data.
FIG. 3 is a schematic representation of the risk identification of targeted genes and pathways affecting bladder cancer prognosis as predicted by the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Example 1
As shown in fig. 1, a deep learning method for predicting the risk of prognosis of a cancer patient based on multiple sets of mathematical data, for predicting the risk of prognosis of a cancer patient, comprises the steps of:
s1: obtaining clinical data Y and its corresponding multigenomic expression data X of a target cancer patient from an existing public data set (e.g., TCGA, GEO));
in a specific example, 14 TCGA datasets (BRCA, CESC, COAD, ESCA, HNSC, KIRC, LGG, LIHC, LUAD, lucc, MESO, PAAD, SRAC, and SKCM) were used for pre-training, while bladder cancer (BLCA) data served as the target cancer.
Wherein, the multigroup data comprises mRNA expression, miRNA expression, DNA methylation information and copy number variation information of the bladder cancer patient. mRNA data are RNA sequencing data generated by UNC Illumina HiSeq _ RNASeq V2. miRNA is miRNA sequencing data obtained from BCGSC Illumina HiSeq miRNASeq. DNA methylation data was generated by USC human methylation 450; CNV data were generated from BROAD-MIT whole genome SNP _ 6. All of these data are from TCGA lv3 grade data. We calculated the mean of DNA methylation at CpG sites for each gene as methylation expression. CNV features were extracted by averaging the copy number of all CNV variations on one gene.
S2: constructing a deep neural network;
the steps of constructing the deep neural network in the invention comprise:
s201: encoding a plurality of groups of mathematical expression data X to generate a compression characteristic z ═ E (X),
decoding the compressed features to generate new features X', and calculating the decoded data recovery loss Lr, wherein the loss function expression is as follows:
Figure BDA0002952268420000041
s202: defining a risk of survival function representing the survival rate of the cancer patient before a time-set time t;
the survival risk function is expressed as: s (T) ═ Pr (T > T)
Wherein T is the time to survival collected to the patient;
survival risk function at time t:
Figure BDA0002952268420000051
s203: constructing a proportional risk function by using a survival risk function, wherein the proportional risk function is as follows:
λ(t|x)=λ0(t)*exph(x)wherein h (X) ═ β Xi,λ0(t) represents a basic risk function at time t;
s204: constructing a maximum likelihood function by using the proportional risk function, obtaining a preliminary prognosis risk prediction loss function through the maximum likelihood function,
the maximum likelihood function may be expressed as:
Figure BDA0002952268420000052
the preliminary prognostic risk prediction loss function may then be expressed as:
Figure BDA0002952268420000053
s205: and adding the data recovery loss Lr into a preliminary prognosis risk prediction loss function to construct a final loss function, wherein the final loss function is expressed as: lTRDN=(1-γ)lr+γlpWherein gamma is more than 0 and less than 1;
in the present invention, a neural network is trained using a TCGA cancer public dataset to obtain published cancer multiomic data Xp and patient clinical information Yp, wherein the patient clinical information Yp includes the patient's time-to-live t and its status st, where st 1 indicates that the patient has died at this time point, and st 0 indicates that the patient has not died at this time point.
The data of the common data set for TCGA cancer was preprocessed before training, i.e. more than 20% of the genes and samples with deletion values were deleted, and then the remaining deletion values were filled in according to the median method.
S3: cancer multinomial data X of pre-processed existing public data setpAnd patient clinical information YpUpdating the weight theta through the constructed deep neural network to obtain a pre-training network N based on the public data setp
The specific process is as follows:
the compression feature generated by encoding in the deep neural network is z, z ═ e (X), and the new feature generated by decoding the data X' can be expressed as: and calculating a decoded data recovery loss Lr, wherein the loss function expression is as follows:
Figure BDA0002952268420000054
calculating a loss of predicted risk in a deep neural network:
Figure BDA0002952268420000055
Figure BDA0002952268420000056
constructing a final loss function: lTRDN=(1-γ)lr+γlpWherein gamma is more than 0 and less than 1.
And updating the weight theta of the deep neural network through a random gradient descent algorithm optimization model to obtain a pre-training network Np based on a public data set.
S4: comparing the clinical data Y of the target cancer patient and the multigroup expression data X thereof to the network NpTraining again until the training times epoch reach the operation upper limit, thereby obtaining a risk prediction network Nf
As shown in FIG. 2, in the simulation experiment, we tested the enhancement effect of different improvement mechanisms on the prognosis performance of tumors, namely Cox neural network without migration learning (Deep _ surv), Deep Cox network combining two loss functions (Deep _ Cox), transfer-Cox neural network using pre-training dataset (trans _ Cox) and the method TRCN proposed by us. The C-index values obtained for different amounts of training data are shown in FIG. 2. As can be seen from FIG. 2, the value of C-index in each data set is lowest for Deep _ surv, while Deep _ Cox using the synthetic loss function performs better than Cox but worse than the other methods. Deep _ Cox improved the C-index by an average of 3.7% compared to Deep _ surv, but was not as pronounced as trans _ Cox _ all (13.8%) and TRCN (17.9%). Compared with trans _ Cox, the C-index indexes of three types of simulation data obtained by TRCN are respectively improved by 3.3%, 4.2% and 2.9%. These results indicate that integration loss is an effective way to improve predictive performance, and that pre-trained models can bring more useful information to the learning task.
TABLE 1C-index values for predicting the risk of bladder cancer prognosis by different methods
Figure BDA0002952268420000061
In table 1, this example compares the accuracy of predicting the risk of bladder cancer (true data) prognosis by the existing different methods, including four conventional methods and four deep learning-based methods. Of these conventional methods, the C-index of the simple Cox method is the lowest (0.525) performing the worst, while the C-index of Cox with elastic network regularization (Cox-elastic net) is the highest value of 0.561. These C-index values obtained by the conventional method are much smaller than those obtained by the deep learning-based method. In the Deep learning based method, the performance of the Cox model using the function of the auto-encoder (AE-Cox) reconstruction is superior to the Cox model with the Deep neural network (Deep _ surv). The C-index value of the TRCN without the migration learning mechanism is higher than Deep _ surv and AE-Cox, which proves that the mechanism of combining the loss functions provided by the invention can bring about improvement on accuracy. The highest C-index value obtained by TRCN in the methods shows that the migration learning is helpful to improve the performance of model learning.
This example also performed an ablation study that predicted patient risk based on multiple sets of omics data to investigate the contribution of different omics data to the accuracy of the prediction as shown in table 2.
Table 2 contribution of different omics data in predicting the prognosis risk of bladder cancer
Figure BDA0002952268420000071
The results show that when using single type of omics data, the C-index of mRNA performs best, 0.624, and that of miRNA, 0.552, is the lowest. CNV and DNA methylation are ranked second and third, respectively. While when we attempted to eliminate one type from the TRCN's four omics data, elimination of mRNA resulted in a decrease in C-index from 0.643 to 0.599, with the greatest decrease. The decrease in C-index was minimal to 0.09 after the exclusion of miRNA. These results indicate that mRNA data play the most important role in the prognosis prediction of bladder cancer, while miRNA contribution is minimal.
S5: utilizing XGboost algorithm to select the first 200 bases of the Import coefficient of the target cancer patientImproving risk prediction network N by characteristicsfAnd obtaining a final risk prediction model.
The final risk prediction model described in step S5 represents:
Figure BDA0002952268420000072
wherein, XmTo construct mRNA characteristics of the model, YmPredicting a network N for riskfThe risk of the patient is predicted and,
Figure BDA0002952268420000073
Figure BDA0002952268420000074
representing the space of a regression tree, q the structure of the tree, T the number of leaf nodes in the tree, fkRepresenting the structure q of the regression tree with weight w.
The final risk prediction model established in this embodiment is verified and analyzed as follows:
in this embodiment, four bladder cancer data sets in GEO are downloaded as independent tests to verify the robustness of the model constructed based on the XGboost method: GSE13507 contains RNA-seq data and survival information collected at the university hospital, north kingdom, for 165 patients with primary bladder cancer. Dana-Farber cancer institute data were shared among 93 patients with bladder cancer in GSE 31684. GSE32894 contains information on 224 bladder cancer patients from the SCIBLU genomics center, university of london, sweden. GSE42876 contains information collected at the university of coenzyme about 43 patients with bladder cancer.
Table 3 shows the results of independent verification, and it can be seen that the C-index values of the four groups of data are all greater than 0.6, which verifies the accuracy of the model in predicting the patient risk, and the p values among different risk groups are all less than 0.05, which indicates that there is a significant difference among different risk groups. These results demonstrate that the prediction model constructed with the XGboost algorithm works well on these four datasets.
Table 3 independent test results of XGboost risk prediction model on4 GEO datasets
Figure BDA0002952268420000081
Patients can be classified into high-risk groups and low-risk groups based on median predictive pre-patient risk. And then carrying out differential expression analysis according to different risk groups to find differential expression genes influencing prognosis. A total of 244 genes were identified based on the results, with 90 genes downregulated and 154 genes upregulated (fig. 3A). The first 20 difference genes with the highest correlation coefficient are additionally labeled in FIG. 3A. A heat map based on the expression of these differential genes is shown in figure 3B. In review of literature, 104 genes have been shown to be associated with bladder cancer. In addition to known cancer genes, our results also reveal 140 potential genes that have not been fully studied to influence the prognosis of bladder cancer.
Using these 244 genes, we performed KEGG pathway analysis to find an enrichment pathway for differentially expressed genes. A total of 38 KEGG pathways (2 downregulation pathways and 36 upregulation pathways) were found to correlate with the prognosis of bladder cancer. Considering that there are many more pathways up-regulated than down-regulated, we only show pathways with a gene number >4 in fig. 3 (c). The metabolic pathway is one of the common pathways in cancer, and therefore it contains the most diverse genes (n-12). Among these pathways, the PI3K-Akt signaling pathway has the lowest p-value. The PI3K-Akt signal transduction pathway is an important intracellular signal transduction pathway in regulating the cell cycle, and the growth of human bladder cancer cells can be inhibited by regulating the PI3K-Akt signal transduction pathway. In addition, we have discovered MAPK signaling pathways, Ras signaling pathways, PPAR signaling pathways, proteoglycans, and cancer pathways, among others. MAPK signaling pathways have also been shown to affect treatment in patients with bladder cancer. These results further demonstrate that TRCN predicted cancer outcome is of biological significance.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A deep learning method for predicting the risk of prognosis for a cancer patient based on multiple sets of mathematical data, comprising the steps of:
s1: acquiring clinical data Y of a target cancer patient and corresponding multigroup chemical expression data X thereof from an existing public data set;
s2: constructing a deep neural network;
s3: multiple histology data X of cancer with existing public data setpAnd patient clinical information YpUpdating the weight theta through the constructed deep neural network to obtain a pre-training network N based on the public data setp
S4: comparing the clinical data Y of the target cancer patient and the multigroup expression data X thereof to the network NpTraining again until the training times epoch reach the operation upper limit, thereby obtaining a risk prediction network Nf
S5: the XGboost algorithm is utilized to select the first N gene characteristics of the Import coefficient of the target cancer patient, and the risk prediction network N is improvedfAnd obtaining a final risk prediction model.
2. The deep learning method for predicting the prognosis risk of cancer patients based on multi-group chemical data as claimed in claim 1, wherein the step S2 is to construct a deep neural network by:
s201: coding a plurality of groups of chemical expression data X to generate compression characteristics z ═ E (X), decoding the compression characteristics to generate new characteristics X', and calculating the data recovery loss Lr after decoding;
s202: defining a risk of survival function representing the survival rate of the cancer patient before a time-set time t;
s203: constructing a proportional risk function by using the survival risk function;
s204: constructing a maximum likelihood function by using the proportional risk function, and obtaining a preliminary prognosis risk prediction loss function through the maximum likelihood function;
s205: and adding the data recovery loss Lr into a preliminary prognosis risk prediction loss function to construct a final loss function.
3. The deep learning method of claim 2, wherein the loss function is expressed as:
Figure FDA0002952268410000011
4. the deep learning method of claim 3, wherein the survival risk function is expressed as: s (T) ═ Pr (T > T)
Wherein T is the time to survival collected to the patient;
survival risk function at time t:
Figure FDA0002952268410000021
5. the deep learning method of claim 4, wherein the proportional risk function is:
λ(t|x)=λ0(t)*exph(x)wherein h (X) ═ β Xi,λ0(t) represents the basic risk function at time t.
6. The deep learning method of claim 5, wherein the maximum likelihood function is expressed as:
Figure FDA0002952268410000022
7. the method of claim 6, wherein the preliminary prognostic risk prediction loss function is expressed as:
Figure FDA0002952268410000023
8. the deep learning method of claim 7, wherein the final loss function is expressed as: lTRDN=(1-γ)lr+γlpWherein gamma is more than 0 and less than 1.
9. The deep learning method for predicting the risk of prognosis of cancer patients based on multi-group chemical data as claimed in claim 8, wherein the final risk prediction model of step S5 represents:
Figure FDA0002952268410000024
wherein, XmTo construct mRNA characteristics of the model, YmPredicting a network N for riskfThe risk of the patient is predicted and,
Figure FDA0002952268410000025
Figure FDA0002952268410000026
representing the space of a regression tree, q the structure of the tree, T the leaves in the treeNumber of nodes, fkRepresenting the structure q of the regression tree with weight w.
10. The deep learning method for predicting the risk of prognosis of cancer patients based on multi-group chemical data as claimed in claim 1, wherein the value of n is 200 for the first n genetic features of the inportance coefficient in step S5.
CN202110210941.8A 2021-02-25 2021-02-25 Deep learning method for predicting prognosis risk of cancer patient based on multiple sets of learning data Active CN112820403B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110210941.8A CN112820403B (en) 2021-02-25 2021-02-25 Deep learning method for predicting prognosis risk of cancer patient based on multiple sets of learning data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110210941.8A CN112820403B (en) 2021-02-25 2021-02-25 Deep learning method for predicting prognosis risk of cancer patient based on multiple sets of learning data

Publications (2)

Publication Number Publication Date
CN112820403A true CN112820403A (en) 2021-05-18
CN112820403B CN112820403B (en) 2024-03-29

Family

ID=75865575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110210941.8A Active CN112820403B (en) 2021-02-25 2021-02-25 Deep learning method for predicting prognosis risk of cancer patient based on multiple sets of learning data

Country Status (1)

Country Link
CN (1) CN112820403B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409946A (en) * 2021-07-02 2021-09-17 中山大学 System and method for predicting cancer prognosis risk under high-dimensional deletion data
CN113838570A (en) * 2021-08-31 2021-12-24 华中科技大学 Cervical cancer self-consistent typing method and system based on deep learning
CN114783524A (en) * 2022-06-17 2022-07-22 之江实验室 Path abnormity detection system based on self-adaptive resampling depth encoder network
CN114927162A (en) * 2022-05-19 2022-08-19 大连理工大学 Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN116417070A (en) * 2023-04-17 2023-07-11 齐鲁工业大学(山东省科学院) Method for improving prognosis prediction precision of gastric cancer typing based on gradient lifting depth feature selection algorithm
CN116580841A (en) * 2023-07-12 2023-08-11 北京大学 Disease diagnosis device, device and storage medium based on multiple groups of study data
CN116862861A (en) * 2023-07-04 2023-10-10 浙江大学 Prediction model training and prediction method and system for gastric cancer treatment efficacy based on multiple groups of students
CN117594243A (en) * 2023-10-13 2024-02-23 太原理工大学 Ovarian cancer prognosis prediction method based on cross-modal view association discovery network
WO2024065987A1 (en) * 2022-09-27 2024-04-04 山东第一医科大学(山东省医学科学院) Lung cancer prognosis prediction system based on multi-omics of radiomics, pathomics and genomics

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108922628A (en) * 2018-04-23 2018-11-30 华北电力大学 A kind of Prognosis in Breast Cancer survival rate prediction technique based on dynamic Cox model
KR20190021471A (en) * 2017-02-02 2019-03-05 사회복지법인 삼성생명공익재단 Method, Apparatus and Program for Predicting Prognosis of Gastric Cancer Using Artificial Neural Network
CN109859801A (en) * 2019-02-14 2019-06-07 辽宁省肿瘤医院 A kind of model and method for building up containing seven genes as biomarker prediction lung squamous cancer prognosis
CN110853756A (en) * 2019-11-08 2020-02-28 郑州轻工业学院 Esophagus cancer risk prediction method based on SOM neural network and SVM
CN110942808A (en) * 2019-12-10 2020-03-31 山东大学 Prognosis prediction method and prediction system based on gene big data
CN111161882A (en) * 2019-12-04 2020-05-15 深圳先进技术研究院 Breast cancer life prediction method based on deep neural network
CN111161799A (en) * 2019-12-24 2020-05-15 大连海事大学 Method and system for acquiring multigene risk scores based on multigroup mathematical data
KR102119687B1 (en) * 2020-03-02 2020-06-05 엔에이치네트웍스 주식회사 Learning Apparatus and Method of Image
CN112037919A (en) * 2020-09-15 2020-12-04 南京鼓楼医院 Risk assessment model for papillary carcinoma of thyroid nodule patient
CN112086199A (en) * 2020-09-14 2020-12-15 中科院计算所西部高等技术研究院 Liver cancer data processing system based on multiple groups of mathematical data
CN112201346A (en) * 2020-10-12 2021-01-08 哈尔滨工业大学(深圳) Cancer survival prediction method, apparatus, computing device and computer-readable storage medium
CN112309576A (en) * 2020-09-22 2021-02-02 江南大学 Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics
CN112397143A (en) * 2020-10-30 2021-02-23 深圳思勤医疗科技有限公司 Method for predicting tumor risk value based on plasma multi-omic multi-dimensional features and artificial intelligence

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190021471A (en) * 2017-02-02 2019-03-05 사회복지법인 삼성생명공익재단 Method, Apparatus and Program for Predicting Prognosis of Gastric Cancer Using Artificial Neural Network
CN108922628A (en) * 2018-04-23 2018-11-30 华北电力大学 A kind of Prognosis in Breast Cancer survival rate prediction technique based on dynamic Cox model
CN109859801A (en) * 2019-02-14 2019-06-07 辽宁省肿瘤医院 A kind of model and method for building up containing seven genes as biomarker prediction lung squamous cancer prognosis
CN110853756A (en) * 2019-11-08 2020-02-28 郑州轻工业学院 Esophagus cancer risk prediction method based on SOM neural network and SVM
CN111161882A (en) * 2019-12-04 2020-05-15 深圳先进技术研究院 Breast cancer life prediction method based on deep neural network
CN110942808A (en) * 2019-12-10 2020-03-31 山东大学 Prognosis prediction method and prediction system based on gene big data
CN111161799A (en) * 2019-12-24 2020-05-15 大连海事大学 Method and system for acquiring multigene risk scores based on multigroup mathematical data
KR102119687B1 (en) * 2020-03-02 2020-06-05 엔에이치네트웍스 주식회사 Learning Apparatus and Method of Image
CN112086199A (en) * 2020-09-14 2020-12-15 中科院计算所西部高等技术研究院 Liver cancer data processing system based on multiple groups of mathematical data
CN112037919A (en) * 2020-09-15 2020-12-04 南京鼓楼医院 Risk assessment model for papillary carcinoma of thyroid nodule patient
CN112309576A (en) * 2020-09-22 2021-02-02 江南大学 Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics
CN112201346A (en) * 2020-10-12 2021-01-08 哈尔滨工业大学(深圳) Cancer survival prediction method, apparatus, computing device and computer-readable storage medium
CN112397143A (en) * 2020-10-30 2021-02-23 深圳思勤医疗科技有限公司 Method for predicting tumor risk value based on plasma multi-omic multi-dimensional features and artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHI HUANG 等: "Deep learning-based cancer survival prognosis from RNA-seq data:approaches and evaluations", BMC MEDICAL GENOMICS, vol. 13, no. 5, pages 1 - 12 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409946A (en) * 2021-07-02 2021-09-17 中山大学 System and method for predicting cancer prognosis risk under high-dimensional deletion data
CN113838570A (en) * 2021-08-31 2021-12-24 华中科技大学 Cervical cancer self-consistent typing method and system based on deep learning
CN113838570B (en) * 2021-08-31 2024-04-26 华中科技大学 Cervical cancer self-consistent typing method and system based on deep learning
CN114927162A (en) * 2022-05-19 2022-08-19 大连理工大学 Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN114927162B (en) * 2022-05-19 2024-06-14 大连理工大学 Multi-mathematic association phenotype prediction method based on hypergraph characterization and dirichlet allocation
CN114783524A (en) * 2022-06-17 2022-07-22 之江实验室 Path abnormity detection system based on self-adaptive resampling depth encoder network
WO2024065987A1 (en) * 2022-09-27 2024-04-04 山东第一医科大学(山东省医学科学院) Lung cancer prognosis prediction system based on multi-omics of radiomics, pathomics and genomics
CN116417070A (en) * 2023-04-17 2023-07-11 齐鲁工业大学(山东省科学院) Method for improving prognosis prediction precision of gastric cancer typing based on gradient lifting depth feature selection algorithm
CN116862861A (en) * 2023-07-04 2023-10-10 浙江大学 Prediction model training and prediction method and system for gastric cancer treatment efficacy based on multiple groups of students
CN116580841B (en) * 2023-07-12 2023-11-10 北京大学 Disease diagnosis device, device and storage medium based on multiple groups of study data
CN116580841A (en) * 2023-07-12 2023-08-11 北京大学 Disease diagnosis device, device and storage medium based on multiple groups of study data
CN117594243A (en) * 2023-10-13 2024-02-23 太原理工大学 Ovarian cancer prognosis prediction method based on cross-modal view association discovery network
CN117594243B (en) * 2023-10-13 2024-05-14 太原理工大学 Ovarian cancer prognosis prediction method based on cross-modal view association discovery network

Also Published As

Publication number Publication date
CN112820403B (en) 2024-03-29

Similar Documents

Publication Publication Date Title
CN112820403B (en) Deep learning method for predicting prognosis risk of cancer patient based on multiple sets of learning data
CN108647489B (en) Method and system for screening disease drug target and target combination
Pang et al. Gene selection using iterative feature elimination random forests for survival outcomes
CN103649337B (en) The probabilistic Modeling assessment cell signaling pathway activity expressed using target gene
Fujiwara et al. ASCL1-coexpression profiling but not single gene expression profiling defines lung adenocarcinomas of neuroendocrine nature with poor prognosis
Zhao et al. Identification of differentially expressed genes in pituitary adenomas by integrating analysis of microarray data
JP2022524484A (en) How to predict the survival rate of cancer patients
Liu et al. MNNMDA: predicting human microbe-disease association via a method to minimize matrix nuclear norm
Chai et al. Integrating multi-omics data with deep learning for predicting cancer prognosis
CN113409946A (en) System and method for predicting cancer prognosis risk under high-dimensional deletion data
KR102386876B1 (en) Method for identifying condition-specific micro rna targets with big data
Zhao et al. SSCMDA: spy and super cluster strategy for MiRNA-disease association prediction
CN107075586B (en) Glycosyltransferase gene expression profiling for identifying multiple cancer types and subtypes
CN117038067A (en) Neuroendocrine type prostate cancer risk prediction method and application thereof
CN116486913A (en) System, apparatus and medium for de novo predictive regulatory mutations based on single cell sequencing
Gupta et al. A new deep learning technique reveals the exclusive functional contributions of individual cancer mutations
Jo et al. Interpretation of SNP combination effects on schizophrenia etiology based on stepwise deep learning with multi-precision data
Quackenbush From ‘omes to biology
Kuznetsov et al. Statistically weighted voting analysis of microarrays for molecular pattern selection and discovery cancer genotypes
VIEIRA Unveiling Novel Glioma Biomarkers through Multi-omics Integration and Classification
Joo Bayesian lasso: An extension for genome-wide association study
Zhang et al. Network propagation models for gene selection
CN116741269A (en) Method for predicting personalized cancer driving genes by fusion of gene characteristics and graph convolution
Gleason Methods for Integrative Multi-Omics Association Analysis Using Summary Statistics
CN115206440A (en) KRAS mutation colon cancer gene-based prognosis model and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant