CN112309576A - Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics - Google Patents
Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics Download PDFInfo
- Publication number
- CN112309576A CN112309576A CN202011005022.9A CN202011005022A CN112309576A CN 112309576 A CN112309576 A CN 112309576A CN 202011005022 A CN202011005022 A CN 202011005022A CN 112309576 A CN112309576 A CN 112309576A
- Authority
- CN
- China
- Prior art keywords
- data
- survival
- colorectal cancer
- model
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/20—Drawing from basic elements, e.g. lines or circles
- G06T11/203—Drawing of straight lines or curves
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10081—Computed x-ray tomography [CT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30028—Colon; Small intestine
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30096—Tumor; Lesion
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Radiology & Medical Imaging (AREA)
- Quality & Reliability (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a colorectal cancer survival time prediction method based on deep learning CT (computed tomography) image omics. Belongs to the technical field of medical image processing. The method comprises the following specific steps: (1) acquiring data; (2) labeling the colorectal tumor region of the CT image omics data; (3) preprocessing the acquired data; (4) constructing a feature learning model based on a deep neural network; (5) establishing a risk scoring model of the patient by utilizing Lasso regression to reduce dimension for the colorectal cancer CT imagemics depth high-flux characteristics; (6) grouping according to the risk score; (7) verifying the effectiveness of the curve and the characteristic; (8) constructing a deep neural network multi-task logistic regression (DNN-MTLR) model for predicting the life cycle probability; according to the invention, the system analysis is introduced after the CT image of the patient is obtained, and the result can provide reference for doctors (especially the radiologist with short experience) so as to better understand the patient condition and make the next decision.
Description
Technical Field
The invention belongs to the technical field of medical image processing, can be used for intelligent medical disease diagnosis, introduces a colorectal cancer survival prediction method based on deep learning CT (computed tomography) image omics, and finally obtains the five-year disease-free survival (DFS) probability of colorectal cancer patients.
Background
Colorectal cancer is a common malignant tumor in the gastrointestinal tract, and has high morbidity and mortality. According to the results of the international tumor research institution in 2018 on global investigation, the incidence rate of colorectal cancer ranks the third and is second only to lung cancer and breast cancer. Mortality rates ranked second, second only to lung cancer. In China, the incidence and mortality of areas with developed economic conditions and the coastal areas of southeast are also in a remarkably increasing trend.
The accurate prediction of the life cycle of the patient has important clinical value and social value. For physicians, accurate prediction of the patient's survival (especially for young inexperienced physicians) may help physicians better understand the patient's condition, make diagnoses, and make optimal medical decisions. For a patient, the life cycle of the patient can be accurately predicted, scientific survival expectation can be provided for the patient, and the physical condition of the patient can be better understood. Therefore, the patient is guided to scientifically follow a treatment plan, excessive medical treatment is avoided, the family economic burden is reduced, and the doctor-patient relationship is favorably improved.
With the development of imaging and artificial intelligence technologies, imaging technologies such as Computed Tomography (CT), Positron Emission Tomography (PET), and Magnetic Resonance (MR) play an increasingly important role in diagnosis, and prognosis of tumors. The function of medical imaging is gradually changed from the traditional analysis methods such as disease diagnosis and screening to individual precise diagnosis and treatment. The mainstream direction for future medical development is accurate medicine, which needs to take into account individual variability prevention and corresponding diagnostic and therapeutic strategies. The combination of artificial intelligence and medicine is also a necessary way for future development of future medicine, and the realization of artificial intelligence is one of the technologies without machine learning, and deep learning is one of the technologies. Deep learning techniques can be combined with CT imaging omics features for life prediction of colorectal cancer patients.
Disclosure of Invention
In view of the above problems, the present invention provides a novel method for predicting the survival of CRC (colorectal cancer) patients based on deep learning CT image group; in particular to a colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics.
The technical scheme of the invention is as follows: the colorectal cancer survival period prediction method based on deep learning CT imaging omics specifically comprises the following steps:
step (1.1), data acquisition: the data comprises clinical data and CT imaging omics data;
step (1.2), carrying out colorectal tumor region labeling on CT image omics data;
step (1.3), preprocessing the acquired data;
step (1.4), constructing a feature learning model based on a deep neural network to obtain deep high-flux features of the colorectal cancer CT image omics data;
step (1.5), reducing dimensions by utilizing the Lasso regression to the deep high-flux characteristics of the colorectal cancer CT image omics data, and establishing a risk score model of a patient;
step (1.6), according to the proteomic risk score S of the patient, obtaining a cutoff value T by using a median value of the proteomic label score values, and dividing the patient into a survival period high risk group and a survival period low risk group;
step (1.7), carrying out curve evaluation and verification on the obtained deep high-flux characteristics by using a drawing KM curve and adopting data analysis software;
and (1.8) constructing a deep neural network multi-task logistic regression model to predict the life cycle probability.
Further, in step (1.1), specifically:
(1.1.1), clinical data: including the age, sex, survival status of the patient: 1 or 0 and the time of interest since the CT image was taken; wherein 1 represents death and 0 represents survival;
(1.1.2), CT imaging omics data: i.e. CT image data taken by the patient.
Further, in step (1.2), the specific operation manner of labeling the colorectal tumor region for CT imaging omics data is as follows: and (3) introducing the CT image omics data into the ITK-SNAP in batches according to unit sequence, manually marking the ITK-SNAP, selecting an interested region where the tumor is located, and storing the marked CT image omics data into an nii file.
Further, in the step (1.3), the specific operation steps of preprocessing the acquired data are as follows:
and (3) carrying out pre-selection deletion on the data, wherein the elimination criteria are as follows:
(1.3.1) incomplete information of clinical information record, wherein the incomplete reasons comprise missed visits, quits and terminations;
(1.3.2), the cut-off of the survival observation process is due to other causes, not to death events;
(1.3.3) obtaining a region of interest nii file according to the step (1.2), extracting the region of interest features by combining original CT image omics data, and obtaining a feature three-dimensional matrix f (P, P, P) containing the region of interest by each unit, wherein P represents the size of the matrix.
Further, in step (1.4), the deep neural network-based feature learning model is specifically described as follows: obtaining a characteristic matrix containing an interested area by each unit as the input of a network, wherein the size of the characteristic matrix is [ M multiplied by P ], wherein M represents the total number of the units; p represents the feature matrix dimension of each unit in the total units;
putting the obtained object into a feature selector for feature selection; wherein the feature selector is composed of N0A convolution layer, N0Each pooling layer, the full-connection layer and the logistic regression output layer; the convolution layer comprises M1A filter, other convolutional layers including MiA plurality of filters, wherein the filter size is n × n × n, and n represents the filter size;
after each convolution layer, the maximum pooling operation is carried out, and each convolution layer with the size of the pool being m multiplied by m has a linear rectification function; the loss function adopts a mean square error, and the formula is as follows:
Further, in step (1.5), the specific operation method for performing effective dimensionality reduction on the colorectal cancer CT imaging omics data is as follows: firstly, M multiplied by K node information of a full connection layer of a feature learning model of a deep neural network is selected as first effective feature dimension reduction, wherein M represents the total unit number, and K is the node information number; standardizing the data;
then, further effective dimensionality reduction is carried out on the features by adopting a least absolute contraction selection operator Lasso regression, and the risk coefficient score S of each person is obtained; the Lasso regression loss function is given by:
wherein xi represents each unit feature label, yi represents each unit time label, λ represents the regularization coefficient,representing the weight coefficients.
Further, in step (1.7), the specific operation steps of curve evaluation and verification for the selected features are as follows:
(1.7.1) drawing a corresponding KM curve according to the cut-off value T obtained in the step (1.6), so that a result is visualized, and two survival probability curves are obtained;
(1.7.2) after different survival probability curves are obtained by using a KM method, chi-square test is carried out through data analysis software, and finally a P value is obtained;
(1.7.3) judging whether the two curves have significant difference according to the P value.
Further, in the step (1.8), the specific operation steps of constructing the deep neural network multi-task logistic regression model for predicting the lifetime probability are as follows:
(1.8.1) introducing the final effective characteristics obtained in the step (1.5), the time labels and the survival state labels into a deep neural network multitask logistic regression model;
wherein each layer of the deep neural network multitask logistic regression model uses the following activation function:
layer #1: M1 neurons using the activation function h(1)(x)=LeakyReLu(x)
Layer # 2M 2 neurons using the activation function h(2)(x)=ReLu(x)
Layer # 3M 3 neurons using the activation function h(3)(x)=ReLu(x)
Wherein LeakyReLu represents a linear unit function with leakage correction, and ReLu represents a linear unit function with leakage correction;
the time axis is divided into J-time intervals such thatHaving τ 00 and τJInfinity; as shown in the following formula:
at each interval ajA logistic regression model is established, and parametersAnd response variableI.e. the event occurs in interval ajIs 1, otherwise is 0;
when a unit is in the interval asWhen an event is experienced, s ∈ [1, J ∈]The state of the remaining interval remains unchanged; thus, the response vector is described by:
wherein, ajRepresents a unit time interval: one month; y isjAs response variables: 1 represents the occurrence of an event and the like,
0 represents no occurrence;
probability density function:
wherein exp () represents an exponential function with a natural number e as the base;
survival function:
wherein the content of the first and second substances,is → x ∈ RpThe feature vector is the nonlinear transformation of the input; the output of which is oneThe vector, whose values are mapped to the J subdivision of the time axis, is described as follows:
(1.8.2), wherein the ratio of training set to test set is set to 8: 2, visualizing the result;
(1.8.3) evaluating the identification power of the deep neural network multitask logistic regression model by using a consistency index: the consistency index represents the overall evaluation of the identification power of the deep neural network multi-task logistic regression model, the numerical range of the consistency index is 0-1, the numerical value 1 is the optimal prediction model, the numerical value 0.5 is the random prediction model, and the numerical value 0 is the inapplicable model; the consistency index is calculated as follows:
wherein C-index represents the consistency index and η i represents the risk score of a unit i; 1Tj < Ti satisfies that Tj < Ti is 1, otherwise 0;
(1.8.4) evaluating the accuracy of the deep neural network multiple task logistic regression model using IBS: the numerical range is between 0 and 1, wherein 0 is the best possible value; IBS <0.25 represents a useful model; wherein, the IBS calculation formula is as follows:
wherein IBS represents a composite brix score used to assess the accuracy of the model's predictive survival function. N is the number of data samples,representing the actual probability of the occurrence of the event t for sample i.
The invention has the beneficial effects that: the invention uses deep learning technology and CT imaging omics label to predict the survival time of the colorectal cancer patient; the technique relies on CT imaging, and CT images are easily obtained clinically; in medicine, after a CT image of a patient is obtained, system analysis is introduced, and the result can provide reference for a doctor (especially a young radiologist with insufficient experience) so as to better understand the condition of the patient and make a next decision; in addition, the patient can better understand the condition of the patient;
the CT image contains abundant features, but the CT image has large size and excessive slices, so that the data volume is large and the redundant features are large; according to the method, data dimensionality reduction is realized through a DL feature selector and a least absolute shrinkage operator Lasso regression, so that effective features which are low in dimensionality and beneficial to prediction are obtained;
in addition, the present invention constructs a deep neural network multiple task logistic regression (DNN-MTLR) model that provides similar results to the CoxPH model, but without relying on the assumptions required by the latter, can be used to estimate the likelihood of an event of interest occurring within each centerline using the DNN-MTLR model.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic illustration of the manual labeling of the present invention using ITK-SNAP;
FIG. 3 is a diagram of a model of a DL feature selector network in accordance with the present invention;
FIG. 4 is a high-low risk group-KM graph in accordance with the present invention;
FIG. 5 is a diagram of a DNN-MTLR network model in the present invention;
FIG. 6 is a graph of the results of the present invention using a DNN-MTLR model for prediction;
FIG. 7 is a diagram of a prediction diagram according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the technical solution of the present invention, the following detailed description is made with reference to the accompanying drawings:
as depicted in fig. 1; the colorectal cancer survival time prediction method based on deep learning CT image omics finally obtains the five-year disease-free survival time (DFS) probability of colorectal cancer patients, and comprises the following specific steps:
step (1.1), data acquisition: the data comprises clinical data and CT imaging omics data;
step (1.2), carrying out colorectal tumor region labeling on CT image omics data;
step (1.3), preprocessing the acquired data;
step (1.4), constructing a feature learning model based on a deep neural network to obtain deep high-flux features of the colorectal cancer CT image omics data;
step (1.5), reducing dimensions by utilizing the Lasso regression to the deep high-flux characteristics of the colorectal cancer CT image omics data, and establishing a risk score model of a patient;
step (1.6), according to the proteomic risk score S of the patient, obtaining a cutoff value T by using a median value of the proteomic tag score values, and dividing the patient into a survival time high risk group (S > T) and a survival time low risk group (S < T);
step (1.7), carrying out curve evaluation and verification on the obtained deep high-flux characteristics by using a drawing KM curve and adopting data analysis software;
and (1.8) constructing a deep neural network multitask logistic regression (DNN-MTLR) model for predicting the survival time probability.
Further, in step (1.1), specifically:
(1.1.1), clinical data: including the age, sex, survival status of the patient: 1 or 0 and the time of interest since the CT image was taken; wherein 1 represents death and 0 represents survival;
(1.1.2), CT imaging omics data: i.e. CT image data taken by the patient.
Further, in step (1.2), the specific operation manner of labeling the colorectal tumor region for CT imaging omics data is as follows: introducing CT image omics data into ITK-SNAP in batches according to unit sequence, manually marking the ITK-SNAP, selecting an interested region where a tumor is located, and storing the marked CT image omics data into an nii file; the labeling results are shown in FIG. 2.
Further, in the step (1.3), the specific operation steps of preprocessing the data are as follows:
and (3) carrying out pre-selection deletion on the data, wherein the elimination criteria are as follows:
(1.3.1), incomplete information of clinical information records, wherein the incomplete reasons include missed visits (meaning loss of contact), withdrawal (withdrawal from study due to non-study or non-treatment factors), termination (termination of observation after the time specified by the design has been reached, but the study still survived);
(1.3.2), the cut-off of the survival observation process is due to other causes, not to death events;
(1.3.3) obtaining a region of interest nii file according to the step (1.2), extracting the region of interest features by combining the original CT image data, and obtaining a feature three-dimensional matrix f (32,32,32) containing the region of interest by each unit.
Further, in step (1.4), as shown in fig. 3, the deep neural network-based feature learning model is specifically described as follows: obtaining a characteristic matrix containing an interested area by each unit as the input of a network, wherein the size of the characteristic matrix is [ M multiplied by P ], wherein M represents the total number of the units; p represents the feature matrix dimension of each unit in the total units;
it is prepared byPutting the obtained product into a feature selector for feature selection; wherein the feature selector is composed of N0A convolution layer, N0The system comprises a pooling layer, a full-connection layer and a logistic regression output layer; the convolution layer comprises M1A filter, other convolutional layers including MiA plurality of filters, wherein the size of the filters is n multiplied by n, and n is the size of the filter size;
after each convolution layer, the maximum pooling operation is carried out, and each convolution layer with the pool size of m multiplied by m has a linear rectification function (RELU); the loss function is a Mean Square Error (MSE) which is given by the following equation:
Further, in step (1.5), the specific operation method for performing effective dimensionality reduction on the colorectal cancer CT imaging omics data is as follows: firstly, selecting 600 multiplied by 6400 node information of a full connection layer of a feature learning model of a deep neural network as first effective feature dimension reduction; standardizing the data;
then, further effective dimensionality reduction is carried out on the features by adopting a least absolute contraction selection operator Lasso regression, and the risk coefficient score S of each person is obtained; wherein the Lasso regression loss function is as follows:
wherein xi represents each unit feature label, yi represents each unit time label, and λ represents the regularization coefficient,representing the weight coefficients.
Further, in step (1.6), the specific procedures for classifying patients into high-risk survival group (S > T) and low-risk survival group (S < T) are as follows: finding a risk coefficient score S file for the patient, using the median of the imagery omics label score values as the cutoff value T: -2.227, with T as a cutoff value, for S > T the high risk group with a short life span and S < T the low risk group with a long life span.
Further, in step (1.7), the specific operation steps of curve evaluation and verification for the selected features are as follows:
(1.7.1) drawing a corresponding KM curve according to the cut-off value T obtained in the step (1.6), so that a result is visualized, and two survival probability curves are obtained;
(1.7.2) after different survival probability curves are obtained by using a KM method, determining whether the obvious difference among the curves is insufficient only by direct observation, and performing log-rank test by using an IBM SPSS statics 26 to finally obtain a P value;
(1.7.3) judging whether the two curves have significant difference according to the P value; p <0.05 is generally considered statistically different; the result P <0.01 was obtained with statistical differences.
Further, in the step (1.8), the specific operation steps of constructing a deep neural network multi-task logistic regression (DNN-MTLR) model for lifetime probability prediction are as follows:
(1.8.1) introducing the final effective characteristics obtained in the step (5), the time labels and the survival state labels into a deep neural network multitask logistic regression (DNN-MTLR) model; wherein the DNN-MTLR model is shown in FIG. 5;
each layer uses the following activation function:
layer #1:326 neurons, using activation function h(1)(x)=LeakyReLu(x)
Layer #2:652 neurons using the activation function h(2)(x)=ReLu(x)
Layer #3:1304 neurons, using the activation function h(3)(x)=ReLu(x)
Wherein LeakyReLu is a linear unit function with leakage correction, and ReLu is a linear unit function with leakage correction;
the time axis is divided into J-time intervals such thatHaving τ 00 and τJInfinity; as shown in the following formula:
at each interval ajA logistic regression model is established, and parametersAnd response variableI.e. the event occurs in interval ajIs 1, otherwise is 0; however, since the effects of repeated events are not analyzed, it is necessary to ensure that when a unit is at interval asWhen an event is experienced, s ∈ [1, J ∈]The state of the remaining interval remains unchanged; thus, the response vector is described by:
wherein, ajRepresents a unit time interval: one month; y isjAs response variables: 1 represents the occurrence of an event, 0
Represents that no occurrence has occurred;
probability density function:
wherein exp () represents an exponential function with a natural number e as the base;
survival function:
wherein the content of the first and second substances,so as to make→x∈RpThe feature vector is the nonlinear transformation of the input; the output of which is oneThe vector, whose values are mapped to the J subdivision of the time axis, is described as follows:
(1.8.2), wherein the ratio of training set to test set is set to 8: 2, visualizing the result;
(1.8.3) evaluating the discriminative power of the deep neural network multitask logistic regression model using the consistency index (C-index) DNN-MTLR model: the C-index represents the overall evaluation of the identification power of the deep neural network multi-task logistic regression model, and the C-index (0.82).1 is obtained as the optimal prediction model, the numerical value 0.5 is obtained as the random prediction model, and the numerical value 0 is obtained as the inapplicable model. The calculation formula of the C-index is as follows:
wherein C-index represents the consistency index and η i represents the risk score of a unit i; 1Tj < Ti satisfies that Tj < Ti is 1, otherwise 0;
(1.8.4) evaluating the accuracy of a deep neural network multiple task logistic regression (DNN-MTLR) model using Integrated Brisket Score (IBS): wherein IBS represents the accuracy of the prediction survival function of the assessment model, IBS value: (0.06), wherein 0 is the best possible value; IBS <0.25 represents a useful model; wherein, the IBS calculation formula is as follows:
wherein IBS represents a composite brix score used to assess the accuracy of the model's predictive survival function. N is the number of data samples,representing the actual probability of the occurrence of the event t for sample i.
The specific embodiment is as follows:
(1) and acquiring data: and obtaining CT image omics data of the patient A.
(2) And labeling the colorectal tumor region of the CT image omics data.
(3) And preprocessing the acquired data to obtain time and state labels.
(4) And constructing a feature learning model based on a deep neural network to obtain the CT image omics deep high-flux features of the patient A.
(5) And performing dimensionality reduction on the depth high-flux characteristic of the CT image omics data by using a lasso regression operator, and establishing a risk scoring model of the patient A.
(6) And classifying the patient A into a high-risk group according to the risk score of the imaging group of the patient A.
(7) And evaluating and verifying the obtained deep high-flux characteristics.
(8) And putting the features obtained by dimensionality reduction into a deep neural network multi-task logistic regression model for life cycle probability prediction, and finally obtaining a prediction result.
The results are shown in FIG. 7, where the annual probability results are shown in the following table:
month of the year | 12 | 24 | 36 | 48 | 60 |
Probability of survival | 97.649882% | 93.804818% | 85.995414% | 66.933918% | 48.437748% |
The results show that such methods can be used for survival prediction in colorectal cancer patients.
Claims (8)
1. The colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics is characterized by comprising the following specific steps of:
step (1.1), data acquisition: the data comprises clinical data and CT imaging omics data;
step (1.2), carrying out colorectal tumor region labeling on CT image omics data;
step (1.3), preprocessing the acquired data;
step (1.4), constructing a feature learning model based on a deep neural network to obtain deep high-flux features of the colorectal cancer CT image omics data;
step (1.5), reducing dimensions by utilizing the Lasso regression to the deep high-flux characteristics of the colorectal cancer CT image omics data, and establishing a risk score model of a patient;
step (1.6), according to the proteomic risk score S of the patient, obtaining a cutoff value T by using a median value of the proteomic label score values, and dividing the patient into a survival period high risk group and a survival period low risk group;
step (1.7), carrying out curve evaluation and verification on the obtained deep high-flux characteristics by using a drawing KM curve and adopting data analysis software;
and (1.8) constructing a deep neural network multi-task logistic regression model to predict the life cycle probability.
2. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein in step (1.1), specifically:
(1.1.1), clinical data: including the age, sex, survival status of the patient: 1 or 0 and the time of interest since the CT image was taken; wherein 1 represents death and 0 represents survival;
(1.1.2), CT imaging omics data: i.e. CT image data taken by the patient.
3. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein in step (1.2), the specific operation manner of labeling the colorectal tumor region for the CT imaging omics data is as follows: and (3) introducing the CT image omics data into the ITK-SNAP in batches according to unit sequence, manually marking the ITK-SNAP, selecting an interested region where the tumor is located, and storing the marked CT image omics data into an nii file.
4. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein in said step (1.3), the specific operation steps of preprocessing the obtained data are as follows:
and (3) carrying out pre-selection deletion on the data, wherein the elimination criteria are as follows:
(1.3.1) incomplete information of clinical information record, wherein the incomplete reasons comprise missed visits, quits and terminations;
(1.3.2), the cut-off of the survival observation process is due to other causes, not to death events;
(1.3.3) obtaining a region of interest nii file according to the step (1.2), extracting the region of interest features by combining original CT image omics data, and obtaining a feature three-dimensional matrix f (P, P, P) containing the region of interest by each unit, wherein P represents the size of the matrix.
5. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein in step (1.4), the deep neural network-based feature learning model is specifically described as follows: obtaining a characteristic matrix containing an interested area by each unit as the input of a network, wherein the size of the characteristic matrix is [ M multiplied by P ], wherein M represents the total number of the units; p represents the feature matrix dimension of each unit in the total units;
putting the obtained object into a feature selector for feature selection; wherein the feature selector is composed of N0A convolution layer, N0Each pooling layer, the full-connection layer and the logistic regression output layer; the convolution layer comprises M1A filter, other convolutional layers including MiA plurality of filters, wherein the filter size is n × n × n, and n represents the filter size;
after each convolution layer, the maximum pooling operation is carried out, and each convolution layer with the size of the pool being m multiplied by m has a linear rectification function; the loss function adopts a mean square error, and the formula is as follows:
6. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein in step (1.5), the specific operation method for performing effective dimension reduction on the colorectal cancer CT imaging omics data is as follows: firstly, M multiplied by K node information of a full connection layer of a feature learning model of a deep neural network is selected as first effective feature dimension reduction, wherein M represents the total unit number, and K is the node information number; standardizing the data;
then, further effective dimensionality reduction is carried out on the features by adopting a least absolute contraction selection operator Lasso regression, and the risk coefficient score S of each person is obtained; the Lasso regression loss function is given by:
7. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein the specific operation steps of curve evaluation and verification of the selected features in step (1.7) are as follows:
(1.7.1) drawing a corresponding KM curve according to the cut-off value T obtained in the step (1.6), so that a result is visualized, and two survival probability curves are obtained;
(1.7.2) after different survival probability curves are obtained by using a KM method, chi-square test is carried out through data analysis software, and finally a P value is obtained;
(1.7.3) judging whether the two curves have significant difference according to the P value.
8. The method for predicting survival time of colorectal cancer based on deep learning CT (computed tomography) proteomics as claimed in claim 1, wherein in the step (1.8), the specific operation steps of constructing a deep neural network multi-task logistic regression model for predicting survival time probability are as follows:
(1.8.1) introducing the final effective characteristics obtained in the step (1.5), the time labels and the survival state labels into a deep neural network multitask logistic regression model;
wherein each layer of the deep neural network multitask logistic regression model uses the following activation function:
layer #1: M1 neurons using the activation function h(1)(x)=LeakyReLu(x)
Layer # 2M 2 neurons using the activation function h(2)(x)=ReLu(x)
Layer # 3M 3 neurons using the activation function h(3)(x)=ReLu(x)
Wherein LeakyReLu represents a linear unit function with leakage correction, and ReLu represents a linear unit function with leakage correction;
the time axis is divided into J-time intervals such thatHaving τ00 and τJInfinity; as shown in the following formula:
at each interval ajA logistic regression model is established, and parametersAnd response variableI.e. the event occurs in interval ajIs 1, otherwise is 0;
when a unit is in the interval asWhen an event is experienced, s ∈ [1, J ∈]The state of the remaining interval remains unchanged; thus, the response vector is described by:
wherein, ajRepresents a unit time interval: one month; y isjAs response variables: 1 represents event occurrence, 0 represents non-occurrence;
probability density function:
wherein exp () represents an exponential function with a natural number e as the base;
survival function:
wherein the content of the first and second substances, so as to make→x∈RpThe feature vector is the nonlinear transformation of the input; the output of which is oneVectors whose values are mapped to J subdivisions of the time axis as described below:
(1.8.2), wherein the ratio of training set to test set is set to 8: 2, visualizing the result;
(1.8.3) evaluating the identification power of the deep neural network multitask logistic regression model by using a consistency index: the consistency index represents the overall evaluation of the identification power of the deep neural network multi-task logistic regression model, the numerical range of the consistency index is 0-1, the numerical value 1 is the optimal prediction model, the numerical value 0.5 is the random prediction model, and the numerical value 0 is the inapplicable model; the consistency index is calculated as follows:
wherein C-index represents the consistency index and η i represents the risk score of a unit i; 1Tj < Ti satisfies that Tj < Ti is 1, otherwise 0;
(1.8.4) evaluating the accuracy of the deep neural network multiple task logistic regression model using IBS: the numerical range is between 0 and 1, wherein 0 is the best possible value; IBS <0.25 represents a useful model; wherein, the IBS calculation formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011005022.9A CN112309576A (en) | 2020-09-22 | 2020-09-22 | Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011005022.9A CN112309576A (en) | 2020-09-22 | 2020-09-22 | Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112309576A true CN112309576A (en) | 2021-02-02 |
Family
ID=74488430
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011005022.9A Pending CN112309576A (en) | 2020-09-22 | 2020-09-22 | Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112309576A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112820403A (en) * | 2021-02-25 | 2021-05-18 | 中山大学 | Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data |
CN112927799A (en) * | 2021-04-13 | 2021-06-08 | 中国科学院自动化研究所 | Life cycle analysis system fusing multi-example learning and multi-task depth imaging group |
CN113257413A (en) * | 2021-06-22 | 2021-08-13 | 安翰科技(武汉)股份有限公司 | Cancer prognosis survival prediction method and device based on deep learning and storage medium |
CN113313680A (en) * | 2021-05-24 | 2021-08-27 | 华南理工大学 | Colorectal cancer pathological image prognosis auxiliary prediction method and system |
CN113345576A (en) * | 2021-06-04 | 2021-09-03 | 江南大学 | Rectal cancer lymph node metastasis diagnosis method based on deep learning multi-modal CT |
CN113689382A (en) * | 2021-07-26 | 2021-11-23 | 北京知见生命科技有限公司 | Tumor postoperative life prediction method and system based on medical images and pathological images |
CN113724876A (en) * | 2021-09-10 | 2021-11-30 | 南昌大学第二附属医院 | Intra-stroke hospital complication prediction model based on multi-mode fusion and DFS-LLE algorithm |
CN114188021A (en) * | 2021-12-13 | 2022-03-15 | 浙江大学 | Intelligent analysis system for children intussusception diagnosis based on multi-mode fusion |
CN114511564A (en) * | 2022-04-19 | 2022-05-17 | 天津市肿瘤医院(天津医科大学肿瘤医院) | Image analysis method for breast cancer residual tumor load based on DCE-MRI |
CN115762764A (en) * | 2022-11-25 | 2023-03-07 | 中山大学附属第三医院 | HIV negative cryptococcus meningitis treatment outcome prediction model and construction method thereof |
CN117334347A (en) * | 2023-12-01 | 2024-01-02 | 北京大学 | Method, device, equipment and storage medium for evaluating treatment effect |
CN117474865A (en) * | 2023-11-01 | 2024-01-30 | 东南大学 | CT image prediction and identification method and system |
CN117524486A (en) * | 2024-01-04 | 2024-02-06 | 北京市肿瘤防治研究所 | TTE model establishment method for predicting non-progressive survival probability of postoperative patient |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110895817A (en) * | 2019-11-01 | 2020-03-20 | 复旦大学 | MRI image hepatic fibrosis automatic grading method based on image omics analysis |
CN111210441A (en) * | 2020-01-02 | 2020-05-29 | 苏州瑞派宁科技有限公司 | Tumor prediction method and device, cloud platform and computer-readable storage medium |
CN111353998A (en) * | 2020-05-13 | 2020-06-30 | 温州医科大学附属第一医院 | Tumor diagnosis and treatment prediction model and device based on artificial intelligence |
AU2020101581A4 (en) * | 2020-07-31 | 2020-09-17 | Ampavathi, Anusha MS | Lymph node metastases detection from ct images using deep learning |
-
2020
- 2020-09-22 CN CN202011005022.9A patent/CN112309576A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110895817A (en) * | 2019-11-01 | 2020-03-20 | 复旦大学 | MRI image hepatic fibrosis automatic grading method based on image omics analysis |
CN111210441A (en) * | 2020-01-02 | 2020-05-29 | 苏州瑞派宁科技有限公司 | Tumor prediction method and device, cloud platform and computer-readable storage medium |
CN111353998A (en) * | 2020-05-13 | 2020-06-30 | 温州医科大学附属第一医院 | Tumor diagnosis and treatment prediction model and device based on artificial intelligence |
AU2020101581A4 (en) * | 2020-07-31 | 2020-09-17 | Ampavathi, Anusha MS | Lymph node metastases detection from ct images using deep learning |
Non-Patent Citations (1)
Title |
---|
梁萌;马霄虹;赵心明;: "影像组学在结直肠癌肝转移诊治中的研究进展", 中国医学影像学杂志, no. 05 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112820403B (en) * | 2021-02-25 | 2024-03-29 | 中山大学 | Deep learning method for predicting prognosis risk of cancer patient based on multiple sets of learning data |
CN112820403A (en) * | 2021-02-25 | 2021-05-18 | 中山大学 | Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data |
CN112927799B (en) * | 2021-04-13 | 2023-06-27 | 中国科学院自动化研究所 | Life analysis system integrating multi-example learning and multi-task depth image histology |
CN112927799A (en) * | 2021-04-13 | 2021-06-08 | 中国科学院自动化研究所 | Life cycle analysis system fusing multi-example learning and multi-task depth imaging group |
CN113313680A (en) * | 2021-05-24 | 2021-08-27 | 华南理工大学 | Colorectal cancer pathological image prognosis auxiliary prediction method and system |
CN113345576A (en) * | 2021-06-04 | 2021-09-03 | 江南大学 | Rectal cancer lymph node metastasis diagnosis method based on deep learning multi-modal CT |
CN113257413A (en) * | 2021-06-22 | 2021-08-13 | 安翰科技(武汉)股份有限公司 | Cancer prognosis survival prediction method and device based on deep learning and storage medium |
CN113689382A (en) * | 2021-07-26 | 2021-11-23 | 北京知见生命科技有限公司 | Tumor postoperative life prediction method and system based on medical images and pathological images |
CN113689382B (en) * | 2021-07-26 | 2023-12-01 | 北京知见生命科技有限公司 | Tumor postoperative survival prediction method and system based on medical images and pathological images |
CN113724876A (en) * | 2021-09-10 | 2021-11-30 | 南昌大学第二附属医院 | Intra-stroke hospital complication prediction model based on multi-mode fusion and DFS-LLE algorithm |
CN114188021A (en) * | 2021-12-13 | 2022-03-15 | 浙江大学 | Intelligent analysis system for children intussusception diagnosis based on multi-mode fusion |
CN114188021B (en) * | 2021-12-13 | 2022-06-10 | 浙江大学 | Intelligent analysis system for children intussusception diagnosis based on multi-mode fusion |
CN114511564B (en) * | 2022-04-19 | 2023-01-24 | 天津市肿瘤医院(天津医科大学肿瘤医院) | Image analysis method for breast cancer residual tumor load based on DCE-MRI |
CN114511564A (en) * | 2022-04-19 | 2022-05-17 | 天津市肿瘤医院(天津医科大学肿瘤医院) | Image analysis method for breast cancer residual tumor load based on DCE-MRI |
CN115762764A (en) * | 2022-11-25 | 2023-03-07 | 中山大学附属第三医院 | HIV negative cryptococcus meningitis treatment outcome prediction model and construction method thereof |
CN117474865A (en) * | 2023-11-01 | 2024-01-30 | 东南大学 | CT image prediction and identification method and system |
CN117334347A (en) * | 2023-12-01 | 2024-01-02 | 北京大学 | Method, device, equipment and storage medium for evaluating treatment effect |
CN117334347B (en) * | 2023-12-01 | 2024-03-22 | 北京大学 | Method, device, equipment and storage medium for evaluating treatment effect |
CN117524486A (en) * | 2024-01-04 | 2024-02-06 | 北京市肿瘤防治研究所 | TTE model establishment method for predicting non-progressive survival probability of postoperative patient |
CN117524486B (en) * | 2024-01-04 | 2024-04-05 | 北京市肿瘤防治研究所 | TTE model establishment method for predicting non-progressive survival probability of postoperative patient |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112309576A (en) | Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics | |
WO2022063200A1 (en) | Non-small cell lung cancer prognosis survival prediction method, medium and electronic device | |
RU2543563C2 (en) | Systems and methods for clinical decision support | |
CN110335665A (en) | It is a kind of applied to medical image auxiliary diagnosis analysis to scheme to search drawing method and system | |
Zhang et al. | Stroke lesion detection and analysis in MRI images based on deep learning | |
RU2459244C2 (en) | Clinician-driven example-based computer-aided diagnosis | |
Du et al. | Identification of COPD from multi-view snapshots of 3D lung airway tree via deep CNN | |
CN108230311A (en) | A kind of breast cancer detection method and device | |
CN112614133B (en) | Three-dimensional pulmonary nodule detection model training method and device without anchor point frame | |
CN114066882A (en) | Lung adenocarcinoma Ki67 expression level non-invasive detection method and device based on depth imaging omics | |
CN115099331A (en) | Auxiliary diagnosis system for malignant pleural effusion based on interpretable machine learning algorithm | |
CN114582496A (en) | Common gynecological disease prediction model construction method and prediction system | |
Zhang et al. | A deep learning model for the differential diagnosis of benign and malignant salivary gland tumors based on ultrasound imaging and clinical data | |
Kumar et al. | Colon cancer classification of histopathological images using data augmentation | |
Tang et al. | M-SEAM-NAM: multi-instance self-supervised equivalent attention mechanism with neighborhood affinity module for double weakly supervised segmentation of COVID-19 | |
Yang | A novel brain image segmentation method using an improved 3D U‐net model | |
CN115274119B (en) | Construction method of immunotherapy prediction model fusing multi-image mathematical characteristics | |
Chang et al. | DARWIN: a highly flexible platform for imaging research in radiology | |
CN113889235A (en) | Unsupervised feature extraction system for three-dimensional medical image | |
CN112562851A (en) | Method and system for constructing neck lymph metastasis diagnosis algorithm of oral cancer | |
Sajiv et al. | Machine Learning based Analysis of Histopathological Images of Breast Cancer Classification using Decision Tree Classifier | |
Oermann et al. | Longitudinal deep neural networks for assessing metastatic brain cancer on a massive open benchmark. | |
Lu et al. | Timeline and episode-structured clinical data: Pre-processing for Data Mining and analytics | |
Rathod et al. | Using Weakly Supervised Machine learning Algorithms for Classification and Analysis of CT Scan Lung Cancer Images | |
Chandrakantha et al. | A Survey on Artificial Intelligence-based Lung Tumor Segmentation and Classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |