CN112309576A - Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics - Google Patents

Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics Download PDF

Info

Publication number
CN112309576A
CN112309576A CN202011005022.9A CN202011005022A CN112309576A CN 112309576 A CN112309576 A CN 112309576A CN 202011005022 A CN202011005022 A CN 202011005022A CN 112309576 A CN112309576 A CN 112309576A
Authority
CN
China
Prior art keywords
data
survival
colorectal cancer
model
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011005022.9A
Other languages
Chinese (zh)
Inventor
潘祥
王孝磊
胡曙东
张衡
吕天旭
谢振平
刘渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Affiliated Hospital of Jiangsu University
Affiliated Hospital of Jiangnan University
Original Assignee
Jiangnan University
Affiliated Hospital of Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University, Affiliated Hospital of Jiangnan University filed Critical Jiangnan University
Priority to CN202011005022.9A priority Critical patent/CN112309576A/en
Publication of CN112309576A publication Critical patent/CN112309576A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/203Drawing of straight lines or curves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30028Colon; Small intestine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a colorectal cancer survival time prediction method based on deep learning CT (computed tomography) image omics. Belongs to the technical field of medical image processing. The method comprises the following specific steps: (1) acquiring data; (2) labeling the colorectal tumor region of the CT image omics data; (3) preprocessing the acquired data; (4) constructing a feature learning model based on a deep neural network; (5) establishing a risk scoring model of the patient by utilizing Lasso regression to reduce dimension for the colorectal cancer CT imagemics depth high-flux characteristics; (6) grouping according to the risk score; (7) verifying the effectiveness of the curve and the characteristic; (8) constructing a deep neural network multi-task logistic regression (DNN-MTLR) model for predicting the life cycle probability; according to the invention, the system analysis is introduced after the CT image of the patient is obtained, and the result can provide reference for doctors (especially the radiologist with short experience) so as to better understand the patient condition and make the next decision.

Description

Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics
Technical Field
The invention belongs to the technical field of medical image processing, can be used for intelligent medical disease diagnosis, introduces a colorectal cancer survival prediction method based on deep learning CT (computed tomography) image omics, and finally obtains the five-year disease-free survival (DFS) probability of colorectal cancer patients.
Background
Colorectal cancer is a common malignant tumor in the gastrointestinal tract, and has high morbidity and mortality. According to the results of the international tumor research institution in 2018 on global investigation, the incidence rate of colorectal cancer ranks the third and is second only to lung cancer and breast cancer. Mortality rates ranked second, second only to lung cancer. In China, the incidence and mortality of areas with developed economic conditions and the coastal areas of southeast are also in a remarkably increasing trend.
The accurate prediction of the life cycle of the patient has important clinical value and social value. For physicians, accurate prediction of the patient's survival (especially for young inexperienced physicians) may help physicians better understand the patient's condition, make diagnoses, and make optimal medical decisions. For a patient, the life cycle of the patient can be accurately predicted, scientific survival expectation can be provided for the patient, and the physical condition of the patient can be better understood. Therefore, the patient is guided to scientifically follow a treatment plan, excessive medical treatment is avoided, the family economic burden is reduced, and the doctor-patient relationship is favorably improved.
With the development of imaging and artificial intelligence technologies, imaging technologies such as Computed Tomography (CT), Positron Emission Tomography (PET), and Magnetic Resonance (MR) play an increasingly important role in diagnosis, and prognosis of tumors. The function of medical imaging is gradually changed from the traditional analysis methods such as disease diagnosis and screening to individual precise diagnosis and treatment. The mainstream direction for future medical development is accurate medicine, which needs to take into account individual variability prevention and corresponding diagnostic and therapeutic strategies. The combination of artificial intelligence and medicine is also a necessary way for future development of future medicine, and the realization of artificial intelligence is one of the technologies without machine learning, and deep learning is one of the technologies. Deep learning techniques can be combined with CT imaging omics features for life prediction of colorectal cancer patients.
Disclosure of Invention
In view of the above problems, the present invention provides a novel method for predicting the survival of CRC (colorectal cancer) patients based on deep learning CT image group; in particular to a colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics.
The technical scheme of the invention is as follows: the colorectal cancer survival period prediction method based on deep learning CT imaging omics specifically comprises the following steps:
step (1.1), data acquisition: the data comprises clinical data and CT imaging omics data;
step (1.2), carrying out colorectal tumor region labeling on CT image omics data;
step (1.3), preprocessing the acquired data;
step (1.4), constructing a feature learning model based on a deep neural network to obtain deep high-flux features of the colorectal cancer CT image omics data;
step (1.5), reducing dimensions by utilizing the Lasso regression to the deep high-flux characteristics of the colorectal cancer CT image omics data, and establishing a risk score model of a patient;
step (1.6), according to the proteomic risk score S of the patient, obtaining a cutoff value T by using a median value of the proteomic label score values, and dividing the patient into a survival period high risk group and a survival period low risk group;
step (1.7), carrying out curve evaluation and verification on the obtained deep high-flux characteristics by using a drawing KM curve and adopting data analysis software;
and (1.8) constructing a deep neural network multi-task logistic regression model to predict the life cycle probability.
Further, in step (1.1), specifically:
(1.1.1), clinical data: including the age, sex, survival status of the patient: 1 or 0 and the time of interest since the CT image was taken; wherein 1 represents death and 0 represents survival;
(1.1.2), CT imaging omics data: i.e. CT image data taken by the patient.
Further, in step (1.2), the specific operation manner of labeling the colorectal tumor region for CT imaging omics data is as follows: and (3) introducing the CT image omics data into the ITK-SNAP in batches according to unit sequence, manually marking the ITK-SNAP, selecting an interested region where the tumor is located, and storing the marked CT image omics data into an nii file.
Further, in the step (1.3), the specific operation steps of preprocessing the acquired data are as follows:
and (3) carrying out pre-selection deletion on the data, wherein the elimination criteria are as follows:
(1.3.1) incomplete information of clinical information record, wherein the incomplete reasons comprise missed visits, quits and terminations;
(1.3.2), the cut-off of the survival observation process is due to other causes, not to death events;
(1.3.3) obtaining a region of interest nii file according to the step (1.2), extracting the region of interest features by combining original CT image omics data, and obtaining a feature three-dimensional matrix f (P, P, P) containing the region of interest by each unit, wherein P represents the size of the matrix.
Further, in step (1.4), the deep neural network-based feature learning model is specifically described as follows: obtaining a characteristic matrix containing an interested area by each unit as the input of a network, wherein the size of the characteristic matrix is [ M multiplied by P ], wherein M represents the total number of the units; p represents the feature matrix dimension of each unit in the total units;
putting the obtained object into a feature selector for feature selection; wherein the feature selector is composed of N0A convolution layer, N0Each pooling layer, the full-connection layer and the logistic regression output layer; the convolution layer comprises M1A filter, other convolutional layers including MiA plurality of filters, wherein the filter size is n × n × n, and n represents the filter size;
after each convolution layer, the maximum pooling operation is carried out, and each convolution layer with the size of the pool being m multiplied by m has a linear rectification function; the loss function adopts a mean square error, and the formula is as follows:
Figure BDA0002693556580000031
wherein, ymThe actual value is represented by the value of,
Figure BDA0002693556580000032
indicating the predicted value.
Further, in step (1.5), the specific operation method for performing effective dimensionality reduction on the colorectal cancer CT imaging omics data is as follows: firstly, M multiplied by K node information of a full connection layer of a feature learning model of a deep neural network is selected as first effective feature dimension reduction, wherein M represents the total unit number, and K is the node information number; standardizing the data;
then, further effective dimensionality reduction is carried out on the features by adopting a least absolute contraction selection operator Lasso regression, and the risk coefficient score S of each person is obtained; the Lasso regression loss function is given by:
Figure BDA0002693556580000033
wherein xi represents each unit feature label, yi represents each unit time label, λ represents the regularization coefficient,
Figure BDA0002693556580000034
representing the weight coefficients.
Further, in step (1.7), the specific operation steps of curve evaluation and verification for the selected features are as follows:
(1.7.1) drawing a corresponding KM curve according to the cut-off value T obtained in the step (1.6), so that a result is visualized, and two survival probability curves are obtained;
(1.7.2) after different survival probability curves are obtained by using a KM method, chi-square test is carried out through data analysis software, and finally a P value is obtained;
(1.7.3) judging whether the two curves have significant difference according to the P value.
Further, in the step (1.8), the specific operation steps of constructing the deep neural network multi-task logistic regression model for predicting the lifetime probability are as follows:
(1.8.1) introducing the final effective characteristics obtained in the step (1.5), the time labels and the survival state labels into a deep neural network multitask logistic regression model;
wherein each layer of the deep neural network multitask logistic regression model uses the following activation function:
layer #1: M1 neurons using the activation function h(1)(x)=LeakyReLu(x)
Layer # 2M 2 neurons using the activation function h(2)(x)=ReLu(x)
Layer # 3M 3 neurons using the activation function h(3)(x)=ReLu(x)
Wherein LeakyReLu represents a linear unit function with leakage correction, and ReLu represents a linear unit function with leakage correction;
the time axis is divided into J-time intervals such that
Figure BDA0002693556580000041
Having τ 00 and τJInfinity; as shown in the following formula:
Figure BDA0002693556580000042
at each interval ajA logistic regression model is established, and parameters
Figure BDA0002693556580000043
And response variable
Figure BDA0002693556580000044
I.e. the event occurs in interval ajIs 1, otherwise is 0;
when a unit is in the interval asWhen an event is experienced, s ∈ [1, J ∈]The state of the remaining interval remains unchanged; thus, the response vector is described by:
Figure BDA0002693556580000045
wherein, ajRepresents a unit time interval: one month; y isjAs response variables: 1 represents the occurrence of an event and the like,
0 represents no occurrence;
probability density function:
Figure BDA0002693556580000046
wherein exp () represents an exponential function with a natural number e as the base;
survival function:
Figure BDA0002693556580000047
wherein the content of the first and second substances,
Figure BDA0002693556580000048
is → x ∈ Rp
Figure BDA0002693556580000049
The feature vector is the nonlinear transformation of the input; the output of which is one
Figure BDA0002693556580000051
The vector, whose values are mapped to the J subdivision of the time axis, is described as follows:
Figure BDA0002693556580000052
(1.8.2), wherein the ratio of training set to test set is set to 8: 2, visualizing the result;
(1.8.3) evaluating the identification power of the deep neural network multitask logistic regression model by using a consistency index: the consistency index represents the overall evaluation of the identification power of the deep neural network multi-task logistic regression model, the numerical range of the consistency index is 0-1, the numerical value 1 is the optimal prediction model, the numerical value 0.5 is the random prediction model, and the numerical value 0 is the inapplicable model; the consistency index is calculated as follows:
Figure BDA0002693556580000053
wherein C-index represents the consistency index and η i represents the risk score of a unit i; 1Tj < Ti satisfies that Tj < Ti is 1, otherwise 0;
(1.8.4) evaluating the accuracy of the deep neural network multiple task logistic regression model using IBS: the numerical range is between 0 and 1, wherein 0 is the best possible value; IBS <0.25 represents a useful model; wherein, the IBS calculation formula is as follows:
Figure BDA0002693556580000054
Figure BDA0002693556580000055
wherein IBS represents a composite brix score used to assess the accuracy of the model's predictive survival function. N is the number of data samples,
Figure BDA0002693556580000056
representing the actual probability of the occurrence of the event t for sample i.
The invention has the beneficial effects that: the invention uses deep learning technology and CT imaging omics label to predict the survival time of the colorectal cancer patient; the technique relies on CT imaging, and CT images are easily obtained clinically; in medicine, after a CT image of a patient is obtained, system analysis is introduced, and the result can provide reference for a doctor (especially a young radiologist with insufficient experience) so as to better understand the condition of the patient and make a next decision; in addition, the patient can better understand the condition of the patient;
the CT image contains abundant features, but the CT image has large size and excessive slices, so that the data volume is large and the redundant features are large; according to the method, data dimensionality reduction is realized through a DL feature selector and a least absolute shrinkage operator Lasso regression, so that effective features which are low in dimensionality and beneficial to prediction are obtained;
in addition, the present invention constructs a deep neural network multiple task logistic regression (DNN-MTLR) model that provides similar results to the CoxPH model, but without relying on the assumptions required by the latter, can be used to estimate the likelihood of an event of interest occurring within each centerline using the DNN-MTLR model.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic illustration of the manual labeling of the present invention using ITK-SNAP;
FIG. 3 is a diagram of a model of a DL feature selector network in accordance with the present invention;
FIG. 4 is a high-low risk group-KM graph in accordance with the present invention;
FIG. 5 is a diagram of a DNN-MTLR network model in the present invention;
FIG. 6 is a graph of the results of the present invention using a DNN-MTLR model for prediction;
FIG. 7 is a diagram of a prediction diagram according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the technical solution of the present invention, the following detailed description is made with reference to the accompanying drawings:
as depicted in fig. 1; the colorectal cancer survival time prediction method based on deep learning CT image omics finally obtains the five-year disease-free survival time (DFS) probability of colorectal cancer patients, and comprises the following specific steps:
step (1.1), data acquisition: the data comprises clinical data and CT imaging omics data;
step (1.2), carrying out colorectal tumor region labeling on CT image omics data;
step (1.3), preprocessing the acquired data;
step (1.4), constructing a feature learning model based on a deep neural network to obtain deep high-flux features of the colorectal cancer CT image omics data;
step (1.5), reducing dimensions by utilizing the Lasso regression to the deep high-flux characteristics of the colorectal cancer CT image omics data, and establishing a risk score model of a patient;
step (1.6), according to the proteomic risk score S of the patient, obtaining a cutoff value T by using a median value of the proteomic tag score values, and dividing the patient into a survival time high risk group (S > T) and a survival time low risk group (S < T);
step (1.7), carrying out curve evaluation and verification on the obtained deep high-flux characteristics by using a drawing KM curve and adopting data analysis software;
and (1.8) constructing a deep neural network multitask logistic regression (DNN-MTLR) model for predicting the survival time probability.
Further, in step (1.1), specifically:
(1.1.1), clinical data: including the age, sex, survival status of the patient: 1 or 0 and the time of interest since the CT image was taken; wherein 1 represents death and 0 represents survival;
(1.1.2), CT imaging omics data: i.e. CT image data taken by the patient.
Further, in step (1.2), the specific operation manner of labeling the colorectal tumor region for CT imaging omics data is as follows: introducing CT image omics data into ITK-SNAP in batches according to unit sequence, manually marking the ITK-SNAP, selecting an interested region where a tumor is located, and storing the marked CT image omics data into an nii file; the labeling results are shown in FIG. 2.
Further, in the step (1.3), the specific operation steps of preprocessing the data are as follows:
and (3) carrying out pre-selection deletion on the data, wherein the elimination criteria are as follows:
(1.3.1), incomplete information of clinical information records, wherein the incomplete reasons include missed visits (meaning loss of contact), withdrawal (withdrawal from study due to non-study or non-treatment factors), termination (termination of observation after the time specified by the design has been reached, but the study still survived);
(1.3.2), the cut-off of the survival observation process is due to other causes, not to death events;
(1.3.3) obtaining a region of interest nii file according to the step (1.2), extracting the region of interest features by combining the original CT image data, and obtaining a feature three-dimensional matrix f (32,32,32) containing the region of interest by each unit.
Further, in step (1.4), as shown in fig. 3, the deep neural network-based feature learning model is specifically described as follows: obtaining a characteristic matrix containing an interested area by each unit as the input of a network, wherein the size of the characteristic matrix is [ M multiplied by P ], wherein M represents the total number of the units; p represents the feature matrix dimension of each unit in the total units;
it is prepared byPutting the obtained product into a feature selector for feature selection; wherein the feature selector is composed of N0A convolution layer, N0The system comprises a pooling layer, a full-connection layer and a logistic regression output layer; the convolution layer comprises M1A filter, other convolutional layers including MiA plurality of filters, wherein the size of the filters is n multiplied by n, and n is the size of the filter size;
after each convolution layer, the maximum pooling operation is carried out, and each convolution layer with the pool size of m multiplied by m has a linear rectification function (RELU); the loss function is a Mean Square Error (MSE) which is given by the following equation:
Figure BDA0002693556580000071
wherein, ymWhich represents the true value of the image data,
Figure BDA0002693556580000072
representing the predicted value.
Further, in step (1.5), the specific operation method for performing effective dimensionality reduction on the colorectal cancer CT imaging omics data is as follows: firstly, selecting 600 multiplied by 6400 node information of a full connection layer of a feature learning model of a deep neural network as first effective feature dimension reduction; standardizing the data;
then, further effective dimensionality reduction is carried out on the features by adopting a least absolute contraction selection operator Lasso regression, and the risk coefficient score S of each person is obtained; wherein the Lasso regression loss function is as follows:
Figure BDA0002693556580000081
wherein xi represents each unit feature label, yi represents each unit time label, and λ represents the regularization coefficient,
Figure BDA0002693556580000082
representing the weight coefficients.
Further, in step (1.6), the specific procedures for classifying patients into high-risk survival group (S > T) and low-risk survival group (S < T) are as follows: finding a risk coefficient score S file for the patient, using the median of the imagery omics label score values as the cutoff value T: -2.227, with T as a cutoff value, for S > T the high risk group with a short life span and S < T the low risk group with a long life span.
Further, in step (1.7), the specific operation steps of curve evaluation and verification for the selected features are as follows:
(1.7.1) drawing a corresponding KM curve according to the cut-off value T obtained in the step (1.6), so that a result is visualized, and two survival probability curves are obtained;
(1.7.2) after different survival probability curves are obtained by using a KM method, determining whether the obvious difference among the curves is insufficient only by direct observation, and performing log-rank test by using an IBM SPSS statics 26 to finally obtain a P value;
(1.7.3) judging whether the two curves have significant difference according to the P value; p <0.05 is generally considered statistically different; the result P <0.01 was obtained with statistical differences.
Further, in the step (1.8), the specific operation steps of constructing a deep neural network multi-task logistic regression (DNN-MTLR) model for lifetime probability prediction are as follows:
(1.8.1) introducing the final effective characteristics obtained in the step (5), the time labels and the survival state labels into a deep neural network multitask logistic regression (DNN-MTLR) model; wherein the DNN-MTLR model is shown in FIG. 5;
each layer uses the following activation function:
layer #1:326 neurons, using activation function h(1)(x)=LeakyReLu(x)
Layer #2:652 neurons using the activation function h(2)(x)=ReLu(x)
Layer #3:1304 neurons, using the activation function h(3)(x)=ReLu(x)
Wherein LeakyReLu is a linear unit function with leakage correction, and ReLu is a linear unit function with leakage correction;
the time axis is divided into J-time intervals such that
Figure BDA0002693556580000083
Having τ 00 and τJInfinity; as shown in the following formula:
Figure BDA0002693556580000091
at each interval ajA logistic regression model is established, and parameters
Figure BDA0002693556580000092
And response variable
Figure BDA0002693556580000093
I.e. the event occurs in interval ajIs 1, otherwise is 0; however, since the effects of repeated events are not analyzed, it is necessary to ensure that when a unit is at interval asWhen an event is experienced, s ∈ [1, J ∈]The state of the remaining interval remains unchanged; thus, the response vector is described by:
Figure BDA0002693556580000094
wherein, ajRepresents a unit time interval: one month; y isjAs response variables: 1 represents the occurrence of an event, 0
Represents that no occurrence has occurred;
probability density function:
Figure BDA0002693556580000095
wherein exp () represents an exponential function with a natural number e as the base;
survival function:
Figure BDA0002693556580000096
wherein the content of the first and second substances,
Figure BDA00026935565800000911
so as to makex∈Rp
Figure BDA0002693556580000098
The feature vector is the nonlinear transformation of the input; the output of which is one
Figure BDA0002693556580000099
The vector, whose values are mapped to the J subdivision of the time axis, is described as follows:
Figure BDA00026935565800000910
(1.8.2), wherein the ratio of training set to test set is set to 8: 2, visualizing the result;
(1.8.3) evaluating the discriminative power of the deep neural network multitask logistic regression model using the consistency index (C-index) DNN-MTLR model: the C-index represents the overall evaluation of the identification power of the deep neural network multi-task logistic regression model, and the C-index (0.82).1 is obtained as the optimal prediction model, the numerical value 0.5 is obtained as the random prediction model, and the numerical value 0 is obtained as the inapplicable model. The calculation formula of the C-index is as follows:
Figure BDA0002693556580000101
wherein C-index represents the consistency index and η i represents the risk score of a unit i; 1Tj < Ti satisfies that Tj < Ti is 1, otherwise 0;
(1.8.4) evaluating the accuracy of a deep neural network multiple task logistic regression (DNN-MTLR) model using Integrated Brisket Score (IBS): wherein IBS represents the accuracy of the prediction survival function of the assessment model, IBS value: (0.06), wherein 0 is the best possible value; IBS <0.25 represents a useful model; wherein, the IBS calculation formula is as follows:
Figure BDA0002693556580000102
Figure BDA0002693556580000103
wherein IBS represents a composite brix score used to assess the accuracy of the model's predictive survival function. N is the number of data samples,
Figure BDA0002693556580000104
representing the actual probability of the occurrence of the event t for sample i.
The specific embodiment is as follows:
(1) and acquiring data: and obtaining CT image omics data of the patient A.
(2) And labeling the colorectal tumor region of the CT image omics data.
(3) And preprocessing the acquired data to obtain time and state labels.
(4) And constructing a feature learning model based on a deep neural network to obtain the CT image omics deep high-flux features of the patient A.
(5) And performing dimensionality reduction on the depth high-flux characteristic of the CT image omics data by using a lasso regression operator, and establishing a risk scoring model of the patient A.
(6) And classifying the patient A into a high-risk group according to the risk score of the imaging group of the patient A.
(7) And evaluating and verifying the obtained deep high-flux characteristics.
(8) And putting the features obtained by dimensionality reduction into a deep neural network multi-task logistic regression model for life cycle probability prediction, and finally obtaining a prediction result.
The results are shown in FIG. 7, where the annual probability results are shown in the following table:
month of the year 12 24 36 48 60
Probability of survival 97.649882% 93.804818% 85.995414% 66.933918% 48.437748%
The results show that such methods can be used for survival prediction in colorectal cancer patients.

Claims (8)

1. The colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics is characterized by comprising the following specific steps of:
step (1.1), data acquisition: the data comprises clinical data and CT imaging omics data;
step (1.2), carrying out colorectal tumor region labeling on CT image omics data;
step (1.3), preprocessing the acquired data;
step (1.4), constructing a feature learning model based on a deep neural network to obtain deep high-flux features of the colorectal cancer CT image omics data;
step (1.5), reducing dimensions by utilizing the Lasso regression to the deep high-flux characteristics of the colorectal cancer CT image omics data, and establishing a risk score model of a patient;
step (1.6), according to the proteomic risk score S of the patient, obtaining a cutoff value T by using a median value of the proteomic label score values, and dividing the patient into a survival period high risk group and a survival period low risk group;
step (1.7), carrying out curve evaluation and verification on the obtained deep high-flux characteristics by using a drawing KM curve and adopting data analysis software;
and (1.8) constructing a deep neural network multi-task logistic regression model to predict the life cycle probability.
2. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein in step (1.1), specifically:
(1.1.1), clinical data: including the age, sex, survival status of the patient: 1 or 0 and the time of interest since the CT image was taken; wherein 1 represents death and 0 represents survival;
(1.1.2), CT imaging omics data: i.e. CT image data taken by the patient.
3. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein in step (1.2), the specific operation manner of labeling the colorectal tumor region for the CT imaging omics data is as follows: and (3) introducing the CT image omics data into the ITK-SNAP in batches according to unit sequence, manually marking the ITK-SNAP, selecting an interested region where the tumor is located, and storing the marked CT image omics data into an nii file.
4. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein in said step (1.3), the specific operation steps of preprocessing the obtained data are as follows:
and (3) carrying out pre-selection deletion on the data, wherein the elimination criteria are as follows:
(1.3.1) incomplete information of clinical information record, wherein the incomplete reasons comprise missed visits, quits and terminations;
(1.3.2), the cut-off of the survival observation process is due to other causes, not to death events;
(1.3.3) obtaining a region of interest nii file according to the step (1.2), extracting the region of interest features by combining original CT image omics data, and obtaining a feature three-dimensional matrix f (P, P, P) containing the region of interest by each unit, wherein P represents the size of the matrix.
5. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein in step (1.4), the deep neural network-based feature learning model is specifically described as follows: obtaining a characteristic matrix containing an interested area by each unit as the input of a network, wherein the size of the characteristic matrix is [ M multiplied by P ], wherein M represents the total number of the units; p represents the feature matrix dimension of each unit in the total units;
putting the obtained object into a feature selector for feature selection; wherein the feature selector is composed of N0A convolution layer, N0Each pooling layer, the full-connection layer and the logistic regression output layer; the convolution layer comprises M1A filter, other convolutional layers including MiA plurality of filters, wherein the filter size is n × n × n, and n represents the filter size;
after each convolution layer, the maximum pooling operation is carried out, and each convolution layer with the size of the pool being m multiplied by m has a linear rectification function; the loss function adopts a mean square error, and the formula is as follows:
Figure FDA0002693556570000021
wherein, ymThe actual value is represented by the value of,
Figure FDA0002693556570000022
indicating the predicted value.
6. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein in step (1.5), the specific operation method for performing effective dimension reduction on the colorectal cancer CT imaging omics data is as follows: firstly, M multiplied by K node information of a full connection layer of a feature learning model of a deep neural network is selected as first effective feature dimension reduction, wherein M represents the total unit number, and K is the node information number; standardizing the data;
then, further effective dimensionality reduction is carried out on the features by adopting a least absolute contraction selection operator Lasso regression, and the risk coefficient score S of each person is obtained; the Lasso regression loss function is given by:
Figure FDA0002693556570000023
wherein xi represents each unit feature label, yi represents each unit time label, λ represents the regularization coefficient,
Figure FDA0002693556570000024
representing the weight coefficients.
7. The method for predicting survival of colorectal cancer based on deep learning CT imaging omics as claimed in claim 1, wherein the specific operation steps of curve evaluation and verification of the selected features in step (1.7) are as follows:
(1.7.1) drawing a corresponding KM curve according to the cut-off value T obtained in the step (1.6), so that a result is visualized, and two survival probability curves are obtained;
(1.7.2) after different survival probability curves are obtained by using a KM method, chi-square test is carried out through data analysis software, and finally a P value is obtained;
(1.7.3) judging whether the two curves have significant difference according to the P value.
8. The method for predicting survival time of colorectal cancer based on deep learning CT (computed tomography) proteomics as claimed in claim 1, wherein in the step (1.8), the specific operation steps of constructing a deep neural network multi-task logistic regression model for predicting survival time probability are as follows:
(1.8.1) introducing the final effective characteristics obtained in the step (1.5), the time labels and the survival state labels into a deep neural network multitask logistic regression model;
wherein each layer of the deep neural network multitask logistic regression model uses the following activation function:
layer #1: M1 neurons using the activation function h(1)(x)=LeakyReLu(x)
Layer # 2M 2 neurons using the activation function h(2)(x)=ReLu(x)
Layer # 3M 3 neurons using the activation function h(3)(x)=ReLu(x)
Wherein LeakyReLu represents a linear unit function with leakage correction, and ReLu represents a linear unit function with leakage correction;
the time axis is divided into J-time intervals such that
Figure FDA0002693556570000031
Having τ00 and τJInfinity; as shown in the following formula:
Figure FDA0002693556570000032
at each interval ajA logistic regression model is established, and parameters
Figure FDA0002693556570000033
And response variable
Figure FDA0002693556570000034
I.e. the event occurs in interval ajIs 1, otherwise is 0;
when a unit is in the interval asWhen an event is experienced, s ∈ [1, J ∈]The state of the remaining interval remains unchanged; thus, the response vector is described by:
Figure FDA0002693556570000041
wherein, ajRepresents a unit time interval: one month; y isjAs response variables: 1 represents event occurrence, 0 represents non-occurrence;
probability density function:
Figure FDA0002693556570000042
wherein exp () represents an exponential function with a natural number e as the base;
survival function:
Figure FDA0002693556570000043
wherein the content of the first and second substances,
Figure FDA0002693556570000049
Figure FDA0002693556570000044
so as to makex∈Rp
Figure FDA0002693556570000045
The feature vector is the nonlinear transformation of the input; the output of which is one
Figure FDA0002693556570000046
Vectors whose values are mapped to J subdivisions of the time axis as described below:
Figure FDA0002693556570000047
(1.8.2), wherein the ratio of training set to test set is set to 8: 2, visualizing the result;
(1.8.3) evaluating the identification power of the deep neural network multitask logistic regression model by using a consistency index: the consistency index represents the overall evaluation of the identification power of the deep neural network multi-task logistic regression model, the numerical range of the consistency index is 0-1, the numerical value 1 is the optimal prediction model, the numerical value 0.5 is the random prediction model, and the numerical value 0 is the inapplicable model; the consistency index is calculated as follows:
Figure FDA0002693556570000048
wherein C-index represents the consistency index and η i represents the risk score of a unit i; 1Tj < Ti satisfies that Tj < Ti is 1, otherwise 0;
(1.8.4) evaluating the accuracy of the deep neural network multiple task logistic regression model using IBS: the numerical range is between 0 and 1, wherein 0 is the best possible value; IBS <0.25 represents a useful model; wherein, the IBS calculation formula is as follows:
Figure FDA0002693556570000051
Figure FDA0002693556570000052
wherein IBS represents a composite brix score used to assess the accuracy of the model's predictive survival function. N is the number of data samples,
Figure FDA0002693556570000053
representing the actual probability of the occurrence of the event t for sample i.
CN202011005022.9A 2020-09-22 2020-09-22 Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics Pending CN112309576A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011005022.9A CN112309576A (en) 2020-09-22 2020-09-22 Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011005022.9A CN112309576A (en) 2020-09-22 2020-09-22 Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics

Publications (1)

Publication Number Publication Date
CN112309576A true CN112309576A (en) 2021-02-02

Family

ID=74488430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011005022.9A Pending CN112309576A (en) 2020-09-22 2020-09-22 Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics

Country Status (1)

Country Link
CN (1) CN112309576A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112820403A (en) * 2021-02-25 2021-05-18 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data
CN112927799A (en) * 2021-04-13 2021-06-08 中国科学院自动化研究所 Life cycle analysis system fusing multi-example learning and multi-task depth imaging group
CN113257413A (en) * 2021-06-22 2021-08-13 安翰科技(武汉)股份有限公司 Cancer prognosis survival prediction method and device based on deep learning and storage medium
CN113313680A (en) * 2021-05-24 2021-08-27 华南理工大学 Colorectal cancer pathological image prognosis auxiliary prediction method and system
CN113345576A (en) * 2021-06-04 2021-09-03 江南大学 Rectal cancer lymph node metastasis diagnosis method based on deep learning multi-modal CT
CN113689382A (en) * 2021-07-26 2021-11-23 北京知见生命科技有限公司 Tumor postoperative life prediction method and system based on medical images and pathological images
CN113724876A (en) * 2021-09-10 2021-11-30 南昌大学第二附属医院 Intra-stroke hospital complication prediction model based on multi-mode fusion and DFS-LLE algorithm
CN114188021A (en) * 2021-12-13 2022-03-15 浙江大学 Intelligent analysis system for children intussusception diagnosis based on multi-mode fusion
CN114511564A (en) * 2022-04-19 2022-05-17 天津市肿瘤医院(天津医科大学肿瘤医院) Image analysis method for breast cancer residual tumor load based on DCE-MRI
CN115762764A (en) * 2022-11-25 2023-03-07 中山大学附属第三医院 HIV negative cryptococcus meningitis treatment outcome prediction model and construction method thereof
CN117334347A (en) * 2023-12-01 2024-01-02 北京大学 Method, device, equipment and storage medium for evaluating treatment effect
CN117474865A (en) * 2023-11-01 2024-01-30 东南大学 CT image prediction and identification method and system
CN117524486A (en) * 2024-01-04 2024-02-06 北京市肿瘤防治研究所 TTE model establishment method for predicting non-progressive survival probability of postoperative patient

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110895817A (en) * 2019-11-01 2020-03-20 复旦大学 MRI image hepatic fibrosis automatic grading method based on image omics analysis
CN111210441A (en) * 2020-01-02 2020-05-29 苏州瑞派宁科技有限公司 Tumor prediction method and device, cloud platform and computer-readable storage medium
CN111353998A (en) * 2020-05-13 2020-06-30 温州医科大学附属第一医院 Tumor diagnosis and treatment prediction model and device based on artificial intelligence
AU2020101581A4 (en) * 2020-07-31 2020-09-17 Ampavathi, Anusha MS Lymph node metastases detection from ct images using deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110895817A (en) * 2019-11-01 2020-03-20 复旦大学 MRI image hepatic fibrosis automatic grading method based on image omics analysis
CN111210441A (en) * 2020-01-02 2020-05-29 苏州瑞派宁科技有限公司 Tumor prediction method and device, cloud platform and computer-readable storage medium
CN111353998A (en) * 2020-05-13 2020-06-30 温州医科大学附属第一医院 Tumor diagnosis and treatment prediction model and device based on artificial intelligence
AU2020101581A4 (en) * 2020-07-31 2020-09-17 Ampavathi, Anusha MS Lymph node metastases detection from ct images using deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
梁萌;马霄虹;赵心明;: "影像组学在结直肠癌肝转移诊治中的研究进展", 中国医学影像学杂志, no. 05 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112820403B (en) * 2021-02-25 2024-03-29 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple sets of learning data
CN112820403A (en) * 2021-02-25 2021-05-18 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple groups of mathematical data
CN112927799B (en) * 2021-04-13 2023-06-27 中国科学院自动化研究所 Life analysis system integrating multi-example learning and multi-task depth image histology
CN112927799A (en) * 2021-04-13 2021-06-08 中国科学院自动化研究所 Life cycle analysis system fusing multi-example learning and multi-task depth imaging group
CN113313680A (en) * 2021-05-24 2021-08-27 华南理工大学 Colorectal cancer pathological image prognosis auxiliary prediction method and system
CN113345576A (en) * 2021-06-04 2021-09-03 江南大学 Rectal cancer lymph node metastasis diagnosis method based on deep learning multi-modal CT
CN113257413A (en) * 2021-06-22 2021-08-13 安翰科技(武汉)股份有限公司 Cancer prognosis survival prediction method and device based on deep learning and storage medium
CN113689382A (en) * 2021-07-26 2021-11-23 北京知见生命科技有限公司 Tumor postoperative life prediction method and system based on medical images and pathological images
CN113689382B (en) * 2021-07-26 2023-12-01 北京知见生命科技有限公司 Tumor postoperative survival prediction method and system based on medical images and pathological images
CN113724876A (en) * 2021-09-10 2021-11-30 南昌大学第二附属医院 Intra-stroke hospital complication prediction model based on multi-mode fusion and DFS-LLE algorithm
CN114188021A (en) * 2021-12-13 2022-03-15 浙江大学 Intelligent analysis system for children intussusception diagnosis based on multi-mode fusion
CN114188021B (en) * 2021-12-13 2022-06-10 浙江大学 Intelligent analysis system for children intussusception diagnosis based on multi-mode fusion
CN114511564B (en) * 2022-04-19 2023-01-24 天津市肿瘤医院(天津医科大学肿瘤医院) Image analysis method for breast cancer residual tumor load based on DCE-MRI
CN114511564A (en) * 2022-04-19 2022-05-17 天津市肿瘤医院(天津医科大学肿瘤医院) Image analysis method for breast cancer residual tumor load based on DCE-MRI
CN115762764A (en) * 2022-11-25 2023-03-07 中山大学附属第三医院 HIV negative cryptococcus meningitis treatment outcome prediction model and construction method thereof
CN117474865A (en) * 2023-11-01 2024-01-30 东南大学 CT image prediction and identification method and system
CN117334347A (en) * 2023-12-01 2024-01-02 北京大学 Method, device, equipment and storage medium for evaluating treatment effect
CN117334347B (en) * 2023-12-01 2024-03-22 北京大学 Method, device, equipment and storage medium for evaluating treatment effect
CN117524486A (en) * 2024-01-04 2024-02-06 北京市肿瘤防治研究所 TTE model establishment method for predicting non-progressive survival probability of postoperative patient
CN117524486B (en) * 2024-01-04 2024-04-05 北京市肿瘤防治研究所 TTE model establishment method for predicting non-progressive survival probability of postoperative patient

Similar Documents

Publication Publication Date Title
CN112309576A (en) Colorectal cancer survival period prediction method based on deep learning CT (computed tomography) image omics
WO2022063200A1 (en) Non-small cell lung cancer prognosis survival prediction method, medium and electronic device
RU2543563C2 (en) Systems and methods for clinical decision support
CN110335665A (en) It is a kind of applied to medical image auxiliary diagnosis analysis to scheme to search drawing method and system
Zhang et al. Stroke lesion detection and analysis in MRI images based on deep learning
RU2459244C2 (en) Clinician-driven example-based computer-aided diagnosis
Du et al. Identification of COPD from multi-view snapshots of 3D lung airway tree via deep CNN
CN108230311A (en) A kind of breast cancer detection method and device
CN112614133B (en) Three-dimensional pulmonary nodule detection model training method and device without anchor point frame
CN114066882A (en) Lung adenocarcinoma Ki67 expression level non-invasive detection method and device based on depth imaging omics
CN115099331A (en) Auxiliary diagnosis system for malignant pleural effusion based on interpretable machine learning algorithm
CN114582496A (en) Common gynecological disease prediction model construction method and prediction system
Zhang et al. A deep learning model for the differential diagnosis of benign and malignant salivary gland tumors based on ultrasound imaging and clinical data
Kumar et al. Colon cancer classification of histopathological images using data augmentation
Tang et al. M-SEAM-NAM: multi-instance self-supervised equivalent attention mechanism with neighborhood affinity module for double weakly supervised segmentation of COVID-19
Yang A novel brain image segmentation method using an improved 3D U‐net model
CN115274119B (en) Construction method of immunotherapy prediction model fusing multi-image mathematical characteristics
Chang et al. DARWIN: a highly flexible platform for imaging research in radiology
CN113889235A (en) Unsupervised feature extraction system for three-dimensional medical image
CN112562851A (en) Method and system for constructing neck lymph metastasis diagnosis algorithm of oral cancer
Sajiv et al. Machine Learning based Analysis of Histopathological Images of Breast Cancer Classification using Decision Tree Classifier
Oermann et al. Longitudinal deep neural networks for assessing metastatic brain cancer on a massive open benchmark.
Lu et al. Timeline and episode-structured clinical data: Pre-processing for Data Mining and analytics
Rathod et al. Using Weakly Supervised Machine learning Algorithms for Classification and Analysis of CT Scan Lung Cancer Images
Chandrakantha et al. A Survey on Artificial Intelligence-based Lung Tumor Segmentation and Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination