WO2022110278A1 - 一种用于肺结节风险性评估的*** - Google Patents

一种用于肺结节风险性评估的*** Download PDF

Info

Publication number
WO2022110278A1
WO2022110278A1 PCT/CN2020/133952 CN2020133952W WO2022110278A1 WO 2022110278 A1 WO2022110278 A1 WO 2022110278A1 CN 2020133952 W CN2020133952 W CN 2020133952W WO 2022110278 A1 WO2022110278 A1 WO 2022110278A1
Authority
WO
WIPO (PCT)
Prior art keywords
patient
pulmonary
data
results
model
Prior art date
Application number
PCT/CN2020/133952
Other languages
English (en)
French (fr)
Inventor
叶莘
范献军
周燕玲
陈燕慈
黄萌
张俊成
石剑峰
Original Assignee
珠海圣美生物诊断技术有限公司
珠海横琴圣澳云智科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 珠海圣美生物诊断技术有限公司, 珠海横琴圣澳云智科技有限公司 filed Critical 珠海圣美生物诊断技术有限公司
Publication of WO2022110278A1 publication Critical patent/WO2022110278A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung
    • G06T2207/30064Lung nodule
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion

Definitions

  • the present disclosure relates to the technical field of medical data processing, and in particular, to a system for risk assessment of pulmonary nodules.
  • Sarcoidosis is a multi-system and multi-organ granulomatous disease of unknown etiology, which often invades the lungs, bilateral hilar lymph nodes, eyes or skin and other organs. It is distributed around the world, with a higher incidence in European and American countries, and is rare in eastern ethnic groups.
  • sarcoidosis is the result of the mutual competition between unknown antigens and the body's cellular and humoral immune functions. Due to individual differences (age, gender, race, genetic factors, hormones or HLA) and the regulation of antibody immune responses, the development and regression of granulomas are determined by the imbalance between the promoting and antagonistic factors produced by them. Shows different pathological states of sarcoidosis and a tendency to spontaneous remission.
  • Pulmonary sarcoid granulomas are seen on histological sections as aggregations of dermoid cells with multinucleated macrophages surrounded by lymphocytes without caseous lesions.
  • Inclusion bodies such as oval Schumann bodies, birefringent crystals and asteroid bodies, can be seen in the vesicles of macrophages.
  • the primary lesions of pulmonary sarcoidosis are extensive alveolitis infiltrated by monocytes, macrophages, or lymphocytes, involving the alveolar walls and interstitium. Both alveolitis and granulomas may resolve on their own.
  • the fibroblasts surrounding the granuloma are collagenous and hyalinizing, becoming nonspecific fibrosis.
  • the histomorphological manifestations of granulomas are not characteristic and can be seen in mycobacterial and fungal infections, or tissue reactions to foreign bodies or trauma, as well as in beryllium disease, tertiary syphilis, lymphoma, and exogenous allergic alveolitis etc., should be identified.
  • the disease can be diagnosed.
  • Common pulmonary nodules are mostly benign, and malignant pulmonary nodules can be diagnosed as lung cancer. Therefore, pulmonary nodules are usually the main early manifestations of lung cancer. Accurate detection of pulmonary nodules is of great significance for the early diagnosis and treatment of lung cancer.
  • pulmonary nodules The size range of pulmonary nodules is widely distributed, including large-sized nodules over 5 cm, and sub-centimeter nodules. For large-sized pulmonary nodules, traditional medical imaging detection can be used to detect them, while for small-sized pulmonary nodules. Pulmonary nodules (3-10mm), which are only observed by traditional medical images, will be judged differently due to different doctor levels, hospital conditions or data islands, resulting in a large number of clinical experience misdiagnosis.
  • CT computed tomography
  • the sample size used in the existing CT imaging AI technology is relatively small, and the threshold is usually set low, so the detection accuracy and sensitivity are higher than the actual value, and false positives are common in the obtained results. Experts need to be reviewed before the final result can be determined. In fact, not only does it not improve the detection efficiency, but it increases the detection cost.
  • CAC circulating abnormal cells
  • CAC circulating abnormal cells
  • CAC detection has a relatively consistent detection rate for different types and stages of lung cancer, and the identification accuracy rate of pulmonary nodules smaller than 5-10mm is also more than 70%. It is small and difficult to analyze and identify from the image. Therefore, the current detection results of CAC can be used as an effective supplement to the CT image AI analysis technology, and the detection results of CT image AI analysis can be used to check and fill in the gaps, such as the affiliated Hospital of Qingdao University. Dr. Xu Tao used the CAC test results provided by our company to correct the CT-AI analysis results. Eight high-risk samples were found from 11 low-risk samples judged by CT-AI that clinicians tended to treat aggressively.
  • the present disclosure provides a system for risk assessment of pulmonary nodules.
  • the system through machine self-learning, not only realizes the combination of the patient's lesion image results and the patient's CAC detection data for pulmonary nodules
  • the risk assessment further integrates the patient's risk factors, which significantly improves the accuracy of the risk assessment of the patient's pulmonary nodules.
  • the present disclosure provides a system for risk assessment of pulmonary nodules, comprising:
  • a data acquisition module configured to acquire the patient's lesion imaging results, the patient's CAC detection data and the patient's risk factors
  • a data processing module configured to preprocess the data acquired by the data acquisition module, and the output result of the preprocessing matches the pulmonary nodule risk assessment module;
  • the pulmonary nodule risk assessment module is configured to use the pulmonary nodule risk assessment model constructed by applying machine learning to calculate the preprocessing output result of the data processing module to obtain the pulmonary nodule risk result.
  • the risk factors of the patient include one or a combination of two or more of the patient's gender, age, family history of tumor or smoking history.
  • the data processing module is configured to: convert the lesion image results of the patient into lesion image analysis data through artificial intelligence calculation, and output the malignant probability of the patient's pulmonary nodules; convert the patient's gender into corresponding gender identifiers; converting the family tumor history into a corresponding family tumor history identifier; converting the smoking history into a corresponding smoking history identifier.
  • the gender identifier refers to that the gender identifier of male patients is 1, and the gender identifier of female patients is 0;
  • the family tumor medical history identifier refers to that the medical history identifier of patients with family tumor history is 1, and the patient with no family tumor history is identified as 1.
  • the patient's medical history identifier is 0;
  • the smoking history identifier means that the smoking history identifier of a patient with a smoking history is 1, and the smoking history identifier of a patient without a smoking history is 0.
  • the image results of the patient's lesions include one or both of low-dose helical CT scans, thin-slice helical CT scans, X-ray chest radiographs or positron emission computed tomography, and combination of the above.
  • the lesion image analysis method includes: firstly performing 3D topology reconstruction on the obtained image, then performing 3D segmentation on the nodule boundary, then extracting the features of the nodule image, and performing composition, correlation and clustering based on the extracted features. Analysis, and finally calculate the malignant probability of artificial intelligence according to the existing nodule image judgment standard guidelines and the marked image data.
  • the patient's CAC detection data includes the number of circulating abnormal cells per 10,000 mononuclear cells obtained by the patient's CAC detection.
  • the sample used in the CAC detection of the patient includes one or a combination of two or more of the patient's blood, pleural and ascites fluid, bronchoalveolar lavage fluid, urine, saliva or cerebrospinal fluid.
  • the pulmonary nodule risk assessment module adopts a logistic regression model, and calculates the patient's pulmonary nodule risk according to a preprocessing result;
  • the preprocessing result includes the patient's lesion image analysis data, the patient's The CAC detection data, the age of the patient and the gender identification of the patient;
  • the logistic regression model calculation formula is:
  • ⁇ T is an independent variable matrix, including the patient’s lesion image analysis data x 1 , the patient’s CAC detection data x 2 , the patient’s age identification x 3 and the patient’s gender identification x 4 , where ⁇ T is the coefficient matrix corresponding to the independent variable matrix X, ⁇ 0 is a constant coefficient, and ⁇ is the malignant probability of the patient's pulmonary nodule; compare the calculated ⁇ with the preset classification threshold to obtain a comparison result; output the benign or malignant label of the patient's pulmonary nodule based on the comparison result, the said The benign flag is 0, and the malignant flag is 1.
  • the classification threshold is 0.5-0.8.
  • the classification threshold is 0.6.
  • the ⁇ 1 is any value from 3.08 to 15.05, preferably 7.92;
  • the ⁇ 2 is any value from -0.12 to 0.40, preferably 0.10;
  • the ⁇ 3 is any value from -0.03 to 0.16, preferably 0.06;
  • the ⁇ 4 is any value from -7.72 to -1.43, preferably -3.9;
  • the ⁇ 0 is any value from -12.60 to 1.18, preferably -4.94.
  • the pulmonary nodule risk assessment module adopts a decision tree model, and uses the patient's lesion image analysis data, the patient's CAC detection data, the patient's age and the patient's gender identification as the classification features, according to the prediction The outcome of the treatment classifies the patient's risk of pulmonary nodules.
  • the decision depth of the decision tree is 2-7.
  • the decision depth of the decision tree is 4.
  • the decision depth of the decision tree is 7.
  • the pulmonary nodule risk assessment module adopts a random forest model, and at the same time builds 100-1000 decision trees to classify the risk of pulmonary nodules of patients, and according to the 100-1000 decision trees
  • the classification results calculate the malignant probability of the patient's pulmonary nodules; the 100-1000 decision trees randomly select 2-4 predictions from the patient's lesion image analysis data, the patient's CAC detection data, the patient's age, and the patient's gender identifier.
  • the processing result is used as a partition feature.
  • the number of decision trees is 300.
  • the present disclosure provides a training method for a pulmonary nodule risk assessment model adopted by the pulmonary nodule risk assessment module in the system described in the foregoing embodiments, the training method comprising the acquired lesions of patients with known pathological results
  • the image analysis data, the patient's CAC detection data, the corresponding identification after the patient's risk factor conversion, and the patient's pathological detection data are input into the preset model as self-learning samples, and the characteristic parameters are obtained through self-learning to determine the risk assessment model for pulmonary nodules .
  • the pulmonary nodule risk assessment model is a logistic regression model
  • the characteristic parameters obtained by self-learning include a coefficient matrix, a constant coefficient and a classification threshold.
  • the pulmonary nodule risk assessment model is a decision tree model
  • the characteristic parameters obtained by self-learning include the division characteristic value of the root node and the division characteristic value of the parent nodes at all levels.
  • the pulmonary nodule risk assessment model is a random forest model
  • the characteristic parameters obtained by self-learning include the number of decision trees, the division characteristic value of the root node of each decision tree, and the division characteristic value of parent nodes at all levels.
  • the above-mentioned preset models include logistic regression models, which are classical models used in statistical modeling to build models for binary variables. It is based on the assumption that the dependent variable obeys the Bernoulli distribution, and has many similarities with the linear regression assuming that the dependent variable obeys the Gaussian distribution.
  • the above-mentioned lesion image analysis data includes artificial intelligence calculation to give the malignant probability of the patient's pulmonary nodules, and the artificial intelligence calculation method includes using a convolutional neural network as a model to digitally process and analyze medical image information.
  • the method of probability value according to the method after clinically collecting the patient's lesion image analysis data and pathological analysis results, and performing mathematical statistical verification, it is confirmed that there is a significant relationship between the lesion image analysis data selected in this disclosure and the deterioration of pulmonary nodules.
  • the above CAC detection data includes the detection data obtained by detecting the blood cells of the patient according to the CAC detection principle, including the corresponding detection data obtained by performing the detection operation using the CAC kit or the CAC detection equipment.
  • Mathematical statistical results confirm that there is also a link between the CAC detection data selected in the present disclosure and the probability of pulmonary nodule deterioration. Therefore, the present disclosure constructs a model to realize the joint processing and analysis of the patient's lesion image analysis data and the patient's blood CAC detection data.
  • the construction method of the logistic regression analysis includes: taking the acquired risk factors of the patient, the image analysis data of the patient's lesions and the liquid sample CAC detection data of the patient as independent variables, and using the pathological detection data of the patient as the dependent variable to construct a logistic regression equation, Then, for the purpose of minimizing the cost function, the optimized coefficient matrix and constant coefficient of the independent variable are obtained through the gradient descent algorithm or the iterative weighted least square method, the logistic regression equation is obtained, and the classification threshold is determined, and the classification threshold is 0.5 ⁇ 0.8 , the threshold growth gradient in the process of determining the classification threshold is 0.05.
  • the accuracy rate obtained by using the logistic regression model provided by the present disclosure for evaluation is comparable to the accuracy rate and sensitivity obtained by using the AI analysis data and CAC detection data of CT images alone to evaluate the malignancy of pulmonary nodules in the prior art.
  • the ratio of benign and malignant nodules obtained by the logistic regression model was compared with the pathological results, and the accuracy rate was close to 90%.
  • the above-mentioned preset model includes a decision tree model.
  • Decision Tree is based on the known probability of occurrence of various situations, by forming a decision tree to obtain the probability that the expected value of the net present value is greater than or equal to zero, and evaluate the project risk.
  • the decision analysis method for judging its feasibility is a graphical method that uses probability analysis intuitively. Since this decision branch is drawn as a graph like the branches of a tree, it is called a decision tree.
  • a decision tree is a predictive model that represents a mapping relationship between object attributes and object values.
  • the present disclosure is characterized by the patient's lesion image analysis data, the patient's CAC detection data and the patient's risk factors, through the Gini coefficient Computes the case after division by the eigenvalue of one of the features, where The smallest feature is used as the dividing node, and so on until all the data are divided into leaf nodes or the maximum depth, and finally the decision tree model and decision depth are obtained; D is an internal node in the decision tree training process, and A is the training process that can be used. A certain splitting method of , p is the probability of becoming a certain category in the internal node. After cross-validation, the decision depth of the decision tree can be obtained.
  • the maximum depth of the decision tree provided by this disclosure is 7.
  • the accuracy rate obtained by using the decision tree model provided by the present disclosure for evaluation is comparable to the accuracy rate and sensitivity obtained by using the AI analysis data and CAC detection data of CT images alone in the prior art to evaluate the malignancy of pulmonary nodules. ratios were significantly improved.
  • the above-mentioned preset models also include random forest models, which can explain the effects of several independent variables (X 1 , X 2 , . . . and X k ) on the dependent variable Y. If the dependent variable Y has n observations, there are k independent variables associated with it; when building a separate classification tree in the random forest model, the random forest will randomly reselect n observations in the original data, follow Bootstrap A sampling method in which some observations are selected multiple times and some are not. At the same time, random forest randomly selects some variables from k independent variables to determine the nodes of the classification tree. In this way, the classification tree constructed each time may be different.
  • the acquired patient lesion image analysis data, the patient's CAC detection data, and the patient's risk factors are used as independent variables, and the patient's pathological detection data are used as the dependent variable, and a subset of the number of decision trees is randomly selected from all samples.
  • a learning sample a set number of decision trees are established, and finally the tuning parameters are screened by cross-validation to obtain a random forest model.
  • the accuracy rate obtained by using the random forest model provided by the present disclosure for evaluation is comparable to the accuracy rate and sensitivity obtained by using the AI analysis data and CAC detection data of CT images alone to evaluate the malignancy of pulmonary nodules in the prior art. ratios were significantly improved.
  • the above-mentioned cross-validation method includes randomly dividing the training data or self-learning samples into k parts, using 1 part as the test set in turn, and using the other k-1 parts as the training set, where k is preferably 10.
  • the present disclosure provides a system for pulmonary nodule risk assessment.
  • the pulmonary nodule risk assessment module of the system is capable of performing analysis on the collected image analysis data of the patient's lesions, the patient's CAC detection data and the patient's risk factors. Combined assessment and analysis, while giving the malignant probability of the patient's pulmonary nodules, gives the risk prediction. Compared with AI analysis data of CT images and CAC detection data alone, it has higher accuracy in predicting the risk of pulmonary nodules.
  • Figure 1 is a schematic diagram of the AI processing flow chart of the patient's lesion image results
  • Fig. 2 is a schematic diagram of AI nodule feature processing of the patient's lesion image results
  • Figure 3 is a schematic diagram of a CT-AI processing result display interface
  • FIG. 4 is a schematic diagram of a blood sample fluorescence in situ hybridization processing flow
  • Figure 5 is a schematic diagram of the results of fluorescence in situ hybridization of CAC and normal cells
  • Fig. 6 is the decision tree obtained in embodiment 2;
  • Fig. 7 is the decision result of embodiment 9 applying the decision tree of embodiment 2;
  • FIG. 8 shows the cross-validation accuracy results obtained in Examples 10 to 14 with different numbers of decision trees.
  • the patient lesion image analysis data used in the following groups of embodiments of the present disclosure are obtained by processing the patient's lesion image results through TargetCall TM software (Zhuhai Shengmei Bio-Diagnostic Technology Co., Ltd.), and the analysis process sequentially includes image acquisition (S1) , image segmentation (S2), feature extraction (S3), model construction (S4) and classification prediction (S5), as shown in Figure 1, the specific steps are as follows:
  • Nodule detection After the nodule is detected, the image is segmented, and the feature of the nodule is extracted, and then the model is constructed based on the feature information of the nodule, and three-dimensional analysis is performed, so as to realize the identification of benign and malignant nodules.
  • the feature extraction includes: acquisition of characteristic parameters such as the percentage of solid components of the nodule, the percentage of calcified components, the volume of the nodule, the diameter of the nodule, and the density of the nodule.
  • the three-dimensional analysis includes: nodule classification and nodule segmentation, and estimation of nodule malignancy probability and malignancy grade according to pulmonary nodule guidelines and pulmonary nodule radiomics.
  • nodules The classification of nodules includes: solid nodules, mixed nodules, ground glass nodules and calcified nodules, of which the first three are the types of nodules that doctors pay more attention to.
  • nodule classification there was also considerable variability among physicians, especially for mixed nodules, with an agreement of approximately 65%.
  • image segmentation is as follows: in TargetCall TM software (Zhuhai Shengmei Bio-Diagnostic Technology Co., Ltd.) software, the system can perform three-dimensional segmentation of irregular solid nodules and ground-glass nodules. For mixed nodules, the system will perform three-dimensional segmentation of solid components, calculate the percentage of solid components, and segment calcified components according to different shapes of calcified components, wherein the shapes of calcified components include diffuse, Central, stratified, and popcorn-shaped, and nodules with a calcified component greater than 80% were classified as calcified nodules.
  • the nodule volume calculation method in the feature extraction is as follows: the segmented nodule image is composed of multiple pixel points, and the volume of the nodule is obtained by multiplying the number of pixel points by the volume of each pixel point.
  • Nodule diameter measurement methods include three-dimensional diameter measurement and two-dimensional diameter measurement.
  • the three-dimensional diameter includes the axial diameter and the standard diameter, and the two-dimensional diameter measures the long and short diameters in the cross-section of the nodule.
  • the measurement method of nodule density is: each pixel obtained after segmentation corresponds to a density value, and the system calculates the average density of all pixels accordingly, sorts all pixels according to the density, and takes the 95th percentile density. is the maximum density, and the 5th percentile density is taken as the minimum density.
  • the original image (A) of the lung nodule is extracted with intensity, shape and texture.
  • the feature parameters (B) After equalizing the feature parameters (B), perform a correlation analysis (C) on the extracted features to examine the dependencies between the various features, and on this basis, carry out model construction, and then give the malignant probability results of classification prediction. .
  • the probability of malignancy refers to the probability that the nodule is malignant, and refers to the confidence score for the distinction between benign and malignant. The range is 1% to 100%. If the probability of malignancy is >50%, it is likely to be a malignant nodule. The higher the probability of malignancy, the higher the confidence that the nodule is malignant. Conversely, if the probability of malignancy is ⁇ 50%, it is likely to be a benign nodule. The lower the probability of malignancy, the higher the confidence that the nodule is benign.
  • the malignant probability of nodules can be divided into four types: very low, malignant probability ⁇ 5%; low, malignant probability 5%-40%; medium, malignant probability 40%-65%; high, malignant Probability >65%.
  • the CT-AI algorithm can calculate the malignancy grades: very low, low, medium and high.
  • nodule management recommendations can be automatically given, and the final CT-AI processing results are shown in Figure 3.
  • the CAC detection method adopted in the following embodiments of the present disclosure includes, according to the CAC detection principle, using a CAC kit or a CAC detection device to perform a detection operation, the detection process is shown in Figure 4, and the specific steps are as follows:
  • sample collection S1
  • a patient blood sample for example, collect a patient blood sample, fix it with a cell preservation solution (Zhuhai Shengmei Bio-Diagnostic Technology Co., Ltd.), and then use a density gradient centrifugation method to separate cells from the blood sample.
  • a cell preservation solution Zhuhai Shengmei Bio-Diagnostic Technology Co., Ltd.
  • peripheral blood mononuclear cells were enriched and purified (S2), fixed on glass slides (S3), and subjected to enzymatic digestion and ethanol gradient dehydration pre-hybridization, the DNA in peripheral blood mononuclear cells was denatured, followed by Fluorescent probes are added to bind to DNA in peripheral blood mononuclear cells to form a "chromosome-specific sequence probe" complex; finally, the nuclei are stained with nucleic acid dyes to indicate the complete karyotype.
  • the processed samples are scanned by a fluorescence microscope (S4) to realize the identification and detection of CACs with abnormal number of specific chromosomes.
  • the number of staining points on the four channels of each cell is counted to determine whether the cell is a CAC. Since CAC is defined as having gain over two and more staining channels, each cell can be classified based on the counts of fluorescence signals in four probe images (S5), as shown in Figure 5, The green and red marks in the figure indicate 2 loci on one chromosome, and the blue and yellow marks indicate 2 loci on the other chromosome. According to the results of fluorescence detection, the cells are classified according to the rules in Table 1. Figure 1 In 5, the number of two staining signals (green and red) in the detection results of cell A is 3. According to Table 1, cell A is CAC, while in the detection results of cell B, all four staining signals appear in pairs, according to Table 1. 1. Cell B is a normal cell.
  • the risk factors of patients described in the present disclosure include information such as patient gender, age, smoking history, and family genetic history, which are used to assist stratified analysis.
  • CT image results and blood samples of 64 patients with pulmonary nodules were collected and used for CT image AI analysis and CAC detection respectively, and a comprehensive pathological analysis was performed on the 64 patients.
  • the results are shown in the following table.
  • the present embodiment successfully constructs a logistic regression model by performing logistic regression analysis on the age, gender, CT image AI analysis data, CAC detection data and pathological analysis results of the 64 patients collected in Table 2, which specifically includes the following steps:
  • ⁇ 0 is any value selected from -12.60 to 1.18, preferably is -4.94; ⁇ 1 is any value selected from 3.08 to 15.05, preferably 7.92; ⁇ 2 is any value selected from -0.12 to 0.40, preferably 0.10; ⁇ 3 is selected from -0.03 to 0.16 Any numerical value, preferably 0.06; ⁇ 4 is any numerical value selected from -7.72 to -1.43, preferably -3.9.
  • this embodiment establishes different alternative thresholds from 0.5 to 0.8 with 0.05 as a step.
  • the relative accuracy of the prediction results was calculated with the pathological detection results as the standard, and the results are shown in Table 3.
  • this embodiment selects a classification threshold of 0.6 with the highest classification accuracy.
  • the present embodiment adopts the CT image AI analysis data, CAC detection data and pathological analysis results of 64 patients collected in Table 2 to successfully construct a decision tree model, which specifically includes the following steps:
  • each node is evaluated before being divided. If the current node can improve the generalization performance of the decision tree, the current node is divided, otherwise it is not divided;
  • step (2) repeat above-mentioned step (2) and step (3) until all data is divided into leaf node or maximum depth, what finally obtains is decision tree, and wherein maximum depth can set any numerical value in 2 ⁇ 7;
  • step (1) Use the above-generated decision tree classifier to discriminate 1 piece of test set data randomly divided in step (1), if the test set test result meets the accuracy requirement, proceed to the next step, otherwise reclassify;
  • the division standard is to obtain three secondary parent nodes B1, B2 and B3 and a malignant leaf node arranged from left to right.
  • the fifth-level parent node E1 uses cac ⁇ 1 as the division standard to obtain from left to right.
  • the arranged sixth-level parent node F1 and the malignant leaf node, the sixth-level parent node F1 adopts cac ⁇ 2 as the division standard, and obtains the benign leaf node and the malignant leaf node arranged from left to right.
  • the present embodiment adopts the CT image AI analysis data, CAC detection data and pathological analysis results of 64 patients collected in Table 2 to successfully construct a random forest model, which specifically includes the following steps:
  • step (3) For each data set in step (2), consider the four characteristics of age, gender, "CT image AI malignant probability” and “CAC detection data”, set the tuning parameter mtry, when each node needs to be split , first randomly select a subset of mtry features from these four features from the set of current nodes, and select the feature that minimizes Gini(D, A) from this subset for node splitting;
  • step (3) 100 decision trees are constructed according to step (3) to form a random forest, then each tree is discriminated and classified to the test set, the classification result is determined according to the votes of the classification tree, and the error rate of the test set is calculated;
  • the tuning parameter mtry is between 2 and 4, repeat steps (3) and (4), and calculate the 10-fold cross-validation accuracy respectively;
  • step (5) when the tuning parameter mtry is 3, the accuracy of the cross-validation has reached 84.2%, which has met the actual evaluation requirements. Therefore, the tuning parameter of the random forest model is mtry is 3.
  • Examples 4 to 7 four random forest models were successfully constructed by using the CT image AI analysis data, CAC detection data and pathological analysis results of 64 patients collected in Table 2, respectively. Compared with Example 3, the only difference between Examples 4 to 7 is that the number of decision trees used are 300, 500, 700 and 1000 trees, respectively.
  • the logistic regression model provided in Example 1 is used to evaluate the risk of pulmonary nodules in 5 patients, and the CT image analysis data and CAC detection data of the 5 patients are shown in the following table.
  • the detection results are shown in Table 11.
  • This embodiment adopts the decision tree model provided in Example 2 to carry out risk assessment of pulmonary nodules for the five patients in Table 4, and substitutes the data in Table 4 into the decision tree model obtained in Example 2, as shown in Figure 7, After evaluation, the risk of pulmonary nodules of the five patients in Table 4 was obtained, and the evaluation results are shown in Table 11.
  • the random forest models provided in Examples 3 to 7 were used to evaluate the risk of lung nodules for the 5 patients in Table 4, and the data in Table 4 were substituted into the random forest models obtained in Examples 3 to 7.
  • the risk of pulmonary nodules of the five patients in Table 4 is obtained after evaluation. After pathological testing is performed on the five patients to be tested, they are compared with the evaluation probabilities given in this group of examples. The results are shown in Tables 5 to 9. .
  • Example Number of decision trees (trees) cross validation accuracy Example 10 100 0.8095238095238095
  • Example 11 300 0.8261904761904763
  • Example 12 500 0.8261904761904763
  • Example 13 700 0.8261904761904763
  • Example 14 1000 0.8261904761904763
  • This comparative example adopts the patient data in Table 1, and adopts the same construction method as Example 1 to construct a logistic regression model.
  • Input the CAC detection data (x 1 ) as the independent variable and the pathological result as the dependent variable, and construct the regression equation logit( ⁇ ) ⁇ 0 + ⁇ 1 x 1 , the coefficients ⁇ 0 and ⁇ 1 in the regression equation are 0.98 and 0.14, respectively, namely Logistic regression constructed for .
  • the resulting logistic regression model was then used to assess the risk of pulmonary nodules for the 5 patients in Table 4, using the same thresholds as in Example 3.
  • This comparative example adopts the patient data in Table 2, and adopts the same construction method as Example 1 to construct a logistic regression model.
  • Input the CT image AI analysis data (x 1 ) as the independent variable, and the pathological result as the dependent variable, and construct the regression equation logit( ⁇ ) ⁇ 0 + ⁇ 1 x 1 .
  • the coefficients ⁇ 0 and ⁇ 1 are -0.95 and 4.11, which is the constructed logistic regression.
  • the resulting logistic regression model was then used to assess the risk of pulmonary nodules for the 5 patients in Table 4, using the same thresholds as in Example 3.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Geometry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

一种用于肺结节风险性评估的***,该***通过逻辑回归模型、决策树模型或随机森林模型,对获取的患者的影像分析数据、患者的CAC检测数据和患者的风险因素进行综合评估,对患者肺结节风险性进行评估,不仅能够实现对是否存在肺结节进行评估,还能够预测存在的肺结节的风险性,且具有较高的准确率。

Description

一种用于肺结节风险性评估的***
相关申请的交叉引用
本公开要求于2020年11月25日提交中国专利局的申请号为CN202011341094.0、名称为“一种用于肺结节风险性评估的***”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及医学数据处理技术领域,具体而言,涉及一种用于肺结节风险性评估的***。
背景技术
结节病(sarcoidosis)是一种病因未明的多***多器官的肉芽肿性疾病,常侵犯肺、双侧肺门***、眼或皮肤等器官,其胸部受侵率高达80%~90%。呈世界分布,欧美国家发病率较高,东方民族少见,多见于20~40岁,女略多于男。
目前肺结节的致病原因和发病机理还都处于研究阶段,仅探明了结节病是未知抗原与机体细胞免疫和体液免疫功能相互抗衡的结果。由于个体的差异(年龄、性别、种族、遗传因素、激素或HLA)和抗体免疫反应的调节作用,视其产生的促进因子和拮抗因子之间的失衡状态,而决定肉芽肿的发展和消退,表现出结节病不同的病理状态和自然缓解的趋势。肺结节病肉芽肿在组织切片上可见为皮样细胞的聚集,其中有多核巨噬细胞,周围有淋巴细胞,而无干酪样病变。在巨噬细胞的泡浆中可见有包涵体,如卵圆形的舒曼(Schaumann)小体、双折光的结晶和星状小体(asteroid body)。肺结节病的初发病变有较广泛的单核细胞、巨噬细胞或淋巴细胞浸润的肺泡炎,累及肺泡壁和间质。肺泡炎和肉芽肿都可能自行消散。但在慢性阶段,肉芽肿周围的纤维母细胞胶原化和玻璃样,成为非特异性的纤维化。肉芽肿的组织形态学表现并无特性,可见于分支杆菌和真菌感染,或为异物或外伤的组织反应,亦可见于铍病、第三期梅毒、淋巴瘤和外源性***反应性肺泡炎等,应行鉴别。但在多器官中见到同样的组织病变,结合临床资料,可诊断本病。常见的肺结节多为良性,而恶性的肺结节即可诊断为肺癌,因此,肺结节通常为肺癌的主要早期表现,准确检测肺结节对肺癌的早期诊断和治疗具有重要意义。
肺结节的尺寸范围分布很广,既有5cm以上的大尺寸结节,也有亚厘米级的结节,对于大尺寸的肺结节,传统的医学影像检测即可发现,而对于小尺寸的肺结节(3~10mm),仅仅依靠传统医学影像观察,会由于不同的医生水平、医院条件或数据孤岛,使得判断标准不一,导致了大量的临床经验误诊。
随着影像学技术的进一步发展,其处理能力得到了显著提高,然而,由于肺部计算机断层扫描(CT)图像背景复杂以及检测范围大,且肺结节大小不一和/或形态各异,所以快速准确检测肺结节是一项极具挑战的工作。
近年来,AI技术得到了长足的发展,已有大量将AI技术应用于医学影像分析的研究报道,但是,现有报道中仍存在明显不足。
首先,现有的CT影像AI技术中所采用的样本量比较小,且通常阈值设定较低,因此检出的准确度及灵敏度均高于实际值,所得到的结果中普遍存在假阳性,需要专家进行复核,才能最终确定结果,实际上不仅没有提高检测效率,反而增大的检测成本。
目前所见到的利用AI技术处理CT影像其检测结果准确度在90%以上的报道中普遍存在调查样本量少,设定阈值低的缺点。当样本量扩大后,其检测结果的准确度和灵敏度都会显著下降,例如,Tao Xu等即报道了受试患者人数达到534个时,AI的分析准确率仅为70%(Tao Xu,Chuoji Huang,Yaoqi Liu,Jing Gao,Huan Chang,Ronghua Yang,Tianjiao Jiang,Zhaozhong Cheng,Wencheng Yu,Juncheng Zhang,Chunxue Bai,Artificial intelligence based on deep learning for differential diagnosis between benign and malignant pulmonary nodules:A real-world,multicenter,diagnostic study.Journal of Clinical Oncology.),无法满足实际临床的需求。
另外,对于小细胞肺癌和鳞癌,放射科医生从影像也很难以进行辨认。由于影像上的特性不明显,现有的AI算法对这一类肺癌的检测准确率不到10%。
除了采用医学图像数字化处理的手段进行肺结节检测以及肿瘤风险评估之外,还有包括基于免疫反应在内的很多其他检测和评估手段,其中循环异常细胞(CAC)检测因为具有操作简便、随时可检、灵 敏度高、特异性好、稳定性强且检测用时短等优势,而备受关注。
多项研究表明,肿瘤发生早期与染色体特定区域的变化(扩增、缺失或融合等)密切相关。通过分离富集并检测血液中含有与特定癌种高度相关的染色体异常细胞,能更加全面地反映癌症早期发生的情况,进而提供癌症诊断的信息。这种存在于外周血或人体其他体液且含有与癌症发生相关染色体异常的细胞为循环异常细胞(circulating abnormal cell,CAC)。初步研究证明,CAC在早期的肿瘤检测中的优势明显,具有良好的诊断效能。
临床研究的结果表明,CAC检测对于肺癌的不同的类型和分期均有较为一致的检出率,小于5-10mm的肺结节的鉴别准确率也超过70%,而这一类结节由于尺寸较小,从影像上难以进行分析和鉴别,因此,目前CAC的检测结果可作为CT影像AI分析技术的有效补充手段,对CT影像AI分析的检测结果进行查漏补缺,例如青岛大学附属医院的徐涛医生利用我司提供的CAC检测结果去进行CT-AI分析结果的修正。从临床医生倾向积极治疗的11例CT-AI判断低风险的样本中找到了8例高风险的样本。而这8例样本的最终病理检测结果则与CAC的评价结果完全一致,实现了查漏补缺的作用。然而,这也仅仅是用CAC的检测结果对CT-AI的结果进行补充,并未深入分析两者如何有机的结合,进一步提升对肺结节中早期肺癌检测的准确率。
鉴于此,特提出本公开。
发明内容
本公开针对上述技术问题,提供了一种用于肺结节风险性评估的***,该***通过机器自学习,不仅实现了将患者的病灶影像结果与患者的CAC检测数据联合用于肺结节风险性评估,还进一步整合了患者的风险因素,显著提高了患者肺结节风险性评估的准确率。
第一方面,本公开提供一种用于肺结节风险性评估的***,包括:
数据采集模块,配置成获取患者的病灶影像结果、患者的CAC检测数据和患者的风险因素;
数据处理模块,配置成对数据采集模块获取的数据进行预处理,所述预处理的输出结果与肺结节风险性评估模块相匹配;
肺结节风险性评估模块,配置成应用机器学习构建的肺结节风险性评估模型对数据处理模块预处理输出结果进行计算,得到肺结节风险性结果。
在可选实施方式中,所述患者的风险因素包括患者的性别、年龄、家族肿瘤病史或吸烟史中的一种或两种及以上组合。
在可选实施方式中,所述数据处理模块配置成:将所述患者的病灶影像结果经人工智能计算转换为病灶影像分析数据,输出患者肺结节恶性概率;将所述患者的性别转换为相应的性别标识;将所述家族肿瘤病史转换为相应的家族肿瘤病史标识;将所述吸烟史转换为相应的吸烟史标识。
所述性别标识是指,男性患者的性别标识为1,女性患者的性别标识为0;所述家族肿瘤病史标识是指,具有家族肿瘤病史的患者的病史标识为1,不具有家族肿瘤病史的患者的病史标识为0;所述吸烟史标识是指,具有吸烟史的患者的吸烟史标识为1,不具有吸烟史的患者的吸烟史标识为0。
在可选实施方式中,所述患者的病灶影像结果包括低剂量螺旋CT扫描图、薄层螺旋CT扫描图、X光射线胸片或正电子发射计算机断层显像中的一种或两种及以上组合。
所述的病灶影像分析方法包括:首先对获得的影像进行3D拓扑重构,而后对结节边界进行三维分割,再提取结节图像的特征,通过提取到的特征进行成分、相关性和聚类分析,最后根据现有的结节图像判断标准指南以及标记的图像数据进行人工智能恶性概率计算。
在可选的实施方式中,所述患者的CAC检测数据包括患者的CAC检测得到的每万个单个核细胞中循环异常细胞数量。
所述患者的CAC检测所使用的样本包括患者的血液、胸腹水、肺泡灌洗液、尿液、唾液或脑脊液中的一种或两种及以上组合。
在可选的实施方式中,所述肺结节风险性评估模块采用逻辑回归模型,根据预处理结果计算得到患者的肺结节风险性;所述预处理结果包括患者的病灶影像分析数据、患者的CAC检测数据、患者的年龄和患者的性别标识;所述逻辑回归模型计算公式为:
logit(π)=θ TX+θ 0,式中X为自变量矩阵,包括患者的病灶影像分析数据x 1、患者的CAC检测数据x 2、患者的年龄标识x 3和患者的性别标识x 4,式中θ T为与自变量矩阵X对应的系数矩阵,
Figure PCTCN2020133952-appb-000001
θ 0为常系数,π为患者肺结节恶性概率;将计算得到的π与预设的分类阈值比较,得到比较结果;基于比较结果输出患者肺结节的良性标识或恶性标识,所述的良性标识为0,恶性标识为1。
优选地,所述分类阈值为0.5~0.8。
优选地,所述分类阈值为0.6。
在可选的实施方式中,所述θ 1为3.08~15.05任意值,优选为7.92;
所述θ 2为-0.12~0.40任意值,优选为0.10;
所述θ 3为-0.03~0.16任意值,优选为0.06;
所述θ 4为-7.72~-1.43任意值,优选为-3.9;
所述θ 0为-12.60~1.18任意值,优选为-4.94。
在可选的实施方式中,所述肺结节风险性评估模块采用决策树模型,将患者的病灶影像分析数据、患者的CAC检测数据、患者的年龄和患者的性别标识作为划分特征,根据预处理结果对患者的肺结节风险性进行分类。
优选地,所述决策树的决策深度为2~7。
优选地,所述决策树的决策深度为4。
优选地,所述决策树的决策深度为7。
在可选的实施方式中,所述肺结节风险性评估模块采用随机森林模型,同时构建100~1000棵决策树对患者的肺结节风险性进行分类,并根据100~1000棵决策树的分类结果计算患者的肺结节恶性的概率;所述100~1000棵决策树随机从患者的病灶影像分析数据、患者的CAC检测数据、患者的年龄和患者的性别标识中选取2~4种预处理结果作为划分特征。
优选地,选取3种预处理结果作为划分特征。
优选地,所述决策树数量为300。
第二方面,本公开提供前述实施方式所述***中肺结节风险性评估模块采用的肺结节风险性评估模型的训练方法,所述训练方法包括将获取的已知病理结果的患者的病灶影像分析数据、患者的CAC检测数据、患者的风险因素转换后的对应标识和患者的病理检测数据,作为自学习样本输入预设模型,经自学习得到特征参数,确定肺结节风险性评估模型。
优选地,所述肺结节风险性评估模型为逻辑回归模型,所述自学习得到的特征参数包括系数矩阵、常系数和分类阈值。
优选地,所述肺结节风险性评估模型为决策树模型,所述自学习得到的特征参数包括根节点划分特征值和各级父节点划分特征值。
优选地,所述肺结节风险性评估模型为随机森林模型,所述自学习得到的特征参数包括决策树数量、每棵决策树的根节点划分特征值和各级父节点划分特征值。
上述预设模型包括逻辑回归模型,所述逻辑回归模型是统计建模中用来对于二元变量建立模型的经典模型。其建立在假设因变量服从伯努利分布基础上,与线性回归假设因变量服从高斯分布具有很多相同之处。上述的病灶影像分析数据包括人工智能计算给出患者肺结节的恶性概率,所述的人工智能计算方法包括采用以卷积神经网络为模型对医学影像信息进行数字化处理分析得到的肺结节恶性的概率值的方法,经临床采集患者的病灶影像分析数据与病理分析结果,进行数理统计验证证实,本公开中选用的病灶影像分析数据与肺结节恶化之间存在显著联系。上述CAC检测数据包括依据CAC检测原理对患者血细胞进行检测得到的检测数据,包括使用CAC试剂盒或CAC检测设备执行检测操作得到的相应的检测数据,经临床采集患者CAC检测数据与病理分析结果,数理统计结果证实,本公开中选用的CAC检测数据与肺结节恶化概率之间也存在联系。因此,本公开构建了模型实现了对患者病灶影像分析数据和患者血液CAC检测数据的联合处理分析。
所述逻辑回归分析的构建方法包括:以获取的患者的风险因素、患者的病灶影像分析数据和患者的液态样本CAC检测数据为自变量,以患者的病理检测数据为因变量构建逻辑回归方程,而后以实现代价函数最小化为目的通过梯度下降算法或迭代加权最小二乘法得到优化后的自变量的系数矩阵和常系数,得到逻辑回归方程,并确定分类阈值,所述分类阈值为0.5~0.8,所述分类阈值确定过程中的阈值增长梯度为0.05。
经验证集验证,采用本公开提供的逻辑回归模型进行评估所得的准确率与现有技术中单独采用CT影像的AI分析数据和CAC检测数据进行肺结节恶性评估所得的准确率和敏感度相比均有显著提高,且逻辑回归模型所得的结节良恶性风险分析概率对比病理结果,准确率接近90%。
上述的预设模型包括决策树模型,决策树(Decision Tree)是在已知各种情况发生概率的基础上,通过构成决策树来求取净现值的期望值大于等于零的概率,评价项目风险,判断其可行性的决策分析方法,是直观运用概率分析的一种图解法。由于这种决策分支画成图形很像一棵树的枝干,故称决策树。在机器学习中,决策树是一个预测模型,他代表的是对象属性与对象值之间的一种映射关系。本公开以患者的病灶影像分析数据、患者的CAC检测数据和患者的风险因素为特征,通过基尼系数
Figure PCTCN2020133952-appb-000002
Figure PCTCN2020133952-appb-000003
计算根据其中一个特征的特征值进行划分后的情况,其中
Figure PCTCN2020133952-appb-000004
Figure PCTCN2020133952-appb-000005
最小的特征作为划分节点,以此类推直至将所有数据划分叶节点或最大深度,最后得到决策树模型及决策深度;其中D为决策树训练过程中的某内部节点,A为训练过程中可采用的某***方式,p为该内部节点中成为某标类的概率,经交叉验证,可得到决策树的决策深度。本公开提供的决策树的最大深度为7。
经验证集验证,采用本公开提供的决策树模型进行评估所得的准确率与现有技术中单独采用CT影像的AI分析数据和CAC检测数据进行肺结节恶性评估所得的准确率和敏感度相比均有显著提高。
上述的预设模型还包括随机森林模型,随机森林模型可以解释若干自变量(X 1、X 2、...和X k)对因变量Y的作用。如果因变量Y有n个观测值,有k个自变量与之相关;在构建随机森林模型中的单独分类树的时候,随机森林会随机地在原数据中重新选择n个观测值,遵循Bootstrap重新抽样的方法,其中有的观测值被选择多次,有的没有被选到。同时,随机森林随机地从k个自变量选择部分变量进行分类树节点的确定。这样,每次构建的分类树都可能不一样。本公开以获取的患者病灶影像分析数据、患者的CAC检测数据以及患者的风险因素为自变量,以患者的病理检测数据为因变量,通过随机方式在全部样本中选取设置决策树数量的子集作为学习样本,并建立设置数量的决策树,最后以交叉验证的方式来筛选调谐参数,从而得到随机森林模型。
经验证集验证,采用本公开提供的随机森林模型进行评估所得的准确率与现有技术中单独采用CT影像的AI分析数据和CAC检测数据进行肺结节恶性评估所得的准确率和敏感度相比均有显著提高。
上述的交叉验证的方法包括将训练数据或自学习样本随机划分为k份,轮流使用其中1份用作测试集,而将另外k-1份作为训练集,其中k优选为10。
本公开具有以下有益效果:
本公开提供了一种用于肺结节风险性评估的***,该***的肺结节风险性评估模块能够对采集到的患者的病灶影像分析数据、患者的CAC检测数据和患者的风险因素进行联合评估分析,在给出患者肺结节的恶性概率的同时,给出风险性预测。与单独采用CT影像的AI分析数据和CAC检测数据进行肺结节风险性预测相比具有更高的准确性。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为患者的病灶影像结果的AI处理流程图示意图;
图2为患者的病灶影像结果的AI结节特征处理示意图;
图3为CT-AI处理结果显示界面示意图;
图4为血液样本荧光原位杂交处理流程示意图;
图5为CAC和正常细胞荧光原位杂交结果示意图;
图6为实施例2中得到的决策树;
图7为实施例9应用实施例2的决策树的决策结果;
图8为实施例10~14中得到的设置不同决策树数量的交叉验证准确率结果。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将对本公开实施例中的技术方案进行清楚地和完整地描述。实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市售购买获得的常规产品。
本公开以下各组实施例所采用的患者病灶影像分析数据是通过TargetCall TM软件(珠海圣美生物诊断技术有限公司)对患者的病灶影像结果进行处理获得的,分析流程依次包括图像获取(S1)、图像分割(S2)、特征提取(S3)、模型构建(S4)和分类预测(S5),如图1所示,具体步骤如下:
首先,获取患者病灶影像,例如薄层CT(厚度小于2mm)或DICOM图像等,将收集到的患者病灶影像导入AI分析***,AI分析***读取病灶影像数据,从中构建影像的三维信息,进行结节检测。检测到结节后进行图像分割,并对结节进行特征提取,而后基于结节的特征信息完成模型构建,进行三维分析,从而实现对结节的良恶性进行鉴别。
所述特征提取包括:结节实性成分的百分比、钙化成分的百分比、结节体积、结节直径和结节密度等特征参数的获取。
所述的三维分析包括:结节分类和结节分割,以及根据肺结节指南和肺结节影像组学估算出结节的恶性概率和恶性级别。
其中结节的分类包括:实性结节、混合性结节、磨玻璃结节和钙化结节,其中前三种是医生较为关注的结节类型。关于结节分类,不同医生之间的差异也比较大,尤其对于混合性结节,一致性约为65%。
图像分割的实现方式如下:在TargetCall TM软件(珠海圣美生物诊断技术有限公司)软件中,***可将不规则实性结节和磨玻璃结节进行三维分割。对于混合性结节,***会对实性成分进行三维分割,并计算实性成分的百分比,同时依据钙化成分的不同形状,对钙化成分进行分割,其中所述的钙化成分的形状包括弥散型、中央型、分层状和爆米花状,并将钙化成分大于80%的结节归类为钙化结节。
所述特征提取中结节体积计算方法如下:分割后的结节图像由多个像素点组成,像素点的个数乘以每个像素点的体积即得出结节的体积。
结节直径的测量方法包括三维直径测量和二维直径测量。其中三维直径包括轴向直径和标准直径,二维直径则是测量在结节横断面的长直径和短直径。
结节密度的测量方法为:分割后得到的每个像素对应一个密度值,***据此计算出所有像素的平均密度,并将所有像素按照密度大小进行排序,并取第95百分位的密度为最大密度,取第5百分位密度为最小密度。
通过提取上述的结节形状、纹理、强度、密度和钙化比等特征,如图2所示范例,肺结节原始图像(A)经过提取强度(intensity)、形状(shape)和密度(texture)等特征参数(B)后,对提取得到的特征进行相关性分析(C),来考察各个特征之间的依赖性,并在此基础上,进行模型构建,而后给出分类预测的恶性概率结果。
恶性概率是指该结节为恶性的可能性,是指良恶性鉴别的信心评分。其范围为1%~100%。如果恶性概率>50%,则可能是恶性结节。恶性概率越高,结节是恶性的信心越高。反之,如果恶性概率<50%,则很可能是良性结节。恶性概率越低,对结节是良性的信心越高。
根据美国ACCP指南的标准,结节的恶性概率可以分为四种:极低,恶性概率<5%;低,恶性概率5%-40%;中等,恶性概率40%-65%;高,恶性概率>65%。以此恶性概率值为依据,按照ACCP指南,CT-AI算法能计算出恶性级别:极低、低、中等和高。同时也可以根据不同的国内外指南(Lung-RADS指南,Fleischner指南),自动给出的结节管理建议,CT-AI最终处理结果如图3所示。
本公开以下各实施例采用的CAC检测方法包括,依据CAC检测原理,使用CAC试剂盒或CAC检 测设备执行检测操作,检测流程如图4所示,具体包括步骤如下:
首先,对患者进行样本收集(S1),例如采集患者血液样本,使用细胞保存液(珠海圣美生物诊断技术有限公司)进行固定,然后使用密度梯度的离心方法对血液样本进行细胞分离,对分离获得的外周血单个核细胞富集和纯化后(S2),固定在载玻片上后(S3),经酶消化和乙醇梯度脱水杂交前处理后,外周血单个核细胞中的DNA发生变性,随后加入荧光探针,使其与外周血单个核细胞中的DNA结合,形成“染色体-特定序列探针”复合物;最后用核酸染料对细胞核进行染色指示完整细胞核型。经处理的样本通过荧光显微镜扫描(S4),实现对特定染色体数目异常的CAC进行识别与检测,根据荧光标志物检测算法统计每个细胞四个通道上的染色点数目特征从而判别该细胞是否为CAC。由于CAC被定义为在两个及两个以上染色通道上具有增益,因此可以基于对四个探针图像中的荧光信号的计数来对每个细胞进行分类(S5),如图5所示,图中绿色和红色标识指示的是一条染色体上的2个位点,蓝色和黄色标识指示的是另一条染色体上2个位点,根据荧光检测结果对细胞按照表1的规则进行分类,图5中细胞A的检测结果中有两种染色信号(绿色和红色)的数量为3个,根据表1细胞A为CAC,而细胞B的检测结果中四种染色信号均成对出现,根据表1,细胞B为正常细胞。
表1 细胞类型判别规则
细胞类型 说明
正常细胞 细胞核内染色信号成对出现
丢失细胞 细胞核内有一种染色信号低于2个
增益细胞 细胞核内只有1种染色信号超过2个
CAC 细胞核内有2对及以上染色信号超过2个
待定细胞 非以上四类
本公开所述的患者的风险因素包括患者性别、年龄、吸烟史和家族遗传史等信息,用来辅助分层分析。
以下结合实施例对本公开的特征和性能作进一步的详细描述。
现采集64名肺结节患者的病灶CT影像结果和血液样本,分别用于CT影像AI分析和CAC检测,并对该64名患者进行了全面的病理分析,其结果如下表所示。
表2 64名患者CT影像AI分析结果、CAC检测结果及病理分析结果
Figure PCTCN2020133952-appb-000006
Figure PCTCN2020133952-appb-000007
Figure PCTCN2020133952-appb-000008
实施例1
本实施例通过对表2中采集的64位患者的年龄、性别、CT影像AI分析数据、CAC检测数据和病理分析结果进行逻辑回归分析,成功构建了逻辑回归模型,具体包括以下步骤:
(1)构建逻辑回归模型
在R 3.6.0统计软件中,输入患者的CT影像AI分析结果(x 1)、CAC检测数据(x 2)、年龄(x 3)和性别标识(x 4:男性为1,女性为0)作为自变量,病理结果作为因变量(π),构建回归方程logit(π)=θ 01x 12x 23x 34x 4,带入表1中64位患者的对应数据,经过R 3.6.0统计软件计算得到系数θ 0、θ 1、θ 2、θ 3和θ 4,计算结果显示:θ 0为选自-12.60~1.18中的任意数值,优选为-4.94;θ 1为选自3.08~15.05中的任意数值,优选为7.92;θ 2为选自-0.12~0.40中的任意数值,优选为0.10;θ 3为选自-0.03~0.16中的任意数值,优选为0.06;θ 4为选自-7.72~-1.43中的任意数值,优选为-3.9。
在阈值的选择过程,本实施例从0.5到0.8以0.05为阶梯建立不同备选阈值。对于每一备选阈值,以病理检测结果为标准,分别计算出相对的预测结果的准确度,结果如表3所示。
表3 实施例1中备选阈值对应准确度
备选阈值 分类准确度
0.5 0.8413
0.55 0.8413
0.6 0.8571
0.65 0.8413
0.7 0.8254
因此,本实施例选择了分类准确度最高的分类阈值0.6。
实施例2
本实施例采用表2中采集的64位患者的CT影像AI分析数据、CAC检测数据和病理分析结果,成功构建了决策树模型,具体包括以下步骤:
(1)将表2中采集的数据随机划分为10份,轮流使用其中1份用作测试集,而将另外9份为训练集;
(2)考虑年龄、性别、“CT影像AI恶性概率”和“CAC检测数据”这四个特征,利用训练集,通过
Figure PCTCN2020133952-appb-000009
计算根据其中一特征的特征值进行划分后的情况,其中
Figure PCTCN2020133952-appb-000010
选择使Gini(D,A)最小的特征作为划分节点;
(3)在决策树生成过程中,对每个节点在划分前先进行评估,若当前节点能带来决策树泛化性能提升,则划分当前节点,否则不进行划分;
(4)重复上述步骤(2)和步骤(3)直至将所有数据划分叶节点或最大深度,最后得到的即为决 策树,其中最大深度可以设置2~7中的任意数值;
(5)利用上述生成的决策树分类器对步骤(1)中随机划分的1份测试集数据进行判别,若测试集测试结果满足精度要求则进行下一步,否则重新进行分类;
(6)对于步骤(1)中产生的10组训练集和测试集,重复上述步骤(2)至步骤(5),并计算得出10折交叉验证准确率;
(7)根据步骤(6)产生的10折交叉验证结果,当深度为7时,交叉检验评估的准确率已经达到了82.9%,已经满足了实际评估需求,因此,将决策树的最大深度设为7,所得决策树如图6所示,其中“ct<0.58”表示节点的划分标准为CT影像AI恶性概率小于0.58,“sex=M”表示患者性别为男性,“cac<7”表示CAC检测数据小于7,“age<56”表示患者年龄小于56周岁,其他节点的划分标准为相应的表示方法,且每级节点中满足划分标准的样本划分入下一级节点的左侧节点,不满足划分标准的样本划分入下一级节点的右侧节点。
图6中得到的决策树的根节点采用ct<0.58作为划分标准,得到左右排列的A1和A2两个一级父节点,而后A1和A2两个父节点分别采用sex=M和cac<4作为划分标准,得到由左到右排列的B1、B2和B3三个二级父节点以及一个恶性叶子节点,三个二级父节点B1、B2和B3分别采用cac<7、age<56和sex=M作为划分标准,得到由左到右排列的良性叶子节点、三级父节点C1、三级父节点C2、恶性叶子节点、三级父节点C3和恶性叶子节点,三个三级父节点C1、C2和C3分别采用cac≥10、ct<0.43和age<57作为划分标准,得到由左到右排列的良性叶子节点、恶性叶子节点、良性叶子节点、恶性叶子节点、四级父节点D1和恶性叶子节点,四级父节点D1采用age≥55作为划分标准,得到由左到右排列的良性叶子节点和五级父节点E1,五级父节点E1采用cac≥1作为划分标准得到由左到右排列的六级父节点F1和恶性叶子节点,六级父节点F1采用cac<2作为划分标准,得到由左到右排列的良性叶子节点和恶性叶子节点。
实施例3
本实施例采用表2中采集的64位患者的CT影像AI分析数据、CAC检测数据和病理分析结果,成功构建了随机森林模型,具体包括以下步骤:
(1)将表2中采集的数据随机划分为10份,轮流使用其中1份用作测试集,而将另外9份为训练集;
(2)设置随机森林中决策树数量为100棵,通过Bootstrap重抽样方法将训练集数据重新划分为100组不同的数据集,其中有的观测值被选择多次,有的没有被选到;
(3)对于步骤(2)中的每组数据集,考虑年龄、性别、“CT影像AI恶性概率”和“CAC检测数据”这四个特征,设置调谐参数mtry,在每个节点需要***时,先从当前节点的集合中随机地从这四个特征中选取mtry个特征的子集,并从这个子集中选择使Gini(D,A)最小的特征进行节点***;
(4)按照步骤(3)构建100棵决策树组成随机森林,然后将每棵树对测试集进行判别与分类,根据分类树的投票多少判定分类结果,计算测试集的错误率;
(5)调谐参数mtry在2~4之间,重复步骤(3)和步骤(4),并分别计算出10折交叉验证准确率;
(6)根据步骤(5)产生的10折交叉验证结果,调谐参数mtry为3时,交叉验证的准确率已经达到了84.2%,已经符合了实际评估要求,因此,将随机森林模型的调谐参数mtry为3。
实施例4~7
实施例4~7分别采用表2中采集的64位患者的CT影像AI分析数据、CAC检测数据和病理分析结果,成功构建了四个随机森林模型。与实施例3相比,实施例4~7的区别仅在于,采用的决策树的数量分别为300棵、500棵、700棵和1000棵。
实施例8
本实施例应用实施例1提供的逻辑回归模型对5位患者进行肺结节风险性评估,5位患者的CT影像分析数据和CAC检测数据见下表。
表4 实施例8中5位待评估患者的CT影像AI分析数据和CAC检测数据
患者编号 CT影像AI分析数据 CAC检测数据(个) 年龄 性别
A 5% 0 44.8
B 8% 1 45.7
C 14% 1 47.1
D 49% 11 71.5
E 31% 1 76.7
将表4中的数据代入实施例1中得到的逻辑回归模型,经计算得到表4中五位患者的肺结节风险性评估结果,根据实施例1得到的结果确定阈值为0.6时,本实施例设定评估结果中大于阈值的,对应患者确认为恶性,输出恶性标识1,小于阈值的确认为良性,输出良性标识0,其检测结果如表11所示。
实施例9
本实施例采用实施例2提供的决策树模型对表4中五位患者进行肺结节风险性评估,将表4中的数据代入实施例2中得到的决策树模型,如图7所示,经过评估得到表4中5位患者的肺结节风险性,其评估结果如表11所示。
实施例10~14
本组实施例分别采用实施例3~7提供的随机森林模型对表4中5位患者进行肺结节风险性评估,将表4中的数据代入实施例3~7中得到的随机森林模型,经过评估得到表4中5位患者的肺结节风险性,对上述待测5位患者进行病理检测后,与本组实施例给出的评估概率进行对比,其结果如表5~9所示。
表5 实施例10中待测患者随机森林的决策结果
Figure PCTCN2020133952-appb-000011
表6 实施例11中待测患者随机森林的决策结果
Figure PCTCN2020133952-appb-000012
表7 实施例12中待测患者随机森林的决策结果
Figure PCTCN2020133952-appb-000013
表8 实施例13中待测患者随机森林的决策结果
Figure PCTCN2020133952-appb-000014
Figure PCTCN2020133952-appb-000015
表9 实施例14中待测患者随机森林的决策结果
Figure PCTCN2020133952-appb-000016
可见,本公开提供的5个随机森林模型用于对上述待测患者的肺结节恶性进行评估时,均能够达到100%的准确率。
进一步的,对上述采用5个具有不同决策树数目的随机森林模型对待测患者进行风险评估的实施例10~14得到的评估结果进行交叉验证,得到的准确率结果如表10所示。
表10 实施例10~14得到的评估结果的交叉验证准确率
实施例 决策树数目(棵) 交叉验证准确率
实施例10 100 0.8095238095238095
实施例11 300 0.8261904761904763
实施例12 500 0.8261904761904763
实施例13 700 0.8261904761904763
实施例14 1000 0.8261904761904763
由表10可以看出,对于5位待测患者的随机森林评估的交叉准确率来讲,当决策树数目达到300时,其交叉验证准确率已经能够满足要求,并且再进一步增加决策树数量并不能提高交叉验证的准确率,如图8所示。
对比例1
本对比例采用表1中患者数据,采用与实施例1相同的构建方法,构建逻辑回归模型。输入CAC检测数据(x 1)作为自变量,病理结果作为因变量,构建回归方程logit(π)=θ 01x 1,中的系数θ 0和θ 1,分别为0.98和0.14,即为构建的逻辑回归。而后采用得到的逻辑回归模型对表4中的5位患者进行肺结节风险性评估,采用与实施例3相同的阈值。
对比例2
本对比例采用表2中患者数据,采用与实施例1相同的构建方法,构建逻辑回归模型。输入CT影像AI分析数据(x 1)作为自变量,病理结果作为因变量,构建回归方程logit(π)=θ 01x 1,中的系数θ 0和θ 1,分别为-0.95和4.11,即为构建的逻辑回归。而后采用得到的逻辑回归模型对表4中的5位患者进行肺结节风险性评估,采用与实施例3相同的阈值。
最后,对五位患者进行病理分析,将分析结果与实施例8、实施例9和实施例10以及对比例1和对比例2得到的评估结果进行对比,其对比结果如下。
表11 不同模型对五位患者评估结果
Figure PCTCN2020133952-appb-000017
由上可见,实施例8、9和10中的预测结果均与病理分析完全吻合。相比而言,对比例1和2中的 预测结果与病例分析结果存在不同程度的偏差。对比例1中的仅用CAC检测结果建立的模型倾向于给出保守的预测结果,即所有患者均有大概率的恶性肿瘤概率。相比而言,对比例2中的仅用CT检测结果建立的模型倾向于给出相对自由的预测结果。然而在运用相同阈值的情况下,五名案例中的两名会出现错误的预测结果。联系到在医疗过程,错误的预测结果可能引起不同但是严重的后果。所以,本公开实施例8、9和10中的模型为更为先进的模型,并给出合理性的诊断建议。
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原则之内,所作的任何修改、等同替换和改进等,均应包含在本公开的保护范围之内。

Claims (10)

  1. 一种用于肺结节风险性评估的***,其特征在于,包括:
    数据采集模块,配置成获取患者的病灶影像结果、患者的CAC检测数据和患者的风险因素;
    数据处理模块,配置成对数据采集模块获取的数据进行预处理,所述预处理的输出结果与肺结节风险性评估模块相匹配;
    肺结节风险性评估模块,配置成应用机器学习构建的肺结节风险性评估模型对数据处理模块预处理输出结果进行计算,得到肺结节风险性结果。
  2. 根据权利要求1所述的***,其特征在于,所述患者的风险因素包括患者的性别、年龄、家族肿瘤病史或吸烟史中的一种或两种及以上组合。
  3. 根据权利要求2所述的***,其特征在于,所述数据处理模块配置成:将所述患者的病灶影像结果经人工智能计算转换为病灶影像分析数据,输出患者肺结节恶性概率;将所述患者的性别转换为相应的性别标识;将所述家族肿瘤病史转换为相应的家族肿瘤病史标识;将所述吸烟史转换为相应的吸烟史标识。
  4. 根据权利要求3所述的***,其特征在于,所述患者的病灶影像结果包括低剂量螺旋CT扫描图、薄层螺旋CT扫描图、X光射线胸片或正电子发射计算机断层显像中的一种或两种及以上组合。
  5. 根据权利要求1所述的***,其特征在于,所述患者的CAC检测数据包括患者的CAC检测得到的每万个单个核细胞中循环异常细胞数量。
  6. 根据权利要求1~5任一项所述的***,特征在于,所述肺结节风险性评估模块采用逻辑回归模型,根据预处理结果计算得到患者的肺结节风险性;
    所述预处理结果包括患者的病灶影像分析数据、患者的CAC检测数据、患者的年龄和患者的性别标识;
    所述逻辑回归模型的计算公式为:
    logit(π)=θ TX+θ 0,式中X为自变量矩阵,包括患者的病灶影像分析数据x 1、患者的CAC检测数据x 2、患者的年龄标识x 3和患者的性别标识x 4,式中θ T为与自变量矩阵X对应的系数矩阵,
    Figure PCTCN2020133952-appb-100001
    θ 0为常系数,π为患者肺结节恶性概率;
    将计算得到的π与预设的分类阈值比较,得到比较结果;基于比较结果输出患者肺结节的良性标识或恶性标识;
    优选地,所述分类阈值为0.5~0.8;
    优选地,所述分类阈值为0.6。
  7. 根据权利要求6所述的***,其特征在于,所述θ 1为3.08~15.05任意值,优选为7.92;
    所述θ 2为-0.12~0.40任意值,优选为0.10;
    所述θ 3为-0.03~0.16任意值,优选为0.06;
    所述θ 4为-7.72~-1.43任意值,优选为-3.9;
    所述θ 0为-12.60~1.18任意值,优选为-4.94。
  8. 根据权利要求1~5任一项所述的***,其特征在于,所述肺结节风险性评估模块采用决策树模型,将患者的病灶影像分析数据、患者的CAC检测数据、患者的年龄和患者的性别标识作为划分特征,根据预处理结果对患者的肺结节风险性进行分类;
    优选地,所述决策树的决策深度为2~7;
    优选地,所述决策树的决策深度为4;
    优选地,所述决策树的决策深度为7。
  9. 根据权利要求1~5任一项所述的***,其特征在于,所述肺结节风险性评估模块采用随机森林模型,同时构建100~1000棵决策树对患者的肺结节风险性进行分类,并根据100~1000棵决策树的分类结果计算患者的肺结节恶性的概率;所述100~1000棵决策树随机从患者的病灶影像分析数据、患者的CAC 检测数据、患者的年龄和患者的性别标识中选取2~4种预处理结果作为划分特征;
    优选地,选取3种预处理结果作为划分特征;
    优选地,所述决策树的数量为300。
  10. 权利要求1~9任一项所述***中肺结节风险性评估模块采用的肺结节风险性评估模型的训练方法,其特征在于,所述训练方法包括将获取的已知病理结果的患者的病灶影像分析数据、患者的CAC检测数据、患者的风险因素转换后的对应标识和患者的病理检测数据,作为自学习样本,输入预设模型,经自学习得到特征参数,确定肺结节风险性评估模型;
    优选地,所述肺结节风险性评估模型为逻辑回归模型,所述自学习得到的特征参数包括系数矩阵、常系数和分类阈值;
    优选地,所述肺结节风险性评估模型为决策树模型,所述自学习得到的特征参数包括根节点划分特征值和各级父节点划分特征值;
    优选地,所述肺结节风险性评估模型为随机森林模型,所述自学习得到的特征参数包括决策树数量、每棵决策树的根节点划分特征值和各级父节点划分特征值。
PCT/CN2020/133952 2020-11-25 2020-12-04 一种用于肺结节风险性评估的*** WO2022110278A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011341094.0A CN112382392A (zh) 2020-11-25 2020-11-25 一种用于肺结节风险性评估的***
CN202011341094.0 2020-11-25

Publications (1)

Publication Number Publication Date
WO2022110278A1 true WO2022110278A1 (zh) 2022-06-02

Family

ID=74587813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/133952 WO2022110278A1 (zh) 2020-11-25 2020-12-04 一种用于肺结节风险性评估的***

Country Status (2)

Country Link
CN (1) CN112382392A (zh)
WO (1) WO2022110278A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115036024A (zh) * 2022-06-27 2022-09-09 中国医学科学院基础医学研究所 肺结节动态风险评估模型的构建方法、应用方法、***、存储介质和电子设备
CN115714016A (zh) * 2022-11-16 2023-02-24 内蒙古卫数数据科技有限公司 一种基于机器学习的布鲁氏菌病筛查率提升方法
CN115881304A (zh) * 2023-03-01 2023-03-31 深圳市森盈智能科技有限公司 基于智能检测的风险评估方法、装置、设备及介质

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112562867A (zh) * 2021-02-22 2021-03-26 天津迈德新医药科技有限公司 一种预测极早期hiv感染风险的装置、存储介质和电子装置
CN112951412A (zh) * 2021-03-11 2021-06-11 深圳大学 一种辅助诊断方法及其应用
CN113299391B (zh) * 2021-05-25 2023-11-03 李玉宏 远程甲状腺结节超声影像的风险评估方法
CN113299379A (zh) * 2021-06-17 2021-08-24 南通市第一人民医院 一种基于血液净化中心安全风险的智能管理方法及***
CN113539493A (zh) * 2021-06-23 2021-10-22 吾征智能技术(北京)有限公司 一种利用多模态风险因素推断癌症风险概率的***
CN113889270A (zh) * 2021-08-23 2022-01-04 浙江一山智慧医疗研究有限公司 胃癌筛查***、方法、装置、电子装置和存储介质
CN113707298A (zh) * 2021-08-25 2021-11-26 景元明 一种基于医疗大数据肿瘤诊断的预测方法
CN114550926A (zh) * 2022-01-19 2022-05-27 四川大学华西医院 一种孤立肺结节恶性风险预测***
CN115578307B (zh) * 2022-05-25 2023-09-15 广州市基准医疗有限责任公司 一种肺结节良恶性分类方法及相关产品
CN114739970B (zh) * 2022-06-09 2022-09-16 珠海横琴圣澳云智科技有限公司 荧光信号点断裂判定方法和装置
CN114783007B (zh) * 2022-06-22 2022-09-27 成都新希望金融信息有限公司 设备指纹识别方法、装置和电子设备
CN116628601B (zh) * 2023-07-25 2023-11-10 中山大学中山眼科中心 一种采用多模态信息对非人灵长类神经元分类的分析方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292114A (zh) * 2017-06-28 2017-10-24 中日友好医院 一种孤立性肺结节恶性概率预测模型的建立方法
US20180068083A1 (en) * 2014-12-08 2018-03-08 20/20 Gene Systems, Inc. Methods and machine learning systems for predicting the likelihood or risk of having cancer
CN109817336A (zh) * 2019-01-21 2019-05-28 杭州英库医疗科技有限公司 一种结合影像学分析结果的早期肺癌风险评估模型建立方法
CN111175267A (zh) * 2020-01-18 2020-05-19 珠海圣美生物诊断技术有限公司 基于fish技术的细胞判读方法和***
CN111915596A (zh) * 2020-08-07 2020-11-10 杭州深睿博联科技有限公司 一种肺结节良恶性预测方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180068083A1 (en) * 2014-12-08 2018-03-08 20/20 Gene Systems, Inc. Methods and machine learning systems for predicting the likelihood or risk of having cancer
CN107292114A (zh) * 2017-06-28 2017-10-24 中日友好医院 一种孤立性肺结节恶性概率预测模型的建立方法
CN109817336A (zh) * 2019-01-21 2019-05-28 杭州英库医疗科技有限公司 一种结合影像学分析结果的早期肺癌风险评估模型建立方法
CN111175267A (zh) * 2020-01-18 2020-05-19 珠海圣美生物诊断技术有限公司 基于fish技术的细胞判读方法和***
CN111915596A (zh) * 2020-08-07 2020-11-10 杭州深睿博联科技有限公司 一种肺结节良恶性预测方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIANG ZHONGMIN, LIN DIANJIE, YE XIN, FAN XIANJUN, ZHANG JUNCHENG, DONG QIAN, BAI CHUN XUE: "Circulating tumor cells, circulating chromosomal abnormal cells and early diagnosis of lung cancer", JOURNAL OF PRECISION MEDICINE, vol. 35, no. 2, 1 April 2020 (2020-04-01), pages 95 - 99, XP055886004, ISSN: 2096-529X, DOI: 10.13362/j.jpmed.202002001 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115036024A (zh) * 2022-06-27 2022-09-09 中国医学科学院基础医学研究所 肺结节动态风险评估模型的构建方法、应用方法、***、存储介质和电子设备
CN115714016A (zh) * 2022-11-16 2023-02-24 内蒙古卫数数据科技有限公司 一种基于机器学习的布鲁氏菌病筛查率提升方法
CN115714016B (zh) * 2022-11-16 2024-01-19 内蒙古卫数数据科技有限公司 一种基于机器学习的布鲁氏菌病筛查率提升方法
CN115881304A (zh) * 2023-03-01 2023-03-31 深圳市森盈智能科技有限公司 基于智能检测的风险评估方法、装置、设备及介质

Also Published As

Publication number Publication date
CN112382392A (zh) 2021-02-19

Similar Documents

Publication Publication Date Title
WO2022110278A1 (zh) 一种用于肺结节风险性评估的***
Bouchareb et al. Artificial intelligence-driven assessment of radiological images for COVID-19
CN112101451B (zh) 一种基于生成对抗网络筛选图像块的乳腺癌组织病理类型分类方法
Liang et al. Accurate diagnosis of pulmonary nodules using a noninvasive DNA methylation test
US10327637B2 (en) Systems, methods, and computer-readable media for patient image analysis to identify new diseases
Phaphuangwittayakul et al. An optimal deep learning framework for multi-type hemorrhagic lesions detection and quantification in head CT images for traumatic brain injury
CN112768072A (zh) 基于影像组学定性算法构建癌症临床指标评估***
CN116188423B (zh) 基于病理切片高光谱图像的超像素稀疏解混检测方法
Xu et al. Using transfer learning on whole slide images to predict tumor mutational burden in bladder cancer patients
CN113539498A (zh) 一种基于决策树模型的孤立肺结节恶性风险预测***
US20200279649A1 (en) Method and apparatus for deriving a set of training data
CN115440383B (zh) 用于预测晚期癌症患者pd-1/pd-l1单抗治疗疗效的***
CN113345576A (zh) 一种基于深度学习多模态ct的直肠癌***转移诊断方法
Wen et al. Deep learning in digital pathology for personalized treatment plans of cancer patients
Hashimoto et al. Case-based similar image retrieval for weakly annotated large histopathological images of malignant lymphoma using deep metric learning
Qi et al. One-step algorithm for fast-track localization and multi-category classification of histological subtypes in lung cancer
Liu et al. Pathological prognosis classification of patients with neuroblastoma using computational pathology analysis
Jing et al. A comprehensive survey of intestine histopathological image analysis using machine vision approaches
CN117711615A (zh) 基于影像组学的***转移状态分类预测方法及设备
CN115274119B (zh) 一种融合多影像组学特征的免疫治疗预测模型的构建方法
Fan et al. MEAI: an artificial intelligence platform for predicting distant and lymph node metastases directly from primary breast cancer
CN115439491A (zh) Mri图像肠损伤区域的分割方法、装置及等级评估***
CN115457069A (zh) 一种基于图像的特征提取和预后模型建立方法及装置
Pradhan An early diagnosis of lung nodule using CT images based on hybrid machine learning techniques
Liu et al. Research in the application of artificial intelligence to lung cancer diagnosis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20963166

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20963166

Country of ref document: EP

Kind code of ref document: A1