CN115171905A - Tumor patient similarity calculation method based on one-hot coding unsupervised clustering - Google Patents

Tumor patient similarity calculation method based on one-hot coding unsupervised clustering Download PDF

Info

Publication number
CN115171905A
CN115171905A CN202210695043.0A CN202210695043A CN115171905A CN 115171905 A CN115171905 A CN 115171905A CN 202210695043 A CN202210695043 A CN 202210695043A CN 115171905 A CN115171905 A CN 115171905A
Authority
CN
China
Prior art keywords
patients
clinical
hot
unsupervised clustering
similarity calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210695043.0A
Other languages
Chinese (zh)
Other versions
CN115171905B (en
Inventor
张如奎
刘雷
朱超宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202210695043.0A priority Critical patent/CN115171905B/en
Publication of CN115171905A publication Critical patent/CN115171905A/en
Application granted granted Critical
Publication of CN115171905B publication Critical patent/CN115171905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses a tumor patient similarity calculation method based on one-hot coding unsupervised clustering; firstly, uniformly adopting one-hot coding to each observation index of clinical data to obtain a characteristic embedded matrix; then KMeans unsupervised clustering is carried out on the characteristic embedding matrix to generate a Patient Similarity Network (PSN); then, based on the overall survival time (OS) of the tumor patients, carrying out clinical outcome correlation analysis, and checking and evaluating the statistical difference of the survival curves of the patients after different clustering groups to obtain the cPSN with highly correlated clinical outcomes; and finally, for a target tumor patient to be evaluated, acquiring a group of patients which are most similar to the target patient in the cpnn by using a KNN algorithm, and selecting the range and the fineness of the target patient by adjusting the K value. The method can overcome the difficulties that the multi-modal medical data is difficult to encode and integrate and the algorithm depends on the marking of doctors, and constructs the cpns to effectively restore the similarity of patients.

Description

Tumor patient similarity calculation method based on one-hot coding unsupervised clustering
Technical Field
The invention belongs to the technical field of intelligent medicine, is applied to the fields of intelligent medical treatment and precise medical treatment, and relates to a tumor patient similarity calculation method based on one-hot coding unsupervised clustering.
Background
Clinical data of tumors have significantly different characteristics compared to clinical data of other diseases. The tumor clinical data with high information density and large clinical value are histopathology data and molecular genetics data, and the histopathology data and the molecular genetics data are basically unrelated to other diseases, so that the modeling/algorithm based on other conventional diseases has poor effect and insufficient granularity in the tumor field, and is difficult to bring clinical benefits. The pathological examination is mainly used for judging benign and malignant tumor lesions, determining the stage and pathological type of the tumor and the like, and the molecular detection can be used for determining the occurrence reason of the tumor, carrying out molecular classification of the tumor, determining the expression of a marker and the like. The pathological examination and the molecular detection can comprehensively describe the tumor characteristics and the tumor microenvironment to make scientific diagnosis, thereby providing important basis for doctors to select treatment methods, formulate reasonable treatment schemes, evaluate treatment effects, judge prognosis and the like.
Although different tumor patients show great tumor heterogeneity, there are always some patients who are similar, but at present, no matter in clinical practice or in medical research, how to define and how to evaluate the similarity of patients still remains a problem. Patient similarity calculation, which evaluates the similarity between patients by mathematically calculating multi-modal heterogeneity data for the patients, appears to be a solution. Generally, the first step in the patient similarity calculation is to determine a multi-modal data integration processing strategy; the second step is to define a patient Similarity metric (Similarity metrics) to compute the distance or Similarity score between patients in a systematic and consistent manner; the third step is to establish a Patient Similarity Network (PSN), and carry out cluster analysis, characteristic analysis and the like in a PSN system; finally, for a new patient to be evaluated, a group of patients most similar to the target patient is located or defined in the PSN based on the patient's similarity score.
Current patient similarity calculations are almost entirely applicable to non-neoplastic diseases, typically by drawing hospitalization, diagnosis, treatment, prescription drugs, laboratory test data, physiological monitoring data, etc. from Electronic Medical Records (EMRs). At present, the similarity calculation of patients only adopts continuous numerical variable parameters to calculate the Euclidean distance; some methods can calculate the distance between a father node and each child node by ICD (interface control document) level coding on disease characteristics so as to evaluate the similarity, or convert medical record information into a medical knowledge map, perform vectorization representation on entity nodes, and calculate the path similarity of the nodes. The method for code conversion has obvious defects, needs to convert into other systems such as ICD codes, knowledge maps and the like, and has the defects of indirect calculation, various influencing factors, complex operation process and influence on the accuracy of results.
With the development of the deep learning discipline, similarity learning prediction is performed using a Convolutional Neural Network (CNN) by representing a disease as a vector or a matrix. The model obtained by the method is highly personalized, and the generalization capability is weak although the model is excellent in experimental data set; in addition, CNN belongs to supervised learning, and in fact, all similarity networks only need to adopt supervised or semi-supervised learning, the similarity of a part of patients needs to be marked in advance, and the weight of parameters and the threshold value of similarity need to be trained, which is influenced by the subjective of doctors and the quality of neural network algorithm in actual operation, and the reliability of results is insufficient. These two features lead to a large discount in clinical application value of deep learning represented by neural networks in tumor patient similarity assessment.
Disclosure of Invention
In order to overcome the difficulties that the multi-modal medical data coding integration is difficult and the algorithm depends on the doctor labeling, the invention uniformly adopts one-hot encoding (one-hot encoding) to all clinical data, adopts an unsupervised method to carry out the patient similarity calculation, and finally constructs a set of highly relevant patient similarity network cPNS (clinical PSN) of clinical outcome so as to effectively restore the similarity of patients. The cPSN can be used for accurately positioning tumor patients, quickly making effective treatment intervention schemes by referring to past cases, accurately predicting clinical outcomes and the like.
The technical scheme of the invention is specifically described as follows.
The invention provides a tumor patient similarity calculation method based on one-hot coding unsupervised clustering, which comprises the following steps of:
(1) Uniformly encoding each observation index of the clinical data by one-hot coding to obtain a characteristic embedding matrix;
(2) Performing KMeans clustering on the feature embedding matrix to generate a patient similarity network PSN;
(3) Carrying out clinical outcome correlation analysis on the clustered grouped patients based on the overall survival time OS of the tumor patients, and evaluating the statistical difference of the survival curves of the clustered different grouped patients to obtain a patient similarity network cPSN with highly relevant clinical outcomes;
(4) For a target tumor patient to be evaluated, a group of patients most similar to the target patient is obtained in the cPNS by using a K neighbor algorithm based on distance calculation, and the range and fineness of the target patient are selected by adjusting the K value.
In the invention, in the step (1), the observation indexes of the clinical data comprise numerical variables, classification variables and clinical qualitative descriptions. An independent variable can be made for histopathological data such as superior mesenteric vein/portal vein involvement, qualitative description of surgical margin status, etc.; for molecular genetic data such as gene mutation data of clinical gene detection, immunohistochemical data and the like, each gene mutation and the expression level of each gene can be used as an independent variable.
In the invention, in the step (1), when one-hot coding is adopted, the classification variables are directly coded, numerical variables and clinical qualitative description are firstly converted into classification variables, and each classification state of each variable is marked as one-hot characteristic; suppose there are M observation targets in a set of samples, denoted as
Figure DEST_PATH_IMAGE001
Each observation index
Figure 41159DEST_PATH_IMAGE002
Is provided with
Figure DEST_PATH_IMAGE003
Various classification states, recorded as
Figure 839350DEST_PATH_IMAGE004
In all, have
Figure DEST_PATH_IMAGE005
One-hot characteristics. Preferably, for numerical variables, dividing the numerical values in a group of samples into 4 parts according to a quartile method to form 4 classification variables; for clinical qualitative profiling, there are N states that form N categorical variables.
In the invention, when a missing value appears in an observation index of clinical data, the missing value is taken as an independent one-hot coding type, and a null value does not need to be filled.
In the invention, in the step (2), the KMeans clustering algorithm is Lloyd-Forgy.
In the invention, in the step (2), when KMans are clustered, the clustering effect is evaluated by using a Silhouette score method or a gap statistical method for the number K of clusters selected each time.
In the invention, a patient similarity network PSN is a coding clustering set of all patients, is an M' -dimensional high-dimensional network, embodies the similarity distance between the patients, and is the sum of classification states; further preferably, the high-dimensional network can be visualized in a reduced dimension by using a t-SNE method, and two-dimensional or three-dimensional display is adopted.
In the invention, in the step (3), the Kaplan-Meier method is used for carrying out the correlation analysis of clinical outcome; the statistical differences in the survival curves of the patients in the different groups after clustering were evaluated using the log-rank test, and cPSN, which is highly relevant for clinical outcome, was obtained based on the significance of the p-values.
Compared with the prior art, the invention has the beneficial effects that: a group of patients can be embedded and represented in a high-dimensional space according to clinical characteristics, and similarity calculation is carried out on tumor patients; the method can efficiently encode any clinical data, and has strong data processing capability and good robustness.
The method can overcome the difficulties that the multi-modal medical data is difficult to encode and integrate and the algorithm depends on the marking of doctors, and construct the cPSN to truly restore the similarity of patients.
Aiming at missing values in different observation indexes of clinical data, the missing value is used as an independent one-hot coding type without filling a null value, so that the classification error caused by coding filling of other coding modes can be reduced.
The method carries out KMeans unsupervised clustering and evaluates K through a statistical algorithm to obtain the optimal K, and the whole process is unsupervised and unsupervised without human intervention; the invention carries out the survival analysis on the constructed patient network model by using the 'gold standard' OS for evaluating the clinical prognosis of the tumor for carrying out the clinical relevance evaluation, thereby ensuring the clinical significance and the clinical practical value of the released PSN.
The invention uniformly adopts single-hot coding for multi-modal and highly heterogeneous clinical data, flexibly compatible with clinical data of observation indexes and observation state changes caused by different medical institutions, different doctors and medical development stages, has simple and effective data processing method, wide application range and strong expansibility, and can carry out high-precision similarity grouping on heterogeneous tumor patients.
Drawings
Fig. 1 is a two-dimensional visualization of cpns of 9 classification clusters of 114 gastric cancer patients.
Fig. 2 is a graph of survival for 9 taxonomic clusters of 114 patients with gastric cancer.
Detailed Description
The technical scheme of the invention is explained in detail by combining the drawings and the embodiment.
The invention develops a tumor patient similarity calculation method and system based on single-hot-code unsupervised clustering, and the method and system are compatible with histopathology data, molecular genetics data and other data which are considered to be brought in clinically. The patient similarity calculation is carried out by adopting an unsupervised method, and finally a set of cpns highly related to clinical outcomes is constructed, so that the similarity of patients is effectively restored.
The clinical data of the primary tumors are multi-modal, highly heterogeneous, with some numerical variables (e.g., age, TMB value), some classification variables (e.g., clinical stage, pathological type), and some qualitative description of the clinical findings being observed(e.g., tumor resectability, driver mutations, surgical paradigm), we used one-hot coding uniformly for all clinical data types. For numerical variables, the numerical values in a set of samples are divided into 4 parts according to a quartile method, namely 4 classification variables are formed. For clinical qualitative profiling, there are N states that form N categorical variables. Suppose there are M observation targets in a set of samples, denoted as
Figure 569409DEST_PATH_IMAGE001
Each observation index
Figure 35025DEST_PATH_IMAGE002
Is provided with
Figure 90706DEST_PATH_IMAGE003
Various states are recorded
Figure 376194DEST_PATH_IMAGE004
In all, have
Figure 847626DEST_PATH_IMAGE005
One-hot characteristics.
One-hot encoding works for all data types, is simple and efficient, and although the gradual change between the variable states of the classification variables is ignored, does not affect patient similarity calculation based on large samples. The similarity between the numerical value and the classification variable is considered (the numerical value is only a measurement method can give the numerical value), the numerical value variable quartiles is converted into the classification variable, and the method is scientifically fitted to clinic. One-hot encoding treats the missing value as an independent classification state without padding null values. The single hot code has strong expansibility, can efficiently process the fusion of numerical values, images, texts and gene detection data, and can be flexibly compatible even if the observation indexes and the observation states of different medical institutions, different doctors and the medical development stages are changed.
After the original data processing is finished, the original data becomes a characteristic embedding matrix, and then KMeans clustering is carried out. The clustering algorithm is Lloyd-Forgy and comprises the following steps:
1) And setting an initialized random seed to ensure that the clustering result can be repeated each time.
2) Coefficient for assigning Z points of total number of patients to K classes
Figure 167749DEST_PATH_IMAGE006
The values belonging to the kth class are marked as 1, otherwise, the values are 0.
3) The objective of the iteration is to minimize the loss function:
Figure DEST_PATH_IMAGE007
4) Calculating the distance from each point to the central point, calculating
Figure 630217DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
If, if
Figure 403001DEST_PATH_IMAGE010
Otherwise
Figure DEST_PATH_IMAGE011
5) Recalculating the center point for each class:
Figure 943704DEST_PATH_IMAGE012
6) Repeating the steps 4) and 5) until convergence.
Selecting K as 2 to 10 respectively generates clustering results (10 can be replaced by a larger integer), and evaluating the clustering effect by using a silouette score method or a gap stability method for each selected clustering number K. The maximum score of Silhauette score is optimal, and the Gap stability method Gap (k) ≧ Gap (k \8197; + \8197; 1) -S (k + 1) For optimization, S (k + 1) Represents the standard deviation. And (4) performing dimensionality reduction visualization on the clustering result by using a t-SNE method, and displaying in two dimensions or three dimensions.
After clustering by KMeans, a patient similarity network PSN was generated, and in order to examine the actual clinical significance of this PSN, we used the Kaplan-Meier method to perform correlation analysis of clinical outcome using a "gold standard" tumor patient Overall Survival (OS) that assesses tumor clinical prognosis and clinical benefit. And the statistical differences in survival curves of these different groups of patients were evaluated using the log-rank test, and according to the significance of the p-value, cPSN was published with a high correlation in clinical outcome.
For a target tumor patient to be evaluated, a group of patients most similar to the target patient is obtained in the cPSN by using a K-Nearest Neighbor (KNN) algorithm, the range and fineness of the target patient are selected by adjusting the K value (the K value can be manually adjusted when the target patient is positioned in a clinical practical operation), and according to clinical characteristics of the group of patients, a treatment scheme, a clinical outcome prediction and the like are quickly formulated by referring to past cases.
Example 1
We collected clinical data for 114 gastric cancers, containing 15 features.
1. Quantitative characterization
1) The method comprises 4 steps: [ 'HRD _ sum', 'Ploid', 'Age', 'TMB' ].
2) The numerical values of each quantitative feature are sorted and divided into four equal parts, and the quartile Q is calculated by using a formula i/4 x (n-1) +1, wherein i is the fourth quartile point, and n is the number of statistical data.
3) The numerical value of the quantitative feature is converted into the qualitative feature.
2. Qualitative features
1) The method comprises 11 steps: 'digital _ sequence', 'Differentiation', 'Lauren _ classification', 'Prognosis _ stage', 'Lymph _ node _ status', 'Metastatis', 'Recurrence', 'ERBB2_ amp _ IHC', 'CDH1_ mut', 'TP53_ mut', 'Histologic _ diagnosis' ].
2) Null values are treated as a special eigenvalue.
3) One-hot encoding is performed for each feature, e.g., the partitioned _ sequenza will be decomposed into 4 features [ 'partitioned _ sequenza _ CIN', 'partitioned _ sequenza _ CS', 'partitioned _ Seza _ CS/CIN',
' Diploid_sequenza_#UNK']。
4) The final 15 original clinical features were decomposed into a total of 65 one-hot features.
3. KMeans clustering
1) After the raw data processing is completed, it becomes a feature matrix of 114 × 65, where each value is between [0,1], kmans clustering is performed using the skleren.
2) And respectively selecting K from 2 to 10 to generate clustering results, and performing dimension reduction visualization by using a sklern. By evaluation 9 as relatively best cluster, we therefore obtained a patient similarity network PSN for one 9 classification clusters, as shown in fig. 1.
4. Correlation analysis of clinical outcome
The OS survival analysis was performed on the 9 patients classified above using R-packs survivval and survivmini, with significant statistical differences in survival curves for the different groups of patients by log-rank test (p = 2 e-08). This is a cPSN with a high correlation with clinical outcome, as shown in fig. 2, it can be seen that the survival curve distances of different groups are very different, indicating that there is a significant difference in the survival status of patients between different groups, and the difference is caused by clustering after similarity calculation.

Claims (9)

1. A tumor patient similarity calculation method based on one-hot coding unsupervised clustering is characterized by comprising the following steps:
(1) Uniformly adopting one-hot coding for each observation index of clinical data to obtain a characteristic embedding matrix;
(2) Performing KMeans unsupervised clustering on the feature embedding matrix to generate a patient similarity network PSN;
(3) Carrying out clinical outcome correlation analysis on the clustered patients based on the overall survival time OS of the tumor patients, and evaluating the statistical difference of survival curves of the patients in different groups to obtain a patient similarity network cPNS with highly relevant clinical outcomes;
(4) For a target tumor patient to be evaluated, a group of patients most similar to the target patient are obtained in the cpnn by using a K-nearest neighbor algorithm based on Euclidean distance calculation, and the range and the fineness of the target patient are selected by adjusting the K value.
2. The tumor patient similarity calculation method based on one-hot coded unsupervised clustering according to claim 1, wherein in step (1), the observed indicators of the clinical data comprise numerical variables, categorical variables and clinical qualitative descriptions.
3. The tumor patient similarity calculation method based on one-hot coding unsupervised clustering according to claim 2, wherein in the step (1), when one-hot coding is adopted, the classification variables are directly coded, numerical variables and clinical qualitative descriptions are respectively converted into classification variables, and each classification state of each variable is recorded as one-hot feature; assume that there are M observation targets in a set of samples, and record as
Figure 67507DEST_PATH_IMAGE001
Each observation index
Figure 880742DEST_PATH_IMAGE002
Is provided with
Figure 411343DEST_PATH_IMAGE003
Various classification states, recorded as
Figure 979728DEST_PATH_IMAGE004
In all, have
Figure 664787DEST_PATH_IMAGE005
One-hot characteristics.
4. The tumor patient similarity calculation method based on one-hot coded unsupervised clustering according to claim 3, wherein for numerical variables, dividing the numerical values in a group of samples into 4 parts according to a quartile method to form 4 classification variables; for clinical qualitative profiling, there are N states that form N categorical variables.
5. The tumor patient similarity calculation method based on unsupervised clustering by one-hot coding according to claim 1, wherein in the step (1), when a missing value appears in the observation index of the clinical data, the missing value is used as a one-hot coding type without filling in empty values.
6. The tumor patient similarity calculation method based on one-hot coded unsupervised clustering according to claim 1, wherein in the step (2), in KMeans unsupervised clustering, for each selected clustering number K, the clustering effect is evaluated by using a silouette score method or a gap statistical method, so as to determine the optimal K value.
7. The tumor patient similarity calculation method based on one-hot coded unsupervised clustering according to claim 1, wherein, in step (2), the generated patient similarity network PSN represents the similarity distance between patients, which is a set of encoded clusters of all patients, and is an M 'dimensional high-dimensional network, and M' is the sum of classification states.
8. The tumor patient similarity calculation method based on one-hot coded unsupervised clustering according to claim 7, wherein in the step (2), the generated high-dimensional network PSN is visualized in a dimension reduction manner by using a t-SNE method, and two-dimensional or three-dimensional display is adopted.
9. The tumor patient similarity calculation method based on one-hot coded unsupervised clustering according to claim 1, wherein in step (3), the Kaplan-Meier method is used for clinical outcome correlation analysis; the statistical differences in survival curves of these different groups of patients were assessed using the log-rank test, and cPSN, which is highly relevant for clinical outcome, was obtained based on the significance of the p-value.
CN202210695043.0A 2022-06-20 2022-06-20 Tumor patient similarity calculation method based on one-hot coding unsupervised clustering Active CN115171905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210695043.0A CN115171905B (en) 2022-06-20 2022-06-20 Tumor patient similarity calculation method based on one-hot coding unsupervised clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210695043.0A CN115171905B (en) 2022-06-20 2022-06-20 Tumor patient similarity calculation method based on one-hot coding unsupervised clustering

Publications (2)

Publication Number Publication Date
CN115171905A true CN115171905A (en) 2022-10-11
CN115171905B CN115171905B (en) 2023-04-07

Family

ID=83484526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210695043.0A Active CN115171905B (en) 2022-06-20 2022-06-20 Tumor patient similarity calculation method based on one-hot coding unsupervised clustering

Country Status (1)

Country Link
CN (1) CN115171905B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116364299A (en) * 2023-03-30 2023-06-30 之江实验室 Disease diagnosis and treatment path clustering method and system based on heterogeneous information network
CN116884626A (en) * 2023-05-16 2023-10-13 浙江大学 System for identifying schizophrenia subgroup based on visual perception paradigm and unsupervised clustering algorithm
CN117009839A (en) * 2023-09-28 2023-11-07 之江实验室 Patient clustering method and device based on heterogeneous hypergraph neural network
CN117992802A (en) * 2024-04-03 2024-05-07 天津医科大学总医院 Radiotherapy similarity planning method, system and storage medium based on radiotherapy database

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529165A (en) * 2016-10-28 2017-03-22 合肥工业大学 Method for identifying cancer molecular subtype based on spectral clustering algorithm of sparse similar matrix
CN110084800A (en) * 2019-04-28 2019-08-02 上海海事大学 A kind of Lung metastases prediction technique for four limbs soft tissue sarcoma patient
US20200065616A1 (en) * 2017-10-30 2020-02-27 Tsinghua University Unsupervised exception access detection method and apparatus based on one-hot encoding mechanism
CN111081377A (en) * 2020-01-16 2020-04-28 四川大学 Necrotic acute pancreatitis patient operation time prediction model
US20200320413A1 (en) * 2019-04-08 2020-10-08 Google Llc Creating a machine learning model with k-means clustering
CN114418008A (en) * 2022-01-21 2022-04-29 平安国际智慧城市科技股份有限公司 Medical treatment behavior identification method and device, terminal equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529165A (en) * 2016-10-28 2017-03-22 合肥工业大学 Method for identifying cancer molecular subtype based on spectral clustering algorithm of sparse similar matrix
US20200065616A1 (en) * 2017-10-30 2020-02-27 Tsinghua University Unsupervised exception access detection method and apparatus based on one-hot encoding mechanism
US20200320413A1 (en) * 2019-04-08 2020-10-08 Google Llc Creating a machine learning model with k-means clustering
CN110084800A (en) * 2019-04-28 2019-08-02 上海海事大学 A kind of Lung metastases prediction technique for four limbs soft tissue sarcoma patient
CN111081377A (en) * 2020-01-16 2020-04-28 四川大学 Necrotic acute pancreatitis patient operation time prediction model
CN114418008A (en) * 2022-01-21 2022-04-29 平安国际智慧城市科技股份有限公司 Medical treatment behavior identification method and device, terminal equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116364299A (en) * 2023-03-30 2023-06-30 之江实验室 Disease diagnosis and treatment path clustering method and system based on heterogeneous information network
CN116364299B (en) * 2023-03-30 2024-02-13 之江实验室 Disease diagnosis and treatment path clustering method and system based on heterogeneous information network
CN116884626A (en) * 2023-05-16 2023-10-13 浙江大学 System for identifying schizophrenia subgroup based on visual perception paradigm and unsupervised clustering algorithm
CN117009839A (en) * 2023-09-28 2023-11-07 之江实验室 Patient clustering method and device based on heterogeneous hypergraph neural network
CN117009839B (en) * 2023-09-28 2024-01-09 之江实验室 Patient clustering method and device based on heterogeneous hypergraph neural network
CN117992802A (en) * 2024-04-03 2024-05-07 天津医科大学总医院 Radiotherapy similarity planning method, system and storage medium based on radiotherapy database

Also Published As

Publication number Publication date
CN115171905B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN115171905B (en) Tumor patient similarity calculation method based on one-hot coding unsupervised clustering
CN109086805B (en) Clustering method based on deep neural network and pairwise constraints
Ortiz et al. Segmentation of brain MRI using SOM‐FCM‐based method and 3D statistical descriptors
CN108763590B (en) Data clustering method based on double-variant weighted kernel FCM algorithm
Wang et al. A cancer survival prediction method based on graph convolutional network
CN110097921B (en) Visualized quantitative method and system for glioma internal gene heterogeneity based on image omics
Hu et al. Classifying the multi-omics data of gastric cancer using a deep feature selection method
CN113889192B (en) Single-cell RNA-seq data clustering method based on deep noise reduction self-encoder
CN115699204A (en) Clinical predictor based on multiple machine learning models
CN115985503B (en) Cancer prediction system based on ensemble learning
An et al. Medical Image Segmentation Algorithm Based on Optimized Convolutional Neural Network‐Adaptive Dropout Depth Calculation
CN117253550A (en) Spatial transcriptome data clustering method
CN117422704A (en) Cancer prediction method, system and equipment based on multi-mode data
Zeng et al. Fuzzy entropy clustering by searching local border points for the analysis of gene expression data
Gull et al. A deep learning approach for multi‐stage classification of brain tumor through magnetic resonance images
CN108710690A (en) Medical image search method based on geometric verification
Sarica et al. Conversion from mild cognitive impairment to Alzheimer’s disease: a comparison of tree-based machine learning algorithms for survival analysis
Sachnev et al. Multi-class BCGA-ELM based classifier that identifies biomarkers associated with hallmarks of cancer
Shahweli et al. In Silico Molecular Classification of Breast and Prostate Cancers using Back Propagation Neural Network
Guo et al. Integrated learning: screening optimal biomarkers for identifying preeclampsia in placental mRNA samples
Saha et al. Simultaneous clustering and feature weighting using multiobjective optimization for identifying functionally similar mirnas
Mythili et al. CTCHABC-hybrid online sequential fuzzy Extreme Kernel learning method for detection of Breast Cancer with hierarchical Artificial Bee
CN114141306A (en) Distant metastasis identification method based on gene interaction mode optimization graph representation
Bentkowska et al. Interval modelling in optimization of k‐NN classifiers for large number of attributes in data sets on an example of DNA microarrays
Jayashanka et al. Machine learning approach to predict the survival time of childhood acute lymphoblastic leukemia patients

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant