CN113488166A - Diabetes data analysis model training and data management method, device and equipment - Google Patents
Diabetes data analysis model training and data management method, device and equipment Download PDFInfo
- Publication number
- CN113488166A CN113488166A CN202110854902.1A CN202110854902A CN113488166A CN 113488166 A CN113488166 A CN 113488166A CN 202110854902 A CN202110854902 A CN 202110854902A CN 113488166 A CN113488166 A CN 113488166A
- Authority
- CN
- China
- Prior art keywords
- data
- diabetes
- clustering
- analysis
- clinical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010012601 diabetes mellitus Diseases 0.000 title claims abstract description 252
- 238000007405 data analysis Methods 0.000 title claims abstract description 100
- 238000000034 method Methods 0.000 title claims abstract description 79
- 238000012549 training Methods 0.000 title claims abstract description 74
- 238000013523 data management Methods 0.000 title claims description 21
- 238000004458 analytical method Methods 0.000 claims abstract description 71
- 238000007417 hierarchical cluster analysis Methods 0.000 claims abstract description 33
- 238000007781 pre-processing Methods 0.000 claims abstract description 25
- 230000006870 function Effects 0.000 claims description 47
- 230000036541 health Effects 0.000 claims description 36
- 238000007726 management method Methods 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 21
- 238000010876 biochemical test Methods 0.000 claims description 20
- 238000000926 separation method Methods 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 17
- 238000007621 cluster analysis Methods 0.000 claims description 16
- 238000013507 mapping Methods 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000003247 decreasing effect Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 6
- 201000010099 disease Diseases 0.000 description 6
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000002265 prevention Effects 0.000 description 5
- 230000037213 diet Effects 0.000 description 4
- 235000005911 diet Nutrition 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000000391 smoking effect Effects 0.000 description 3
- 208000017667 Chronic Disease Diseases 0.000 description 2
- 206010035664 Pneumonia Diseases 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical compound N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000035882 stress Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 208000019901 Anxiety disease Diseases 0.000 description 1
- 102000004877 Insulin Human genes 0.000 description 1
- 108090001061 Insulin Proteins 0.000 description 1
- 208000032023 Signs and Symptoms Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000036506 anxiety Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013506 data mapping Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000035487 diastolic blood pressure Effects 0.000 description 1
- 208000016097 disease of metabolism Diseases 0.000 description 1
- 230000035622 drinking Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000005183 environmental health Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 201000001421 hyperglycemia Diseases 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 229940125396 insulin Drugs 0.000 description 1
- 230000003914 insulin secretion Effects 0.000 description 1
- 230000003907 kidney function Effects 0.000 description 1
- 230000003908 liver function Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 230000004630 mental health Effects 0.000 description 1
- 208000030159 metabolic disease Diseases 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000000291 postprandial effect Effects 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000035488 systolic blood pressure Effects 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/231—Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/20—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- Primary Health Care (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The embodiment of the invention discloses a method, a device and equipment for training a diabetes data analysis model and managing data, wherein the method comprises the following steps: acquiring clinical data related to diabetes, and preprocessing the clinical data; performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data; and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics.
Description
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a method, a device and equipment for training a diabetes data analysis model and managing data.
Background
Diabetes mellitus is a metabolic disease characterized by hyperglycemia due to defective insulin secretion or impaired insulin action, and affects the quality of life of patients, even leading to death. Diabetes mellitus is a condition that can be treated with drugs and maintained in a healthy life and diet. Therefore, early diagnosis and health management of diabetes are of great importance. However, the risk assessment and corresponding health management advice for diabetes are mainly given by doctors according to the physical examination reports of users, and are greatly influenced by the experience of doctors, and different doctors may have different diagnosis and treatment results. There is also a need to establish a unified standardized assessment standard in combination with diabetes related clinical data to facilitate analysis and management of clinical data.
Disclosure of Invention
The embodiment of the invention provides a method, a device and equipment for training a diabetes data analysis model and managing data, so as to analyze a large amount of diabetes data, establish a diabetes risk classification system, manage clinical diabetes data under a unified analysis and management standard and improve the life quality of diabetics.
In a first aspect, an embodiment of the present invention provides a method for training a diabetes data analysis model, where the method includes:
acquiring clinical data related to diabetes, and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data;
and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
Optionally, performing hierarchical clustering analysis on the preprocessed clinical data, including:
taking each piece of preprocessed clinical data as a class, calculating the similarity between every two classes, and aggregating the two classes of which the similarity meets a preset condition into one class until a preset iterative clustering stop condition is met;
the number of clusters when hierarchical clustering was stopped was taken as the first cluster number.
Optionally, performing fuzzy clustering analysis based on the result of hierarchical clustering analysis to determine an optimal classification number of the clinical data, including:
performing fuzzy clustering analysis on the preprocessed clinical data and calculating a clustering effectiveness function by taking the first clustering quantity and the corresponding clustering centers as the initial clustering quantity and the initial clustering centers of the first fuzzy clustering respectively;
sequentially decreasing the first cluster quantity, reducing a corresponding cluster center in a preset mode to serve as the initial cluster quantity and the initial cluster center of the next fuzzy cluster analysis, and repeating the process of carrying out fuzzy cluster analysis on the preprocessed clinical data and carrying out cluster validity function calculation until the initial cluster quantity of the fuzzy cluster analysis is less than or equal to two;
and comparing the values of the clustering validity functions of the fuzzy clustering analysis of the past times, and taking the clustering number corresponding to the minimum value of the values of the clustering validity functions as the optimal classification number of the clinical data.
Optionally, calculating a value of the cluster validity function for each cluster analysis result includes:
calculating the cohesion and the separation of the clustering results, wherein the cohesion is expressed as The degree of separation is expressed asi, k 1, 2.., c, c represents the number of clusters, and n (i) represents the i-th class dataD (x, y) represents the Euclidean distance between the sample x and the sample y, uijRepresenting the membership degree of the sample i belonging to the j class;
and taking the ratio of the cohesion degree and the separation degree as a numerical value of the clustering effectiveness function.
Optionally, the performing fuzzy clustering analysis on the preprocessed clinical data includes:
initializing parameters of fuzzy clustering analysis, wherein the parameters comprise an iteration stop threshold, a maximum iteration number and a membership matrix;
performing data clustering according to the corresponding initial clustering quantity and the initial clustering center, and updating the membership matrix and the classification center according to the clustering result;
when the difference value between the updated membership matrix and the initialized membership matrix is smaller than the iteration stop threshold or the iteration times exceeds the maximum iteration times, completing one-time fuzzy clustering analysis;
otherwise, clustering the preprocessed clinical data again.
Optionally, before performing hierarchical clustering analysis on the preprocessed clinical data, the method further includes:
mapping the preprocessed clinical data to a pre-defined high-dimensional data space.
Optionally, each classification of the clinical data corresponds to a preset diabetes risk level, and each preset diabetes risk level is provided with a corresponding diabetes health management recommendation.
In a second aspect, an embodiment of the present invention provides a diabetes data management method, including:
acquiring diabetes clinical data and preprocessing the diabetes data, wherein the clinical data comprise basic body information of a target object, clinical biochemical test result data and preset health questionnaire response data;
inputting the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training of the diabetes data analysis model training method of any embodiment to obtain a diabetes clinical data classification result;
and feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
In a third aspect, an embodiment of the present invention further provides a diabetes data analysis model training device, where the device includes:
the data acquisition and preprocessing module is used for acquiring clinical data related to diabetes and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
the data analysis module is used for carrying out hierarchical clustering analysis on the preprocessed clinical data and carrying out fuzzy clustering analysis on the basis of the result of the hierarchical clustering analysis so as to determine the optimal classification quantity of the clinical data;
and the model training module is used for inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
Optionally, the data analysis module is specifically configured to:
taking each piece of preprocessed clinical data as a class, calculating the similarity between every two classes, and aggregating the two classes of which the similarity meets a preset condition into one class until a preset iterative clustering stop condition is met;
the number of clusters when hierarchical clustering was stopped was taken as the first cluster number.
Optionally, the data analysis module is specifically configured to:
performing fuzzy clustering analysis on the preprocessed clinical data and calculating a clustering effectiveness function by taking the first clustering quantity and the corresponding clustering centers as the initial clustering quantity and the initial clustering centers of the first fuzzy clustering respectively;
sequentially decreasing the first cluster quantity, reducing a corresponding cluster center in a preset mode to serve as the initial cluster quantity and the initial cluster center of the next fuzzy cluster analysis, and repeating the process of carrying out fuzzy cluster analysis on the preprocessed clinical data and carrying out cluster validity function calculation until the initial cluster quantity of the fuzzy cluster analysis is less than or equal to two;
and comparing the values of the clustering validity functions of the fuzzy clustering analysis of the past times, and taking the clustering number corresponding to the minimum value of the values of the clustering validity functions as the optimal classification number of the clinical data.
Optionally, the data analysis module is further specifically configured to:
calculating the value of a clustering validity function aiming at each clustering analysis result, wherein the value comprises the following steps:
calculating the cohesion and the separation of the clustering results, wherein the cohesion is expressed as The degree of separation is expressed asi, k is 1, 2.., c, c denotes the number of clusters, n (i) denotes the number of class i data, d (x, y) denotes the euclidean distance between sample x and sample y, and u (y) denotes the number of class i dataijRepresenting the membership degree of the sample i belonging to the j class;
and taking the ratio of the cohesion degree and the separation degree as a numerical value of the clustering effectiveness function.
A data analysis module further configured to:
initializing parameters of fuzzy clustering analysis, wherein the parameters comprise an iteration stop threshold, a maximum iteration number and a membership matrix;
performing data clustering according to the corresponding initial clustering quantity and the initial clustering center, and updating the membership matrix and the classification center according to the clustering result;
when the difference value between the updated membership matrix and the initialized membership matrix is smaller than the iteration stop threshold or the iteration times exceeds the maximum iteration times, completing one-time fuzzy clustering analysis;
otherwise, clustering the preprocessed clinical data again.
Optionally, the diabetes data analysis model training device further includes a data space mapping module, configured to map the preprocessed clinical data to a preset high-dimensional data space before performing hierarchical clustering analysis on the preprocessed clinical data.
Optionally, each classification of the clinical data corresponds to a preset diabetes risk level, and each preset diabetes risk level is provided with a corresponding diabetes health management recommendation.
In a fourth aspect, an embodiment of the present invention further provides a diabetes data management apparatus, including:
the data acquisition module is used for acquiring clinical diabetes data and preprocessing the diabetes data, wherein the clinical data comprises basic body information of a target object, clinical biochemical test result data and preset health questionnaire answering data;
the data analysis module is used for inputting the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training by the method of any embodiment so as to obtain a diabetes clinical data classification result;
and the data feedback module is used for feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
In a fifth aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a diabetes data analysis model training or diabetes data management method as provided by any of the embodiments of the invention.
In a sixth aspect, embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for training a diabetes data analysis model or managing diabetes data, according to any of the embodiments of the present invention.
The embodiment of the invention has the following advantages or beneficial effects:
according to the embodiment of the invention, the body basic information of the target object, the clinical biochemical test result data, the preset healthy questionnaire question-answer data and other diabetes related clinical data are obtained, the clinical data are preprocessed to obtain a form which can be identified and operated by a computer, then, hierarchical clustering analysis is carried out on the preprocessed clinical data, fuzzy clustering analysis is carried out on the basis of the hierarchical clustering analysis result, and the optimal classification quantity of the clinical data is determined more quickly; and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics or perform prevention guidance of diseases.
Drawings
FIG. 1 is a flowchart of a method for training a diabetes data analysis model according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for diabetes data management according to a second embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a diabetes data analysis model training apparatus according to a third embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a training apparatus for a diabetes data analysis model according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a method for training a diabetes data analysis model according to an embodiment of the present invention, which is applicable to a case where a diabetes data analysis model is constructed and trained based on a large amount of diabetes-related clinical data. The method can be executed by a diabetes data analysis model training device, which can be realized by software and/or hardware and is integrated in a computer device with an application development function.
As shown in fig. 1, the diabetes data analysis model training method includes the following steps:
and S110, acquiring clinical data related to diabetes, and preprocessing the clinical data.
The clinical data related to diabetes may be data that needs to be considered when clinically evaluating diabetes, such as basic body information of the target subject, clinical biochemical test result data and preset health questionnaire answering data. The target object may be a subject with diabetes or a subject without diabetes, and further data analysis may be performed by analyzing the relevant data of the target object for different diabetes courses and states.
Specifically, the basic body information of the target object comprises basic information such as age, height, weight, pulse, heart rate, diastolic pressure, systolic pressure and the like; the clinical biochemical test result data comprises three items of liver function, three items of kidney function, blood routine, urine routine, postprandial blood sugar before meal and the like; the predetermined health questionnaire response data may include health history (whether a disease has been experienced), physical symptoms (systematic query of symptoms and signs performed by people at risk for major chronic diseases), lifestyle and environment (lifestyle and environmental risk factors that cause major chronic diseases, including diet, smoking, drinking, exercise, environmental health risks, etc.), mental health and stress (mood, stress, anxiety, depression, etc.), sleep and health knowledge stores, etc.
Further, the acquired clinical data is preprocessed in a form that can be recognized and calculated by a computer device. The pre-treatment process includes, first, converting the textual description into a number, such as in a health history description, "having pneumonia" may be represented by "1", or "0" if not having pneumonia, or "1" if smoking, and "0" if not smoking. Then, deleting the abnormal data with missing or obvious errors; and then, carrying out discretization and normalization processing on the numerical data.
In a preferred embodiment, the preprocessed clinical data can be mapped to a predetermined high-dimensional data space before the next operation step is started, so as to accelerate the convergence speed of the algorithm. There are a number of low-dimensional spatially linearly inseparable modes of inspection data in clinical data, which can be achieved by non-linearly mapping the data to a high-dimensional feature space. The low-dimensional to high-dimensional mapping is commonly used for kernel function linear kernels, polynomial kernels, Gaussian kernel functions/radial basis kernel functions and the like. Wherein the Gaussian kernel function/radial basis kernel function is the most applied
A wide range of kernel functions. Expressed mathematically, sample data X is represented by a certain mapping Φ as { X }1,x2,…,xnMapping the data to a kernel space F, wherein the mapped sample space is F ═ Φ (x)1),Φ(x2),…,Φ(xn)}。
And S120, performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data.
In the step, a hierarchical clustering mode is adopted firstly, clustering effects of different clustering types are analyzed and determined, and the optimal scheme is selected to serve as an initial clustering center and an initial clustering number of fuzzy clustering analysis, so that the fuzzy clustering analysis can find a proper clustering center more quickly, and a classification result can be more accurate.
Firstly, in the process of hierarchical clustering, a hierarchical nested cluster number is created by calculating the similarity between data points of different classes. In clustering, the original data points of different classes are the bottom layer of the tree, and a clustering tree is created by adopting a bottom-up combination mode. In the bottom layer, each preprocessed clinical data is firstly used as a class, the similarity between each class and other classes is respectively calculated, and merging clustering is carried out according to the similarity numerical value to obtain a new layer of data. And then calculating the similarity between different classes in the new layer of data to merge the next layer of structure until a preset iterative clustering stop condition is met. The preset iterative clustering stopping condition may be preset with a threshold, and if the distances between all the clusters are greater than the threshold, the clustering process is stopped. Hierarchical clustering may use euclidean distance to calculate the distance (similarity) between data points of different classes, with the smaller the distance, the higher the similarity. And the clustering quantity when the hierarchical clustering is stopped is the first clustering quantity.
Further, fuzzy clustering analysis is performed based on the results of hierarchical clustering analysis to determine an optimal number of classifications for the clinical data. Specifically, the first clustering number is used as the initial clustering number of the first fuzzy clustering, and when hierarchical clustering is stopped, the clustering center of the clustering result of the first clustering number is used as the initial clustering center of the first fuzzy clustering.
Fuzzy clustering is a process of continuously and iteratively calculating data membership and clustering cluster centers until the optimal value is reached, and for a single sample xiThe sum of its membership degrees for each cluster is 1. The data mapped by the high-dimensional data space will be described as an example. In the fuzzy clustering process, the objective function and constraint conditions after high-dimensional mapping can be expressed as the following formula:wherein, N is the number of sample data, C is the cluster number of the cluster, i, j are labels; uij denotes the degree of membership of sample i to class j, cjRepresenting the cluster center for class j, and m is a weighting parameter, typically taken to be 2.
Further, introducing a Lagrange multiplier method to construct a new objective function as follows:
for three variables lambda, uij、cjThe partial derivative is solved, the partial derivative is equal to 0, and the following calculation results:furthermore, when the high-dimensional data space mapping adopts the Gaussian kernel function to perform data mapping, the Gaussian kernel function is combined to obtain the data space mapping through calculationWherein, the Gaussian kernel function can be expressed as
Further, it can be according to the formulaAnd in the fuzzy clustering analysis process, calculating and updating a membership degree matrix and a clustering center. In each fuzzy clustering analysis process, an iteration stop threshold epsilon and a maximum iteration number lmax are initialized and set in advance. The termination condition of iteration in the process of the primary fuzzy clustering analysis isWherein k is iteration step number, epsilon is iteration stop threshold, when the iteration termination condition formula is satisfied, the iteration is continued, and the membership degree is not changed greatly, namely the membership is consideredThe degree is unchanged and a relatively optimal (local optimal or global optimal) state has been reached. The process converges to a local minimum or saddle point of the target objective function. After one-time fuzzy clustering is completed, a clustering effectiveness function can be calculated, and the effectiveness of the classification result is evaluated. And then, starting from the first clustering quantity, reducing the numerical value 1 each time, sequentially decreasing, reducing the initial clustering quantity of fuzzy clustering, and repeatedly performing fuzzy clustering analysis on the preprocessed clinical data.
Specifically, after the first fuzzy clustering analysis, the first clustering number is sequentially decreased, the clustering centers determined by the first clustering analysis are decreased by one in a preset mode to serve as the initial clustering number and the initial clustering centers of the next fuzzy clustering analysis, and the process of carrying out fuzzy clustering analysis on the preprocessed clinical data and carrying out clustering validity function calculation is repeated until the initial clustering number of the fuzzy clustering analysis is less than or equal to two. For example, if the first clustering number is determined to be 12 through hierarchical clustering, then fuzzy clustering analysis can be performed with the initial clustering category number of 12; then, fuzzy clustering analysis is carried out with the initial clustering number being 11. When the initial cluster number is 11, the cluster centers of each class may be determined by subtracting one cluster center from 12 cluster centers for which the initial cluster number is 12 for the fuzzy cluster analysis. The determination process may be to merge the centers with the highest similarity among the 12 cluster centers. The above process is repeated until the number of initial clusters of the fuzzy cluster analysis is reduced to 2.
Then, the values of the clustering validity functions of the fuzzy clustering analysis of the past times are compared, and the clustering quantity (namely, the classification category quantity) corresponding to the minimum value of the values of the clustering validity functions is used as the optimal classification quantity of the clinical data. In this embodiment, the clustering validity function is determined by the degree of cohesion and the degree of separation of the clustering results of different clustering numbers. Specifically, the cohesion and the separation of the clustering results of different clustering numbers are calculated respectively. "cohesion" means the degree of similarity of samples within a class, and "separation" means the degree of independence of samples between classes. Wherein the degree of cohesion is represented byThe degree of separation is expressed asi, k 1, 2.., c, c represents the number of clusters, n (i) represents the number of i-th class data, and d (x)j,yi) Represents a sample xjAnd sample yiEuropean distance between uijRepresenting the degree of membership that the sample i belongs to class j. The ratio of the degree of cohesion and the degree of separation is then used as a numerical value as a function of the effectiveness of the clustering. That is, the cluster validity function reflects the degree of separation between information and particles, and is defined as follows: gd (c) ═ (cd (c))/(sd (c)). When the GD (c) value is smaller, the clustering result is better.
And S130, inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model, and performing model training to obtain the target diabetes data analysis model.
According to the result of the multiple clustering analysis, the finally determined optimal classification quantity, namely the classification result, can be used as the final classification result of the clinical data, each classification of the clinical data can be determined to correspond to a preset diabetes risk level by summarizing the characteristics of different types of clinical data, and each preset diabetes risk level is provided with a corresponding diabetes health management suggestion. Then, each clinical data and its corresponding classification label can be used as a machine learning sample, and input into a preset initial diabetes data analysis model for model training. The preset initial diabetes data analysis model can be a neural network, a Support Vector Machine (SVM), a random forest and other structures. In the model training process, the initial diabetes data analysis model continuously learns the characteristics of sample data, adjusts the model parameters, and finally can classify the sample data with a certain accuracy rate, so that a target diabetes data analysis model is obtained for analyzing and managing clinical diabetes data. The doctor or the patient can know the risk level condition of diabetes and the life and diet management mode under a unified scientific standard.
According to the technical scheme, the method comprises the steps of obtaining body basic information of a target object, clinical biochemical test result data, preset healthy questionnaire question-answer data and other clinical data related to diabetes, preprocessing the clinical data to obtain a form which can be identified and operated by a computer, then carrying out hierarchical clustering analysis on the preprocessed clinical data, carrying out fuzzy clustering analysis based on the result of the hierarchical clustering analysis, and determining the optimal classification quantity of the clinical data more quickly; and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics or perform prevention guidance of diseases.
Example two
Fig. 2 is a flowchart of a diabetes data management method according to a second embodiment of the present invention, where the present embodiment and the sample processing method in the foregoing embodiments belong to the same inventive concept, and further describe a process of performing diabetes data analysis management through a trained diabetes data analysis model. The method may be performed by a diabetes data management apparatus, which may be implemented by means of software and/or hardware, integrated in a computer device having application development functionality.
As shown in fig. 2, the diabetes data management method includes the steps of:
and S210, acquiring diabetes clinical data, and preprocessing the diabetes data.
The clinical data of diabetes is data that needs to be considered when clinically evaluating diabetes, such as basic body information of a target subject, clinical biochemical test result data and preset health questionnaire response data. The target object may be a subject with diabetes or a subject without diabetes, and further data analysis may be performed by analyzing the relevant data of the target object for different diabetes courses and states. When a subject wants to perform an examination assessment of the person's own diabetes condition, relevant clinical data can be collected at the hospital. And inputting the acquired data into a diabetes data management system, and automatically preprocessing and analyzing the data. The preprocessing process is related to the form requirement of the model input sample for data analysis, and the specific preprocessing process can refer to the data preprocessing process in the first embodiment.
And S220, inputting the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training of the diabetes data analysis model training method in any embodiment to obtain a diabetes clinical data classification result.
The target diabetes data analysis model obtained by training the diabetes data analysis model training method according to any embodiment can analyze and classify clinical diabetes data under a unified analysis and management standard. Therefore, after the preprocessed diabetes clinical data are input into the target diabetes data analysis model, the classification result of the diabetes clinical data and the health management suggestion corresponding to the classification result can be determined, and the contents of diet, exercise, a body index monitoring scheme, education, psychology and the like are related.
And S230, feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to a target object.
The content output by the model is fed back to a doctor or a target user such as a patient, so that the doctor can be assisted in clinical judgment, or the diabetic patient can be effectively guided in life, the quality of life is improved, and the development of the course of diabetes is controlled.
According to the technical scheme of the embodiment, the obtained diabetes clinical data are input into the target diabetes data analysis model obtained through training by the method of the embodiment, the output result of the model is obtained, and the result is fed back to the user, so that a doctor can be assisted in clinical judgment, or the diabetes patient can be effectively given life guidance, the life quality is improved, and the development of the course of diabetes is controlled. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics or perform prevention guidance of diseases.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a diabetes data analysis model training device according to a third embodiment of the present invention, which is applicable to a case where a diabetes data analysis model is constructed and trained based on a large amount of diabetes-related clinical data.
As shown in fig. 3, the diabetes data analysis model training apparatus includes: a data acquisition and preprocessing module 310, a data analysis module 320, and a model training module 330.
The data acquisition and preprocessing module 310 is configured to acquire clinical data related to diabetes and preprocess the clinical data, where the clinical data includes basic body information of a target subject, clinical biochemical test result data, and preset health questionnaire response data; the data analysis module 320 is used for performing hierarchical clustering analysis on the preprocessed clinical data and performing fuzzy clustering analysis based on the result of the hierarchical clustering analysis to determine the optimal classification quantity of the clinical data; and the model training module 330 is configured to input each clinical data classified according to the optimal classification number and the corresponding classification label as a model training sample into the initial diabetes data analysis model, and perform model training to obtain a target diabetes data analysis model.
According to the technical scheme, the method comprises the steps of obtaining body basic information of a target object, clinical biochemical test result data, preset healthy questionnaire question-answer data and other clinical data related to diabetes, preprocessing the clinical data to obtain a form which can be identified and operated by a computer, then carrying out hierarchical clustering analysis on the preprocessed clinical data, carrying out fuzzy clustering analysis based on the result of the hierarchical clustering analysis, and determining the optimal classification quantity of the clinical data more quickly; and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics or perform prevention guidance of diseases.
Optionally, the data analysis module 320 is specifically configured to:
taking each piece of preprocessed clinical data as a class, calculating the similarity between every two classes, and aggregating the two classes of which the similarity meets a preset condition into one class until a preset iterative clustering stop condition is met;
the number of clusters when hierarchical clustering was stopped was taken as the first cluster number.
Optionally, the data analysis module is specifically configured to:
performing fuzzy clustering analysis on the preprocessed clinical data and calculating a clustering effectiveness function by taking the first clustering quantity and the corresponding clustering centers as the initial clustering quantity and the initial clustering centers of the first fuzzy clustering respectively;
sequentially decreasing the first cluster quantity, reducing a corresponding cluster center in a preset mode to serve as the initial cluster quantity and the initial cluster center of the next fuzzy cluster analysis, and repeating the process of carrying out fuzzy cluster analysis on the preprocessed clinical data and carrying out cluster validity function calculation until the initial cluster quantity of the fuzzy cluster analysis is less than or equal to two;
and comparing the values of the clustering validity functions of the fuzzy clustering analysis of the past times, and taking the clustering number corresponding to the minimum value of the values of the clustering validity functions as the optimal classification number of the clinical data.
Optionally, the data analysis module is further specifically configured to:
calculating the value of a clustering validity function aiming at each clustering analysis result, wherein the value comprises the following steps:
calculating the cohesion and the separation of the clustering results, wherein the cohesion is expressed as The degree of separation is expressed asi, k is 1, 2.., c, c denotes the number of clusters, n (i) denotes the number of class i data, d (x, y) denotes the euclidean distance between sample x and sample y, and u (y) denotes the number of class i dataijRepresenting the membership degree of the sample i belonging to the j class;
and taking the ratio of the cohesion degree and the separation degree as a numerical value of the clustering effectiveness function.
A data analysis module further configured to:
initializing parameters of fuzzy clustering analysis, wherein the parameters comprise an iteration stop threshold, a maximum iteration number and a membership matrix;
performing data clustering according to the corresponding initial clustering quantity and the initial clustering center, and updating the membership matrix and the classification center according to the clustering result;
when the difference value between the updated membership matrix and the initialized membership matrix is smaller than the iteration stop threshold or the iteration times exceeds the maximum iteration times, completing one-time fuzzy clustering analysis;
otherwise, clustering the preprocessed clinical data again.
Optionally, the diabetes data analysis model training device further includes a data space mapping module, configured to map the preprocessed clinical data to a preset high-dimensional data space before performing hierarchical clustering analysis on the preprocessed clinical data.
Optionally, each classification of the clinical data corresponds to a preset diabetes risk level, and each preset diabetes risk level is provided with a corresponding diabetes health management recommendation.
The diabetes data analysis model training device provided by the embodiment of the invention can execute the diabetes data analysis model training method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of a diabetes data management apparatus according to a fourth embodiment of the present invention, which is applicable to a case of performing diabetes data analysis management through a trained diabetes data analysis model, and the apparatus may be implemented in a software and/or hardware manner and integrated into a computer device having an application development function.
As shown in fig. 4, the diabetes data management apparatus includes: a data acquisition module 410, a data analysis module 420, and a data feedback module 430.
The data acquisition module 410 is configured to acquire clinical diabetes data and preprocess the clinical diabetes data, where the clinical diabetes data includes basic body information of a target subject, clinical biochemical test result data, and preset health questionnaire response data; a data analysis module 420, configured to input the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training in a diabetes data analysis model training method according to any embodiment, so as to obtain a classification result of the diabetes clinical data; the data feedback module 430 is configured to feed back a preset diabetes risk level corresponding to the classification result of the clinical diabetes data and a diabetes health management recommendation corresponding to the preset diabetes risk level to the target object.
According to the technical scheme of the embodiment, the obtained diabetes clinical data are input into the target diabetes data analysis model obtained through training by the method of the embodiment, the output result of the model is obtained, and the result is fed back to the user, so that a doctor can be assisted in clinical judgment, or the diabetes patient can be effectively given life guidance, the life quality is improved, and the development of the course of diabetes is controlled. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics or perform prevention guidance of diseases.
The diabetes data management device provided by the embodiment of the invention can execute the diabetes data management method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 5 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention. The computer device 12 may be any terminal device with computing capability, such as a terminal device of an intelligent controller, a server, a mobile phone, and the like.
As shown in FIG. 5, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
The processing unit 16 executes programs stored in the system memory 28 to execute various functional applications and data processing, such as implementing a diabetes data analysis model training or diabetes data management method provided by the present embodiment.
The training method of the diabetes data analysis model comprises the following steps:
acquiring clinical data related to diabetes, and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data;
and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
A diabetes data management method, comprising:
acquiring diabetes clinical data and preprocessing the diabetes data, wherein the clinical data comprise basic body information of a target object, clinical biochemical test result data and preset health questionnaire response data;
inputting the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training of the diabetes data analysis model training method of any embodiment to obtain a diabetes clinical data classification result;
and feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
EXAMPLE six
The sixth embodiment provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for training a diabetes data analysis model or managing diabetes data, as provided by any of the embodiments of the present invention.
The training method of the diabetes data analysis model comprises the following steps:
acquiring clinical data related to diabetes, and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data;
and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
A diabetes data management method, comprising:
acquiring diabetes clinical data and preprocessing the diabetes data, wherein the clinical data comprise basic body information of a target object, clinical biochemical test result data and preset health questionnaire response data;
inputting the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training of the diabetes data analysis model training method of any embodiment to obtain a diabetes clinical data classification result;
and feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (12)
1. A method for training a diabetes data analysis model, the method comprising:
acquiring clinical data related to diabetes, and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data;
and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
2. The method of claim 1, wherein performing hierarchical clustering analysis on the preprocessed clinical data comprises:
taking each piece of preprocessed clinical data as a class, calculating the similarity between every two classes, and aggregating the two classes of which the similarity meets a preset condition into one class until a preset iterative clustering stop condition is met;
the number of clusters when hierarchical clustering was stopped was taken as the first cluster number.
3. The method of claim 2, wherein performing fuzzy clustering analysis based on results of hierarchical clustering analysis to determine an optimal number of classifications for the clinical data comprises:
performing fuzzy clustering analysis on the preprocessed clinical data and calculating a clustering effectiveness function by taking the first clustering quantity and the corresponding clustering centers as the initial clustering quantity and the initial clustering centers of the first fuzzy clustering respectively;
sequentially decreasing the first cluster quantity, reducing a corresponding cluster center in a preset mode to serve as the initial cluster quantity and the initial cluster center of the next fuzzy cluster analysis, and repeating the process of carrying out fuzzy cluster analysis on the preprocessed clinical data and carrying out cluster validity function calculation until the initial cluster quantity of the fuzzy cluster analysis is less than or equal to two;
and comparing the values of the clustering validity functions of the fuzzy clustering analysis of the past times, and taking the clustering number corresponding to the minimum value of the values of the clustering validity functions as the optimal classification number of the clinical data.
4. The method of claim 3, wherein calculating a value of a cluster validity function for each cluster analysis result comprises:
calculating the cohesion and the separation of the clustering results, wherein the cohesion is expressed as The degree of separation is expressed asc represents the number of clusters, n (i) represents the number of data of the ith class, d (x, y) represents the Euclidean distance between a sample x and a sample y, uijRepresenting the membership degree of the sample i belonging to the j class;
and taking the ratio of the cohesion degree and the separation degree as a numerical value of the clustering effectiveness function.
5. The method of claim 3, wherein performing fuzzy clustering analysis on the preprocessed clinical data comprises:
initializing parameters of fuzzy clustering analysis, wherein the parameters comprise an iteration stop threshold, a maximum iteration number and a membership matrix;
performing data clustering according to the corresponding initial clustering quantity and the initial clustering center, and updating the membership matrix and the classification center according to the clustering result;
when the difference value between the updated membership matrix and the initialized membership matrix is smaller than the iteration stop threshold or the iteration times exceeds the maximum iteration times, completing one-time fuzzy clustering analysis;
otherwise, clustering the preprocessed clinical data again.
6. The method of claim 1, wherein prior to performing hierarchical cluster analysis on the preprocessed clinical data, the method further comprises:
mapping the preprocessed clinical data to a pre-defined high-dimensional data space.
7. The method of any one of claims 1-6, wherein each classification of the clinical data corresponds to a preset diabetes risk level, and each preset diabetes risk level is provided with a corresponding diabetes health management recommendation.
8. A method of diabetes data management, the method comprising:
acquiring diabetes clinical data and preprocessing the diabetes data, wherein the clinical data comprise basic body information of a target object, clinical biochemical test result data and preset health questionnaire response data;
inputting the preprocessed diabetes clinical data into a target diabetes data analysis model trained by the method of any one of claims 1-7 to obtain the diabetes clinical data classification result;
and feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
9. A diabetes data analysis model training apparatus, characterized in that the apparatus comprises:
the data acquisition and preprocessing module is used for acquiring clinical data related to diabetes and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
the data analysis module is used for carrying out hierarchical clustering analysis on the preprocessed clinical data and carrying out fuzzy clustering analysis on the basis of the result of the hierarchical clustering analysis so as to determine the optimal classification quantity of the clinical data;
and the model training module is used for inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
10. A diabetes data management apparatus, characterized in that the apparatus comprises:
the data acquisition module is used for acquiring clinical diabetes data and preprocessing the diabetes data, wherein the clinical data comprises basic body information of a target object, clinical biochemical test result data and preset health questionnaire answering data;
a data analysis module, configured to input the preprocessed diabetes clinical data into a target diabetes data analysis model trained by the method according to any one of claims 1 to 7, so as to obtain a classification result of the diabetes clinical data;
and the data feedback module is used for feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
11. A computer device, characterized in that the computer device comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the diabetes data analysis model training or diabetes data management method of any of claims 1-8.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of diabetes data analysis model training or diabetes data management according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110854902.1A CN113488166A (en) | 2021-07-28 | 2021-07-28 | Diabetes data analysis model training and data management method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110854902.1A CN113488166A (en) | 2021-07-28 | 2021-07-28 | Diabetes data analysis model training and data management method, device and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113488166A true CN113488166A (en) | 2021-10-08 |
Family
ID=77943201
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110854902.1A Pending CN113488166A (en) | 2021-07-28 | 2021-07-28 | Diabetes data analysis model training and data management method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113488166A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115954107A (en) * | 2022-12-20 | 2023-04-11 | 首都医科大学附属北京佑安医院 | Method and device for analyzing clinical examination data of primary biliary cholangitis |
CN117542460A (en) * | 2024-01-09 | 2024-02-09 | 江苏尤里卡生物科技有限公司 | Adaptive parameter optimization method and system for urokinase separation |
CN117672445A (en) * | 2023-12-18 | 2024-03-08 | 郑州大学 | Diabetes mellitus debilitation current situation analysis method and system based on big data |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915560A (en) * | 2015-06-11 | 2015-09-16 | 万达信息股份有限公司 | Method for disease diagnosis and treatment scheme based on generalized neural network clustering |
CN107403072A (en) * | 2017-08-07 | 2017-11-28 | 北京工业大学 | A kind of diabetes B prediction and warning method based on machine learning |
CN107545133A (en) * | 2017-07-20 | 2018-01-05 | 陆维嘉 | A kind of Gaussian Blur cluster calculation method for antidiastole chronic bronchitis |
CN110197724A (en) * | 2019-03-12 | 2019-09-03 | 平安科技(深圳)有限公司 | Predict the method, apparatus and computer equipment in diabetes illness stage |
WO2020181805A1 (en) * | 2019-03-12 | 2020-09-17 | 平安科技(深圳)有限公司 | Diabetes prediction method and apparatus, storage medium, and computer device |
CN112802568A (en) * | 2021-02-03 | 2021-05-14 | 紫东信息科技(苏州)有限公司 | Multi-label stomach disease classification method and device based on medical history text |
CN112802606A (en) * | 2021-01-28 | 2021-05-14 | 联仁健康医疗大数据科技股份有限公司 | Data screening model establishing method, data screening device, data screening equipment and data screening medium |
CN113012806A (en) * | 2021-02-20 | 2021-06-22 | 西安交通大学医学院第二附属医院 | Early prediction method for gestational diabetes |
CN113128536A (en) * | 2019-12-31 | 2021-07-16 | 奇安信科技集团股份有限公司 | Unsupervised learning method, system, computer device and readable storage medium |
-
2021
- 2021-07-28 CN CN202110854902.1A patent/CN113488166A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915560A (en) * | 2015-06-11 | 2015-09-16 | 万达信息股份有限公司 | Method for disease diagnosis and treatment scheme based on generalized neural network clustering |
CN107545133A (en) * | 2017-07-20 | 2018-01-05 | 陆维嘉 | A kind of Gaussian Blur cluster calculation method for antidiastole chronic bronchitis |
CN107403072A (en) * | 2017-08-07 | 2017-11-28 | 北京工业大学 | A kind of diabetes B prediction and warning method based on machine learning |
CN110197724A (en) * | 2019-03-12 | 2019-09-03 | 平安科技(深圳)有限公司 | Predict the method, apparatus and computer equipment in diabetes illness stage |
WO2020181805A1 (en) * | 2019-03-12 | 2020-09-17 | 平安科技(深圳)有限公司 | Diabetes prediction method and apparatus, storage medium, and computer device |
CN113128536A (en) * | 2019-12-31 | 2021-07-16 | 奇安信科技集团股份有限公司 | Unsupervised learning method, system, computer device and readable storage medium |
CN112802606A (en) * | 2021-01-28 | 2021-05-14 | 联仁健康医疗大数据科技股份有限公司 | Data screening model establishing method, data screening device, data screening equipment and data screening medium |
CN112802568A (en) * | 2021-02-03 | 2021-05-14 | 紫东信息科技(苏州)有限公司 | Multi-label stomach disease classification method and device based on medical history text |
CN113012806A (en) * | 2021-02-20 | 2021-06-22 | 西安交通大学医学院第二附属医院 | Early prediction method for gestational diabetes |
Non-Patent Citations (1)
Title |
---|
张鹏林等编著: "《时空数据库原理与技术》", vol. 2019, 武汉:武汉大学出版社, pages: 297 - 298 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115954107A (en) * | 2022-12-20 | 2023-04-11 | 首都医科大学附属北京佑安医院 | Method and device for analyzing clinical examination data of primary biliary cholangitis |
CN115954107B (en) * | 2022-12-20 | 2024-01-26 | 首都医科大学附属北京佑安医院 | Method and device for analyzing clinical test data of primary cholangitis |
CN117672445A (en) * | 2023-12-18 | 2024-03-08 | 郑州大学 | Diabetes mellitus debilitation current situation analysis method and system based on big data |
CN117542460A (en) * | 2024-01-09 | 2024-02-09 | 江苏尤里卡生物科技有限公司 | Adaptive parameter optimization method and system for urokinase separation |
CN117542460B (en) * | 2024-01-09 | 2024-03-22 | 江苏尤里卡生物科技有限公司 | Adaptive parameter optimization method and system for urokinase separation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Islam et al. | Likelihood prediction of diabetes at early stage using data mining techniques | |
Dorado-Díaz et al. | Applications of artificial intelligence in cardiology. The future is already here | |
US20210342212A1 (en) | Method and system for identifying root causes | |
US10553319B1 (en) | Artificial intelligence systems and methods for vibrant constitutional guidance | |
CN113488166A (en) | Diabetes data analysis model training and data management method, device and equipment | |
US20150324527A1 (en) | Learning health systems and methods | |
US11468363B2 (en) | Methods and systems for classification to prognostic labels using expert inputs | |
CN112541056B (en) | Medical term standardization method, device, electronic equipment and storage medium | |
CN114026651A (en) | Automatic generation of structured patient data records | |
US20220084633A1 (en) | Systems and methods for automatically identifying a candidate patient for enrollment in a clinical trial | |
US11157822B2 (en) | Methods and systems for classification using expert data | |
WO2021114635A1 (en) | Patient grouping model constructing method, patient grouping method, and related device | |
Singh et al. | Data mining classifier for predicting diabetics | |
US11694814B1 (en) | Determining patient condition from unstructured text data | |
CN111553478A (en) | Community old people cardiovascular disease prediction system and method based on big data | |
Davazdahemami et al. | A deep learning approach for predicting early bounce-backs to the emergency departments | |
CN116864139A (en) | Disease risk assessment method, device, computer equipment and readable storage medium | |
Bayramli et al. | Predictive structured–unstructured interactions in EHR models: A case study of suicide prediction | |
JP2021536636A (en) | How to classify medical records | |
D’Amario et al. | GENERATOR HEART FAILURE DataMart: An integrated framework for heart failure research | |
Teng et al. | Few-shot ICD coding with knowledge transfer and evidence representation | |
US20200321112A1 (en) | Systems and methods for generating alimentary instruction sets based on vibrant constitutional guidance | |
US11749397B2 (en) | Inferring semantic data organization from machine-learned relationships | |
CN111863283A (en) | Medical health big data platform | |
Hassan et al. | Efficient prediction of coronary artery disease using machine learning algorithms with feature selection techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |