CN113488166A - Diabetes data analysis model training and data management method, device and equipment - Google Patents

Diabetes data analysis model training and data management method, device and equipment Download PDF

Info

Publication number
CN113488166A
CN113488166A CN202110854902.1A CN202110854902A CN113488166A CN 113488166 A CN113488166 A CN 113488166A CN 202110854902 A CN202110854902 A CN 202110854902A CN 113488166 A CN113488166 A CN 113488166A
Authority
CN
China
Prior art keywords
data
diabetes
clustering
analysis
clinical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110854902.1A
Other languages
Chinese (zh)
Inventor
刘伟业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianren Healthcare Big Data Technology Co Ltd
Original Assignee
Lianren Healthcare Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lianren Healthcare Big Data Technology Co Ltd filed Critical Lianren Healthcare Big Data Technology Co Ltd
Priority to CN202110854902.1A priority Critical patent/CN113488166A/en
Publication of CN113488166A publication Critical patent/CN113488166A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/20ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the invention discloses a method, a device and equipment for training a diabetes data analysis model and managing data, wherein the method comprises the following steps: acquiring clinical data related to diabetes, and preprocessing the clinical data; performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data; and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics.

Description

Diabetes data analysis model training and data management method, device and equipment
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a method, a device and equipment for training a diabetes data analysis model and managing data.
Background
Diabetes mellitus is a metabolic disease characterized by hyperglycemia due to defective insulin secretion or impaired insulin action, and affects the quality of life of patients, even leading to death. Diabetes mellitus is a condition that can be treated with drugs and maintained in a healthy life and diet. Therefore, early diagnosis and health management of diabetes are of great importance. However, the risk assessment and corresponding health management advice for diabetes are mainly given by doctors according to the physical examination reports of users, and are greatly influenced by the experience of doctors, and different doctors may have different diagnosis and treatment results. There is also a need to establish a unified standardized assessment standard in combination with diabetes related clinical data to facilitate analysis and management of clinical data.
Disclosure of Invention
The embodiment of the invention provides a method, a device and equipment for training a diabetes data analysis model and managing data, so as to analyze a large amount of diabetes data, establish a diabetes risk classification system, manage clinical diabetes data under a unified analysis and management standard and improve the life quality of diabetics.
In a first aspect, an embodiment of the present invention provides a method for training a diabetes data analysis model, where the method includes:
acquiring clinical data related to diabetes, and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data;
and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
Optionally, performing hierarchical clustering analysis on the preprocessed clinical data, including:
taking each piece of preprocessed clinical data as a class, calculating the similarity between every two classes, and aggregating the two classes of which the similarity meets a preset condition into one class until a preset iterative clustering stop condition is met;
the number of clusters when hierarchical clustering was stopped was taken as the first cluster number.
Optionally, performing fuzzy clustering analysis based on the result of hierarchical clustering analysis to determine an optimal classification number of the clinical data, including:
performing fuzzy clustering analysis on the preprocessed clinical data and calculating a clustering effectiveness function by taking the first clustering quantity and the corresponding clustering centers as the initial clustering quantity and the initial clustering centers of the first fuzzy clustering respectively;
sequentially decreasing the first cluster quantity, reducing a corresponding cluster center in a preset mode to serve as the initial cluster quantity and the initial cluster center of the next fuzzy cluster analysis, and repeating the process of carrying out fuzzy cluster analysis on the preprocessed clinical data and carrying out cluster validity function calculation until the initial cluster quantity of the fuzzy cluster analysis is less than or equal to two;
and comparing the values of the clustering validity functions of the fuzzy clustering analysis of the past times, and taking the clustering number corresponding to the minimum value of the values of the clustering validity functions as the optimal classification number of the clinical data.
Optionally, calculating a value of the cluster validity function for each cluster analysis result includes:
calculating the cohesion and the separation of the clustering results, wherein the cohesion is expressed as
Figure BDA0003183786470000031
Figure BDA0003183786470000032
The degree of separation is expressed as
Figure BDA0003183786470000033
i, k 1, 2.., c, c represents the number of clusters, and n (i) represents the i-th class dataD (x, y) represents the Euclidean distance between the sample x and the sample y, uijRepresenting the membership degree of the sample i belonging to the j class;
and taking the ratio of the cohesion degree and the separation degree as a numerical value of the clustering effectiveness function.
Optionally, the performing fuzzy clustering analysis on the preprocessed clinical data includes:
initializing parameters of fuzzy clustering analysis, wherein the parameters comprise an iteration stop threshold, a maximum iteration number and a membership matrix;
performing data clustering according to the corresponding initial clustering quantity and the initial clustering center, and updating the membership matrix and the classification center according to the clustering result;
when the difference value between the updated membership matrix and the initialized membership matrix is smaller than the iteration stop threshold or the iteration times exceeds the maximum iteration times, completing one-time fuzzy clustering analysis;
otherwise, clustering the preprocessed clinical data again.
Optionally, before performing hierarchical clustering analysis on the preprocessed clinical data, the method further includes:
mapping the preprocessed clinical data to a pre-defined high-dimensional data space.
Optionally, each classification of the clinical data corresponds to a preset diabetes risk level, and each preset diabetes risk level is provided with a corresponding diabetes health management recommendation.
In a second aspect, an embodiment of the present invention provides a diabetes data management method, including:
acquiring diabetes clinical data and preprocessing the diabetes data, wherein the clinical data comprise basic body information of a target object, clinical biochemical test result data and preset health questionnaire response data;
inputting the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training of the diabetes data analysis model training method of any embodiment to obtain a diabetes clinical data classification result;
and feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
In a third aspect, an embodiment of the present invention further provides a diabetes data analysis model training device, where the device includes:
the data acquisition and preprocessing module is used for acquiring clinical data related to diabetes and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
the data analysis module is used for carrying out hierarchical clustering analysis on the preprocessed clinical data and carrying out fuzzy clustering analysis on the basis of the result of the hierarchical clustering analysis so as to determine the optimal classification quantity of the clinical data;
and the model training module is used for inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
Optionally, the data analysis module is specifically configured to:
taking each piece of preprocessed clinical data as a class, calculating the similarity between every two classes, and aggregating the two classes of which the similarity meets a preset condition into one class until a preset iterative clustering stop condition is met;
the number of clusters when hierarchical clustering was stopped was taken as the first cluster number.
Optionally, the data analysis module is specifically configured to:
performing fuzzy clustering analysis on the preprocessed clinical data and calculating a clustering effectiveness function by taking the first clustering quantity and the corresponding clustering centers as the initial clustering quantity and the initial clustering centers of the first fuzzy clustering respectively;
sequentially decreasing the first cluster quantity, reducing a corresponding cluster center in a preset mode to serve as the initial cluster quantity and the initial cluster center of the next fuzzy cluster analysis, and repeating the process of carrying out fuzzy cluster analysis on the preprocessed clinical data and carrying out cluster validity function calculation until the initial cluster quantity of the fuzzy cluster analysis is less than or equal to two;
and comparing the values of the clustering validity functions of the fuzzy clustering analysis of the past times, and taking the clustering number corresponding to the minimum value of the values of the clustering validity functions as the optimal classification number of the clinical data.
Optionally, the data analysis module is further specifically configured to:
calculating the value of a clustering validity function aiming at each clustering analysis result, wherein the value comprises the following steps:
calculating the cohesion and the separation of the clustering results, wherein the cohesion is expressed as
Figure BDA0003183786470000051
Figure BDA0003183786470000052
The degree of separation is expressed as
Figure BDA0003183786470000053
i, k is 1, 2.., c, c denotes the number of clusters, n (i) denotes the number of class i data, d (x, y) denotes the euclidean distance between sample x and sample y, and u (y) denotes the number of class i dataijRepresenting the membership degree of the sample i belonging to the j class;
and taking the ratio of the cohesion degree and the separation degree as a numerical value of the clustering effectiveness function.
A data analysis module further configured to:
initializing parameters of fuzzy clustering analysis, wherein the parameters comprise an iteration stop threshold, a maximum iteration number and a membership matrix;
performing data clustering according to the corresponding initial clustering quantity and the initial clustering center, and updating the membership matrix and the classification center according to the clustering result;
when the difference value between the updated membership matrix and the initialized membership matrix is smaller than the iteration stop threshold or the iteration times exceeds the maximum iteration times, completing one-time fuzzy clustering analysis;
otherwise, clustering the preprocessed clinical data again.
Optionally, the diabetes data analysis model training device further includes a data space mapping module, configured to map the preprocessed clinical data to a preset high-dimensional data space before performing hierarchical clustering analysis on the preprocessed clinical data.
Optionally, each classification of the clinical data corresponds to a preset diabetes risk level, and each preset diabetes risk level is provided with a corresponding diabetes health management recommendation.
In a fourth aspect, an embodiment of the present invention further provides a diabetes data management apparatus, including:
the data acquisition module is used for acquiring clinical diabetes data and preprocessing the diabetes data, wherein the clinical data comprises basic body information of a target object, clinical biochemical test result data and preset health questionnaire answering data;
the data analysis module is used for inputting the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training by the method of any embodiment so as to obtain a diabetes clinical data classification result;
and the data feedback module is used for feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
In a fifth aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a diabetes data analysis model training or diabetes data management method as provided by any of the embodiments of the invention.
In a sixth aspect, embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for training a diabetes data analysis model or managing diabetes data, according to any of the embodiments of the present invention.
The embodiment of the invention has the following advantages or beneficial effects:
according to the embodiment of the invention, the body basic information of the target object, the clinical biochemical test result data, the preset healthy questionnaire question-answer data and other diabetes related clinical data are obtained, the clinical data are preprocessed to obtain a form which can be identified and operated by a computer, then, hierarchical clustering analysis is carried out on the preprocessed clinical data, fuzzy clustering analysis is carried out on the basis of the hierarchical clustering analysis result, and the optimal classification quantity of the clinical data is determined more quickly; and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics or perform prevention guidance of diseases.
Drawings
FIG. 1 is a flowchart of a method for training a diabetes data analysis model according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for diabetes data management according to a second embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a diabetes data analysis model training apparatus according to a third embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a training apparatus for a diabetes data analysis model according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a method for training a diabetes data analysis model according to an embodiment of the present invention, which is applicable to a case where a diabetes data analysis model is constructed and trained based on a large amount of diabetes-related clinical data. The method can be executed by a diabetes data analysis model training device, which can be realized by software and/or hardware and is integrated in a computer device with an application development function.
As shown in fig. 1, the diabetes data analysis model training method includes the following steps:
and S110, acquiring clinical data related to diabetes, and preprocessing the clinical data.
The clinical data related to diabetes may be data that needs to be considered when clinically evaluating diabetes, such as basic body information of the target subject, clinical biochemical test result data and preset health questionnaire answering data. The target object may be a subject with diabetes or a subject without diabetes, and further data analysis may be performed by analyzing the relevant data of the target object for different diabetes courses and states.
Specifically, the basic body information of the target object comprises basic information such as age, height, weight, pulse, heart rate, diastolic pressure, systolic pressure and the like; the clinical biochemical test result data comprises three items of liver function, three items of kidney function, blood routine, urine routine, postprandial blood sugar before meal and the like; the predetermined health questionnaire response data may include health history (whether a disease has been experienced), physical symptoms (systematic query of symptoms and signs performed by people at risk for major chronic diseases), lifestyle and environment (lifestyle and environmental risk factors that cause major chronic diseases, including diet, smoking, drinking, exercise, environmental health risks, etc.), mental health and stress (mood, stress, anxiety, depression, etc.), sleep and health knowledge stores, etc.
Further, the acquired clinical data is preprocessed in a form that can be recognized and calculated by a computer device. The pre-treatment process includes, first, converting the textual description into a number, such as in a health history description, "having pneumonia" may be represented by "1", or "0" if not having pneumonia, or "1" if smoking, and "0" if not smoking. Then, deleting the abnormal data with missing or obvious errors; and then, carrying out discretization and normalization processing on the numerical data.
In a preferred embodiment, the preprocessed clinical data can be mapped to a predetermined high-dimensional data space before the next operation step is started, so as to accelerate the convergence speed of the algorithm. There are a number of low-dimensional spatially linearly inseparable modes of inspection data in clinical data, which can be achieved by non-linearly mapping the data to a high-dimensional feature space. The low-dimensional to high-dimensional mapping is commonly used for kernel function linear kernels, polynomial kernels, Gaussian kernel functions/radial basis kernel functions and the like. Wherein the Gaussian kernel function/radial basis kernel function is the most applied
A wide range of kernel functions. Expressed mathematically, sample data X is represented by a certain mapping Φ as { X }1,x2,…,xnMapping the data to a kernel space F, wherein the mapped sample space is F ═ Φ (x)1),Φ(x2),…,Φ(xn)}。
And S120, performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data.
In the step, a hierarchical clustering mode is adopted firstly, clustering effects of different clustering types are analyzed and determined, and the optimal scheme is selected to serve as an initial clustering center and an initial clustering number of fuzzy clustering analysis, so that the fuzzy clustering analysis can find a proper clustering center more quickly, and a classification result can be more accurate.
Firstly, in the process of hierarchical clustering, a hierarchical nested cluster number is created by calculating the similarity between data points of different classes. In clustering, the original data points of different classes are the bottom layer of the tree, and a clustering tree is created by adopting a bottom-up combination mode. In the bottom layer, each preprocessed clinical data is firstly used as a class, the similarity between each class and other classes is respectively calculated, and merging clustering is carried out according to the similarity numerical value to obtain a new layer of data. And then calculating the similarity between different classes in the new layer of data to merge the next layer of structure until a preset iterative clustering stop condition is met. The preset iterative clustering stopping condition may be preset with a threshold, and if the distances between all the clusters are greater than the threshold, the clustering process is stopped. Hierarchical clustering may use euclidean distance to calculate the distance (similarity) between data points of different classes, with the smaller the distance, the higher the similarity. And the clustering quantity when the hierarchical clustering is stopped is the first clustering quantity.
Further, fuzzy clustering analysis is performed based on the results of hierarchical clustering analysis to determine an optimal number of classifications for the clinical data. Specifically, the first clustering number is used as the initial clustering number of the first fuzzy clustering, and when hierarchical clustering is stopped, the clustering center of the clustering result of the first clustering number is used as the initial clustering center of the first fuzzy clustering.
Fuzzy clustering is a process of continuously and iteratively calculating data membership and clustering cluster centers until the optimal value is reached, and for a single sample xiThe sum of its membership degrees for each cluster is 1. The data mapped by the high-dimensional data space will be described as an example. In the fuzzy clustering process, the objective function and constraint conditions after high-dimensional mapping can be expressed as the following formula:
Figure BDA0003183786470000101
wherein, N is the number of sample data, C is the cluster number of the cluster, i, j are labels; uij denotes the degree of membership of sample i to class j, cjRepresenting the cluster center for class j, and m is a weighting parameter, typically taken to be 2.
Further, introducing a Lagrange multiplier method to construct a new objective function as follows:
Figure BDA0003183786470000111
for three variables lambda, uij、cjThe partial derivative is solved, the partial derivative is equal to 0, and the following calculation results:
Figure BDA0003183786470000112
furthermore, when the high-dimensional data space mapping adopts the Gaussian kernel function to perform data mapping, the Gaussian kernel function is combined to obtain the data space mapping through calculation
Figure BDA0003183786470000113
Wherein, the Gaussian kernel function can be expressed as
Figure BDA0003183786470000114
Further, it can be according to the formula
Figure BDA0003183786470000115
And in the fuzzy clustering analysis process, calculating and updating a membership degree matrix and a clustering center. In each fuzzy clustering analysis process, an iteration stop threshold epsilon and a maximum iteration number lmax are initialized and set in advance. The termination condition of iteration in the process of the primary fuzzy clustering analysis is
Figure BDA0003183786470000116
Wherein k is iteration step number, epsilon is iteration stop threshold, when the iteration termination condition formula is satisfied, the iteration is continued, and the membership degree is not changed greatly, namely the membership is consideredThe degree is unchanged and a relatively optimal (local optimal or global optimal) state has been reached. The process converges to a local minimum or saddle point of the target objective function. After one-time fuzzy clustering is completed, a clustering effectiveness function can be calculated, and the effectiveness of the classification result is evaluated. And then, starting from the first clustering quantity, reducing the numerical value 1 each time, sequentially decreasing, reducing the initial clustering quantity of fuzzy clustering, and repeatedly performing fuzzy clustering analysis on the preprocessed clinical data.
Specifically, after the first fuzzy clustering analysis, the first clustering number is sequentially decreased, the clustering centers determined by the first clustering analysis are decreased by one in a preset mode to serve as the initial clustering number and the initial clustering centers of the next fuzzy clustering analysis, and the process of carrying out fuzzy clustering analysis on the preprocessed clinical data and carrying out clustering validity function calculation is repeated until the initial clustering number of the fuzzy clustering analysis is less than or equal to two. For example, if the first clustering number is determined to be 12 through hierarchical clustering, then fuzzy clustering analysis can be performed with the initial clustering category number of 12; then, fuzzy clustering analysis is carried out with the initial clustering number being 11. When the initial cluster number is 11, the cluster centers of each class may be determined by subtracting one cluster center from 12 cluster centers for which the initial cluster number is 12 for the fuzzy cluster analysis. The determination process may be to merge the centers with the highest similarity among the 12 cluster centers. The above process is repeated until the number of initial clusters of the fuzzy cluster analysis is reduced to 2.
Then, the values of the clustering validity functions of the fuzzy clustering analysis of the past times are compared, and the clustering quantity (namely, the classification category quantity) corresponding to the minimum value of the values of the clustering validity functions is used as the optimal classification quantity of the clinical data. In this embodiment, the clustering validity function is determined by the degree of cohesion and the degree of separation of the clustering results of different clustering numbers. Specifically, the cohesion and the separation of the clustering results of different clustering numbers are calculated respectively. "cohesion" means the degree of similarity of samples within a class, and "separation" means the degree of independence of samples between classes. Wherein the degree of cohesion is represented by
Figure BDA0003183786470000121
The degree of separation is expressed as
Figure BDA0003183786470000122
i, k 1, 2.., c, c represents the number of clusters, n (i) represents the number of i-th class data, and d (x)j,yi) Represents a sample xjAnd sample yiEuropean distance between uijRepresenting the degree of membership that the sample i belongs to class j. The ratio of the degree of cohesion and the degree of separation is then used as a numerical value as a function of the effectiveness of the clustering. That is, the cluster validity function reflects the degree of separation between information and particles, and is defined as follows: gd (c) ═ (cd (c))/(sd (c)). When the GD (c) value is smaller, the clustering result is better.
And S130, inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model, and performing model training to obtain the target diabetes data analysis model.
According to the result of the multiple clustering analysis, the finally determined optimal classification quantity, namely the classification result, can be used as the final classification result of the clinical data, each classification of the clinical data can be determined to correspond to a preset diabetes risk level by summarizing the characteristics of different types of clinical data, and each preset diabetes risk level is provided with a corresponding diabetes health management suggestion. Then, each clinical data and its corresponding classification label can be used as a machine learning sample, and input into a preset initial diabetes data analysis model for model training. The preset initial diabetes data analysis model can be a neural network, a Support Vector Machine (SVM), a random forest and other structures. In the model training process, the initial diabetes data analysis model continuously learns the characteristics of sample data, adjusts the model parameters, and finally can classify the sample data with a certain accuracy rate, so that a target diabetes data analysis model is obtained for analyzing and managing clinical diabetes data. The doctor or the patient can know the risk level condition of diabetes and the life and diet management mode under a unified scientific standard.
According to the technical scheme, the method comprises the steps of obtaining body basic information of a target object, clinical biochemical test result data, preset healthy questionnaire question-answer data and other clinical data related to diabetes, preprocessing the clinical data to obtain a form which can be identified and operated by a computer, then carrying out hierarchical clustering analysis on the preprocessed clinical data, carrying out fuzzy clustering analysis based on the result of the hierarchical clustering analysis, and determining the optimal classification quantity of the clinical data more quickly; and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics or perform prevention guidance of diseases.
Example two
Fig. 2 is a flowchart of a diabetes data management method according to a second embodiment of the present invention, where the present embodiment and the sample processing method in the foregoing embodiments belong to the same inventive concept, and further describe a process of performing diabetes data analysis management through a trained diabetes data analysis model. The method may be performed by a diabetes data management apparatus, which may be implemented by means of software and/or hardware, integrated in a computer device having application development functionality.
As shown in fig. 2, the diabetes data management method includes the steps of:
and S210, acquiring diabetes clinical data, and preprocessing the diabetes data.
The clinical data of diabetes is data that needs to be considered when clinically evaluating diabetes, such as basic body information of a target subject, clinical biochemical test result data and preset health questionnaire response data. The target object may be a subject with diabetes or a subject without diabetes, and further data analysis may be performed by analyzing the relevant data of the target object for different diabetes courses and states. When a subject wants to perform an examination assessment of the person's own diabetes condition, relevant clinical data can be collected at the hospital. And inputting the acquired data into a diabetes data management system, and automatically preprocessing and analyzing the data. The preprocessing process is related to the form requirement of the model input sample for data analysis, and the specific preprocessing process can refer to the data preprocessing process in the first embodiment.
And S220, inputting the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training of the diabetes data analysis model training method in any embodiment to obtain a diabetes clinical data classification result.
The target diabetes data analysis model obtained by training the diabetes data analysis model training method according to any embodiment can analyze and classify clinical diabetes data under a unified analysis and management standard. Therefore, after the preprocessed diabetes clinical data are input into the target diabetes data analysis model, the classification result of the diabetes clinical data and the health management suggestion corresponding to the classification result can be determined, and the contents of diet, exercise, a body index monitoring scheme, education, psychology and the like are related.
And S230, feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to a target object.
The content output by the model is fed back to a doctor or a target user such as a patient, so that the doctor can be assisted in clinical judgment, or the diabetic patient can be effectively guided in life, the quality of life is improved, and the development of the course of diabetes is controlled.
According to the technical scheme of the embodiment, the obtained diabetes clinical data are input into the target diabetes data analysis model obtained through training by the method of the embodiment, the output result of the model is obtained, and the result is fed back to the user, so that a doctor can be assisted in clinical judgment, or the diabetes patient can be effectively given life guidance, the life quality is improved, and the development of the course of diabetes is controlled. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics or perform prevention guidance of diseases.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a diabetes data analysis model training device according to a third embodiment of the present invention, which is applicable to a case where a diabetes data analysis model is constructed and trained based on a large amount of diabetes-related clinical data.
As shown in fig. 3, the diabetes data analysis model training apparatus includes: a data acquisition and preprocessing module 310, a data analysis module 320, and a model training module 330.
The data acquisition and preprocessing module 310 is configured to acquire clinical data related to diabetes and preprocess the clinical data, where the clinical data includes basic body information of a target subject, clinical biochemical test result data, and preset health questionnaire response data; the data analysis module 320 is used for performing hierarchical clustering analysis on the preprocessed clinical data and performing fuzzy clustering analysis based on the result of the hierarchical clustering analysis to determine the optimal classification quantity of the clinical data; and the model training module 330 is configured to input each clinical data classified according to the optimal classification number and the corresponding classification label as a model training sample into the initial diabetes data analysis model, and perform model training to obtain a target diabetes data analysis model.
According to the technical scheme, the method comprises the steps of obtaining body basic information of a target object, clinical biochemical test result data, preset healthy questionnaire question-answer data and other clinical data related to diabetes, preprocessing the clinical data to obtain a form which can be identified and operated by a computer, then carrying out hierarchical clustering analysis on the preprocessed clinical data, carrying out fuzzy clustering analysis based on the result of the hierarchical clustering analysis, and determining the optimal classification quantity of the clinical data more quickly; and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics or perform prevention guidance of diseases.
Optionally, the data analysis module 320 is specifically configured to:
taking each piece of preprocessed clinical data as a class, calculating the similarity between every two classes, and aggregating the two classes of which the similarity meets a preset condition into one class until a preset iterative clustering stop condition is met;
the number of clusters when hierarchical clustering was stopped was taken as the first cluster number.
Optionally, the data analysis module is specifically configured to:
performing fuzzy clustering analysis on the preprocessed clinical data and calculating a clustering effectiveness function by taking the first clustering quantity and the corresponding clustering centers as the initial clustering quantity and the initial clustering centers of the first fuzzy clustering respectively;
sequentially decreasing the first cluster quantity, reducing a corresponding cluster center in a preset mode to serve as the initial cluster quantity and the initial cluster center of the next fuzzy cluster analysis, and repeating the process of carrying out fuzzy cluster analysis on the preprocessed clinical data and carrying out cluster validity function calculation until the initial cluster quantity of the fuzzy cluster analysis is less than or equal to two;
and comparing the values of the clustering validity functions of the fuzzy clustering analysis of the past times, and taking the clustering number corresponding to the minimum value of the values of the clustering validity functions as the optimal classification number of the clinical data.
Optionally, the data analysis module is further specifically configured to:
calculating the value of a clustering validity function aiming at each clustering analysis result, wherein the value comprises the following steps:
calculating the cohesion and the separation of the clustering results, wherein the cohesion is expressed as
Figure BDA0003183786470000171
Figure BDA0003183786470000172
The degree of separation is expressed as
Figure BDA0003183786470000173
i, k is 1, 2.., c, c denotes the number of clusters, n (i) denotes the number of class i data, d (x, y) denotes the euclidean distance between sample x and sample y, and u (y) denotes the number of class i dataijRepresenting the membership degree of the sample i belonging to the j class;
and taking the ratio of the cohesion degree and the separation degree as a numerical value of the clustering effectiveness function.
A data analysis module further configured to:
initializing parameters of fuzzy clustering analysis, wherein the parameters comprise an iteration stop threshold, a maximum iteration number and a membership matrix;
performing data clustering according to the corresponding initial clustering quantity and the initial clustering center, and updating the membership matrix and the classification center according to the clustering result;
when the difference value between the updated membership matrix and the initialized membership matrix is smaller than the iteration stop threshold or the iteration times exceeds the maximum iteration times, completing one-time fuzzy clustering analysis;
otherwise, clustering the preprocessed clinical data again.
Optionally, the diabetes data analysis model training device further includes a data space mapping module, configured to map the preprocessed clinical data to a preset high-dimensional data space before performing hierarchical clustering analysis on the preprocessed clinical data.
Optionally, each classification of the clinical data corresponds to a preset diabetes risk level, and each preset diabetes risk level is provided with a corresponding diabetes health management recommendation.
The diabetes data analysis model training device provided by the embodiment of the invention can execute the diabetes data analysis model training method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of a diabetes data management apparatus according to a fourth embodiment of the present invention, which is applicable to a case of performing diabetes data analysis management through a trained diabetes data analysis model, and the apparatus may be implemented in a software and/or hardware manner and integrated into a computer device having an application development function.
As shown in fig. 4, the diabetes data management apparatus includes: a data acquisition module 410, a data analysis module 420, and a data feedback module 430.
The data acquisition module 410 is configured to acquire clinical diabetes data and preprocess the clinical diabetes data, where the clinical diabetes data includes basic body information of a target subject, clinical biochemical test result data, and preset health questionnaire response data; a data analysis module 420, configured to input the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training in a diabetes data analysis model training method according to any embodiment, so as to obtain a classification result of the diabetes clinical data; the data feedback module 430 is configured to feed back a preset diabetes risk level corresponding to the classification result of the clinical diabetes data and a diabetes health management recommendation corresponding to the preset diabetes risk level to the target object.
According to the technical scheme of the embodiment, the obtained diabetes clinical data are input into the target diabetes data analysis model obtained through training by the method of the embodiment, the output result of the model is obtained, and the result is fed back to the user, so that a doctor can be assisted in clinical judgment, or the diabetes patient can be effectively given life guidance, the life quality is improved, and the development of the course of diabetes is controlled. The embodiment of the invention solves the problem of non-uniform analysis standards of the diabetes data, realizes the analysis of a large amount of diabetes data, establishes a diabetes risk grade classification system, and can manage clinical diabetes data under the uniform analysis and management standards so as to improve the life quality of diabetics or perform prevention guidance of diseases.
The diabetes data management device provided by the embodiment of the invention can execute the diabetes data management method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 5 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention. The computer device 12 may be any terminal device with computing capability, such as a terminal device of an intelligent controller, a server, a mobile phone, and the like.
As shown in FIG. 5, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes programs stored in the system memory 28 to execute various functional applications and data processing, such as implementing a diabetes data analysis model training or diabetes data management method provided by the present embodiment.
The training method of the diabetes data analysis model comprises the following steps:
acquiring clinical data related to diabetes, and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data;
and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
A diabetes data management method, comprising:
acquiring diabetes clinical data and preprocessing the diabetes data, wherein the clinical data comprise basic body information of a target object, clinical biochemical test result data and preset health questionnaire response data;
inputting the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training of the diabetes data analysis model training method of any embodiment to obtain a diabetes clinical data classification result;
and feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
EXAMPLE six
The sixth embodiment provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for training a diabetes data analysis model or managing diabetes data, as provided by any of the embodiments of the present invention.
The training method of the diabetes data analysis model comprises the following steps:
acquiring clinical data related to diabetes, and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data;
and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
A diabetes data management method, comprising:
acquiring diabetes clinical data and preprocessing the diabetes data, wherein the clinical data comprise basic body information of a target object, clinical biochemical test result data and preset health questionnaire response data;
inputting the preprocessed diabetes clinical data into a target diabetes data analysis model obtained by training of the diabetes data analysis model training method of any embodiment to obtain a diabetes clinical data classification result;
and feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (12)

1. A method for training a diabetes data analysis model, the method comprising:
acquiring clinical data related to diabetes, and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
performing hierarchical clustering analysis on the preprocessed clinical data, and performing fuzzy clustering analysis based on the hierarchical clustering analysis result to determine the optimal classification quantity of the clinical data;
and inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
2. The method of claim 1, wherein performing hierarchical clustering analysis on the preprocessed clinical data comprises:
taking each piece of preprocessed clinical data as a class, calculating the similarity between every two classes, and aggregating the two classes of which the similarity meets a preset condition into one class until a preset iterative clustering stop condition is met;
the number of clusters when hierarchical clustering was stopped was taken as the first cluster number.
3. The method of claim 2, wherein performing fuzzy clustering analysis based on results of hierarchical clustering analysis to determine an optimal number of classifications for the clinical data comprises:
performing fuzzy clustering analysis on the preprocessed clinical data and calculating a clustering effectiveness function by taking the first clustering quantity and the corresponding clustering centers as the initial clustering quantity and the initial clustering centers of the first fuzzy clustering respectively;
sequentially decreasing the first cluster quantity, reducing a corresponding cluster center in a preset mode to serve as the initial cluster quantity and the initial cluster center of the next fuzzy cluster analysis, and repeating the process of carrying out fuzzy cluster analysis on the preprocessed clinical data and carrying out cluster validity function calculation until the initial cluster quantity of the fuzzy cluster analysis is less than or equal to two;
and comparing the values of the clustering validity functions of the fuzzy clustering analysis of the past times, and taking the clustering number corresponding to the minimum value of the values of the clustering validity functions as the optimal classification number of the clinical data.
4. The method of claim 3, wherein calculating a value of a cluster validity function for each cluster analysis result comprises:
calculating the cohesion and the separation of the clustering results, wherein the cohesion is expressed as
Figure FDA0003183786460000021
Figure FDA0003183786460000022
The degree of separation is expressed as
Figure FDA0003183786460000023
c represents the number of clusters, n (i) represents the number of data of the ith class, d (x, y) represents the Euclidean distance between a sample x and a sample y, uijRepresenting the membership degree of the sample i belonging to the j class;
and taking the ratio of the cohesion degree and the separation degree as a numerical value of the clustering effectiveness function.
5. The method of claim 3, wherein performing fuzzy clustering analysis on the preprocessed clinical data comprises:
initializing parameters of fuzzy clustering analysis, wherein the parameters comprise an iteration stop threshold, a maximum iteration number and a membership matrix;
performing data clustering according to the corresponding initial clustering quantity and the initial clustering center, and updating the membership matrix and the classification center according to the clustering result;
when the difference value between the updated membership matrix and the initialized membership matrix is smaller than the iteration stop threshold or the iteration times exceeds the maximum iteration times, completing one-time fuzzy clustering analysis;
otherwise, clustering the preprocessed clinical data again.
6. The method of claim 1, wherein prior to performing hierarchical cluster analysis on the preprocessed clinical data, the method further comprises:
mapping the preprocessed clinical data to a pre-defined high-dimensional data space.
7. The method of any one of claims 1-6, wherein each classification of the clinical data corresponds to a preset diabetes risk level, and each preset diabetes risk level is provided with a corresponding diabetes health management recommendation.
8. A method of diabetes data management, the method comprising:
acquiring diabetes clinical data and preprocessing the diabetes data, wherein the clinical data comprise basic body information of a target object, clinical biochemical test result data and preset health questionnaire response data;
inputting the preprocessed diabetes clinical data into a target diabetes data analysis model trained by the method of any one of claims 1-7 to obtain the diabetes clinical data classification result;
and feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
9. A diabetes data analysis model training apparatus, characterized in that the apparatus comprises:
the data acquisition and preprocessing module is used for acquiring clinical data related to diabetes and preprocessing the clinical data, wherein the clinical data comprises target object body basic information, clinical biochemical test result data and preset health questionnaire answering data;
the data analysis module is used for carrying out hierarchical clustering analysis on the preprocessed clinical data and carrying out fuzzy clustering analysis on the basis of the result of the hierarchical clustering analysis so as to determine the optimal classification quantity of the clinical data;
and the model training module is used for inputting each clinical data classified according to the optimal classification quantity and the corresponding classification label as a model training sample into the initial diabetes data analysis model for model training to obtain the target diabetes data analysis model.
10. A diabetes data management apparatus, characterized in that the apparatus comprises:
the data acquisition module is used for acquiring clinical diabetes data and preprocessing the diabetes data, wherein the clinical data comprises basic body information of a target object, clinical biochemical test result data and preset health questionnaire answering data;
a data analysis module, configured to input the preprocessed diabetes clinical data into a target diabetes data analysis model trained by the method according to any one of claims 1 to 7, so as to obtain a classification result of the diabetes clinical data;
and the data feedback module is used for feeding back a preset diabetes risk grade corresponding to the diabetes clinical data classification result and a diabetes health management suggestion corresponding to the preset diabetes risk grade to the target object.
11. A computer device, characterized in that the computer device comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the diabetes data analysis model training or diabetes data management method of any of claims 1-8.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of diabetes data analysis model training or diabetes data management according to any one of claims 1 to 8.
CN202110854902.1A 2021-07-28 2021-07-28 Diabetes data analysis model training and data management method, device and equipment Pending CN113488166A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110854902.1A CN113488166A (en) 2021-07-28 2021-07-28 Diabetes data analysis model training and data management method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110854902.1A CN113488166A (en) 2021-07-28 2021-07-28 Diabetes data analysis model training and data management method, device and equipment

Publications (1)

Publication Number Publication Date
CN113488166A true CN113488166A (en) 2021-10-08

Family

ID=77943201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110854902.1A Pending CN113488166A (en) 2021-07-28 2021-07-28 Diabetes data analysis model training and data management method, device and equipment

Country Status (1)

Country Link
CN (1) CN113488166A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115954107A (en) * 2022-12-20 2023-04-11 首都医科大学附属北京佑安医院 Method and device for analyzing clinical examination data of primary biliary cholangitis
CN117542460A (en) * 2024-01-09 2024-02-09 江苏尤里卡生物科技有限公司 Adaptive parameter optimization method and system for urokinase separation
CN117672445A (en) * 2023-12-18 2024-03-08 郑州大学 Diabetes mellitus debilitation current situation analysis method and system based on big data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915560A (en) * 2015-06-11 2015-09-16 万达信息股份有限公司 Method for disease diagnosis and treatment scheme based on generalized neural network clustering
CN107403072A (en) * 2017-08-07 2017-11-28 北京工业大学 A kind of diabetes B prediction and warning method based on machine learning
CN107545133A (en) * 2017-07-20 2018-01-05 陆维嘉 A kind of Gaussian Blur cluster calculation method for antidiastole chronic bronchitis
CN110197724A (en) * 2019-03-12 2019-09-03 平安科技(深圳)有限公司 Predict the method, apparatus and computer equipment in diabetes illness stage
WO2020181805A1 (en) * 2019-03-12 2020-09-17 平安科技(深圳)有限公司 Diabetes prediction method and apparatus, storage medium, and computer device
CN112802568A (en) * 2021-02-03 2021-05-14 紫东信息科技(苏州)有限公司 Multi-label stomach disease classification method and device based on medical history text
CN112802606A (en) * 2021-01-28 2021-05-14 联仁健康医疗大数据科技股份有限公司 Data screening model establishing method, data screening device, data screening equipment and data screening medium
CN113012806A (en) * 2021-02-20 2021-06-22 西安交通大学医学院第二附属医院 Early prediction method for gestational diabetes
CN113128536A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Unsupervised learning method, system, computer device and readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915560A (en) * 2015-06-11 2015-09-16 万达信息股份有限公司 Method for disease diagnosis and treatment scheme based on generalized neural network clustering
CN107545133A (en) * 2017-07-20 2018-01-05 陆维嘉 A kind of Gaussian Blur cluster calculation method for antidiastole chronic bronchitis
CN107403072A (en) * 2017-08-07 2017-11-28 北京工业大学 A kind of diabetes B prediction and warning method based on machine learning
CN110197724A (en) * 2019-03-12 2019-09-03 平安科技(深圳)有限公司 Predict the method, apparatus and computer equipment in diabetes illness stage
WO2020181805A1 (en) * 2019-03-12 2020-09-17 平安科技(深圳)有限公司 Diabetes prediction method and apparatus, storage medium, and computer device
CN113128536A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Unsupervised learning method, system, computer device and readable storage medium
CN112802606A (en) * 2021-01-28 2021-05-14 联仁健康医疗大数据科技股份有限公司 Data screening model establishing method, data screening device, data screening equipment and data screening medium
CN112802568A (en) * 2021-02-03 2021-05-14 紫东信息科技(苏州)有限公司 Multi-label stomach disease classification method and device based on medical history text
CN113012806A (en) * 2021-02-20 2021-06-22 西安交通大学医学院第二附属医院 Early prediction method for gestational diabetes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张鹏林等编著: "《时空数据库原理与技术》", vol. 2019, 武汉:武汉大学出版社, pages: 297 - 298 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115954107A (en) * 2022-12-20 2023-04-11 首都医科大学附属北京佑安医院 Method and device for analyzing clinical examination data of primary biliary cholangitis
CN115954107B (en) * 2022-12-20 2024-01-26 首都医科大学附属北京佑安医院 Method and device for analyzing clinical test data of primary cholangitis
CN117672445A (en) * 2023-12-18 2024-03-08 郑州大学 Diabetes mellitus debilitation current situation analysis method and system based on big data
CN117542460A (en) * 2024-01-09 2024-02-09 江苏尤里卡生物科技有限公司 Adaptive parameter optimization method and system for urokinase separation
CN117542460B (en) * 2024-01-09 2024-03-22 江苏尤里卡生物科技有限公司 Adaptive parameter optimization method and system for urokinase separation

Similar Documents

Publication Publication Date Title
Islam et al. Likelihood prediction of diabetes at early stage using data mining techniques
Dorado-Díaz et al. Applications of artificial intelligence in cardiology. The future is already here
US20210342212A1 (en) Method and system for identifying root causes
US10553319B1 (en) Artificial intelligence systems and methods for vibrant constitutional guidance
CN113488166A (en) Diabetes data analysis model training and data management method, device and equipment
US20150324527A1 (en) Learning health systems and methods
US11468363B2 (en) Methods and systems for classification to prognostic labels using expert inputs
CN112541056B (en) Medical term standardization method, device, electronic equipment and storage medium
CN114026651A (en) Automatic generation of structured patient data records
US20220084633A1 (en) Systems and methods for automatically identifying a candidate patient for enrollment in a clinical trial
US11157822B2 (en) Methods and systems for classification using expert data
WO2021114635A1 (en) Patient grouping model constructing method, patient grouping method, and related device
Singh et al. Data mining classifier for predicting diabetics
US11694814B1 (en) Determining patient condition from unstructured text data
CN111553478A (en) Community old people cardiovascular disease prediction system and method based on big data
Davazdahemami et al. A deep learning approach for predicting early bounce-backs to the emergency departments
CN116864139A (en) Disease risk assessment method, device, computer equipment and readable storage medium
Bayramli et al. Predictive structured–unstructured interactions in EHR models: A case study of suicide prediction
JP2021536636A (en) How to classify medical records
D’Amario et al. GENERATOR HEART FAILURE DataMart: An integrated framework for heart failure research
Teng et al. Few-shot ICD coding with knowledge transfer and evidence representation
US20200321112A1 (en) Systems and methods for generating alimentary instruction sets based on vibrant constitutional guidance
US11749397B2 (en) Inferring semantic data organization from machine-learned relationships
CN111863283A (en) Medical health big data platform
Hassan et al. Efficient prediction of coronary artery disease using machine learning algorithms with feature selection techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination