CN114974585A - Construction method of early risk prediction and evaluation model of metabolic syndrome in gestational period - Google Patents

Construction method of early risk prediction and evaluation model of metabolic syndrome in gestational period Download PDF

Info

Publication number
CN114974585A
CN114974585A CN202210593499.6A CN202210593499A CN114974585A CN 114974585 A CN114974585 A CN 114974585A CN 202210593499 A CN202210593499 A CN 202210593499A CN 114974585 A CN114974585 A CN 114974585A
Authority
CN
China
Prior art keywords
model
risk
prediction
gms
early
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210593499.6A
Other languages
Chinese (zh)
Inventor
胡文胜
卢莎
江泓
马聿嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhjou Obstetrics & Gynecology Hospital
Original Assignee
Hangzhjou Obstetrics & Gynecology Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhjou Obstetrics & Gynecology Hospital filed Critical Hangzhjou Obstetrics & Gynecology Hospital
Priority to CN202210593499.6A priority Critical patent/CN114974585A/en
Publication of CN114974585A publication Critical patent/CN114974585A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a construction method of a pregnancy metabolic syndrome early risk prediction and evaluation model, which comprises the following steps: (1) obtaining multi-source heterogeneous data, and preprocessing the data to obtain metabolism related data; (2) screening for poor pregnancy outcome with a high association of Gms; (3) establishing a prediction model by combining extreme gradient boost (XGboost) with a Stacking frame, and inputting the prediction model as a prediction label according to the determined poor pregnancy outcome in the step (2); (4) calculating the feature importance of each modeling factor in the prediction model based on the Shapley value; (5) and (5) establishing a risk hierarchical model based on a clustering algorithm according to the feature importance of the modeling factors in the step (4) to obtain Gms risk levels. By the method, early prediction of the GMS in clinic can be realized, GMS related prediction indexes are discovered as early as possible, risks are avoided to the period of newborn, and then the risk of offspring long-term metabolic diseases is reduced, so that the method has important significance for preventing and reducing GMS.

Description

Construction method of early risk prediction and evaluation model of metabolic syndrome in gestational period
Technical Field
The invention belongs to the technical field of disease risk assessment, and relates to a construction method of a pregnancy metabolic syndrome early risk prediction assessment model.
Background
Metabolic Syndrome (MS) refers to a pathological state in which substances such as protein, fat, carbohydrate and the like in a human body are metabolized, and includes a group of syndromes in which multiple metabolic abnormalities such as insulin resistance, obesity, elevated blood pressure, abnormal sugar metabolism, abnormal lipid metabolism are aggregated, which are risk factors for cardiovascular and cerebrovascular diseases and metabolic diseases caused by diabetes, while pregnancy metabolic syndrome (GMS) is a multiple metabolic abnormality aggregation specific to pregnancy. Along with the improvement of economic development and living standard, the incidence rate of metabolic diseases of pregnant and lying-in women in all over the world is as high as 5-10%, and the number of women with childbearing age in China is about 3.34 hundred million in 2020, so that 7515 million MS patients exist in women with childbearing age in China; meanwhile, about 1600 ten thousand women of childbearing age are pregnant every year in China, so that about 360 ten thousand MS crowds during pregnancy are reckoned every year. Therefore, GMS will cause a huge burden of diseases and economic burden, and has become a global non-negligible public health problem.
The metabolic syndrome in gestational period not only directly influences the current pregnancy outcome, but also potentially acts on the long-term physiological health conditions of pregnant and lying-in women and offspring, and the probability of metabolic syndrome and cardiovascular diseases of the offspring after the offspring is grown is obviously increased. According to the DOHAD theory, namely the developmental origin theory of health and diseases, whether GMS intervenes in the early stage or not directly influences the pregnancy outcome of a lying-in woman and the long-term physiological health condition of offspring, early GMS risk assessment and scientific health management can interrupt the vicious circle of metabolic abnormality between mothers and children, avoid the risk to the neonatal period, further reduce the risk of offspring long-term metabolic diseases, and therefore have important significance for promoting national quality.
The clinical assessment of GMS is broadly divided into two categories of studies. One is GMS risk factor research, and Niu et al have demonstrated that the more metabolic risk factors such as overweight/obesity, gestational hypertriglyceridemia, low-high-density lipoprotein cholesterol, hyperglycemia, and hypertension aggregate, the greater the risk value for adverse pregnancy outcome such as preterm birth, small/large gestational age, eclampsia, gestational diabetes, neonatal asphyxia, and fetal death. The existing GMS diagnostic standard is established on the basis of the diagnosis standard of the metabolic syndrome established by the diabetes and urology division of the Chinese medical society in 2004 and the research result of the Wiznitzer et al in the United states in 2009, and comprises the following steps: firstly, the BMI before pregnancy is more than or equal to 25kg/m 2; ② the blood pressure is more than or equal to 140/90 mmHg; rising blood sugar, diagnosing as gestational diabetes; and fourth, Triglyceride (TG) is more than or equal to 3.23 mmol/L. Secondly, the prediction model of GMS is researched, for example, Nitzan and the like construct a prediction model of gestational diabetes by utilizing an Israel national database, and the pre-pregnancy blood glucose measurement can effectively screen high risk groups; jong et al established an early prediction model study of late onset PE using mid-pregnancy laboratory tests, and considered that early blood pressure, creatinine levels, etc. are important models thereof; tao et al more accurately predict fetal weight and screen for large/small gestational age infants by using the continuous weight change of pregnant women during pregnancy.
Previous studies have focused on the correlation between multiple metabolic risk factors and GMS, or on the study of predictive models of metabolic risk factors and metabolic disorders in one pregnancy. In addition, related researches are mostly tracked and found according to set experimental schemes, selected crowds and possible risk factors, pathogenic factors cannot be comprehensively, completely and effectively grasped, and the method has certain limitations. The diagnostic criteria for GMS also lack a grading of severity and more an effective model for early risk prediction and assessment, so GMS is often not diagnosed and discovered in time before birth, resulting in poor pregnancy outcome due to insufficient clinical attention paid to GMS and lack of intervention.
Therefore, a set of effective risk prediction and evaluation models needs to be established, and a proper machine learning model is selected and improved according to a medical clinical real scene; the method has the advantages that an integral and meaningful interpretable framework is developed, a learning model and medical interpretability are combined to be a key problem in the field, and machine learning algorithms based on big data often lack interpretability; and (4) discovering high-relevance risk factors based on the obtained risk assessment model, and exploring a proper high-risk classification standard. The method helps doctors to quickly and comprehensively evaluate the risk degree of the multiple metabolic abnormality aggregation, and provides accurate clinical decision support for early warning, early diagnosis, early intervention and early prevention of GMS.
Disclosure of Invention
The invention aims to provide a construction method of a model for predicting and evaluating the early risk of metabolic syndrome during pregnancy aiming at the defects of the prior art.
The technical scheme adopted by the invention is as follows:
a construction method of a model for predicting and evaluating the risk of early metabolic syndrome in gestation period comprises the following steps:
(1) obtaining multi-source heterogeneous data, and preprocessing the data to obtain metabolism related data;
(2) screening for poor pregnancy outcome with a high association of Gms;
(3) establishing a prediction model by adopting extreme gradient boosting (XGboost) and combining a Stacking frame, and inputting the prediction model as a prediction label according to the determined poor pregnancy outcome in the step (2);
(4) calculating the feature importance of each modeling factor in the prediction model based on the Shapley value;
(5) and (4) establishing a risk hierarchical model based on a clustering algorithm according to the characteristic importance of the modeling factor in the step (4) to obtain Gms risk grades.
In the above technical solution, further, the multi-source heterogeneous data includes outpatient medical records, laboratory tests, ultrasound image examinations, and medical record courses of hospitalization, and the preprocessing includes outlier rejection, missing value filling, and normalization.
Further, the metabolic-related data include: hemoglobin, hematocrit, platelets, neutrophils, lymphocytes, eosinophil ferritin, partial thromboplastin time, prothrombin time, fibrinogen, D-dimer, glucose, triglycerides, total cholesterol, high density lipoprotein cholesterol, low density lipoprotein cholesterol, APOA1, APOB, homocysteine, uric acid, alanine aminotransferase, aspartate aminotransferase, total protein, albumin, total bilirubin direct bilirubin creatinine, lactate dehydrogenase, blood amylase, total bile acid, glycocholic acid, free triiodothyronine, free thyroxine human thyroid stimulating hormone, total triiodothyronine, total thyroxine, thyroglobulin antibodies, antithyroid peroxidase antibodies.
Further, the poor pregnancy outcome screened in the step (2) is as follows:
gestational Hypertension (HDP): newly-developed hypertension after 20 weeks of gestation, BP is greater than or equal to 140/90 mmHg;
gestational Diabetes (GDM): women who receive OGTT during the 24 to 28 gestational period are diagnosed with gestational diabetes, based on IADPSG criteria;
premature delivery (PB): delivery within 37 weeks of gestation;
small for gestational age infant (SGA): birth weight is less than the estimated 10 th percentile for infant gender and gestational age; gestational age infant (SGA): birth weight is greater than the estimated 90 th percentile for infant gender and gestational age.
Furthermore, the prediction model is an integration that a Stacking framework is fused and applied to three different extreme gradient lifting meta-models, the first layer comprises three meta-models XGB1, XGB2 and XGB3, the second layer comprises a logistic regression LR model, predicted values of a training set and a test set are generated by using different XGB models, three groups of XGB meta-models generate 3 groups of predicted values of the training set and the test set, each group of XGB meta-models divides the training set into 5 parts, the training set is trained one by one according to the ratio of 4:1 to obtain the predicted values, the test set is tested, the 3 groups of XGB meta-models repeat the operation, finally, the constructed new training set and the test set are input into an LR model, and GMS risks are classified and output.
Further, in the step (5), feature importance is ranked based on the shape value, 10% of the maximum modeling factor contribution value is selected as a threshold, and features larger than the threshold are screened and used as input variables of the cluster.
The invention has the beneficial effects that:
by the method, the early prediction of the clinical GMS can be realized, the GMS related prediction index can be discovered as early as possible, the determination of high-risk groups is facilitated, scientific health intervention is applied, the vicious circle of metabolic abnormality between mothers and children can be interrupted, the risk is avoided to the period of newborn, the risk of offspring long-term metabolic diseases is further reduced, and the method has important significance for preventing and reducing the occurrence of the GMS. The invention can be used as an auxiliary diagnosis system for obstetrical outpatient service, fills the blank of early GMS prevention and treatment in China at present, aims at early discovery, early intervention and early treatment of GMS to reduce the morbidity and adverse consequences of GMS, guides graded diagnosis and hierarchical management based on a follow-up personalized population intervention scheme, and has important scientific significance and social value for promoting the population health of China.
Drawings
FIG. 1 is a schematic structural diagram of Gms prediction models constructed according to the present invention;
Detailed Description
The method discovers high-relevance risk factors based on the obtained risk assessment model, selects and improves a proper machine learning model aiming at a medical clinical real scene, and explores a proper high-risk grading standard, so that the learning model and the medical interpretability are combined. The invention is further illustrated below with reference to specific examples.
A construction method of a model for predicting and evaluating the risk of early metabolic syndrome in gestation period comprises the following steps:
(1) data acquisition: multi-source heterogeneous data such as outpatient medical records, laboratory tests, ultrasonic image examination, inpatient medical record course records and the like are obtained, and preprocessing such as outlier rejection, missing value filling, normalization and the like is performed on the data.
(2) Risk definition: screening the poor pregnancy outcome highly correlated with Gms based on the data obtained in (1), and establishing a regression model to calculate its decision coefficient to define the correlation degree of these pregnancy outcomes with Gms.
(3) And (3) risk analysis: and (3) inputting the most relevant poor pregnancy outcome in the step (2) as a prediction label into the prediction model in the step (3), and establishing Gms the prediction model by adopting an extreme gradient lifting tree and combining a Stacking framework.
(4) And (3) risk characterization: the feature importance of each modeling factor in (3) calculated based on the sharley value.
(5) Risk stratification: and (4) establishing a risk hierarchical model based on a clustering algorithm based on the modeling factor with the most characteristic importance in the step (4).
According to the risk assessment method designed by the embodiment of the invention, GMS assessment is realized in the early pregnancy, and early health intervention is carried out. Specifically, the method comprises the following steps:
(1) the multi-modal data collected in (1) mainly comprises demographic data, prenatal examination data, obstetric outpatient service data, ultrasonic imaging data and laboratory examination data;
demographic and prenatal survey data including age, date of birth, pregnancy, parity, height, prenatal weight, prenatal systolic and diastolic blood pressure, last menstruation, first tide, menstruation period, menstrual volume, dysmenorrhea, natural pregnancy, blood type, cultural degree, community, etc.;
the outpatient service for obstetrical examination refers to the relevant data of the pregnant woman in about 20 time periods of the gestational week, the related examination items mainly comprise weight, high abdominal circumference, blood pressure, fetal position and fetal heart and the like in the pregnancy period, and the outpatient service doctors record high risk factors such as high blood pressure, abnormal fetal position and the like.
The laboratory examination data relating to metabolism mainly include the following: hemoglobin, hematocrit, platelets, neutrophils, lymphocytes, eosinophil ferritin, partial thromboplastin time, prothrombin time, fibrinogen, D-dimer, glucose, triglycerides, total cholesterol, high density lipoprotein cholesterol, low density lipoprotein cholesterol, APOA1, APOB, homocysteine, uric acid, alanine aminotransferase, aspartate aminotransferase, total protein, albumin, total bilirubin direct bilirubin creatinine, lactate dehydrogenase, blood amylase, total bile acid, glycocholic acid, free triiodothyronine, free thyroxine human thyroid stimulating hormone, total triiodothyronine, total thyroxine, thyroglobulin antibodies, antithyroid peroxidase antibodies.
(2) Method for defining risks in pregnancy, defining bad pregnancy outcome highly correlated with GMS as label
Poor pregnancy outcome, highly correlated with GMS, was selected as a predictive label, and considering its utility, only poor pregnancy outcomes with incidence rates above 1% were taken temporarily as the candidate poor pregnancy set. The correlation of the metabolic factors such as BMI, blood sugar, blood pressure, triglyceride and the like can be calculated by taking the metabolic factors as independent variables and taking each pregnancy outcome as dependent variables, and the adverse pregnancy outcome determined by the method is determined by combining the existing literature and expert consultation:
1) gestational Hypertension (HDP): new onset hypertension 20 weeks after gestation, BP is not less than 140/90 mmHg.
2) Gestational Diabetes (GDM): women receiving the OGTT during the 24 to 28 th gestation period are diagnosed with gestational diabetes using IADPSG criteria (one or more fasting, 1 or 2 hour blood glucose concentration equal to or greater than a threshold of 5.1, 10.0, or 8.5 mmol/L).
3) Preterm Birth (PB): delivery within 37 weeks of gestation.
4) Small for gestational age infant (SGA): according to the previously published chinese data, birth weight is less than the estimated 10 th percentile for infant gender and gestational age.
5) Gestational age infant (SGA): according to the previously published chinese data, birth weight is greater than the 90 th percentile estimated for infant gender and gestational age.
(3) The method for analyzing the risks comprises the steps of constructing a GMS prediction model, and establishing the prediction model by combining extreme gradient boost (XGboost) with a Stacking framework, wherein the steps are as follows:
step 1: XGboost prediction model construction
XGboost is a distributed gradient enhancement algorithm, and is widely applied and paid attention to obstetrical disease auxiliary diagnosis. Firstly, regularizing a learning target to obtain an optimal solution of a target function; the model can be prevented from being over-fitted, and the objective function is shown as follows:
Figure BDA0003666657260000061
wherein i tableThe ith sample is shown, k is the kth tree, t is the time, l is the loss function, multi: softmax is selected when the multi-classification problem is predicted, and binomial logistic regression is selected as the loss function when the two-classification problem (whether the poor pregnancy outcome occurs) is predicted. y is i The label is output for the model and y is the true label.
Step 2: stacking framework construction
The invention aims to apply the Stacking framework fusion to the integration of a plurality of different XGB-models so as to improve the prediction precision. Assuming that the input is Vi, the first layer 3 models are XGB1, XGB2 and XGB3, and the second layer prediction model is LR (logical regression), the outputs of the first layer 3 meta-model are XGB1(Vi), XGB2(Vi) and XGB3 (Vi).
The method comprises the steps of generating predicted values of a training set and a testing set by using different XGB models, taking 3 groups of XGB models as an example, generating 3 groups of predicted values of the training set and the testing set by using 3 groups of XGB meta-model layers, dividing the training set into 5 parts of S1-S5 by each group of XGB models, training the training set one by one according to a ratio of 4:1 to obtain predicted values P1-P5, obtaining T1-T5 in the testing set, and repeating the operation of the 3 groups of XGB meta-models, wherein the structure diagram is shown in figure 1;
and constructing a new training set by using the predicted values of the training set, and constructing a new test set by using the predicted values of the test set, wherein the new test set is represented by the following formula:
Figure BDA0003666657260000062
Figure BDA0003666657260000063
and step 3: risk value calculation
And finally inputting the new training set and the new testing set into an LR model, and classifying and outputting GMS risks, wherein the output yi of the GMS risks is the final prediction result:
y i =LR(XGB 1 (V i ),XGB 2 (V i ),XGB 3 (V i )) (4)
and 4, step 4: model evaluation
Selecting common evaluation indexes of a machine learning method: the accuracy (accuracy), sensitivity (sensitivity), specificity (specificity) and ROC (rock characteristic) curves are used for evaluating the performance of each model so as to detect the difference between the prediction result and the real result of the model, evaluate the quality of the model and provide a basis for the selection of the model. The formula for the correlation metric is as follows:
Figure BDA0003666657260000071
Figure BDA0003666657260000072
Figure BDA0003666657260000073
(4) method for characterizing risks, feature importance distribution calculated based on Shapley values
The invention adopts a Shapley value method to evaluate the contribution degree of each modeling factor to the model prediction capability, solves the problem that contradictions are generated in the cooperation process of people in a plurality of offices due to benefit distribution, and belongs to the field of cooperative game. The sharley value takes into account the contributions made by the various agents, to fairly distribute the cooperative gain,
assuming that all the gestational characteristics x in (1) are aggregated into N, v represents a contribution function to modeling effect, and a Shapley value of modeling (N, v) distributes the total contribution v (N) of the model according to the following formula:
Figure BDA0003666657260000074
Figure BDA0003666657260000075
wherein each x can be combined into an arbitrary feature union S, i is represented as the ith member of S, x (v) represents the contribution value function of the feature in the model, w (S) represents the probability integration as 1, and v (S) represents the predicted performance contribution brought by the feature added into the model.
The shape value has the greatest advantage that the principle and the result are easily accepted by all the partners as fair results, and the shape value is an index for fairly and quantitatively evaluating the marginal contribution degree of the user and is used in a wide field.
Based on the feature importance distribution in the step (5), risk stratification is carried out by utilizing a clustering method
V shapley And representing shape values of all x, sequencing the shape values, selecting 10% of the maximum modeling factor contribution value as a threshold value, and screening the features larger than the threshold value as input variables of the cluster. Selection variable V select The description is as follows:
V select ={V|V shapley >0.1×max(V shapley )} (10)
the existing gms diagnostic standard contains four risk indicators, namely, each sample only has one of the following five conditions (0 risk, 1 risk, 2 risks, 3 risks and 4 risks), so that the clustering algorithm is intended to generate five target clusters.
And (4) converting the output in the step (3) into a quantized risk value through a y _ score function in a sklern frame, sequencing the quantized risk value, taking samples with 0%, 25%, 50%, 75% and 100% percentiles as 5 initial centroid vectors, and carrying out K-means clustering operation: inputting the clustering number of 5 to all pregnant women and V thereof select And outputting five target clusters meeting the minimum variance standard as a target, and evaluating the risk stratification effect by comparing the incidence of various adverse pregnancy outcomes in each cluster with the original diagnosis standard.
A risk assessment system is established based on the embodiment of the invention, and the GMS intelligent early warning is realized by embedding an electronic disease system.

Claims (6)

1. A construction method of a model for predicting and evaluating the early risk of metabolic syndrome in gestation period is characterized by comprising the following steps:
(1) obtaining multi-source heterogeneous data, and preprocessing the data to obtain metabolism related data;
(2) screening for poor pregnancy outcome with a high association of Gms;
(3) establishing a prediction model by combining extreme gradient boost (XGboost) with a Stacking frame, and inputting the prediction model as a prediction label according to the determined poor pregnancy outcome in the step (2);
(4) calculating the feature importance of each modeling factor in the prediction model based on the Shapley value;
(5) and (4) establishing a risk hierarchical model based on a clustering algorithm according to the characteristic importance of the modeling factor in the step (4) to obtain Gms risk grades.
2. The method for constructing the model for predicting and evaluating the early risk of metabolic syndrome during pregnancy according to claim 1, wherein the multi-source heterogeneous data comprises medical records of outpatient service, laboratory tests, ultrasonic image examination and medical records of hospitalization, and the preprocessing comprises outlier elimination, missing value filling and normalization.
3. The method for constructing the model for predicting and evaluating the risk of the early pregnancy metabolic syndrome according to claim 1, wherein the metabolic-related data comprises: hemoglobin, hematocrit, platelets, neutrophils, lymphocytes, eosinophil ferritin, partial thromboplastin time, prothrombin time, fibrinogen, D-dimer, glucose, triglycerides, total cholesterol, high density lipoprotein cholesterol, low density lipoprotein cholesterol, APOA1, APOB, homocysteine, uric acid, alanine aminotransferase, aspartate aminotransferase, total protein, albumin, total bilirubin direct bilirubin creatinine, lactate dehydrogenase, blood amylase, total bile acid, glycocholic acid, free triiodothyronine, free thyroxine human thyroid stimulating hormone, total triiodothyronine, total thyroxine, thyroglobulin antibodies, antithyroid peroxidase antibodies.
4. The method for constructing the model for predicting and evaluating the early risk of metabolic syndrome during pregnancy according to claim 1, wherein the poor outcome of pregnancy screened in the step (2) is:
gestational Hypertension (HDP): newly-developed hypertension after 20 weeks of gestation, BP is greater than or equal to 140/90 mmHg;
gestational Diabetes Mellitus (GDM): women who receive OGTT during the 24 to 28 gestational period are diagnosed with gestational diabetes, based on IADPSG criteria;
premature delivery (PB): delivery within 37 weeks of gestation;
small for gestational age infant (SGA): birth weight is less than the estimated 10 th percentile for infant gender and gestational age;
gestational age infant (SGA): birth weight is greater than the estimated 90 th percentile for infant gender and gestational age.
5. The method for constructing the early risk prediction and evaluation model of the metabolic syndrome during the gestation period according to claim 1, wherein the prediction model is formed by fusing a Stacking framework into three different extreme gradient lifting meta-models, the first layer comprises three meta-models XGB1, XGB2 and XGB3, the second layer comprises a logistic regression LR model, the prediction values of a training set and a testing set are generated by using different XGB models, three groups of XGB meta-models generate 3 groups of prediction values of the training set and the testing set, each group of XGB meta-model divides the training set into 5 parts, the training set is trained one by one according to a ratio of 4:1 to obtain the prediction values, the operation is repeated in the testing set, the 3 groups of XGB meta-models are input into the LR model, and GMS risk is classified and output.
6. The method for constructing the model for predicting and evaluating the early risk of metabolic syndrome during pregnancy according to claim 1, wherein in step (5), the importance of the features is ranked based on shapey values, 10% of the contribution value of the largest modeling factor is selected as a threshold, and features larger than the threshold are screened as input variables of clustering.
CN202210593499.6A 2022-05-27 2022-05-27 Construction method of early risk prediction and evaluation model of metabolic syndrome in gestational period Pending CN114974585A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210593499.6A CN114974585A (en) 2022-05-27 2022-05-27 Construction method of early risk prediction and evaluation model of metabolic syndrome in gestational period

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210593499.6A CN114974585A (en) 2022-05-27 2022-05-27 Construction method of early risk prediction and evaluation model of metabolic syndrome in gestational period

Publications (1)

Publication Number Publication Date
CN114974585A true CN114974585A (en) 2022-08-30

Family

ID=82957436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210593499.6A Pending CN114974585A (en) 2022-05-27 2022-05-27 Construction method of early risk prediction and evaluation model of metabolic syndrome in gestational period

Country Status (1)

Country Link
CN (1) CN114974585A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116013520A (en) * 2022-12-27 2023-04-25 上海市第一妇婴保健院 Method and device for predicting developmental coordination disorder and electronic equipment
CN116307742A (en) * 2023-05-19 2023-06-23 平安科技(深圳)有限公司 Risk identification method, device and equipment for subdivision guest group and storage medium
CN117219261A (en) * 2023-09-08 2023-12-12 广州中医药大学第一附属医院 Ectopic pregnancy probability prediction device based on vaginal flora and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116013520A (en) * 2022-12-27 2023-04-25 上海市第一妇婴保健院 Method and device for predicting developmental coordination disorder and electronic equipment
CN116013520B (en) * 2022-12-27 2023-11-17 上海市第一妇婴保健院 Method and device for predicting developmental coordination disorder and electronic equipment
CN116307742A (en) * 2023-05-19 2023-06-23 平安科技(深圳)有限公司 Risk identification method, device and equipment for subdivision guest group and storage medium
CN116307742B (en) * 2023-05-19 2023-08-22 平安科技(深圳)有限公司 Risk identification method, device and equipment for subdivision guest group and storage medium
CN117219261A (en) * 2023-09-08 2023-12-12 广州中医药大学第一附属医院 Ectopic pregnancy probability prediction device based on vaginal flora and electronic equipment

Similar Documents

Publication Publication Date Title
LaFreniere et al. Using machine learning to predict hypertension from a clinical dataset
CN114974585A (en) Construction method of early risk prediction and evaluation model of metabolic syndrome in gestational period
Davidson et al. Towards deep phenotyping pregnancy: a systematic review on artificial intelligence and machine learning methods to improve pregnancy outcomes
US6556977B1 (en) Methods for selecting, developing and improving diagnostic tests for pregnancy-related conditions
CN110827993A (en) Early death risk assessment model establishing method and device based on ensemble learning
Chakradar et al. A non-invasive approach to identify insulin resistance with triglycerides and HDL-c ratio using machine learning
CN107153774A (en) The disease forecasting system of the structure and application of chronic disease risk assessment the hyperbolic model model
CN111261282A (en) Sepsis early prediction method based on machine learning
CN110808097A (en) Gestational diabetes prediction system and method
CN114023441A (en) Severe AKI early risk assessment model and device based on interpretable machine learning model and development method thereof
CN113838577B (en) Convenient layered old people MODS early death risk assessment model, device and establishment method
CN110739076A (en) medical artificial intelligence public training platform
CN112786203A (en) Machine learning diabetic retinopathy morbidity risk prediction method and application
CN113128654B (en) Improved random forest model for coronary heart disease pre-diagnosis and pre-diagnosis system thereof
CN115099331A (en) Auxiliary diagnosis system for malignant pleural effusion based on interpretable machine learning algorithm
CN113593708A (en) Sepsis prognosis prediction method based on integrated learning algorithm
CN114943629A (en) Health management and health care service system and health management method thereof
CN114023440A (en) Model and device capable of explaining layered old people MODS early death risk assessment and establishing method thereof
Bhat et al. Analysis of diabetes mellitus using machine learning techniques
Singh et al. Detection of Cardio Vascular abnormalities using gradient descent optimization and CNN
Murthy et al. Comparative Analysis on Diabetes Dataset Using Machine Learning Algorithms
Liu et al. Visit-to-visit blood pressure variability and risk of adverse birth outcomes in pregnancies in East China
CN116564521A (en) Chronic disease risk assessment model establishment method, medium and system
CN115691788A (en) Dual attention coupling network diabetes classification system based on heterogeneous data
Du et al. Prediction of pregnancy diabetes based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination