WO2022198794A1 - 糖尿病的分型概率预测方法、装置、设备及存储介质 - Google Patents

糖尿病的分型概率预测方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022198794A1
WO2022198794A1 PCT/CN2021/097552 CN2021097552W WO2022198794A1 WO 2022198794 A1 WO2022198794 A1 WO 2022198794A1 CN 2021097552 W CN2021097552 W CN 2021097552W WO 2022198794 A1 WO2022198794 A1 WO 2022198794A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
type
trained
training
probability
Prior art date
Application number
PCT/CN2021/097552
Other languages
English (en)
French (fr)
Inventor
唐蕊
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022198794A1 publication Critical patent/WO2022198794A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of digital medical technology, and in particular, to a method, device, device and storage medium for probabilistic prediction of diabetes classification.
  • the types of diabetes include type 1 and type 2. Although the clinical manifestations of both types of diabetes are hyperglycemia, these two types of diabetes must use different treatment methods. Among them, type 1 diabetes is an absolute insulin deficiency. , so type 1 diabetes always requires insulin treatment, and type 2 diabetes can be treated with a variety of drugs because of insulin resistance. Therefore, it is necessary and important to classify patients diagnosed with diabetes mellitus type 1 and type 2. The inventor realizes that the prior art relies on doctors to collect clinical guidelines and personal practical experience to classify diabetes, which is not conducive to accurate diabetes classification.
  • the purpose is to solve the technical problem that the existing technology relies on doctors to collect clinical guidelines and personal practical experience to classify diabetes, which is not conducive to accurate diabetes classification.
  • the main purpose of the present application is to provide a method, device, equipment and storage medium for predicting the probability of diabetes classification, which aims to solve the problem that the prior art relies on doctors to collect clinical guidelines and personal practical experience to classify diabetes, which is not conducive to accurate classification of diabetes.
  • Technical aspects of diabetes classification are important to be performed.
  • the present application proposes a type probability prediction device for diabetes, the device comprising:
  • a data acquisition module used for acquiring the characteristic data of the physical state to be analyzed, the characteristic data of the basic information to be analyzed, and the characteristic data of inspection and examination to be analyzed;
  • a first prediction module configured to input the to-be-analyzed body state characteristic data into a first typing diagnosis model to perform probability prediction of type 1 diabetes, and obtain a first type 1 probability prediction result;
  • the second prediction module is used for inputting the basic information characteristic data to be analyzed into the second typing diagnosis model to carry out the probability prediction of type 1 diabetes, and obtain the second type 1 probability prediction result;
  • the third prediction module is used for inputting the to-be-analyzed inspection and inspection feature data into a third typing diagnosis model to perform probability prediction of type 1 diabetes, to obtain a third type 1 probability prediction result;
  • a type 1 probability determination module configured to determine the type 1 probability of diabetes according to the first type 1 probability prediction result, the second type 1 probability prediction result and the third type 1 probability prediction result, to obtain the target Patient-corresponding target type 1 probabilistic prediction results.
  • the present application also proposes a type probability prediction method for diabetes, the method comprising:
  • the present application also proposes a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the following method steps when executing the computer program:
  • the present application also proposes a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following method steps are implemented:
  • the first type 1 probability is obtained by first inputting the characteristic data of the body state to be analyzed into the first type diagnosis model to predict the probability of type 1 diabetes.
  • the prediction result input the basic information characteristic data to be analyzed into the second classification diagnosis model to predict the probability of type 1 diabetes, obtain the second type 1 probability prediction result, and input the inspection characteristic data to be analyzed into the third classification diagnosis
  • the model performs the probability prediction of type 1 diabetes, obtains the third type 1 probability prediction result, and then performs the type 1 probability of diabetes according to the first type 1 probability prediction result, the second type 1 probability prediction result and the third type 1 probability prediction result.
  • FIG. 1 is a schematic flowchart of a method for predicting the probability of typing of diabetes according to an embodiment of the application
  • FIG. 2 is a schematic block diagram of the structure of an apparatus for predicting the probability of diabetes classification according to an embodiment of the present application
  • FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
  • the present application proposes a diabetes classification probability prediction method.
  • the method applies In the field of digital medical technology, the method can also be applied to the field of probabilistic reasoning technology of artificial intelligence.
  • the method for predicting the probability of type 1 diabetes is first to predict the probability of type 1 diabetes through three different types of diabetes-related characteristics, and then to determine the probability of type 1 diabetes based on the results of the three probability predictions.
  • the method of fusing the prediction results improves the accuracy of the prediction and is helpful for assisting doctors in accurately classifying diabetes.
  • an embodiment of the present application provides a method for predicting the probability of typing of diabetes, the method comprising:
  • S1 Obtain the feature data of the physical state to be analyzed, the feature data of basic information to be analyzed, and the feature data of inspection and examination to be analyzed of the target patient;
  • S2 Inputting the to-be-analyzed body state characteristic data into a first typing diagnosis model to predict the probability of type 1 diabetes, to obtain a first type 1 probability prediction result;
  • S5 Determine the type 1 probability of diabetes according to the first type 1 probability prediction result, the second type 1 probability prediction result, and the third type 1 probability prediction result, and obtain the target type 1 corresponding to the target patient Probabilistic prediction results.
  • the first type 1 probability prediction result is obtained, and the basic information feature data to be analyzed is input into the second classification model.
  • Type 1 diagnosis model to predict the probability of type 1 diabetes obtain the second type 1 probability prediction result, input the inspection feature data to be analyzed into the third type diagnosis model to predict the probability of type 1 diabetes, and obtain the third type 1 Probability prediction result, then determine the type 1 probability of diabetes according to the first type 1 probability prediction result, the second type 1 probability prediction result and the third type 1 probability prediction result, and obtain the target type 1 probability prediction result corresponding to the target patient, thus It realizes the probability prediction of diabetes type 1 through three different types of diabetes-related characteristics, and then determines the probability of diabetes type 1 according to the three probability prediction results, and uses the prediction results of three different models to improve the method of fusion. The accuracy of prediction is improved, and it is helpful to assist doctors in accurate diabetes classification.
  • the target patient's body state characteristic data to be analyzed can be obtained from the database, the target patient's target patient's body state characteristic data to be analyzed input by the user can also be obtained, and the target patient's to-be-analyzed data sent by a third-party application system can also be obtained. physical state characteristic data.
  • the basic information characteristic data of the target patient to be analyzed can be obtained from the database, the basic information characteristic data of the target patient to be analyzed entered by the user can also be obtained, and the basic information to be analyzed of the target patient sent by the third-party application system can also be obtained. characteristic data.
  • the test and examination characteristic data of the target patient to be analyzed can be obtained from the database, the test and examination characteristic data of the target patient to be analyzed entered by the user can also be obtained, and the test and examination of the target patient to be analyzed sent by the third-party application system can also be obtained. characteristic data.
  • the body state feature data to be analyzed, the basic information feature data to be analyzed, and the inspection feature data to be analyzed are the feature data acquired by the target patient at the same time.
  • the target patients are those with diabetes.
  • the physical state characteristic data to be analyzed includes but is not limited to: age of onset data and body mass index data.
  • age of onset data refers to the age at which the target patient's diabetes begins to develop.
  • Body mass index data refers to the body mass index of the target patient.
  • the basic information characteristic data to be analyzed includes, but is not limited to: current age data, gender data, place of origin data and occupational data.
  • the current age data refers to the age of the target patient when the probability of diabetes classification is predicted.
  • Gender data referring to the gender of the target patient.
  • the origin data refers to the origin of the target patient.
  • Occupational data refers to the occupation of the target patient.
  • the inspection characteristic data to be analyzed includes, but is not limited to: blood test characteristic data, blood pressure measurement characteristic data, and waist circumference characteristic data.
  • the blood test characteristic data refers to the characteristic data obtained by the blood test of the target patient.
  • the blood test characteristic data includes but is not limited to: fasting blood glucose data, two-hour postprandial blood glucose data, glycosylated hemoglobin data, triglyceride data, cholesterol data .
  • the blood pressure measurement characteristic data refers to the data obtained by measuring the blood pressure of the target patient, and the blood pressure measurement characteristic data includes but is not limited to: diastolic blood pressure data and systolic blood pressure data.
  • the waist circumference feature data refers to the waist circumference of the target patient when predicting the probability of diabetes classification.
  • the inspection and inspection feature data to be analyzed adopts basic, common and easily obtained inspection and inspection feature data.
  • the basic, common, and easy-to-obtain test and inspection feature data there are “complex, uncommon, and unacquired test and inspection feature data", such as C-peptide (connecting peptide), gene measurement, etc., complex , Uncommon, unobtained test feature data were not used to input a third-class diagnostic model for probabilistic prediction of type 1 diabetes.
  • the physical state characteristic data to be analyzed is input into the first typing diagnosis model to predict the probability of diabetes type 1, and the predicted probability is used as the first type 1 probability prediction result, wherein the first typing diagnosis
  • the model is a model trained based on a logistic regression model.
  • the basic information characteristic data to be analyzed is input into the second classification diagnosis model to predict the probability of diabetes type 1, and the predicted probability is used as the second type 1 probability prediction result, wherein the second classification diagnosis
  • the model is a model obtained by training a multi-layer perceptron network.
  • the test feature data to be analyzed is input into the third type diagnosis model to predict the probability of diabetes type 1, and the predicted probability is used as the third type 1 probability prediction result, wherein the third type diagnosis
  • the model is a model trained based on the XGBoost (eXtreme Gradient Boosting) method.
  • XGBoost is an optimized distributed gradient boosting library that implements machine learning algorithms under the Gradient Boosting framework.
  • steps S2 to S4 may be performed synchronously, or may be partially performed synchronously, and may also be performed asynchronously, which is not limited herein.
  • the first type 1 probability prediction result, the second type 1 probability prediction result and the third type 1 probability prediction result are fused, and the probability obtained by fusion is used as the target type 1 corresponding to the target patient Probabilistic prediction results.
  • the target type 1 probability predicts the outcome, that is, the probability that the target patient has type 1 diabetes.
  • Target type 1 probabilistic prediction results are used to assist physicians in accurate diabetes typing.
  • steps S2 to S5 can also be used to predict the probability of the target patient suffering from type 2 diabetes, which is not limited herein.
  • the above-mentioned inputting the to-be-analyzed body state characteristic data into the first typing diagnosis model to perform the probability prediction of type 1 diabetes, and before obtaining the first type 1 probability prediction result further includes:
  • the first training samples of the plurality of first training samples include: body state feature sample data and first type 1 calibration data, wherein the body state feature sample data includes : sample data of age of onset and sample data of body mass index;
  • S24 Input the first type-1 probability training prediction value and the first type-1 calibration data of the target first training sample into a first loss function for calculation, to obtain the first model of the to-be-trained first model. loss value, updating the parameters of the first model to be trained according to the first loss value, and the updated first model to be trained is used to calculate the first type 1 probability training prediction value next time;
  • S25 Repeat the step of obtaining one of the first training samples from the plurality of first training samples as a target first training sample, until the first loss value reaches a first convergence condition or the to-be-trained The number of iterations of the first model reaches the second convergence condition, and the first loss value reaches the first convergence condition or the number of iterations of the first model to be trained of the first model to be trained reaches the The to-be-trained first model of the second convergence condition is determined to be the first typing diagnosis model.
  • a plurality of first training samples are used to train the first model to be trained obtained based on the logistic regression model, and the first model to be trained after training is used as the model obtained by the logistic regression model.
  • the sample data of the physical state characteristics of the sample include: the sample data of the age of onset and the sample data of the body mass index, so as to learn the relationship between the age of onset and the body mass index and type 1 diabetes.
  • a plurality of first training samples may be obtained from a database, a plurality of first training samples input by a user may also be obtained, and a plurality of first training samples sent by a third-party application system may also be obtained.
  • the body mass index sample data that is, the body mass index when the body state characteristic sample data of the training sample is determined.
  • Each first training sample includes a body state feature sample data and a first type 1 calibration data.
  • the first type 1 calibration data is an expert calibration method, and the calibration value of type 1 diabetes is performed according to the physical state characteristic sample data.
  • the first type 1 calibration data is 1, it means that the diabetic patient corresponding to the training sample corresponding to the first type 1 calibration data has type 1 diabetes, and when the first type 1 calibration data is 0, it means The diabetic patient corresponding to the training sample corresponding to the first type 1 calibration data has type 2 diabetes, which is not specifically limited by this example.
  • one of the first training samples is sequentially obtained from the plurality of first training samples, and the obtained first training sample is used as the target first training sample.
  • the physical state characteristic sample data of the target first training sample is input into the first model to be trained to predict the probability of diabetes type 1, and the predicted probability is used as the first type 1 probability training prediction value.
  • the method for inputting the first type 1 probability training prediction value and the first type 1 calibration data of the target first training sample into the first loss function for calculation can be determined from the prior art, and here I won't go into details.
  • the first loss function adopts a negative log-likelihood function.
  • steps S22 to S25 are repeatedly executed until the first loss value reaches the first convergence condition or the number of iterations of the first model to be trained reaches the second convergence condition.
  • the first convergence condition means that the magnitude of the first loss value calculated twice adjacently satisfies the Lipschitz condition (the Lipschitz continuity condition).
  • the number of iterations of the first model to be trained refers to the number of times that the first model to be trained is used to calculate the predicted value of the first type-1 probability training, that is, the number of iterations is increased by one for one calculation.
  • the method before the step of inputting the basic information characteristic data to be analyzed into the second typing diagnosis model to perform the probability prediction of type 1 diabetes, and obtaining the second type 1 probability prediction result, the method further includes:
  • S31 Acquire a plurality of second training samples, the second training samples of the plurality of second training samples include: basic information feature sample data and second type 1 calibration data, wherein the basic information feature sample data includes : Current age sample data, gender sample data, hometown sample data and occupational sample data;
  • S33 Input the basic information feature sample data of the target second training sample into the second model to be trained to predict the probability of type 1 diabetes, and obtain a second type 1 probability training prediction value, wherein the to-be-trained
  • the second model of is a model based on a multilayer perceptron network
  • S34 Input the second type 1 probability training prediction value and the second type 1 calibration data of the target second training sample into a second loss function for calculation, to obtain the second model of the second model to be trained. loss value, updating the parameters of the second model to be trained according to the second loss value, and the updated second model to be trained is used to calculate the second type 1 probability training prediction value next time;
  • S35 Repeat the step of obtaining one of the second training samples from the plurality of second training samples as a target second training sample, until the second loss value reaches a third convergence condition or the to-be-trained The number of iterations of the second model reaches the fourth convergence condition, and the second loss value reaches the third convergence condition or the number of iterations of the second model to be trained reaches the fourth convergence condition.
  • the second model is determined as the second typing diagnostic model.
  • a plurality of second training samples are used to train the second model to be trained based on the multilayer perceptron network, and the second model to be trained after the training is used as the model obtained by the logistic regression model.
  • the basic information and feature sample data of the second training sample include: current age sample data, gender sample data, origin sample data and occupational sample data, so as to learn the relationship between current age, gender, origin, occupation and type 1 diabetes.
  • multiple second training samples may be obtained from the database, multiple second training samples input by the user may also be obtained, and multiple second training samples sent by a third-party application system may also be obtained.
  • the current age sample data that is, the age of the patient when the current age sample data is determined.
  • Gender sample data that is, the gender of the patient.
  • Nationality sample data and occupational sample data that is, the patient's nationality.
  • Each second training sample includes a basic information feature sample data and a second type 1 calibration data.
  • the second type 1 calibration data is an expert calibration method, and the calibration value of type 1 diabetes is carried out according to the basic information characteristic sample data.
  • the second type 1 calibration data is 1, it means that the diabetic patient corresponding to the training sample corresponding to the second type 1 calibration data has type 1 diabetes, and when the second type 1 calibration data is 0, it means The diabetic patient corresponding to the training sample corresponding to the second type 1 calibration data has type 2 diabetes, which is not specifically limited by this example.
  • one second training sample is sequentially obtained from the plurality of second training samples, and the obtained second training sample is used as a target second training sample.
  • the method for inputting the second type 1 probability training prediction value and the second type 1 calibration data of the target second training sample into the second loss function for calculation can be selected from the prior art, here I won't go into details.
  • the second loss function can be a loss function for training a multi-layer perceptron network.
  • steps S32 to S35 are repeatedly performed until the second loss value reaches the third convergence condition or the number of iterations of the second model to be trained reaches the fourth convergence condition.
  • the third convergence condition means that the magnitude of the second loss value calculated twice adjacently satisfies the Lipschitz condition (the Lipschitz continuity condition).
  • the number of iterations of the second model to be trained refers to the number of times the second model to be trained is used to calculate the predicted value of the second type 1 probability training, that is, the number of iterations is increased by 1 for one calculation.
  • the method before the above-mentioned step of inputting the to-be-analyzed test inspection feature data into a third typing diagnosis model to perform probability prediction of type 1 diabetes, and obtaining a third type 1 probability prediction result, the method further includes:
  • S41 Acquire a plurality of third training samples, where the third training samples include: inspection feature sample data and third type 1 calibration data, wherein the inspection and inspection feature sample data includes: blood test feature sample data, blood pressure measurement Feature sample data, waist circumference feature sample data;
  • S42 Using a preset division rule, divide the plurality of third training samples to obtain a training set and a test set;
  • S43 Using the XGBoost method, establish an integrated learning classifier according to the test and inspection feature sample data, and use the integrated learning classifier as a third model to be trained, wherein the third model to be trained includes: a plurality of trees to be trained The relationship between the trained classification and regression tree and the classification and regression tree to be trained;
  • S44 Use the training set to train the third model to be trained, and adjust the inspection features and target thresholds in the nodes of the multiple classification regression trees to be trained in the third model to be trained , until the training end condition is reached, the third model to be trained at the end of the training is used as the third model to be tested;
  • the sample data includes: blood test feature sample data, blood pressure measurement feature sample data, and waist circumference feature sample data, so as to learn the relationship between blood test features, blood pressure measurement features, waist circumference features and type 1 diabetes.
  • multiple third training samples may be obtained from the database, multiple third training samples input by the user may also be obtained, and multiple third training samples sent by a third-party application system may also be obtained.
  • the blood test characteristic sample data includes but is not limited to: fasting blood glucose sample data, two-hour postprandial blood glucose sample data, glycosylated hemoglobin sample data, triglyceride sample data, and cholesterol sample data.
  • the blood pressure measurement characteristic sample data includes but is not limited to: diastolic blood pressure sample data and systolic blood pressure sample data.
  • the waist circumference characteristic sample data refers to the determination of the waist circumference of the patient when the characteristic sample data is examined.
  • Each third training sample includes a test check feature sample data and a third type 1 calibration data.
  • the third type 1 calibration data is an expert calibration method, and the calibration value of type 1 diabetes is carried out according to the inspection characteristic sample data.
  • the third type 1 calibration data is 1, it means that the diabetic patient corresponding to the training sample corresponding to the third type 1 calibration data has type 1 diabetes, and when the third type 1 calibration data is 0, it means The diabetic patient corresponding to the training sample corresponding to the third type 1 calibration data has type 2 diabetes, which is not specifically limited by this example.
  • the third training samples in the plurality of third training samples are divided into two sets, that is, the training set and the test set.
  • the preset division rule may be a preset division ratio. For example, it is divided into a training set and a test set in a ratio of 7:3, that is, 70% of the third training samples in the plurality of third training samples are divided into the training set, and the plurality of third training samples are divided into the training set. 30% of the third training samples are divided into the test set, which is not specifically limited in this example.
  • a third training sample is extracted from the training set as the target third training sample; the target third training sample is input into the third model to be trained for probability prediction of type 1 diabetes, and the predicted third training sample is The probability is used as the third type 1 probability training prediction value corresponding to the target third training sample; the third type 1 probability training prediction value corresponding to the target third training sample and the third type 1 calibration data pair are used
  • the third model to be trained is trained; when the training end condition is not reached, adjust the inspection features and target thresholds in the nodes of the multiple classification regression trees to be trained in the third model to be trained , and adjust the relationship between the classification and regression trees to be trained; when the training end condition is reached, the third model to be trained after the training ends is used as the third model to be tested.
  • the training end condition means that the third type-1 probability training prediction value corresponding to the target third training sample is consistent with the third type-1 calibration data.
  • a preset test standard is obtained, the third model to be tested is tested by using the test set and the preset test standard, and the model test result is determined to be successful if it meets the preset test standard. If the preset test criteria are met, the model test result is determined to be a failure.
  • step S46 when the model test result is a failure, it means that the third model to be tested does not meet the preset requirements. At this time, step S44 is repeatedly performed to retrain and test until the model test result is successful .
  • the third model to be tested whose model test result is successful is used as the third typing diagnosis model.
  • the above-mentioned steps of using the XGBoost method to establish an ensemble learning classifier according to the inspection and checking feature sample data, and using the ensemble learning classifier as the third model to be trained include:
  • S431 Obtain feature names from the inspection and inspection feature sample data, and obtain the inspection and inspection features corresponding to the inspection and inspection feature sample data;
  • S432 Using the XGBoost method, perform feature splitting according to the inspection and inspection features corresponding to the inspection and inspection feature sample data to generate a classification and regression tree, and obtain the ensemble learning classifier, where the ensemble learning classifier includes: the plurality of trees The relationship between the classification and regression tree to be trained and the classification and regression tree to be trained;
  • the classification and regression tree to be trained in the classification and regression trees to be trained includes a plurality of nodes, and the nodes are the inspection and inspection features corresponding to the inspection and inspection feature sample data and each of the inspection and inspection features.
  • the target threshold corresponding to each feature, and the relationship between the classification and regression trees to be trained is that the latter classification and regression tree to be trained fits the prediction residual of the previous classification and regression tree to be trained. Difference.
  • This embodiment adopts the XGBoost method to establish an integrated learning classifier according to the test feature sample data, so that the third model to be trained can automatically process the missing values of the test check, so that the third model to be trained has fast Efficient, fault-tolerant advantages.
  • a feature name is obtained from the inspection and inspection feature sample data, and the extracted feature name is used as the inspection and inspection feature corresponding to each of the inspection and inspection feature sample data.
  • the test feature sample data includes: blood test feature sample data, blood pressure measurement feature sample data, waist circumference feature sample data
  • the feature names obtained from the test feature sample data include: blood test feature, blood pressure measurement feature, Waist circumference features
  • blood test features include: fasting blood sugar, two-hour postprandial blood sugar, glycosylated hemoglobin, triglyceride, cholesterol
  • blood pressure measurement features include: diastolic blood pressure, systolic blood pressure
  • the test check feature sample data corresponding to the The inspection features include: fasting blood sugar, blood sugar two hours after a meal, glycosylated hemoglobin, triglyceride, diastolic blood pressure, systolic blood pressure, and waist circumference features, which are not specifically limited here.
  • the method of performing feature splitting to generate a classification and regression tree according to the inspection and inspection features corresponding to the inspection and inspection feature sample data can be determined from the prior art, which will not be repeated here.
  • the above-mentioned steps of using the test set to test the third model to be tested to obtain a model test result include:
  • S451 Respectively input the inspection feature sample data corresponding to each of the third training samples in the test set into the third model to be tested for calculation, to obtain each of the third training samples in the test set The corresponding predicted value of the Type 1 probability test for each sample;
  • S452 Perform an accuracy calculation according to the Type 1 probability test prediction value corresponding to each of the third training samples in the test set and the third Type 1 calibration data to obtain a target accuracy;
  • S454 Determine whether the target accuracy rate is greater than the accuracy rate threshold, and obtain an accuracy rate judgment result
  • the test set is used to test the third model to be tested, so that the qualified third model to be tested can be used as the third type diagnosis model, and the accuracy of the type 1 probability prediction of diabetes mellitus is improved. sex.
  • the third model to be tested is calculated to obtain the type 1 probability test prediction value corresponding to the third training sample to be predicted; repeat the process of obtaining one of the third training samples from the test set as the prediction value to be predicted. Step of the third training sample until the prediction value of type 1 probability test corresponding to each of the third training samples in the test set is determined.
  • the exact number of predictions corresponding to the model calculate the number of third training samples according to the test set to obtain the total number of samples corresponding to the test set; divide the exact number of predictions corresponding to the third model to be tested by the The total number of samples corresponding to the test set is used to obtain the target accuracy rate corresponding to the third model to be tested.
  • the acquisition accuracy threshold can be obtained from the database, the acquisition accuracy threshold input by the user can also be obtained, the acquisition accuracy threshold sent by the third-party application system can also be obtained, and the acquisition accuracy threshold can also be written to realize this application. in the program file.
  • the above-mentioned determination of type 1 probability of diabetes is performed according to the first type 1 probability prediction result, the second type 1 probability prediction result and the third type 1 probability prediction result, to obtain the target patient
  • the steps corresponding to the target type 1 probability prediction result include:
  • S52 Input the first type-1 probability prediction result, the second type-1 probability prediction result, and the third type-1 probability prediction result into the preset weighted sum model for weighted summation, to obtain the The target type 1 probability prediction result corresponding to the target patient.
  • the first type-1 probability prediction result, the second type-1 probability prediction result, and the third type-1 probability prediction result are fused by means of weighted summation, thereby improving the prediction accuracy. It is helpful to assist doctors in making accurate diabetes typing.
  • a preset weighted sum model may be obtained from a database, a preset weighted sum model input by a user may be obtained, or a preset weighted sum model sent by a third-party application system may be obtained.
  • the first type-1 probability prediction result, the second type-1 probability prediction result and the third type-1 probability prediction result are input into the preset weighted sum model for weighted summation, and the The preset weighted sum model output probability is used as the target type 1 probability prediction result corresponding to the target patient.
  • the present application also proposes a type probability prediction device for diabetes, which includes:
  • the data acquisition module 100 is used for acquiring the characteristic data of the physical state to be analyzed, the characteristic data of the basic information to be analyzed, and the characteristic data of inspection and examination to be analyzed;
  • the first prediction module 200 is configured to input the to-be-analyzed body state characteristic data into a first typing diagnosis model to perform probability prediction of type 1 diabetes, and obtain a first type 1 probability prediction result;
  • the second prediction module 300 is configured to input the basic information characteristic data to be analyzed into a second typing diagnosis model to perform probability prediction of type 1 diabetes, and obtain a second type 1 probability prediction result;
  • the third prediction module 400 is configured to input the to-be-analyzed test inspection feature data into a third typing diagnosis model to perform probability prediction of type 1 diabetes, and obtain a third type 1 probability prediction result;
  • the type 1 probability determination module 500 is configured to determine the type 1 probability of diabetes according to the first type 1 probability prediction result, the second type 1 probability prediction result and the third type 1 probability prediction result, to obtain the The target type 1 probability prediction result corresponding to the target patient.
  • the first type 1 probability prediction result is obtained, and the basic information feature data to be analyzed is input into the second classification model.
  • Type 1 diagnosis model to predict the probability of type 1 diabetes obtain the second type 1 probability prediction result, input the inspection feature data to be analyzed into the third type diagnosis model to predict the probability of type 1 diabetes, and obtain the third type 1 Probability prediction result, then determine the type 1 probability of diabetes according to the first type 1 probability prediction result, the second type 1 probability prediction result and the third type 1 probability prediction result, and obtain the target type 1 probability prediction result corresponding to the target patient, thus It realizes the probability prediction of diabetes type 1 through three different types of diabetes-related characteristics, and then determines the probability of diabetes type 1 according to the three probability prediction results, and uses the prediction results of three different models to improve the method of fusion. The accuracy of prediction is improved, and it is helpful to assist doctors in accurate diabetes classification.
  • the above-mentioned apparatus further includes: a first classification diagnosis model training module, the first classification diagnosis model training module includes: a first sample sub-acquisition module and a first training sub-module;
  • the first sample sub-acquisition module is used to obtain a plurality of first training samples, and the first training samples of the plurality of first training samples include: physical state feature sample data and first type 1 calibration data, wherein , the physical state characteristic sample data includes: age of onset sample data and body mass index sample data;
  • the first training sub-module is configured to obtain one of the first training samples from the plurality of first training samples as a target first training sample, and use the body state feature sample data of the target first training sample Input the first model to be trained to perform probability prediction of type 1 diabetes, and obtain the first type 1 probability training prediction value, wherein the first model to be trained is a model obtained based on a logistic regression model, and the first The predicted value of type 1 probability training and the first type 1 calibration data of the target first training sample are input into the first loss function for calculation, and the first loss value of the first model to be trained is obtained.
  • a loss value updates the parameters of the first model to be trained, and the updated first model to be trained is used for the next calculation of the first type 1 probability training prediction value, repeating the process from the The step of obtaining one of the first training samples from a plurality of first training samples as the target first training sample, until the first loss value reaches the first convergence condition or the number of iterations of the first model to be trained reaches the second Convergence condition, where the first loss value reaches the first convergence condition or the number of iterations of the first model to be trained of the first model to be trained reaches the second convergence condition for the to-be-trained
  • the first model is determined as the first typing diagnostic model.
  • the above device further includes: a second typing diagnosis model training module, the second typing diagnosis model training module includes: a second sample sub-acquisition module and a second training sub-module;
  • the second sample sub-acquisition module is used to obtain a plurality of second training samples, and the second training samples of the plurality of second training samples include: basic information feature sample data and second type 1 calibration data, wherein , the basic information characteristic sample data includes: age of onset sample data, gender sample data, place of origin sample data and occupational sample data;
  • the second training sub-module is configured to obtain one second training sample from the plurality of second training samples as a target second training sample, and use the basic information feature sample data of the target second training sample Input the second model to be trained to perform probability prediction of diabetes type 1, and obtain the second type 1 probability training prediction value, wherein the second model to be trained is a model obtained based on a multilayer perceptron network, and the The second type 1 probability training prediction value and the second type 1 calibration data of the target second training sample are input into the second loss function for calculation, and the second loss value of the second model to be trained is obtained.
  • the second loss value updates the parameters of the second model to be trained, and the updated second model to be trained is used to calculate the second type 1 probability training prediction value next time, and repeat the execution of the
  • the fourth convergence condition is to determine the second model to be trained that the second loss value reaches the third convergence condition or the number of iterations of the second model to be trained reaches the fourth convergence condition as The second typing diagnostic model.
  • the above-mentioned apparatus further includes: a third classification diagnosis model training module, the third classification diagnosis model training module includes: a third sample sub-acquisition module, and a third training sub-module;
  • the third sample sub-acquisition module is used to obtain a plurality of third training samples, the third training samples include: inspection and inspection feature sample data and third type 1 calibration data, wherein the inspection and inspection feature sample data includes : blood test feature sample data, blood pressure measurement feature sample data, waist circumference feature sample data;
  • the third training sub-module is used to divide the plurality of third training samples by using a preset division rule to obtain a training set and a test set, and use the XGBoost method to establish an integration according to the test and check feature sample data.
  • the model test result is a failure
  • the test check feature and target threshold in the node of until the training end condition is reached, the step of using the third model to be trained at the end of the training as the third model to be tested, until the model test result is successful, the The third model to be tested is used as the third typing diagnostic model.
  • the above-mentioned third training submodule includes: a third model determination unit to be trained;
  • the third model determination unit to be trained is configured to obtain the feature name from the inspection and inspection feature sample data, and obtain the inspection and inspection features corresponding to the inspection and inspection feature sample data respectively, using the XGBoost method, according to the The inspection and inspection features corresponding to each of the inspection and inspection feature sample data are subjected to feature splitting to generate a classification and regression tree, and the integrated learning classifier is obtained.
  • the integrated learning classifier includes: the plurality of classification and regression trees to be trained and the The relationship between classification and regression trees to be trained, using the ensemble learning classifier as the third model to be trained;
  • the classification and regression tree to be trained in the classification and regression trees to be trained includes a plurality of nodes, and the nodes are the inspection and inspection features corresponding to the inspection and inspection feature sample data and each of the inspection and inspection features.
  • the target threshold corresponding to each feature, and the relationship between the classification and regression trees to be trained is that the latter classification and regression tree to be trained fits the prediction residual of the previous classification and regression tree to be trained. Difference.
  • the above-mentioned third training submodule includes: a model training unit and a model verification unit;
  • the model training unit is configured to respectively input the inspection and inspection feature sample data corresponding to each of the third training samples in the test set into the third model to be inspected for calculation, and obtain the data in the test set.
  • Type 1 probability test prediction values corresponding to each of the third training samples
  • the model verification unit is configured to perform an accuracy calculation according to the type 1 probability test prediction value corresponding to each of the third training samples in the test set and the third type 1 calibration data to obtain a target accuracy rate , obtain the accuracy rate threshold, determine whether the target accuracy rate is greater than the accuracy rate threshold value, obtain the accuracy rate judgment result, when the accuracy rate judgment result is greater than, determine that the model test result is successful, otherwise determine the The model test result is failed.
  • the above-mentioned Type 1 probability determination module includes: a weighted summation submodule;
  • the weighted summation sub-module is used to obtain a preset weighted summation model, and input the first type 1 probability prediction result, the second type 1 probability prediction result and the third type 1 probability prediction result into all the models.
  • the preset weighted sum model is used to perform weighted summation to obtain the target type 1 probability prediction result corresponding to the target patient.
  • an embodiment of the present application further provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer design is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer device is used for storing data such as the method of predicting the probability of typing of diabetes.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program when executed by the processor, implements a method for predicting the probability of typing of diabetes.
  • the method for predicting the classification probability of diabetes includes: acquiring the characteristic data of the body state to be analyzed, the characteristic data of the basic information to be analyzed, and the characteristic data of inspection and examination to be analyzed; Inputting the first classification diagnosis model to predict the probability of type 1 diabetes, and obtaining the first type 1 probability prediction result; inputting the basic information characteristic data to be analyzed into the second classification diagnosis model to predict the probability of diabetes type 1 , obtain the second type 1 probability prediction result; input the test inspection characteristic data to be analyzed into the third type diagnosis model to carry out the probability prediction of diabetes type 1, and obtain the third type 1 probability prediction result; The type 1 probability prediction result, the second type 1 probability prediction result and the third type 1 probability prediction result are used to determine the type 1 probability of diabetes, and the target type 1 probability prediction result corresponding to the target patient is obtained.
  • the first type 1 probability prediction result is obtained, and the basic information feature data to be analyzed is input into the second classification model.
  • Type 1 diagnosis model to predict the probability of type 1 diabetes obtain the second type 1 probability prediction result, input the inspection feature data to be analyzed into the third type diagnosis model to predict the probability of type 1 diabetes, and obtain the third type 1 Probability prediction result, then determine the type 1 probability of diabetes according to the first type 1 probability prediction result, the second type 1 probability prediction result and the third type 1 probability prediction result, and obtain the target type 1 probability prediction result corresponding to the target patient, thus It realizes the probability prediction of diabetes type 1 through three different types of diabetes-related characteristics, and then determines the probability of diabetes type 1 according to the three probability prediction results, and uses the prediction results of three different models to improve the method of fusion. The accuracy of prediction is improved, and it is helpful to assist doctors in accurate diabetes classification.
  • An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements a method for predicting the probability of diabetes typing, including the steps of: acquiring a target patient's to-be-analyzed data Physical state feature data, basic information feature data to be analyzed, and inspection and inspection feature data to be analyzed; input the physical state feature data to be analyzed into the first typing diagnosis model to predict the probability of type 1 diabetes, and obtain the first Type 1 probability prediction result; input the basic information feature data to be analyzed into the second typing diagnosis model to perform probability prediction of diabetes type 1, and obtain the second type 1 probability prediction result; The data is input into the third type diagnosis model to predict the probability of type 1 diabetes, and the third type 1 probability prediction result is obtained; according to the first type 1 probability prediction result, the second type 1 probability prediction result and the first type 1 probability prediction result Three type 1 probability prediction results The type 1 probability of diabetes is determined, and the target type 1 probability prediction result corresponding to the target patient is obtained.
  • the above-mentioned method for predicting the probability of type 1 diabetes is performed by first inputting the characteristic data of the body state to be analyzed into the first type diagnosis model to predict the probability of type 1 diabetes, to obtain the first type 1 probability prediction result, and to analyze the type 1 probability prediction result.
  • the basic information and feature data are input into the second typing diagnosis model to predict the probability of type 1 diabetes, and the second type 1 probability prediction result is obtained.
  • Probability prediction obtain the third type 1 probability prediction result, and then determine the type 1 probability of diabetes according to the first type 1 probability prediction result, the second type 1 probability prediction result and the third type 1 probability prediction result, and obtain the corresponding probability of the target patient.
  • the target type 1 probability prediction result so as to realize the probability prediction of diabetes type 1 through three different types of diabetes-related characteristics, and then determine the type 1 probability of diabetes according to the three probability prediction results, using three different models.
  • the method of merging the prediction results of the method improves the accuracy of the prediction and is helpful to assist doctors in accurate diabetes classification.
  • the computer-readable storage medium may be non-volatile or volatile.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

本申请涉及数字医疗技术领域,揭示了一种糖尿病的分型概率预测方法、装置、设备及存储介质,其中装置通过将待分析的身体状态特征数据输入第一分型诊断模型进行1型的概率预测,得到第一1型概率预测结果;将待分析的基本信息特征数据输入第二分型诊断模型进行1型的概率预测,得到第二1型概率预测结果;将待分析的检验检查特征数据输入第三分型诊断模型进行1型的概率预测,得到第三1型概率预测结果;根据第一1型概率预测结果、第二1型概率预测结果和第三1型概率预测结果进行1型概率确定,得到目标患者对应的目标1型概率预测结果。采用三种不同模型的预测结果进行融合的方法提升了预测的准确性,有利于辅助医生进行准确的糖尿病分型。

Description

糖尿病的分型概率预测方法、装置、设备及存储介质
本申请要求于2021年03月26日提交中国专利局、申请号为202110328104.5,发明名称为“糖尿病的分型概率预测方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及到数字医疗技术领域,特别是涉及到一种糖尿病的分型概率预测方法、装置、设备及存储介质。
背景技术
糖尿病的类型包括1型和2型,虽然两种类型的糖尿病的临床都表现为高血糖,但是这两种类型的糖尿病必须使用不同的治疗方法,其中,1型糖尿病因为是绝对的胰岛素缺乏症,所以1型糖尿病始终需要胰岛素治疗,而2型糖尿病中因为存在着胰岛素抵抗,所以2型糖尿病可以使用多种药物来治疗。因此,对诊断为糖尿病的患者进行1型和2型的分型是必须的和重要的。发明人意识到现有技术依靠医生集合临床指南与个人实践经验进行糖尿病的分型,不利于进行准确的糖尿病分型。
技术问题
旨在解决现有技术依靠医生集合临床指南与个人实践经验进行糖尿病的分型,不利于进行准确的糖尿病分型的技术问题。
技术解决方案
本申请的主要目的为提供一种糖尿病的分型概率预测方法、装置、设备及存储介质,旨在解决现有技术依靠医生集合临床指南与个人实践经验进行糖尿病的分型,不利于进行准确的糖尿病分型的技术问题。
为了实现上述发明目的,本申请提出一种糖尿病的分型概率预测装置,所述装置包括:
数据获取模块,用于获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;
第一预测模块,用于将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;
第二预测模块,用于将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;
第三预测模块,用于将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;
1型概率确定模块,用于根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
本申请还提出了一种糖尿病的分型概率预测方法,所述方法包括:
获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;
将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;
将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;
将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;
根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
本申请还提出了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如下方法步骤:
获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;
将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;
将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;
将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;
根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
本申请还提出了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如下方法步骤:
获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;
将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;
将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;
将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;
根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
有益效果
本申请的糖尿病的分型概率预测方法、装置、设备及存储介质,通过首先将待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果,将待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果,将待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果,然后根据第一1型概率预测结果、第二1型概率预测结果和第三1型概率预测结果进行糖尿病的1型概率确定,得到目标患者对应的目标1型概率预测结果,从而实现了先通过三种不同类型的糖尿病相关特征分别进行糖尿病的1型的概率预测,然后根据三种概率预测结果进行糖尿病的1型概率确定,采用三种不同模型的预测结果进行融合的方法提升了预测的准确性,有利于辅助医生进行准确的糖尿病分型。
附图说明
图1为本申请一实施例的糖尿病的分型概率预测方法的流程示意图;
图2为本申请一实施例的糖尿病的分型概率预测装置的结构示意框图;
图3为本申请一实施例的计算机设备的结构示意框图。
本申请目的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
为了解决现有技术依靠医生集合临床指南与个人实践经验进行糖尿病的分型,不利于进行准确的糖尿病分型的技术问题,本申请提出了一种糖尿病的分型概率预测方法,所述方法应用于数字医疗技术领域,所述方法还可以应用于人工智能的概率推理技术领域。所述糖尿病的分型概率预测方法先通过三种不同类型的糖尿病相关特征分别进行糖尿病的1型的概率预测,然后根据三种概率预测结果进行糖尿病的1型概率确定,采用三种不同模型的预测结果进行融合的方法提升了预测的准确性,有利于辅助医生进行准确的糖尿病分型。
参照图1,本申请实施例中提供一种糖尿病的分型概率预测方法,所述方法包括:
S1:获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;
S2:将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;
S3:将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;
S4:将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;
S5:根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
本实施例通过首先将待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果,将待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果,将待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果,然后根据第一1型概率预测结果、第二1型概率预测结果和第三1型概率预测结果进行糖尿病的1型概率确定,得到目标患者对应的目标1型概率预测结果,从而实现了先通过三种不同类型的糖尿病相关特征分别进行糖尿病的1型的概率预测,然后根据三种概率预测结果进行糖尿病的1型概率确定,采用三种不同模型的预测结果进行融合的方法提升了预测的准确性,有利于辅助医生进行准确的糖尿病分型。
对于S1,可以从数据库中获取目标患者的待分析的身体状态特征数据,也可以获取用户输入的目标患者的待分析的身体状态特征数据,还可以获取第三方应用***发送的目标患者的待分析的身体状态特征数据。
可以从数据库中获取目标患者的待分析的基本信息特征数据,也可以获取用户输入的目标患者的待分析的基本信息特征数据,还可以获取第三方应用***发送的目标患者的待分析的基本信息特征数据。
可以从数据库中获取目标患者的待分析的检验检查特征数据,也可以获取用户输入的目标患者的待分析的检验检查特征数据,还可以获取第三方应用***发 送的目标患者的待分析的检验检查特征数据。
可以理解的是,待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据是目标患者同一时间获取的特征数据。
目标患者,是患有糖尿病的患者。
待分析的身体状态特征数据包括但不限于:发病年龄数据和身体质量指数数据。发病年龄数据,是指目标患者的糖尿病开始发病的年龄。身体质量指数数据,是指目标患者的身体质量指数。
待分析的基本信息特征数据包括但不限于:当前年龄数据、性别数据、籍贯数据和职业数据。当前年龄数据,是指在进行糖尿病的分型概率预测时目标患者的年龄。性别数据,是指目标患者的性别。籍贯数据,是指目标患者的籍贯。职业数据,是指目标患者的职业。
待分析的检验检查特征数据包括但不限于:血液化验特征数据、血压测量特征数据、腰围特征数据。血液化验特征数据,是指对目标患者进行抽血化验得到的特征数据,血液化验特征数据包括但不限于:空腹血糖数据、餐后两小时血糖数据、糖化血红蛋白数据、甘油三脂数据、胆固醇数据。血压测量特征数据,是指对目标患者进行血压测量得到的数据,血压测量特征数据包括但不限于:舒张压数据、收缩压数据。腰围特征数据,是指在进行糖尿病的分型概率预测时目标患者的腰围。可以理解的是,待分析的检验检查特征数据采用的是基础的、常见的、容易获取的检验检查特征数据。对应基础的、常见的、容易获取的检验检查特征数据”,相反的有“复杂的、不常见的、不已获得的检验检查特征数据”,比如C肽(连接肽)、基因测量等,复杂的、不常见的、不已获得的检验检查特征数据不用于输入第三分型诊断模型进行糖尿病的1型的概率预测。
对于S2,将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,将预测得到的概率作为第一1型概率预测结果,其中,第一分型诊断模型是基于逻辑回归模型训练得到的模型。
对于S3,将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,将预测得到的概率作为第二1型概率预测结果,其中,第二分型诊断模型是基于多层感知机网络训练得到的模型。
对于S4,将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,将预测得到的概率作为第三1型概率预测结果,其中,第三分型诊断模型是基于XGBoost(eXtreme Gradient Boosting)方法训练得到的模型。XGBoost,是优化的分布式梯度增强库,在Gradient Boosting框架下实现机器学习算法。
可以理解的是,步骤S2至步骤S4可以全部同步执行,也可以部分同步执行,还可以异步执行,在此不做限定。
对于S5,所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行融合,将融合得到的概率作为所述目标患者对应的目标1型概率预测结果。
目标1型概率预测结果,也就是所述目标患者患有1型糖尿病的概率。目标1型概率预测结果用于辅助医生进行准确的糖尿病分型。
可以理解的是,也可以采用步骤S2至步骤S5预测所述目标患者患有2型糖尿病的概率,在此不做限定。
在一个实施例中,上述将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果之前,还包括:
S21:获取多个第一训练样本,所述多个第一训练样本的所述第一训练样本包括:身体状态特征样本数据、第一1型标定数据,其中,所述身体状态特征样本数据包括:发病年龄样本数据和身体质量指数样本数据;
S22:从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本;
S23:将所述目标第一训练样本的所述身体状态特征样本数据输入待训练的第一模型进行糖尿病的1型的概率预测,得到第一1型概率训练预测值,其中,所述待训练的第一模型是基于逻辑回归模型得到的模型;
S24:将所述第一1型概率训练预测值、所述目标第一训练样本的所述第一1型标定数据输入第一损失函数进行计算,得到所述待训练的第一模型的第一损失值,根据所述第一损失值更新所述待训练的第一模型的参数,更新后的所述待训练的第一模型被用于下一次计算所述第一1型概率训练预测值;
S25:重复执行所述从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本的步骤,直至所述第一损失值达到第一收敛条件或所述待训练的第一模型的迭代次数达到第二收敛条件,将所述第一损失值达到所述第一收敛条件或所述待训练的第一模型的所述待训练的第一模型的迭代次数达到所述第二收敛条件的所述待训练的第一模型,确定为所述第一分型诊断模型。
本实施例实现了采用多个第一训练样本,对基于逻辑回归模型得到的待训练的第一模型进行训练,将训练结束的待训练的第一模型作为逻辑回归模型得到的模型,第一训练样本的身体状态特征样本数据包括:发病年龄样本数据和身体质量指数样本数据,从而学习了发病年龄和身体质量指数与糖尿病的1型之间的关系。
对于S21,可以从数据库中获取多个第一训练样本,也可以获取用户输入的多个第一训练样本,还可以获取第三方应用***发送的多个第一训练样本。
发病年龄样本数据,也就是糖尿病开始发病的年龄。
身体质量指数样本数据,也就是确定训练样本的身体状态特征样本数据时的身体质量指数。
每个第一训练样本包括一个身体状态特征样本数据和一个第一1型标定数据。
在同一个第一训练样本中,第一1型标定数据是采用专家标定法,根据身体状态特征样本数据进行1型糖尿病的标定值。比如,当第一1型标定数据为1时,则意味着第一1型标定数据对应的训练样本对应的糖尿病患者患有1型糖尿病,当第一1型标定数据为0时,则意味着第一1型标定数据对应的训练样本对应的糖尿病患者患有2型糖尿病,在此举例不做具体限定。
对于S22,依次从所述多个第一训练样本获取一个所述第一训练样本,将获取的所述第一训练样本作为目标第一训练样本。
对于S23,将所述目标第一训练样本的所述身体状态特征样本数据输入待训练的第一模型进行糖尿病的1型的概率预测,将预测得到的概率作为第一1型概率训练预测值。
对于S24,将所述第一1型概率训练预测值、所述目标第一训练样本的所述第一1型标定数据输入第一损失函数进行计算的方法可以从现有技术中确定,在此不做赘述。
可选的,所述第一损失函数采用负对数似然函数。
对于S25,重复执行步骤S22至步骤S25,直至所述第一损失值达到第一收 敛条件或所述待训练的第一模型的迭代次数达到第二收敛条件。
所述第一收敛条件是指相邻两次计算的所述第一损失值的大小满足lipschitz条件(利普希茨连续条件)。
所述待训练的第一模型的迭代次数是指所述待训练的第一模型被用于计算所述第一1型概率训练预测值的次数,也就是说,计算一次,迭代次数增加1。
在一个实施例中,上述将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果的步骤之前,还包括:
S31:获取多个第二训练样本,所述多个第二训练样本的所述第二训练样本包括:基本信息特征样本数据、第二1型标定数据,其中,所述基本信息特征样本数据包括:当前年龄样本数据、性别样本数据、籍贯样本数据和职业样本数据;
S32:从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本;
S33:将所述目标第二训练样本的所述基本信息特征样本数据输入待训练的第二模型进行糖尿病的1型的概率预测,得到第二1型概率训练预测值,其中,所述待训练的第二模型是基于多层感知机网络得到的模型;
S34:将所述第二1型概率训练预测值、所述目标第二训练样本的所述第二1型标定数据输入第二损失函数进行计算,得到所述待训练的第二模型的第二损失值,根据所述第二损失值更新所述待训练的第二模型的参数,更新后的所述待训练的第二模型被用于下一次计算所述第二1型概率训练预测值;
S35:重复执行所述从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本的步骤,直至所述第二损失值达到第三收敛条件或所述待训练的第二模型的迭代次数达到第四收敛条件,将所述第二损失值达到所述第三收敛条件或所述待训练的第二模型的迭代次数达到所述第四收敛条件的所述待训练的第二模型,确定为所述第二分型诊断模型。
本实施例实现了采用多个第二训练样本,对基于多层感知机网络得到的待训练的第二模型进行训练,将训练结束的待训练的第二模型作为逻辑回归模型得到的模型,第二训练样本的基本信息特征样本数据包括:当前年龄样本数据、性别样本数据、籍贯样本数据和职业样本数据,从而学习了当前年龄、性别、籍贯、职业与糖尿病的1型之间的关系。
对于S31,可以从数据库中获取多个第二训练样本,也可以获取用户输入的多个第二训练样本,还可以获取第三方应用***发送的多个第二训练样本。
当前年龄样本数据,也就是确定当前年龄样本数据时患者的年龄。
性别样本数据,也就是患者的性别。
籍贯样本数据和职业样本数据,也就是患者的籍贯。
每个第二训练样本包括一个基本信息特征样本数据和一个第二1型标定数据。
在同一个第二训练样本中,第二1型标定数据是采用专家标定法,根据基本信息特征样本数据进行1型糖尿病的标定值。比如,当第二1型标定数据为1时,则意味着第二1型标定数据对应的训练样本对应的糖尿病患者患有1型糖尿病,当第二1型标定数据为0时,则意味着第二1型标定数据对应的训练样本对应的糖尿病患者患有2型糖尿病,在此举例不做具体限定。
对于S32,依次从所述多个第二训练样本获取一个所述第二训练样本,将获取的所述第二训练样本作为目标第二训练样本。
对于S33,将所述目标第二训练样本的所述基本信息特征样本数据输入待训练的第二模型进行糖尿病的1型的概率预测,将预测得到的概率作为第二1型概率训练预测值。
对于S34,将所述第二1型概率训练预测值、所述目标第二训练样本的所述第二1型标定数据输入第二损失函数进行计算的方法可以从现有技术中选择,在此不做赘述。
可以理解的是,第二损失函数可以采用训练多层感知机网络的损失函数。
对于S35,重复执行步骤S32至步骤S35,直至所述第二损失值达到第三收敛条件或所述待训练的第二模型的迭代次数达到第四收敛条件。
所述第三收敛条件是指相邻两次计算的所述第二损失值的大小满足lipschitz条件(利普希茨连续条件)。
所述待训练的第二模型的迭代次数是指所述待训练的第二模型被用于计算所述第二1型概率训练预测值的次数,也就是说,计算一次,迭代次数增加1。
在一个实施例中,上述将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果的步骤之前,还包括:
S41:获取多个第三训练样本,所述第三训练样本包括:检验检查特征样本数据、第三1型标定数据,其中,所述检验检查特征样本数据包括:血液化验特征样本数据、血压测量特征样本数据、腰围特征样本数据;
S42:采用预设的划分规则,将所述多个第三训练样本进行划分,得到训练集合和测试集合;
S43:采用XGBoost方法,根据所述检验检查特征样本数据建立集成学习分类器,将所述集成学习分类器作为待训练的第三模型,其中,所述待训练的第三模型包括:多棵待训练的分类回归树和待训练的分类回归树之间的关系;
S44:采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结束的所述待训练的第三模型作为待检验的第三模型;
S45:采用所述测试集合对所述待检验的第三模型进行测试,得到模型测试结果;
S46:当所述模型测试结果为失败时,重复执行所述采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结束的所述待训练的第三模型作为待检验的第三模型的步骤,直至所述模型测试结果为成功;
S47:将所述待检验的第三模型作为所述第三分型诊断模型。
本实施例实现了采用XGBoost方法建立集成学习分类器得到待训练的第三模型,对待训练的第三模型进行训练和测试,测试合格作为第三分型诊断模型,第三训练样本的检验检查特征样本数据包括:血液化验特征样本数据、血压测量特征样本数据、腰围特征样本数据,从而学习了血液化验特征、血压测量特征、腰围特征与糖尿病的1型之间的关系。
对于S41,可以从数据库中获取多个第三训练样本,也可以获取用户输入的多个第三训练样本,还可以获取第三方应用***发送的多个第三训练样本。
血液化验特征样本数据包括但不限于:空腹血糖样本数据、餐后两小时血糖 样本数据、糖化血红蛋白样本数据、甘油三脂样本数据、胆固醇样本数据。
血压测量特征样本数据包括但不限于:舒张压样本数据、收缩压样本数据。
腰围特征样本数据,是指确定检验检查特征样本数据时患者的腰围。
每个第三训练样本包括一个检验检查特征样本数据和一个第三1型标定数据。
在同一个第三训练样本中,第三1型标定数据是采用专家标定法,根据检验检查特征样本数据进行1型糖尿病的标定值。比如,当第三1型标定数据为1时,则意味着第三1型标定数据对应的训练样本对应的糖尿病患者患有1型糖尿病,当第三1型标定数据为0时,则意味着第三1型标定数据对应的训练样本对应的糖尿病患者患有2型糖尿病,在此举例不做具体限定。
对于S42,采用预设的划分规则,将所述多个第三训练样本中的第三训练样本划分到两个集合,两个集合也就是训练集合和测试集合。
预设的划分规则可以是预设的划分比例。比如,按7:3的比例划分为训练集合和测试集合,也就是说,所述多个第三训练样本中70%数量的第三训练样本划分到训练集合,所述多个第三训练样本中30%数量的第三训练样本划分到测试集合,在此举例不做具体限定。
对于S43,从所述检验检查特征样本数据中确定具体包括哪些检验检查特征,然后根据确定的所有检验检查特征采用XGBoost方法进行特征***生成多棵分类回归树(CART回归树),将生成的每棵分类回归树作为一棵待训练的分类回归树;其中,在进行特征***生成多棵分类回归树的过程中,确定待训练的分类回归树之间的关系。
对于S44,从所述训练集合中提取出一个第三训练样本作为目标第三训练样本;将目标第三训练样本输入所述待训练的第三模型进行糖尿病的1型的概率预测,将预测的概率作为所述目标第三训练样本对应的第三1型概率训练预测值;采用所述目标第三训练样本对应的所述第三1型概率训练预测值和所述第三1型标定数据对所述待训练的第三模型进行训练;当没有达到训练结束条件时,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,以及调整待训练的分类回归树之间的关系;当达到训练结束条件时,将训练结束的所述待训练的第三模型作为待检验的第三模型。
可选的,训练结束条件是指所述目标第三训练样本对应的所述第三1型概率训练预测值和所述第三1型标定数据相符。
对于S45,获取预设的测试标准,采用所述测试集合和所述预设的测试标准对所述待检验的第三模型进行测试,符合预设的测试标准则确定模型测试结果为成功,不符合预设的测试标准则确定模型测试结果为失败。
对于S46,当所述模型测试结果为失败时,意味着所述待检验的第三模型还不符合预设要求,此时重复执行步骤S44进行重新训练和测试,直至所述模型测试结果为成功。
对于S47,将所述模型测试结果为成功的所述待检验的第三模型作为所述第三分型诊断模型。
在一个实施例中,上述采用XGBoost方法,根据所述检验检查特征样本数据建立集成学习分类器,将所述集成学习分类器作为待训练的第三模型的步骤,包括:
S431:从所述检验检查特征样本数据中获取特征名称,得到所述检验检查特征样本数据各自对应的所述检验检查特征;
S432:采用XGBoost方法,根据所述检验检查特征样本数据各自对应的所述检验检查特征进行特征***生成分类回归树,得到所述集成学习分类器,所述集成学习分类器包括:所述多棵待训练的分类回归树和所述待训练的分类回归树之间的关系;
S433:将所述集成学习分类器作为所述待训练的第三模型;
其中,所述多棵待训练的分类回归树中的待训练的分类回归树包括多个节点,所述节点为所述检验检查特征样本数据各自对应的所述检验检查特征及各个所述检验检查特征各自对应的所述目标阈值,所述待训练的分类回归树之间的关系为后一棵所述待训练的分类回归树拟合了前一棵所述待训练的分类回归树的预测残差。
本实施例采用XGBoost方法,根据所述检验检查特征样本数据建立集成学习分类器,从而使待训练的第三模型能够对检验检查的缺失值自动进行处理,从而使待训练的第三模型具有快速高效、可容错的优点。
对于S431,从所述检验检查特征样本数据中获取特征名称,将提取得到的特征名称作为所述检验检查特征样本数据各自对应的所述检验检查特征。
比如,所述检验检查特征样本数据包括:血液化验特征样本数据、血压测量特征样本数据、腰围特征样本数据,从所述检验检查特征样本数据中获取特征名称包括:血液化验特征、血压测量特征、腰围特征,血液化验特征包括:空腹血糖、餐后两小时血糖、糖化血红蛋白、甘油三脂、胆固醇,血压测量特征包括:舒张压、收缩压,则所述检验检查特征样本数据各自对应的所述检验检查特征包括:空腹血糖、餐后两小时血糖、糖化血红蛋白、甘油三脂、舒张压、收缩压、腰围特征,在此举例不做具体限定。
对于S432,采用XGBoost方法,根据所述检验检查特征样本数据各自对应的所述检验检查特征进行特征***生成分类回归树的方法可以从现有技术确定,在此不做赘述。
在一个实施例中,上述采用所述测试集合对所述待检验的第三模型进行测试,得到模型测试结果的步骤,包括:
S451:分别将所述测试集合中每个所述第三训练样本对应的所述检验检查特征样本数据输入所述待检验的第三模型进行计算,得到所述测试集合中各个所述第三训练样本各自对应的1型概率测试预测值;
S452:根据所述测试集合中各个所述第三训练样本各自对应的所述1型概率测试预测值和所述第三1型标定数据进行准确率计算,得到目标准确率;
S453:获取准确率阈值;
S454:判断所述目标准确率是否大于所述准确率阈值,得到准确率判断结果;
S455:当所述准确率判断结果为大于时,确定所述模型测试结果为成功,否则确定所述模型测试结果为失败。
本实施例采用所述测试集合对所述待检验的第三模型进行测试,从而有利于将合格的待检验的第三模型作为第三分型诊断模型,提高了糖尿病的1型概率预测的准确性。
对于S451,从所述测试集合中获取一个所述第三训练样本作为待预测的第三训练样本;将所述待预测的第三训练样本对应的所述待预测的检验检查特征样本数据输入所述待检验的第三模型进行计算,得到所述待预测的第三训练样本对应的1型概率测试预测值;重复执行所述从所述测试集合中获取一个所述第三训练样本作为待预测的第三训练样本的步骤,直至确定所述测试集合中各个所述第 三训练样本各自对应的1型概率测试预测值。
对于S452,根据所述测试集合中各个所述第三训练样本各自对应的所述1型概率测试预测值和所述第三1型标定数据进行预测准确数量计算,得到所述待检验的第三模型对应的预测准确数量;根据所述测试集合进行第三训练样本的数量计算,得到所述测试集合对应的样本总数量;将所述待检验的第三模型对应的预测准确数量除以所述测试集合对应的样本总数量,得到所述待检验的第三模型对应的目标准确率。
对于S453,可以从数据库中获取获取准确率阈值,也可以获取用户输入的获取准确率阈值,还可以获取第三方应用***发送的获取准确率阈值,还可以将获取准确率阈值写入实现本申请的程序文件中。
对于S455,当所述准确率判断结果为大于时,意味着所述待检验的第三模型符合要求,此时确定所述模型测试结果为成功;当所述准确率判断结果为小于或等于时,意味着所述待检验的第三模型不符合要求,此时确定所述模型测试结果为失败。
在一个实施例中,上述根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果的步骤,包括:
S51:获取预设的加权求和模型;
S52:将所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果输入所述预设的加权求和模型进行加权求和,得到所述目标患者对应的所述目标1型概率预测结果。
本实施例采用加权求和的方式将所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行融合,从而提升了预测的准确性,有利于辅助医生进行准确的糖尿病分型。
对于S51,可以从数据库中获取预设的加权求和模型,也可以获取用户输入的获取预设的加权求和模型,还可以获取第三方应用***发送的获取预设的加权求和模型中。
所述预设的加权求和模型表述为y=ax 1+bx 2+cx 3+d,其中,x 1是所述第一1型概率预测结果,x 2是所述第二1型概率预测结果,x 3是所述第三1型概率预测结果,d是偏差常量,a、b、c和d是通过模型训练得到的参数。
对于S52,将所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果输入所述预设的加权求和模型进行加权求和,将所述预设的加权求和模型输出概率作为所述目标患者对应的所述目标1型概率预测结果。
参照图2,本申请还提出了一种糖尿病的分型概率预测装置,所述装置包括:
数据获取模块100,用于获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;
第一预测模块200,用于将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;
第二预测模块300,用于将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;
第三预测模块400,用于将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;
1型概率确定模块500,用于根据所述第一1型概率预测结果、所述第二1型概 率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
本实施例通过首先将待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果,将待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果,将待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果,然后根据第一1型概率预测结果、第二1型概率预测结果和第三1型概率预测结果进行糖尿病的1型概率确定,得到目标患者对应的目标1型概率预测结果,从而实现了先通过三种不同类型的糖尿病相关特征分别进行糖尿病的1型的概率预测,然后根据三种概率预测结果进行糖尿病的1型概率确定,采用三种不同模型的预测结果进行融合的方法提升了预测的准确性,有利于辅助医生进行准确的糖尿病分型。
在一个实施例中,上述装置还包括:第一分型诊断模型训练模块,所述第一分型诊断模型训练模块包括:第一样本子获取模块,第一训练子模块;
所述第一样本子获取模块,用于获取多个第一训练样本,所述多个第一训练样本的所述第一训练样本包括:身体状态特征样本数据、第一1型标定数据,其中,所述身体状态特征样本数据包括:发病年龄样本数据和身体质量指数样本数据;
所述第一训练子模块,用于从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本,将所述目标第一训练样本的所述身体状态特征样本数据输入待训练的第一模型进行糖尿病的1型的概率预测,得到第一1型概率训练预测值,其中,所述待训练的第一模型是基于逻辑回归模型得到的模型,将所述第一1型概率训练预测值、所述目标第一训练样本的所述第一1型标定数据输入第一损失函数进行计算,得到所述待训练的第一模型的第一损失值,根据所述第一损失值更新所述待训练的第一模型的参数,更新后的所述待训练的第一模型被用于下一次计算所述第一1型概率训练预测值,重复执行所述从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本的步骤,直至所述第一损失值达到第一收敛条件或所述待训练的第一模型的迭代次数达到第二收敛条件,将所述第一损失值达到所述第一收敛条件或所述待训练的第一模型的所述待训练的第一模型的迭代次数达到所述第二收敛条件的所述待训练的第一模型,确定为所述第一分型诊断模型。
在一个实施例中,上述装置还包括:第二分型诊断模型训练模块,所述第二分型诊断模型训练模块包括:第二样本子获取模块,第二训练子模块;
所述第二样本子获取模块,用于获取多个第二训练样本,所述多个第二训练样本的所述第二训练样本包括:基本信息特征样本数据、第二1型标定数据,其中,所述基本信息特征样本数据包括:发病年龄样本数据、性别样本数据、籍贯样本数据和职业样本数据;
所述第二训练子模块,用于从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本,将所述目标第二训练样本的所述基本信息特征样本数据输入待训练的第二模型进行糖尿病的1型的概率预测,得到第二1型概率训练预测值,其中,所述待训练的第二模型是基于多层感知机网络得到的模型,将所述第二1型概率训练预测值、所述目标第二训练样本的所述第二1型标定数据输入第二损失函数进行计算,得到所述待训练的第二模型的第二损失值,根据所述第二损失值更新所述待训练的第二模型的参数,更新后的所述待训练的第二模 型被用于下一次计算所述第二1型概率训练预测值,重复执行所述从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本的步骤,直至所述第二损失值达到第三收敛条件或所述待训练的第二模型的迭代次数达到第四收敛条件,将所述第二损失值达到所述第三收敛条件或所述待训练的第二模型的迭代次数达到所述第四收敛条件的所述待训练的第二模型,确定为所述第二分型诊断模型。
在一个实施例中,上述装置还包括:第三分型诊断模型训练模块,所述第三分型诊断模型训练模块包括:第三样本子获取模块,第三训练子模块;
所述第三样本子获取模块,用于获取多个第三训练样本,所述第三训练样本包括:检验检查特征样本数据、第三1型标定数据,其中,所述检验检查特征样本数据包括:血液化验特征样本数据、血压测量特征样本数据、腰围特征样本数据;
所述第三训练子模块,用于采用预设的划分规则,将所述多个第三训练样本进行划分,得到训练集合和测试集合,采用XGBoost方法,根据所述检验检查特征样本数据建立集成学习分类器,将所述集成学习分类器作为待训练的第三模型,其中,所述待训练的第三模型包括:多棵待训练的分类回归树和待训练的分类回归树之间的关系,采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结束的所述待训练的第三模型作为待检验的第三模型,采用所述测试集合对所述待检验的第三模型进行测试,得到模型测试结果,当所述模型测试结果为失败时,重复执行所述采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结束的所述待训练的第三模型作为待检验的第三模型的步骤,直至所述模型测试结果为成功,将所述待检验的第三模型作为所述第三分型诊断模型。
在一个实施例中,上述第三训练子模块包括:待训练的第三模型确定单元;
所述待训练的第三模型确定单元,用于从所述检验检查特征样本数据中获取特征名称,得到所述检验检查特征样本数据各自对应的所述检验检查特征,采用XGBoost方法,根据所述检验检查特征样本数据各自对应的所述检验检查特征进行特征***生成分类回归树,得到所述集成学习分类器,所述集成学习分类器包括:所述多棵待训练的分类回归树和所述待训练的分类回归树之间的关系,将所述集成学习分类器作为所述待训练的第三模型;
其中,所述多棵待训练的分类回归树中的待训练的分类回归树包括多个节点,所述节点为所述检验检查特征样本数据各自对应的所述检验检查特征及各个所述检验检查特征各自对应的所述目标阈值,所述待训练的分类回归树之间的关系为后一棵所述待训练的分类回归树拟合了前一棵所述待训练的分类回归树的预测残差。
在一个实施例中,上述第三训练子模块包括:模型训练单元、模型验证单元;
所述模型训练单元,用于分别将所述测试集合中每个所述第三训练样本对应的所述检验检查特征样本数据输入所述待检验的第三模型进行计算,得到所述测试集合中各个所述第三训练样本各自对应的1型概率测试预测值;
所述模型验证单元,用于根据所述测试集合中各个所述第三训练样本各自对应的所述1型概率测试预测值和所述第三1型标定数据进行准确率计算,得到目标准确率,获取准确率阈值,判断所述目标准确率是否大于所述准确率阈值,得 到准确率判断结果,当所述准确率判断结果为大于时,确定所述模型测试结果为成功,否则确定所述模型测试结果为失败。
在一个实施例中,上述1型概率确定模块包括:加权求和子模块;
所述加权求和子模块,用于获取预设的加权求和模型,将所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果输入所述预设的加权求和模型进行加权求和,得到所述目标患者对应的所述目标1型概率预测结果。
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过***总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作***、计算机程序和数据库。该内存器为非易失性存储介质中的操作***和计算机程序的运行提供环境。该计算机设备的数据库用于储存糖尿病的分型概率预测方法等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种糖尿病的分型概率预测方法。所述糖尿病的分型概率预测方法,包括:获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
本实施例通过首先将待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果,将待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果,将待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果,然后根据第一1型概率预测结果、第二1型概率预测结果和第三1型概率预测结果进行糖尿病的1型概率确定,得到目标患者对应的目标1型概率预测结果,从而实现了先通过三种不同类型的糖尿病相关特征分别进行糖尿病的1型的概率预测,然后根据三种概率预测结果进行糖尿病的1型概率确定,采用三种不同模型的预测结果进行融合的方法提升了预测的准确性,有利于辅助医生进行准确的糖尿病分型。
本申请一实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现一种糖尿病的分型概率预测方法,包括步骤:获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
上述执行的糖尿病的分型概率预测方法,通过首先将待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果,将待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果,将待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果,然后根据第一1型概率预测结果、第二1型概率预测结果和第三1型概率预测结果进行糖尿病的1型概率确定,得到目标患者对应的目标1型概率预测结果,从而实现了先通过三种不同类型的糖尿病相关特征分别进行糖尿病的1型的概率预测,然后根据三种概率预测结果进行糖尿病的1型概率确定,采用三种不同模型的预测结果进行融合的方法提升了预测的准确性,有利于辅助医生进行准确的糖尿病分型。
所述计算机可读存储介质可以是非易失性,也可以是易失性。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种糖尿病的分型概率预测装置,其中,所述装置包括:
    数据获取模块,用于获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;
    第一预测模块,用于将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;
    第二预测模块,用于将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;
    第三预测模块,用于将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;
    1型概率确定模块,用于根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
  2. 根据权利要求1所述的糖尿病的分型概率预测装置,其中,所述装置还包括:第一分型诊断模型训练模块,所述第一分型诊断模型训练模块包括:第一样本子获取模块,第一训练子模块;
    所述第一样本子获取模块,用于获取多个第一训练样本,所述多个第一训练样本的所述第一训练样本包括:身体状态特征样本数据、第一1型标定数据,其中,所述身体状态特征样本数据包括:发病年龄样本数据和身体质量指数样本数据;
    所述第一训练子模块,用于从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本,将所述目标第一训练样本的所述身体状态特征样本数据输入待训练的第一模型进行糖尿病的1型的概率预测,得到第一1型概率训练预测值,其中,所述待训练的第一模型是基于逻辑回归模型得到的模型,将所述第一1型概率训练预测值、所述目标第一训练样本的所述第一1型标定数据输入第一损失函数进行计算,得到所述待训练的第一模型的第一损失值,根据所述第一损失值更新所述待训练的第一模型的参数,更新后的所述待训练的第一模型被用于下一次计算所述第一1型概率训练预测值,重复执行所述从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本的步骤,直至所述第一损失值达到第一收敛条件或所述待训练的第一模型的迭代次数达到第二收敛条件,将所述第一损失值达到所述第一收敛条件或所述待训练的第一模型的所述待训练的第一模型的迭代次数达到所述第二收敛条件的所述待训练的第一模型,确定为所述第一分型诊断模型。
  3. 根据权利要求1所述的糖尿病的分型概率预测装置,其中,所述装置还包括:第二分型诊断模型训练模块,所述第二分型诊断模型训练模块包括:第二样本子获取模块,第二训练子模块;
    所述第二样本子获取模块,用于获取多个第二训练样本,所述多个第二训练样本的所述第二训练样本包括:基本信息特征样本数据、第二1型标定数据,其中,所述基本信息特征样本数据包括:发病年龄样本数据、性别样本数据、籍贯样本数据和职业样本数据;
    所述第二训练子模块,用于从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本,将所述目标第二训练样本的所述基本信息特征样本数据输入待训练的第二模型进行糖尿病的1型的概率预测,得到第二1型概率训 练预测值,其中,所述待训练的第二模型是基于多层感知机网络得到的模型,将所述第二1型概率训练预测值、所述目标第二训练样本的所述第二1型标定数据输入第二损失函数进行计算,得到所述待训练的第二模型的第二损失值,根据所述第二损失值更新所述待训练的第二模型的参数,更新后的所述待训练的第二模型被用于下一次计算所述第二1型概率训练预测值,重复执行所述从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本的步骤,直至所述第二损失值达到第三收敛条件或所述待训练的第二模型的迭代次数达到第四收敛条件,将所述第二损失值达到所述第三收敛条件或所述待训练的第二模型的迭代次数达到所述第四收敛条件的所述待训练的第二模型,确定为所述第二分型诊断模型。
  4. 根据权利要求1所述的糖尿病的分型概率预测装置,其中,所述装置还包括:第三分型诊断模型训练模块,所述第三分型诊断模型训练模块包括:第三样本子获取模块,第三训练子模块;
    所述第三样本子获取模块,用于获取多个第三训练样本,所述第三训练样本包括:检验检查特征样本数据、第三1型标定数据,其中,所述检验检查特征样本数据包括:血液化验特征样本数据、血压测量特征样本数据、腰围特征样本数据;
    所述第三训练子模块,用于采用预设的划分规则,将所述多个第三训练样本进行划分,得到训练集合和测试集合,采用XGBoost方法,根据所述检验检查特征样本数据建立集成学习分类器,将所述集成学习分类器作为待训练的第三模型,其中,所述待训练的第三模型包括:多棵待训练的分类回归树和待训练的分类回归树之间的关系,采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结束的所述待训练的第三模型作为待检验的第三模型,采用所述测试集合对所述待检验的第三模型进行测试,得到模型测试结果,当所述模型测试结果为失败时,重复执行所述采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结束的所述待训练的第三模型作为待检验的第三模型的步骤,直至所述模型测试结果为成功,将所述待检验的第三模型作为所述第三分型诊断模型。
  5. 根据权利要求4所述的糖尿病的分型概率预测装置,其中,所述第三训练子模块包括:待训练的第三模型确定单元;
    所述待训练的第三模型确定单元,用于从所述检验检查特征样本数据中获取特征名称,得到所述检验检查特征样本数据各自对应的所述检验检查特征,采用XGBoost方法,根据所述检验检查特征样本数据各自对应的所述检验检查特征进行特征***生成分类回归树,得到所述集成学习分类器,所述集成学习分类器包括:所述多棵待训练的分类回归树和所述待训练的分类回归树之间的关系,将所述集成学习分类器作为所述待训练的第三模型;
    其中,所述多棵待训练的分类回归树中的待训练的分类回归树包括多个节点,所述节点为所述检验检查特征样本数据各自对应的所述检验检查特征及各个所述检验检查特征各自对应的所述目标阈值,所述待训练的分类回归树之间的关系为后一棵所述待训练的分类回归树拟合了前一棵所述待训练的分类回归树的预测残差。
  6. 根据权利要求4所述的糖尿病的分型概率预测装置,其中,所述第三训练 子模块包括:模型训练单元、模型验证单元;
    所述模型训练单元,用于分别将所述测试集合中每个所述第三训练样本对应的所述检验检查特征样本数据输入所述待检验的第三模型进行计算,得到所述测试集合中各个所述第三训练样本各自对应的1型概率测试预测值;
    所述模型验证单元,用于根据所述测试集合中各个所述第三训练样本各自对应的所述1型概率测试预测值和所述第三1型标定数据进行准确率计算,得到目标准确率,获取准确率阈值,判断所述目标准确率是否大于所述准确率阈值,得到准确率判断结果,当所述准确率判断结果为大于时,确定所述模型测试结果为成功,否则确定所述模型测试结果为失败。
  7. 根据权利要求1所述的糖尿病的分型概率预测装置,其中,所述1型概率确定模块包括:加权求和子模块;
    所述加权求和子模块,用于获取预设的加权求和模型,将所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果输入所述预设的加权求和模型进行加权求和,得到所述目标患者对应的所述目标1型概率预测结果。
  8. 一种糖尿病的分型概率预测方法,其中,所述方法包括:
    获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;
    将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;
    将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;
    将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;
    根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
  9. 根据权利要求8所述的糖尿病的分型概率预测方法,其中,所述将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果之前,还包括:
    获取多个第一训练样本,所述多个第一训练样本的所述第一训练样本包括:身体状态特征样本数据、第一1型标定数据,其中,所述身体状态特征样本数据包括:发病年龄样本数据和身体质量指数样本数据;
    从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本;
    将所述目标第一训练样本的所述身体状态特征样本数据输入待训练的第一模型进行糖尿病的1型的概率预测,得到第一1型概率训练预测值,其中,所述待训练的第一模型是基于逻辑回归模型得到的模型;
    将所述第一1型概率训练预测值、所述目标第一训练样本的所述第一1型标定数据输入第一损失函数进行计算,得到所述待训练的第一模型的第一损失值,根据所述第一损失值更新所述待训练的第一模型的参数,更新后的所述待训练的第一模型被用于下一次计算所述第一1型概率训练预测值;
    重复执行所述从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本的步骤,直至所述第一损失值达到第一收敛条件或所述待训练的 第一模型的迭代次数达到第二收敛条件,将所述第一损失值达到所述第一收敛条件或所述待训练的第一模型的所述待训练的第一模型的迭代次数达到所述第二收敛条件的所述待训练的第一模型,确定为所述第一分型诊断模型。
  10. 根据权利要求8所述的糖尿病的分型概率预测方法,其中,所述将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果的步骤之前,还包括:
    获取多个第二训练样本,所述多个第二训练样本的所述第二训练样本包括:基本信息特征样本数据、第二1型标定数据,其中,所述基本信息特征样本数据包括:当前年龄样本数据、性别样本数据、籍贯样本数据和职业样本数据;
    从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本;
    将所述目标第二训练样本的所述基本信息特征样本数据输入待训练的第二模型进行糖尿病的1型的概率预测,得到第二1型概率训练预测值,其中,所述待训练的第二模型是基于多层感知机网络得到的模型;
    将所述第二1型概率训练预测值、所述目标第二训练样本的所述第二1型标定数据输入第二损失函数进行计算,得到所述待训练的第二模型的第二损失值,根据所述第二损失值更新所述待训练的第二模型的参数,更新后的所述待训练的第二模型被用于下一次计算所述第二1型概率训练预测值;
    重复执行所述从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本的步骤,直至所述第二损失值达到第三收敛条件或所述待训练的第二模型的迭代次数达到第四收敛条件,将所述第二损失值达到所述第三收敛条件或所述待训练的第二模型的迭代次数达到所述第四收敛条件的所述待训练的第二模型,确定为所述第二分型诊断模型。
  11. 根据权利要求8所述的糖尿病的分型概率预测方法,其中,所述将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果的步骤之前,还包括:
    获取多个第三训练样本,所述第三训练样本包括:检验检查特征样本数据、第三1型标定数据,其中,所述检验检查特征样本数据包括:血液化验特征样本数据、血压测量特征样本数据、腰围特征样本数据;
    采用预设的划分规则,将所述多个第三训练样本进行划分,得到训练集合和测试集合;
    采用XGBoost方法,根据所述检验检查特征样本数据建立集成学习分类器,将所述集成学习分类器作为待训练的第三模型,其中,所述待训练的第三模型包括:多棵待训练的分类回归树和待训练的分类回归树之间的关系;
    采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结束的所述待训练的第三模型作为待检验的第三模型;
    采用所述测试集合对所述待检验的第三模型进行测试,得到模型测试结果;
    当所述模型测试结果为失败时,重复执行所述采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结束的所述待训练的第三模型作为待检验的第三模型的步骤,直至所述模型测试结果为成功;
    将所述待检验的第三模型作为所述第三分型诊断模型。
  12. 根据权利要求11所述的糖尿病的分型概率预测方法,其中,所述采用XGBoost方法,根据所述检验检查特征样本数据建立集成学习分类器,将所述集成学习分类器作为待训练的第三模型的步骤,包括:
    从所述检验检查特征样本数据中获取特征名称,得到所述检验检查特征样本数据各自对应的所述检验检查特征;
    采用XGBoost方法,根据所述检验检查特征样本数据各自对应的所述检验检查特征进行特征***生成分类回归树,得到所述集成学习分类器,所述集成学习分类器包括:所述多棵待训练的分类回归树和所述待训练的分类回归树之间的关系;
    将所述集成学习分类器作为所述待训练的第三模型;
    其中,所述多棵待训练的分类回归树中的待训练的分类回归树包括多个节点,所述节点为所述检验检查特征样本数据各自对应的所述检验检查特征及各个所述检验检查特征各自对应的所述目标阈值,所述待训练的分类回归树之间的关系为后一棵所述待训练的分类回归树拟合了前一棵所述待训练的分类回归树的预测残差。
  13. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现如下方法步骤:
    获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;
    将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;
    将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;
    将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;
    根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
  14. 根据权利要求13所述的计算机设备,其中,所述将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果之前,还包括:
    获取多个第一训练样本,所述多个第一训练样本的所述第一训练样本包括:身体状态特征样本数据、第一1型标定数据,其中,所述身体状态特征样本数据包括:发病年龄样本数据和身体质量指数样本数据;
    从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本;
    将所述目标第一训练样本的所述身体状态特征样本数据输入待训练的第一模型进行糖尿病的1型的概率预测,得到第一1型概率训练预测值,其中,所述待训练的第一模型是基于逻辑回归模型得到的模型;
    将所述第一1型概率训练预测值、所述目标第一训练样本的所述第一1型标定数据输入第一损失函数进行计算,得到所述待训练的第一模型的第一损失值,根据所述第一损失值更新所述待训练的第一模型的参数,更新后的所述待训练的第一模型被用于下一次计算所述第一1型概率训练预测值;
    重复执行所述从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本的步骤,直至所述第一损失值达到第一收敛条件或所述待训练的第一模型的迭代次数达到第二收敛条件,将所述第一损失值达到所述第一收敛条件或所述待训练的第一模型的所述待训练的第一模型的迭代次数达到所述第二收敛条件的所述待训练的第一模型,确定为所述第一分型诊断模型。
  15. 根据权利要求13所述的计算机设备,其中,所述将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果的步骤之前,还包括:
    获取多个第二训练样本,所述多个第二训练样本的所述第二训练样本包括:基本信息特征样本数据、第二1型标定数据,其中,所述基本信息特征样本数据包括:当前年龄样本数据、性别样本数据、籍贯样本数据和职业样本数据;
    从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本;
    将所述目标第二训练样本的所述基本信息特征样本数据输入待训练的第二模型进行糖尿病的1型的概率预测,得到第二1型概率训练预测值,其中,所述待训练的第二模型是基于多层感知机网络得到的模型;
    将所述第二1型概率训练预测值、所述目标第二训练样本的所述第二1型标定数据输入第二损失函数进行计算,得到所述待训练的第二模型的第二损失值,根据所述第二损失值更新所述待训练的第二模型的参数,更新后的所述待训练的第二模型被用于下一次计算所述第二1型概率训练预测值;
    重复执行所述从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本的步骤,直至所述第二损失值达到第三收敛条件或所述待训练的第二模型的迭代次数达到第四收敛条件,将所述第二损失值达到所述第三收敛条件或所述待训练的第二模型的迭代次数达到所述第四收敛条件的所述待训练的第二模型,确定为所述第二分型诊断模型。
  16. 根据权利要求13所述的计算机设备,其中,所述将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果的步骤之前,还包括:
    获取多个第三训练样本,所述第三训练样本包括:检验检查特征样本数据、第三1型标定数据,其中,所述检验检查特征样本数据包括:血液化验特征样本数据、血压测量特征样本数据、腰围特征样本数据;
    采用预设的划分规则,将所述多个第三训练样本进行划分,得到训练集合和测试集合;
    采用XGBoost方法,根据所述检验检查特征样本数据建立集成学习分类器,将所述集成学习分类器作为待训练的第三模型,其中,所述待训练的第三模型包括:多棵待训练的分类回归树和待训练的分类回归树之间的关系;
    采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结束的所述待训练的第三模型作为待检验的第三模型;
    采用所述测试集合对所述待检验的第三模型进行测试,得到模型测试结果;
    当所述模型测试结果为失败时,重复执行所述采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结 束的所述待训练的第三模型作为待检验的第三模型的步骤,直至所述模型测试结果为成功;
    将所述待检验的第三模型作为所述第三分型诊断模型。
  17. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下方法步骤:
    获取目标患者的待分析的身体状态特征数据、待分析的基本信息特征数据和待分析的检验检查特征数据;
    将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果;
    将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果;
    将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果;
    根据所述第一1型概率预测结果、所述第二1型概率预测结果和所述第三1型概率预测结果进行糖尿病的1型概率确定,得到所述目标患者对应的目标1型概率预测结果。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述将所述待分析的身体状态特征数据输入第一分型诊断模型进行糖尿病的1型的概率预测,得到第一1型概率预测结果之前,还包括:
    获取多个第一训练样本,所述多个第一训练样本的所述第一训练样本包括:身体状态特征样本数据、第一1型标定数据,其中,所述身体状态特征样本数据包括:发病年龄样本数据和身体质量指数样本数据;
    从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本;
    将所述目标第一训练样本的所述身体状态特征样本数据输入待训练的第一模型进行糖尿病的1型的概率预测,得到第一1型概率训练预测值,其中,所述待训练的第一模型是基于逻辑回归模型得到的模型;
    将所述第一1型概率训练预测值、所述目标第一训练样本的所述第一1型标定数据输入第一损失函数进行计算,得到所述待训练的第一模型的第一损失值,根据所述第一损失值更新所述待训练的第一模型的参数,更新后的所述待训练的第一模型被用于下一次计算所述第一1型概率训练预测值;
    重复执行所述从所述多个第一训练样本获取一个所述第一训练样本作为目标第一训练样本的步骤,直至所述第一损失值达到第一收敛条件或所述待训练的第一模型的迭代次数达到第二收敛条件,将所述第一损失值达到所述第一收敛条件或所述待训练的第一模型的所述待训练的第一模型的迭代次数达到所述第二收敛条件的所述待训练的第一模型,确定为所述第一分型诊断模型。
  19. 根据权利要求17所述的计算机可读存储介质,其中,所述将所述待分析的基本信息特征数据输入第二分型诊断模型进行糖尿病的1型的概率预测,得到第二1型概率预测结果的步骤之前,还包括:
    获取多个第二训练样本,所述多个第二训练样本的所述第二训练样本包括:基本信息特征样本数据、第二1型标定数据,其中,所述基本信息特征样本数据包括:当前年龄样本数据、性别样本数据、籍贯样本数据和职业样本数据;
    从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本;
    将所述目标第二训练样本的所述基本信息特征样本数据输入待训练的第二模型进行糖尿病的1型的概率预测,得到第二1型概率训练预测值,其中,所述待训练的第二模型是基于多层感知机网络得到的模型;
    将所述第二1型概率训练预测值、所述目标第二训练样本的所述第二1型标定数据输入第二损失函数进行计算,得到所述待训练的第二模型的第二损失值,根据所述第二损失值更新所述待训练的第二模型的参数,更新后的所述待训练的第二模型被用于下一次计算所述第二1型概率训练预测值;
    重复执行所述从所述多个第二训练样本获取一个所述第二训练样本作为目标第二训练样本的步骤,直至所述第二损失值达到第三收敛条件或所述待训练的第二模型的迭代次数达到第四收敛条件,将所述第二损失值达到所述第三收敛条件或所述待训练的第二模型的迭代次数达到所述第四收敛条件的所述待训练的第二模型,确定为所述第二分型诊断模型。
  20. 根据权利要求17所述的计算机可读存储介质,其中,所述将所述待分析的检验检查特征数据输入第三分型诊断模型进行糖尿病的1型的概率预测,得到第三1型概率预测结果的步骤之前,还包括:
    获取多个第三训练样本,所述第三训练样本包括:检验检查特征样本数据、第三1型标定数据,其中,所述检验检查特征样本数据包括:血液化验特征样本数据、血压测量特征样本数据、腰围特征样本数据;
    采用预设的划分规则,将所述多个第三训练样本进行划分,得到训练集合和测试集合;
    采用XGBoost方法,根据所述检验检查特征样本数据建立集成学习分类器,将所述集成学习分类器作为待训练的第三模型,其中,所述待训练的第三模型包括:多棵待训练的分类回归树和待训练的分类回归树之间的关系;
    采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结束的所述待训练的第三模型作为待检验的第三模型;
    采用所述测试集合对所述待检验的第三模型进行测试,得到模型测试结果;
    当所述模型测试结果为失败时,重复执行所述采用所述训练集合对所述待训练的第三模型进行训练,调整所述待训练的第三模型中的所述多棵待训练的分类回归树的节点中的检验检查特征和目标阈值,直至达到训练结束条件,将训练结束的所述待训练的第三模型作为待检验的第三模型的步骤,直至所述模型测试结果为成功;
    将所述待检验的第三模型作为所述第三分型诊断模型。
PCT/CN2021/097552 2021-03-26 2021-05-31 糖尿病的分型概率预测方法、装置、设备及存储介质 WO2022198794A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110328104.5A CN112801224B (zh) 2021-03-26 2021-03-26 糖尿病的分型概率预测方法、装置、设备及存储介质
CN202110328104.5 2021-03-26

Publications (1)

Publication Number Publication Date
WO2022198794A1 true WO2022198794A1 (zh) 2022-09-29

Family

ID=75817251

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097552 WO2022198794A1 (zh) 2021-03-26 2021-05-31 糖尿病的分型概率预测方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN112801224B (zh)
WO (1) WO2022198794A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801224B (zh) * 2021-03-26 2024-03-05 平安科技(深圳)有限公司 糖尿病的分型概率预测方法、装置、设备及存储介质
CN113673760A (zh) * 2021-08-19 2021-11-19 上海上实龙创智能科技股份有限公司 一种能耗预测方法、装置、计算机设备和存储介质
CN113689928B (zh) * 2021-08-24 2023-06-20 深圳平安智慧医健科技有限公司 保养及预防患病方案的推荐方法、装置、设备及存储介质
CN116434897A (zh) * 2021-12-31 2023-07-14 深圳云天励飞技术股份有限公司 一种病情检测模型训练、检测方法、装置和电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222723A (zh) * 2019-05-14 2019-09-10 华南理工大学 一种基于混合模型的足球比赛首发预测方法
CN111340244A (zh) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 预测方法、训练方法、装置、服务器及介质
CN112381115A (zh) * 2020-10-21 2021-02-19 西安工程大学 一种Bagging滑坡预报方法
CN112801224A (zh) * 2021-03-26 2021-05-14 平安科技(深圳)有限公司 糖尿病的分型概率预测方法、装置、设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9619753B2 (en) * 2014-12-30 2017-04-11 Winbond Electronics Corp. Data analysis system and method
CN110808097A (zh) * 2019-10-30 2020-02-18 中国福利会国际和平妇幼保健院 一种妊娠期糖尿病预测***及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222723A (zh) * 2019-05-14 2019-09-10 华南理工大学 一种基于混合模型的足球比赛首发预测方法
CN111340244A (zh) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 预测方法、训练方法、装置、服务器及介质
CN112381115A (zh) * 2020-10-21 2021-02-19 西安工程大学 一种Bagging滑坡预报方法
CN112801224A (zh) * 2021-03-26 2021-05-14 平安科技(深圳)有限公司 糖尿病的分型概率预测方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN112801224B (zh) 2024-03-05
CN112801224A (zh) 2021-05-14

Similar Documents

Publication Publication Date Title
WO2022198794A1 (zh) 糖尿病的分型概率预测方法、装置、设备及存储介质
CN110459324B (zh) 基于长短期记忆模型的疾病预测方法、装置和计算机设备
CN109978022B (zh) 一种医疗文本信息处理方法及装置、存储介质
CN109620205B (zh) 心电数据分类方法、装置、计算机设备和存储介质
KR102237449B1 (ko) 환자 진단 학습 방법, 서버 및 프로그램
Joshi et al. Detection and prediction of diabetes mellitus using back-propagation neural network
CN110797101B (zh) 医学数据处理方法、装置、可读存储介质和计算机设备
Ho et al. The dependence of machine learning on electronic medical record quality
CN116864139A (zh) 疾病风险评估方法、装置、计算机设备及可读存储介质
Bellot et al. Generalization and invariances in the presence of unobserved confounding
CN113160986A (zh) 用于预测全身炎症反应综合征发展的模型构建方法及***
CN112542242A (zh) 数据转换/症状评分
CN116864106A (zh) 医疗数据处理方法、装置、设备及介质
TW202211258A (zh) 高齡流感病情預測系統、程式產品及其建立與使用方法
US20210295999A1 (en) Patient state prediction apparatus, patient state prediction method, and patient state prediction program
WO2019200746A1 (zh) Ecg信号的检测方法、装置、计算机设备和存储介质
CN113096756A (zh) 病情演变分类方法、装置、电子设备和存储介质
Güldoğan et al. Performance evaluation of different artificial neural network models in the classification of type 2 diabetes mellitus
Zaitseva et al. Healthcare system reliability analysis addressing uncertain and ambiguous data
CN117216322A (zh) 一种电子病历的生成方法、装置、设备及存储介质
Noori et al. A comparative analysis for diabetic prediction based on machine learning techniques
Aguilera-Venegas et al. A proposal of a mixed diagnostic system based on decision trees and probabilistic experts rules
CN116110612A (zh) 一种基于医患交互的智能导诊问询反馈处理方法及***
CN114550896A (zh) 基于人工神经网络的头晕患者急诊预检分诊决策方法、装置及模型
KR20190104713A (ko) 자가 적응형 의료 데이터 분석 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932410

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21932410

Country of ref document: EP

Kind code of ref document: A1