CN102930163A - Method for judging 2 type diabetes mellitus risk state - Google Patents

Method for judging 2 type diabetes mellitus risk state Download PDF

Info

Publication number
CN102930163A
CN102930163A CN 201210431592 CN201210431592A CN102930163A CN 102930163 A CN102930163 A CN 102930163A CN 201210431592 CN201210431592 CN 201210431592 CN 201210431592 A CN201210431592 A CN 201210431592A CN 102930163 A CN102930163 A CN 102930163A
Authority
CN
China
Prior art keywords
sample
classification
value
sample object
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201210431592
Other languages
Chinese (zh)
Inventor
罗森林
张铁梅
陈�峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN 201210431592 priority Critical patent/CN102930163A/en
Publication of CN102930163A publication Critical patent/CN102930163A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for judging 2 type diabetes mellitus risk state and belongs to the technical field of biomedicine. The method includes that EM clusters and C4.5 classification are utilized to extract determinant attributes affecting attack of the 2 type diabetes mellitus, the crowd is divided into different cluster groups according to own characteristics, according to determinant attributes extracted results and a Logistic regression method, personal or crowd risk state is judged by combining metabolic syndrome and high risk group criterions. The determinant attributes affecting the 2 type diabetes mellitus are obtained, simultaneously, different risk state judgment models are built based on different characteristics of the crowd, the universality and practicability of the models are improved, the requirements of real-time processing and mobile computation are met, the effective degree of the determinant attributes is estimated through a quantitative analysis method, the complexity is low, the accuracy is high, and the divided risk state is detailed.

Description

A kind of diabetes B risk status decision method
Technical field
The present invention relates to a kind of diabetes B risk status decision method, belong to field of biomedicine technology.
Background technology
Along with socio-economic development, the hazards such as lifestyle change and aging population increase sharply, and diabetes B and complication morbidity rate thereof rise rapidly.Show that according to investigation result in 2008 in the adult more than 20 years old, the morbidity rate of the diabetes of age markization is 9.7%, the ratio of prediabetes is more up to 15.5%.Therefore, in early days the risk of diabetes crowd is taken preventive measures, accomplish that prevention of diseae is the effective way of control diabetes prevalence.Diabetes risk condition judgement instrument just more and more is subject to the attention of medical personnel as the important tool of Mass screening.The diabetes risk condition judgement mainly tends to the assessment to ill risk status, refer to analysis and deduction that the residing risk status of individual is done, it is the basis of decision in the face of risk, diabetes risk condition judgement model is the risk status of Accurate Prediction individual diseases effectively also, can alleviate the workload of medical personnel, also can take targetedly preventive measure for High risk group accurately simultaneously.
The diabetes B risk status is judged to be needed to solve 2 basic problems: 1. how to extract the determinant attribute that is closely related with the diabetes B morbidity, strengthen the universality that risk is judged; 2. how reasonably to judge individual onset risk state according to determinant attribute.Take a broad view of existing diabetes B risk status decision method, normally used method has:
1. aspect the determinant attribute extraction:
Generally be divided into filtration method and packing method according to algorithm principle, filtration method generally comprises following methods:
(1) Relief method: the method is to extract determinant attribute according to the statistic correlation standard, come the degree of correlation of evaluating characteristic according to the separating capacity of eigenwert, be that determinant attribute should make similar sample approach, and make between the inhomogeneous sample away from, basic thought is: sample is sampled, calculate the degree of correlation near the difference of sample according to the sample that extracts with similar, inhomogeneous two, thereby determining the different weights of each attribute.The determinant attribute correlativity that the Relief method is selected is stronger, and can process discrete and connection attribute, but the method can not be eliminated redundant attributes, and owing to calculate sample separation from producing larger time overhead, can't satisfy the time performance requirement that the high dimensional data determinant attribute is extracted.
(2) principal component analytical method: the method is studied the correlationship between each attribute, with original one group of attribute with certain correlativity, by being for conversion into one group of new community set as determinant attribute, reach with less new attribute by this conversion and to replace originally more multiattribute purpose, and make the as far as possible information that keep originally than the multiattribute reflection of new attribute more.But principal component analysis (PCA) relates to the computing of secular equation equal matrix, and the time loss of algorithm can not satisfy the requirement that the high dimensional data determinant attribute is extracted.
(3) rough set method: the method is under the classification capacity that keeps attribute, obtains the determinant attribute set thereby constantly screen out redundant attributes.Rough set method is eliminated the lower attribute of classification results impact by the difference of Feature Dependence degree generally according to differential matrix, Attribute Significance or JOHNSON reduction method, reaches the purpose of extracting determinant attribute.Although the method can be deleted uncorrelated attribute effectively, do not consider the impact of noise data, and counting yield is low.
(4) information entropy method: the method is mainly used in analytical information uncertainty in the information theory, also can be used as the evaluation that attribute is made expenditure, and namely determinant attribute is extracted.Its basic thought is according to method dividing data such as computing information gains, and the data gain after dividing from new calculating, and typical method has ID3 and C4.5, but the method time complexity is high.
(5) genetic algorithm: the method is expressed as solution " chromosome " of encoding with binary string, before execution algorithm, provide " chromosome " that hypothesis is separated, then these hypothesis solutions being placed particular problem also is " environment ", therefrom selects the coded strings that conforms according to certain principle and simulates a new generation that the generation that copies, intersects, makes a variation of biological heredity process more conforms.Evolving according to this, will converge to gradually on the coded strings that conforms most at last, also is optimum solution.Can extract determinant attribute by this process.But the method need to not stopped iterative computation, and the algorithm time complexity is high, seldom uses in determinant attribute is extracted.
Packing method: the method is with the black box of learning algorithm as test usefulness, utilize relevant learning algorithm that attribute set is estimated, its main thought is with training data and disaggregated model of corresponding learning algorithm training, then assess the classification accuracy of this sorter with test data, by the iterative extraction determinant attribute, can find the correlation parameter setting value than better suited learning algorithm and algorithm simultaneously.The packing method advantage is high to the support of learning algorithm, and shortcoming is that this model need to expend a large amount of time and learns and train, and time complexity is high, and efficient is low, is not suitable for the often situation of change of learning algorithm.
2. aspect the risk status decision method:
(1) Finland's diabetes risk point system: Finland's risk score questionnaire (FINDRISC) is proposed in 2003 by Lindstrom, it is first diabetes risk assessment models that obtains by cohort study, the diabetes B risk assessment tool that is considered to know best, be widely used in the whole world, can under the collaborative help that need not the medical worker, realize self-score in predicting.Finland's risk score model adopts two groups of differences to follow up a case by regular visits to demographic data as the data source of risk assessment study, the method is by following up a case by regular visits to 10 years the random sampling crowd, and to about problem score such as age, constitutional index (BMI), abdominal circumference abdomen hip circumference ratio, blood pressure, family history, diet and exercise habits, each problem all has fixedly standards of grading, according to score by rules, the individual can be by calculating the danger coefficient score value of each corresponding hazards, and every addition can obtain individual risk's scoring, its score value is higher, and the trouble diabetes risk is larger.In the risk score model, individual risk's score value scope by clinical cohort analysis, then needed further diagnosis and inspection more than or equal to 9 minutes persons at 0 ~ 20 minute.Through 1987 and these two cohort studies in 1992, it is 78% and 81% that the rationality of model obtains its sensitivity, and specificity is 77% and 76%, and positive predictive value is 0.13 and 0.05, AUC is 0.85 and 0.87, and the result shows that this risk score model has preferably prediction effect.
(2) Multiple-Factor Model method: whether the method is to suffer from diabetes as dependent variable, determinant attribute is independent variable, carry out multiple regression (Logistic returns or Cox returns), obtain regression coefficient, then regression coefficient is changed into corresponding risk score value, set up model, calculate individual overall risk score value, and obtain the judging point of risk score value according to the ROC curve.The method judges that by the size of more individual overall risk score value and judging point the individual is in glycosuria stadium or prediabetes.The Multiple-Factor Model method through checking, have certain accuracy, but the method only can judge whether the crowd is in glycosuria stadium or prediabetes, can not judge other risk status in different crowd.
(3) single factor weighted score method: the method adopts the Logistic recurrence to carry out modeling, the OR value of each determinant attribute is changed into corresponding risk score value, set up model and calculate the overall risk score value, the method is divided into 5 states according to the overall risk score value with the diabetes risk state.But the method is not considered group characteristic, does not have universality so that set up model, and the model of country variant and different regions also differs widely.
(4) the diabetes risk appraisal procedure of HCI: be by Wu Haiyun, Pan's equality people proposed in 2007, it is one of model for the assessment of China adult diabetes risk degree, the method proposes major risk factors and the relative risk degree thereof of China adult onset diabetes according to multidisciplinary expert group, the computing method that can assess by medical history and life questionnaire data individual onset diabetes risk of setting up, use Harvard's risk of cancer formula of index, calculate the ill relative risk degree of individual according to the relative risk degree of individual risk's factor and with the relative risk degree of sex age group, this model can be used for assessing the individual onset diabetes risk of China adult, points out the impact of individual different risk factors on its onset diabetes risk.The assessment models method that the method is set up can be by large-scale crowd cohort study, draw the correlation predictive variable of a certain disease through multinomial logistic regression, model is divided into 5 grades with ill degree of risk, has certain reference, and be convenient to realize, and the mechanisms such as health management system arranged and community prevention health care that are used for network use, but the method does not provide the checking of relevant accuracy, so the method is still waiting discussion in the value of practical application.
In sum, for diabetes risk condition judgement problem, the determinant attribute that existing method is chosen is all not identical, does not have representativeness; Simultaneously, the availability risk condition judgement method is not considered the inherent characteristics of crowd, sets up model and does not have universality, and is not satisfactory for the judgement effect of different crowd.
Summary of the invention
The objective of the invention is to solve diabetes B risk status decision problem, propose a kind of diabetes B risk status decision method based on determinant attribute extraction and logistic recurrence, wherein the determinant attribute extraction that EM cluster and C4.5 classification realization are fallen ill on affecting diabetes B is adopted in the determinant attribute extraction.
Design concept of the present invention affects the determinant attribute that diabetes B is fallen ill for using EM cluster and C4.5 classification to extract, and the crowd is divided into different clustering cluster according to own characteristic; Extract result and Logistic homing method according to determinant attribute, judge individual or crowd's risk status in conjunction with metabolic syndrome and people at highest risk's criterion.When obtaining to affect the determinant attribute of diabetes B, consideration crowd different characteristics is constructed different risk status decision models, the universality of lift scheme and practicality, and the demand of satisfied in real time processing and mobile computing.
Technical scheme of the present invention realizes as follows:
Step 1, as sample set S, wherein each object comprises M determinant attribute that affects the diabetes B morbidity with N evaluation object; As row, property value corresponding to different samples set up the matrix representation [s of sample set S as row with determinant attribute (a+c) b]; And adopt the EM clustering method that sample set S is carried out cluster, obtain k clustering cluster.Concrete methods of realizing is:
Whether step 1.1 at first contains missing data with N object basis and is divided into two sample sets: complete data collection X and missing data collection Y.
Described complete data integrate X as data corresponding to M determinant attribute all without the object set of disappearance.The data of all object determinant attributes consist of matrix [x in the set Ab], wherein the line number a of matrix represents the complete data sample object, columns b represents data corresponding to each attribute of complete data sample, and b=1,2 ..., M.
Described missing data integrates Y as the object set that has one or more disappearances in data corresponding to M determinant attribute.Matrix [y of determinant attribute data formation of all objects in the set Cb], line number c represents the missing data sample object, columns b represents data corresponding to each attribute of missing data sample.
Step 1.2 arranges clustering cluster number k(k≤N), because any i sample n among the sample set S i(1≤i≤N), all obey the mixing probability distribution of k clustering cluster:
n i ~ p ( n i | Θ ) = Σ j = 1 k π j p ( θ i | θ j )
Θ=(π wherein 1, π 2..., π k, θ 1, θ 2..., θ k) expression k clustering cluster the mixing probability distribution parameters, θ jThe probability distribution parameters that represents j clustering cluster, π jExpression n iFrom the possibility of j clustering cluster, j=1,2 ..., k, π 1+ π 2+ ...+π k=1.
Therefore, set one group of parameter value Θ 0 = ( π 1 0 , π 2 0 , . . . , π k 0 , θ 1 0 , θ 2 0 , . . . , θ k 0 ) Initial mixing probability distribution parameters estimated value as sample set S.
Step 1.3, the initial mixing probability distribution parameters estimated value Θ that step 1.2 is given 0Substitution missing data collection Y obtains y CbThe posteriority distribution probability be:
p ( y cb | x ab , Θ 0 ) = π k 0 p y cb ( x ab | θ k 0 ) Σ k = 1 M π k 0 p k ( x ab | θ k 0 ) X wherein Ab∈ X, y Cb∈ Y
Because the independence of each sample between sample set, the posteriority distribution function that obtains data set Y is:
p ( Y | X , Θ 0 ) = Π b = 1 M p ( y cb | x ab , Θ 0 )
Step 1.4, posteriority distribution function according to step 1.3 acquisition, to the log-likelihood function ln L of complete data (Θ | X, Y) ask expectation about missing data, and will obtain the expectation value of M corresponding each row, be used for replacing each missing data of the corresponding columns of missing data collection Y, obtain new samples collection Y '.
Described ln L (Θ | X, Y) be that complete data is about the log-likelihood function of missing data, wherein
ln L ( Θ | X , Y ) = ln p ( x , y | Θ ) = Σ b = 1 M ln p ( x ab | y cb ) p ( y cb )
Step 1.5 is calculated sample set Y ' and the complete data collection X that obtains according to step 1.4, recomputates maximum likelihood parameter Q (Θ, the Θ of sample set S 0):
Q(Θ,Θ 0)=∑ln(L(Θ|X,Y)p(Y|X,Θ 0))
Step 1.6 is with Q (Θ, Θ 0) maximization, obtain Θ 1, satisfy Q (Θ 1, Θ 0)=max Q (Θ, Θ 0), use Θ 1Replace Θ 0, substitution step 1.3.
Step 1.7, execution in step 1.3 be to the iterative process of step 1.6 α time, until || Q (Θ α+1, Θ α)-Q (Θ α, Θ α-1) ||<ε stops iteration.The Θ that finally obtains αBe the mixing probability distribution parameters estimated value Θ of k clustering cluster.Wherein ε is the accuracy value of setting according to the cluster accuracy.
Step 1.8 is utilized and is mixed probability distribution parameters estimated value Θ, calculates respectively the posteriority conditional probability density value that each sample object belongs to clustering cluster j, and namely each object is under the jurisdiction of the probability that is subordinate to of each bunch.According to the principle that is subordinate to maximum probability of sample and clustering cluster, each sample among the sample set S is divided into k clustering cluster.
Step 2, each clustering cluster that step 1 is obtained adopts respectively the C4.5 sorting technique to carry out the decision tree training, obtains k decision tree.
Wherein the concrete building process of the decision tree of j clustering cluster is as follows:
Step 2.1, diabetes B decision criteria according to the proposition of diabetes mellitus in China association, object among the clustering cluster j is divided into ill and not ill two classification P and Q, and clustering cluster j contains respectively g sample object and h sample object that belongs to classification Q that belongs to classification P.
Step 2.2, calculate the quantity of information that all objects belong to respectively classification P and Q and be:
Info ( j ) = Info ( P , Q ) = - ( g g + h * log g g + h + h g + h * log h g + h ) .
Step 2.3 is chosen a data value A in b Column Properties data β, with property value in these row more than or equal to A βSample object be divided to classification S 11, less than A βSample object be divided to classification S 12, form two sub-class set; S wherein 11In comprise e 1The individual sample object that belongs to classification P, f 1The individual sample object that belongs to classification Q; S 12In comprise e 2The individual sample object that belongs to classification P, f 2The individual sample object that belongs to classification Q.Then the quantity of information of calculating two subclasses is:
Info ( A β , j ) = Σ o = 1 2 e o + f o g + h * Info ( P , Q )
Step 2.4, with each property value in the b row respectively as A β, obtain the corresponding quantity of information of each property value by the described method of step 2.3, and further obtain the information gain value of each property value:
Gain(A β,j)=Info(j)-Info(A β,j)
Step 2.5 according to the method for step 2.3 to step 2.4, is asked for the information gain value of each property value to each row, and in all information gain values that will obtain attribute corresponding to maximal value as the root node of decision tree; Will be according to A corresponding to maximal value in the information gain value βThe subclass S that divides 11And S 12As the data set of dividing next node layer.
Step 2.6 is according to the data set S that marks off 11And S 12In each sample object belong to respectively the number of classification P or classification Q, according to the described method of step 2.2, computational data collection S respectively 11And S 12Quantity of information Info (S 11) and Info (S 12).
Step 2.7, repeats λ time and divides to the described method of step 2.6 according to step 2.3, belongs to same classification or all properties and all divides completely until be divided all sample object of node, and then Stop node is divided, and obtains (λ+1) layer decision tree.
Step 3, for each attribute, according to its number of plies that in k the decision tree that step 2 obtains, occurs, and the number of times of this layer appearance with, calculate the number of plies coefficient of this attribute.
For sample set S, the number of plies coefficient L of b Column Properties bFor:
L b = Σ j = 1 k Σ w = 1 λ + 1 ( 1 2 w t w ) Σ w = 1 λ + 1 t w
T wherein wBeing illustrated in the w layer has occurred t time.
With the effect degree of number of plies coefficient as each properties affect diabetes B morbidity of measurement, and choose L bB the attribute of δ be as the main community set U (U of impact morbidity 1, U 2..., U B), 1<B<M wherein, δ is the threshold value of main attribute effect degree.
Step 4 belongs to classification P with sample object and is designated as P d=1, belong to classification Q and be designated as P d=0, with P dAs dependent variable, all data U corresponding to main community set U that step 3 obtains VzAs independent variable (z=1,2 ..., B, v=1,2 ..., (g+h)), the sample object in each clustering cluster that respectively step 1 is obtained is carried out the Logistic regression modeling, obtains regression coefficient β corresponding to each main attribute z, make up k Logistic regression model.
The form of described Logistic regression model is:
ln ( P v 1 - P v ) = β 1 U v 1 + β 2 U v 2 + . . . + β B U vB ( v = 1 , . . . , g + h ) , P wherein vIt is the probability that v sample object belongs to classification P in j the clustering cluster.
Wherein the concrete building process of the Logistic regression model of j clustering cluster is as follows:
Step 4.1 obtains the conditional probability function whether v sample belong to classification P and is:
P v ( P d | U v 1 , U v 2 , . . . , U vB ) = exp ( Σ z = 1 B β z U vz ) 1 + exp ( Σ z = 1 B β z U vz )
Step 4.2, the conditional probability function according to each sample object of step 4.1 acquisition calculates the maximum likelihood function of clustering cluster j, and asks logarithm, is converted to log-likelihood function:
l ( P , U ; β 1 , . . . , β B ) = Σ v = 1 g + h ( P d Σ z = 1 B β z U vz ) - Σ v = 1 g + h log [ 1 + Σ z = 1 B β z U vz ]
Step 4.3 is to each β in the log-likelihood function of step 4.2 acquisition zAsk local derviation, and make partial derivative equal 0, obtain B log-likelihood equation, form the log-likelihood system of equations, the solving equation group obtains independent variable U VzCorresponding regression coefficient estimated value β z, set up the Logistic regression model.
Step 5, Logistic regression model according to step 4 acquisition, calculating the probability that sample object in each clustering cluster belongs to classification P is the ill probability of each sample object, and the sample object in each clustering cluster is divided R according to age bracket and sex organize, calculate the relative incidence probability RR that respectively organizes sample object of each clustering cluster.Obtain crowd's relative incidence probability threshold value by the ROC curve, the size of more individual relative incidence probability RR value and threshold value then, and in conjunction with the metabolic syndrome decision method, judge the risk status of each not ill sample object.Risk status corresponding to not ill sample is divided into four grades: devoid of risk, low-risk, risk and excessive risk.
The P of the relative incidence probability RR=sample v of described v sample object/baseline incidence rate.Wherein P is the Logistic regression model corresponding according to this sample object, and the sample v that calculates belongs to the probability of classification P; The baseline incidence rate is with r(r=1 ..., R) data mean value of the corresponding determinant attribute set of all sample object U is brought the probable value that corresponding Logistic regression model calculates in the group.Wherein each clustering cluster obtains R baseline incidence rate value.
Described ROC curve is as threshold value with the different value among each sample object relative incidence probability RR, each sample object is divided into ill P ' and not ill Q ' two classes, and with this sample object in step 2.1 according to the contrast of diabetes mellitus in China association criterion result of determination, with this sample object belong to simultaneously P and P ' as confirming ill object, with this sample object belong to simultaneously Q and Q ' as getting rid of ill object.Calculate sensitivity and the specificity of all sample object, and be ordinate with sensitivity, (1-specificity) is horizontal ordinate, the curve of drawing out, and choose [sensitivity+(1-specificity)] maximum RR value as the threshold value of relative incidence probability.
Wherein, the ill object number of described sensitivity=affirmation/the belong to object number of classification P; The ill object number of specificity=eliminating/the belong to object number of classification Q.
Beneficial effect
Than rough set method, genetic algorithm, the present invention adopts the time complexity of determinant attribute extractive technique of EM cluster and C4.5 classification low.
Than Relief method, principal component analytical method, the determinant attribute extractive technique that the present invention adopts when guaranteeing high-accuracy, the effect degree of throughput fractional analysis method evaluation determinant attribute.
Than Finland's diabetes risk point system, Multiple-Factor Model method, the risk status decision method that the determinant attribute that the present invention adopts is extracted and Logistic returns adopts the Interventional attribute mostly, have the accuracy rate height, divide the detailed characteristics of risk status, and judge in conjunction with prediabetes symptoms such as metabolic syndromes, so that judge that degree of risk is more accurate.
Diabetes risk appraisal procedure than single factor weighted score method, HCI, the present invention has the different qualities problem by the EM cluster with different crowd and takes into account, for the different determinant attributes of different crowd feature extraction, and by medical personnel use proof, have universality and practicality.
Description of drawings
Fig. 1 is risk status resolution principle figure of the present invention;
Fig. 2 is that 17946 sliver transvers section data source determinant attributes are extracted process flow diagram in the embodiment;
Fig. 3 carries out the EM cluster to 17946 sliver transvers section data in the embodiment, and clustering cluster is the as a result figure of 3 o'clock first clustering cluster structure decision trees;
Fig. 4 is relative incidence probability ROC curve map in the embodiment.
Embodiment
Be described in further details objects and advantages of the present invention below in conjunction with the embodiment of drawings and Examples to the inventive method in order better to illustrate.
To follow up a case by regular visits to the return visit data as input from Xicheng District of Beijing and Haidian District scientific research institutions 17946 people's chester sampling transversal section data, 59839 sliver transvers section data and 2288 01-07 of the Chinese Academy of Sciences February calendar year 2001 to September, design and dispose 3 tests: (1) is for the main attributes extraction test that affects the diabetes B morbidity of 17946 sliver transvers section data respectively; (2) judge the feasibility test for the risk status of 59839 sliver transvers section data; (3) judge validity test for the risk status of 2288 follow up data.
The below will describe one by one to above-mentioned 3 testing processs, and all tests are all finished at same computer, and concrete configuration is: Intel double-core CPU(dominant frequency 1.8G), and 1G internal memory, WindowsXP SP3 operating system.
In above-mentioned 1 test, all use the determinant attribute extracting method of EM cluster and C4.5 classification.Determinant attribute extracting method flow process as shown in Figure 1.
In addition, in above-mentioned 2,3 tests, use mixing probability distribution and the Logistic regression model of identical EM cluster.Risk status decision method flow process as shown in Figure 2.
1. test for the determinant attribute extracting method of 17946 sliver transvers section data
This data source comes from February calendar year 2001 to September from Xicheng District of Beijing and the 17946 people's chester sampling transversal section health surveies of Haidian District scientific research institutions, comprises 101 dimension attributes.Through after the pre-service, filter out 13781 data of 67 dimensions and consist of the test figure source.Data source is treated to 5 parts, total data set, women's data acquisition, male sex's data acquisition, the family history data acquisition is arranged and without the family history data acquisition.EM cluster test objective makes the effect of cluster best by the number that arranges bunch, and log-likelihood ratio is minimum.This test is chosen to be 3 and 4 to bunch number of EM cluster.Then the EM clustering algorithm is applied on 5 kinds of different data sources, corresponding data source is all gathered into 3 and 4 class crowds, namely totally 10 groups of cluster tests obtain 5*3+5*4=35 crowd, and data source is divided as shown in Figure 3; Then, utilize blood sugar threshold value (6.1mmolL -1, 5.85mmolL -1, 5.6mmolL -1, 5.26mmolL -1) 35 later crowds of cluster are carried out state demarcate, be ' Y ' greater than the demarcation of threshold value, all the other are demarcated is ' N '; Utilize the C4.5 algorithm to carry out classification based training to demarcating later data, can obtain 35*4=140 categorised decision tree.At last, by the statistics to decision tree information, obtain and the maximally related community set of type ii diabetes.Every class data source is extracted the determinant attribute idiographic flow:
Step 1, arrange cluster bunch number be 3 or 4.
Step 2 is carried out the EM cluster according to the cluster number that arranges.
Step 3 utilizes the blood sugar threshold value (such as 6.1mmolL -1) nominal data, will demarcate the tag along sort variable of variable as C4.5.3 of obtaining after the cluster or 4 data sources are carried out the C4.5 classification, obtain the decision tree in corresponding data source, obtain the form of decision tree as shown in Figure 3.
Step 4, the number of times of the attribute of each layer appearance in the statistical decision tree, and calculate number of plies coefficient corresponding to each attribute, obtain the effect degree of attribute.With each attribute effect degree ordering, extract front 9 attributes as determinant attribute.
2. judge the feasibility test for the risk status of 59839 sliver transvers section data
Should data from each city cumulative data of the whole nation, through after the pre-service, filter out 59839 data of 14 dimensions and consist of the test data source.Data source adopts the main attribute of 9 dimensions that extracts as input attributes, adopts the EM cluster that 59839 are divided in 3 different clustering cluster.The structure that returns for the Logistic of each clustering cluster, the probability that enters that the stepping probability is set is 0.05, probability of erasure is 0.1, maximum iteration time is 50, whether the mode input dependent variable is for ill, independent variable is 9 dimension determinant attributes, and needing calculating parameter is estimates of parameters, standard error and the Wald test value of each attribute.Same parameter setting is adopted in test 3, below Ao Shu no longer.
This Logistic regression model of testing different clustering cluster calculates individual relative incidence probability, and judges the risk status judgement that realizes the individual in conjunction with all kinds of syndromes.By this data source is judged, and add up each risk status proportion, embody result of determination in the rationality of each risk status proportion and the feasibility of this decision method.Concrete risk status determination flow is:
Step 1, according to 9 dimension determinant attributes, setting EM initial clustering number is 3, carries out the EM cluster.Data source is divided in 3 clustering cluster.
Whether step 2 is carried out the logistic regression training to the data source in 3 clustering cluster respectively, will be ill as dependent variable, and 9 dimension determinant attribute corresponding datas obtain 3 logistic regression models as independent variable.
Step 3 is calculated respectively individual relative incidence probability in the corresponding data source according to 3 logistic regression models, and to choose suitable judging point by the ROC curve be 2.2, ROC curve map such as Fig. 4.
Step 4 by more individual relative incidence probable value and the size of judging point value, in conjunction with metabolic syndrome and higher-dimension crowd criterion, is judged individual onset risk state.
3. judge validity test for the risk status of 2288 follow up data
This data source is 7 years follow up data of the 01-07 Chinese Academy of Sciences, through pre-service, keeps 14 dimension determinant attributes, totally 2288.Data source is by judging individual onset risk state with test 2 identical decision methods.The mixing probability distribution of EM cluster is all identical with test 2 with the Logistic regression model.
This test is judged by 01 year and 07 annual data source being carried out risk status respectively, and add up the number of crowd under each risk status state, and lower 01 year of each risk status state not morbidity and in the time of 07 year ill crowd's number, because crowd's P that the onset risk state is high can be large, so be somebody's turn to do test by the rationality of morbidity result and risk status Degree of Accord Relation under the proportionate relationship checking risk status of the two.
Test result
For test (1), table 1 has been listed the number of times that each attribute occurs in the determinant attribute extracting method that the present invention carries in decision tree.
Each attribute of table 1 occurrence number statistical form in decision tree
Figure BDA00002343787000121
By the above results as can be known, on nine larger important risk factors of type ii diabetes morbidity impact be: blood sugar, age, high-density lipoprotein (HDL), systolic pressure, diastolic pressure, cholesterol, physique coefficient, abdominal circumference, triglyceride.Clearly, in the experiment test of total data and different sexes, all play significantly than the prior effect of other factors.This result and clinical medicine empirical evidence, 9 dimension attributes that propose are the important determinant attribute that affects the diabetes B morbidity.
For test (2), table 2 is listed the reasonableness testing of risk status decision method proposed by the invention, embody respectively the distribution proportion of crowd under the different risk status in the table, among the not ill crowd, along with the onset risk state is higher, the number ratio is fewer, and each initiation potential state proportion relatively meets crowd characteristic.The ratio that wherein data under the high-risk risk account for whole crowd in the risk status result of determination is higher than medical science general knowledge, mainly be owing to the reason that gathers the cross-sectional data source, some is the crowd who suffers from metabolic syndrome in this data source, cause the shared ratio of the data of this part initiation potential state higher, prove on the whole rationality of the present invention and feasibility.
Table 2 different risk status crowds distribute
Figure BDA00002343787000122
For test (3), table 3 is listed the validity test of risk status decision method proposed by the invention, be the number of crowd under each risk status in the table, and lower 01 year of each risk status not morbidity and in the time of 07 year ill crowd's number and ratio.
7 years sequela numbers of the different risk status crowds of table 3 distribute
Figure BDA00002343787000123
The crowd that the onset risk state is higher is also higher at 7 years later incidences of disease; The lower crowd of risk status in the time of 01 year, just fewer in the ratio of morbidity in 07 year, on the contrary just higher; In each crowd, the trend that the distribution of each risk status tapers off, and also crowd's ratio of each risk status and actual crowd's distribution is also consistent, proves that the risk status decision algorithm is effective.
The present invention by 9 determinant attributes extracting, establishes the determinant attribute that affects the diabetes B morbidity in the determinant attribute extracting method.By checking and different risk status N sequela distribution checking that different risk status crowds are distributed, prove rationality and the validity of risk status decision method.

Claims (5)

1. diabetes B risk status decision method is characterized in that: may further comprise the steps:
Step 1, as sample set S, wherein each object comprises M determinant attribute that affects the diabetes B morbidity with N evaluation object; As row, property value corresponding to different samples set up the matrix representation of sample set S as row with determinant attribute; Adopt the EM clustering method that sample set S is carried out cluster, obtain k clustering cluster; Concrete methods of realizing is:
Whether step 1.1 contains missing data with N object basis and is divided into complete data collection X and missing data collection Y;
Step 1.2 arranges clustering cluster number k, and the initial mixing probability distribution parameters estimated value of sample set S
Figure FDA00002343786900011
θ jThe probability distribution parameters that represents j clustering cluster, π jRepresent i sample n iFrom the possibility of j clustering cluster, j=1,2 ..., k, π 1+ π 2+ ...+π k=1; K≤N, 1≤i≤N;
Step 1.3, the initial mixing probability distribution parameters estimated value Θ that step 1.2 is given 0Substitution missing data collection Y obtains y CbThe posteriority distribution probability be:
p ( y cb | x ab , Θ 0 ) = π k 0 p y cb ( x ab | θ k 0 ) Σ k = 1 M π k 0 p k ( x ab | θ k 0 )
X wherein Ab∈ X, y Cb∈ Y, a represent the complete data sample object, and b represents data corresponding to each attribute of complete data sample, and c represents the missing data sample object, b=1, and 2 ..., M;
The posteriority distribution function of data set Y is:
p ( Y | X , Θ 0 ) = Π b = 1 M p ( y cb | x ab , Θ 0 ) ;
Step 1.4, the posteriority distribution function that obtains according to step 1.3, to the log-likelihood function ln L of complete data (Θ | X, Y) ask expectation, obtain M expectation value, with its each missing data that replaces corresponding columns among the missing data collection Y, obtain new samples collection Y ';
Described ln L ( Θ | X , Y ) = ln p ( x , y | Θ ) = Σ b = 1 M ln p ( x ab | y cb ) p ( y cb ) ;
Step 1.5 is calculated sample set Y ' and the complete data collection X that obtains according to step 1.4, calculates maximum likelihood parameter Q (Θ, the Θ of sample set S 0):
Q(Θ,Θ 0)=∑ln(L(Θ|X,Y)p(Y|X,Θ 0))
Step 1.6 is with Q (Θ, Θ 0) maximization, obtain Θ 1, satisfy Q (Θ 1, Θ 0)=max Q (Θ, Θ 0), use Θ 1Replace Θ 0, substitution step 1.3;
Step 1.7, execution in step 1.3 be to the iterative process of step 1.6 α time, until || Q (Θ α+1, Θ α)-Q (Θ α, Θ α-1) ||<ε stops iteration; The Θ that obtains αMixing probability distribution parameters estimated value Θ as k clustering cluster;
Step 1.8 is utilized and is mixed probability distribution parameters estimated value Θ, calculates respectively the posteriority conditional probability density value that each sample object belongs to clustering cluster j; According to the principle that is subordinate to maximum probability of sample and clustering cluster, each sample among the sample set S is divided into k clustering cluster;
Step 2, each clustering cluster that step 1 is obtained adopts respectively the C4.5 sorting technique to carry out the decision tree training, obtains k decision tree;
Step 3, for each attribute, according to its number of plies that in k the decision tree that step 2 obtains, occurs, and the number of times of this layer appearance with, calculate the number of plies coefficient of this attribute;
For sample set S, the number of plies coefficient L of b Column Properties bFor:
L b = Σ j = 1 k Σ w = 1 λ + 1 ( 1 2 w t w ) Σ w = 1 λ + 1 t w
T wherein wBeing illustrated in the w layer has occurred t time;
With the effect degree of number of plies coefficient as each properties affect diabetes B morbidity of measurement, and choose L bB the attribute of δ be as the main community set U (U of impact morbidity 1, U 2..., U B), 1<B<M wherein;
Step 4 belongs to classification P with sample object and is designated as P d=1, belong to classification Q and be designated as P d=0, with P dAs dependent variable, all data U corresponding to main community set U that step 3 obtains VzAs independent variable, v=1,2 ..., (g+h), z=1,2 ..., B carries out the Logistic regression modeling to the sample object in each clustering cluster of step 1 acquisition respectively, obtains regression coefficient β corresponding to each main attribute z, make up k Logistic regression model;
Step 5, Logistic regression model according to step 4 acquisition, the sample object of calculating in each clustering cluster belongs to the probability of classification P, and the sample object in each clustering cluster is divided R according to age bracket and sex organize, and calculates the relative incidence probability RR that respectively organizes sample object of each clustering cluster; Obtain crowd's relative incidence probability threshold value by the ROC curve, the size of more individual RR value and threshold value then, and in conjunction with the metabolic syndrome decision method, judge the risk status of each not ill sample object.Risk status corresponding to not ill sample is divided into four grades: devoid of risk, low-risk, risk and excessive risk;
Described relative incidence probability threshold value is [sensitivity+(1-specificity)] maximum RR value;
The P of the relative incidence probability RR=sample v of described v sample object/baseline incidence rate; Wherein P is the Logistic regression model corresponding according to this sample object, and the sample v that calculates belongs to the probability of classification P; The baseline incidence rate is for bringing the data mean value of the corresponding determinant attribute set of all sample object U in the r group into probable value that corresponding Logistic regression model calculates; R=1 wherein ..., R, each clustering cluster obtains R baseline incidence rate value.
2. a kind of diabetes B risk status decision method according to claim 1 is characterized in that: described complete data integrate X as data corresponding to M determinant attribute all without the object set of disappearance, its matrix representation is [x Ab]; Described missing data integrates Y as the object set that has one or more disappearances in data corresponding to M determinant attribute, and its matrix representation is [y Cb].
3. a kind of diabetes B risk status decision method according to claim 1, it is characterized in that: in the described step 2, the concrete building process of the decision tree of j clustering cluster is:
Step 2.1, the diabetes B decision criteria according to diabetes mellitus in China association proposes is divided into ill classification P and not ill classification Q with the object among the clustering cluster j, and wherein P comprises g sample object, and Q comprises h sample object;
Step 2.2, calculate the quantity of information that all objects belong to respectively classification P and Q and be:
Info ( j ) = Info ( P , Q ) = - ( g g + h * log g g + h + h g + h * log h g + h ) ;
Step 2.3 is chosen a data value A in b Column Properties data β, with property value in these row more than or equal to A βSample object be divided to subclass S 11, less than A βSample object be divided to subclass S 12S wherein 11Comprise e 1The individual sample object that belongs to classification P, f 1The individual sample object that belongs to classification Q; S 12Comprise e 2The individual sample object that belongs to classification P, f 2The individual sample object that belongs to classification Q; The quantity of information of two subclasses is:
Info ( A β , j ) = Σ o = 1 2 e o + f o g + h * Info ( P , Q ) ;
Step 2.4, with each property value in the b row respectively as A β, obtain the corresponding quantity of information of each property value by the described method of step 2.3, and try to achieve the information gain value of each property value:
Gain(A β,j)=Info(j)-Info(A β,j)
Step 2.5 according to the method for step 2.3 to step 2.4, each row is asked for the information gain value of each property value, and the attribute that maximal value in the information gain value that obtains is corresponding is as the root node of decision tree; Will be according to A corresponding to maximal value in the information gain value βThe subclass S that divides 11And S 12As the data set of dividing next node layer;
Step 2.6 is according to the data set S that marks off 11And S 12In each sample object belong to respectively the number of classification P or classification Q, computational data collection S respectively 11And S 12Quantity of information Info (S 11) and Info (S 12);
Step 2.7, is carried out λ time and is divided to the described method of step 2.6 according to step 2.3, belongs to same classification or all properties and all divides completely until be divided all sample object of node, and then Stop node is divided, and obtains (λ+1) layer decision tree.
4. a kind of diabetes B risk status decision method according to claim 1, it is characterized in that: the form of described Logistic regression model is:
ln ( P v 1 - P v ) = β 1 U v 1 + β 2 U v 2 + . . . + β B U vB ( v = 1 , . . . , g + h ) , P wherein vIt is the probability that v sample object belongs to classification P in j the clustering cluster;
Wherein the concrete building process of the Logistic regression model of j clustering cluster is:
The conditional probability function that step 4.1, v sample belong to classification P is:
P v ( P d | U v 1 , U v 2 , . . . , U vB ) = exp ( Σ z = 1 B β z U vz ) 1 + exp ( Σ z = 1 B β z U vz )
Step 4.2, the conditional probability function according to each sample object of step 4.1 acquisition, calculate the maximum likelihood function of clustering cluster j, and be converted to log-likelihood function:
l ( P , U ; β 1 , . . . , β B ) = Σ v = 1 g + h ( P d Σ z = 1 B β z U vz ) - Σ v = 1 g + h log [ 1 + Σ z = 1 B β z U vz ]
Step 4.3 is to each β in the log-likelihood function of step 4.2 acquisition zAsk local derviation; Make partial derivative equal 0, obtain B log-likelihood equation, form the log-likelihood system of equations; The solving equation group obtains independent variable U VzCorresponding regression coefficient estimated value β z, set up the Logistic regression model.
5. a kind of diabetes B risk status decision method according to claim 1, it is characterized in that: described ROC curve for relative incidence probability that each sample object is different as threshold value, each sample object is divided into ill class P ' and not ill class Q ', and ill class P, the not ill class Q contrast that obtains with diabetes B decision criteria according to diabetes mellitus in China association, the sample object that belongs to simultaneously P and P ' as confirming ill object, will be belonged to the sample object of Q and Q ' simultaneously as getting rid of ill object; Calculate sensitivity and the specificity of all sample object, and be ordinate with sensitivity, (1-specificity) is horizontal ordinate, the curve of drawing out; Wherein, the ill object number of sensitivity=affirmation/the belong to object number of classification P; The ill object number of specificity=eliminating/the belong to object number of classification Q.
CN 201210431592 2012-11-01 2012-11-01 Method for judging 2 type diabetes mellitus risk state Pending CN102930163A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201210431592 CN102930163A (en) 2012-11-01 2012-11-01 Method for judging 2 type diabetes mellitus risk state

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201210431592 CN102930163A (en) 2012-11-01 2012-11-01 Method for judging 2 type diabetes mellitus risk state

Publications (1)

Publication Number Publication Date
CN102930163A true CN102930163A (en) 2013-02-13

Family

ID=47644960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201210431592 Pending CN102930163A (en) 2012-11-01 2012-11-01 Method for judging 2 type diabetes mellitus risk state

Country Status (1)

Country Link
CN (1) CN102930163A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198211A (en) * 2013-03-08 2013-07-10 北京理工大学 Quantitative analysis method for influences of attack risk factors of type 2 diabetes on blood sugar
CN106202968A (en) * 2016-07-28 2016-12-07 北京博源兴康科技有限公司 The data analysing method of cancer and device
CN106682434A (en) * 2016-12-30 2017-05-17 深圳中科金证科技有限公司 Method and device for assessing disease risk
CN107169284A (en) * 2017-05-12 2017-09-15 北京理工大学 A kind of biomedical determinant attribute system of selection
CN107194138A (en) * 2016-01-31 2017-09-22 青岛睿帮信息技术有限公司 A kind of fasting blood-glucose Forecasting Methodology based on physical examination data modeling
CN107358047A (en) * 2017-07-13 2017-11-17 刘峰 Diabetic assesses and management system
CN107491656A (en) * 2017-09-04 2017-12-19 北京航空航天大学 A kind of Effect of pregnancy outcome factor appraisal procedure based on relative risk decision-tree model
CN108831553A (en) * 2018-04-09 2018-11-16 平安科技(深圳)有限公司 Electronic device, nasopharyngeal carcinoma risk method for early warning and computer readable storage medium
CN109065168A (en) * 2018-08-29 2018-12-21 昆明理工大学 A method of disease risks assessment is carried out based on space-time class statistic
CN109145869A (en) * 2018-09-11 2019-01-04 河南科技大学 The modeling of animal movement behavior classification and method of discrimination, device based on binary tree
WO2019019491A1 (en) * 2017-07-27 2019-01-31 长桑医疗(海南)有限公司 Method and system for detecting blood oxygen saturation
CN110367991A (en) * 2019-06-27 2019-10-25 东南大学 A kind of Falls in Old People methods of risk assessment
CN110473627A (en) * 2019-06-21 2019-11-19 四川大学 A kind of Adaptive Neural-fuzzy Inference diabetes prediction algorithm based on cost-sensitive
CN111261244A (en) * 2020-01-19 2020-06-09 戴纳智慧医疗科技有限公司 Sample information acquisition and storage system and method
WO2020211592A1 (en) * 2019-04-18 2020-10-22 岭南师范学院 Diabetes risk early-warning system
CN112086194A (en) * 2020-09-14 2020-12-15 东南大学附属中大医院 New-onset type 2diabetes scoring prediction system
CN112562846A (en) * 2019-09-25 2021-03-26 中国联合网络通信集团有限公司 Animal disease diagnosis device
CN112768076A (en) * 2021-02-01 2021-05-07 华中科技大学同济医学院附属协和医院 Method for predicting risk of bone marrow suppression of esophageal cancer chemotherapy
CN116452559A (en) * 2023-04-19 2023-07-18 深圳市睿法生物科技有限公司 Tumor focus positioning method and device based on ctDNA fragmentation mode

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198211A (en) * 2013-03-08 2013-07-10 北京理工大学 Quantitative analysis method for influences of attack risk factors of type 2 diabetes on blood sugar
CN103198211B (en) * 2013-03-08 2017-02-22 北京理工大学 Quantitative analysis method for influences of attack risk factors of type 2 diabetes on blood sugar
CN107194138A (en) * 2016-01-31 2017-09-22 青岛睿帮信息技术有限公司 A kind of fasting blood-glucose Forecasting Methodology based on physical examination data modeling
CN106202968A (en) * 2016-07-28 2016-12-07 北京博源兴康科技有限公司 The data analysing method of cancer and device
CN106682434A (en) * 2016-12-30 2017-05-17 深圳中科金证科技有限公司 Method and device for assessing disease risk
CN107169284A (en) * 2017-05-12 2017-09-15 北京理工大学 A kind of biomedical determinant attribute system of selection
CN107358047A (en) * 2017-07-13 2017-11-17 刘峰 Diabetic assesses and management system
US11504034B2 (en) 2017-07-27 2022-11-22 Vita-Course Digital Technologies (Tsingtao) Co., Ltd. Systems and methods for determining blood pressure of a subject
US11207008B2 (en) 2017-07-27 2021-12-28 Vita-Course Technologies (Hainan) Co., Ltd. Method and system for detecting the oxygen saturation within the blood
WO2019019491A1 (en) * 2017-07-27 2019-01-31 长桑医疗(海南)有限公司 Method and system for detecting blood oxygen saturation
CN107491656A (en) * 2017-09-04 2017-12-19 北京航空航天大学 A kind of Effect of pregnancy outcome factor appraisal procedure based on relative risk decision-tree model
CN107491656B (en) * 2017-09-04 2020-01-14 北京航空航天大学 Pregnancy outcome influence factor evaluation method based on relative risk decision tree model
CN108831553A (en) * 2018-04-09 2018-11-16 平安科技(深圳)有限公司 Electronic device, nasopharyngeal carcinoma risk method for early warning and computer readable storage medium
WO2019196299A1 (en) * 2018-04-09 2019-10-17 平安科技(深圳)有限公司 Electronic device, nasopharyngeal cancer risk warning method and computer readable storage medium
CN109065168B (en) * 2018-08-29 2021-09-14 昆明理工大学 Method for evaluating disease risk based on spatio-temporal clustering statistics
CN109065168A (en) * 2018-08-29 2018-12-21 昆明理工大学 A method of disease risks assessment is carried out based on space-time class statistic
CN109145869A (en) * 2018-09-11 2019-01-04 河南科技大学 The modeling of animal movement behavior classification and method of discrimination, device based on binary tree
WO2020211592A1 (en) * 2019-04-18 2020-10-22 岭南师范学院 Diabetes risk early-warning system
CN110473627A (en) * 2019-06-21 2019-11-19 四川大学 A kind of Adaptive Neural-fuzzy Inference diabetes prediction algorithm based on cost-sensitive
CN110367991A (en) * 2019-06-27 2019-10-25 东南大学 A kind of Falls in Old People methods of risk assessment
CN110367991B (en) * 2019-06-27 2022-03-08 东南大学 Old people falling risk assessment method
CN112562846A (en) * 2019-09-25 2021-03-26 中国联合网络通信集团有限公司 Animal disease diagnosis device
CN111261244A (en) * 2020-01-19 2020-06-09 戴纳智慧医疗科技有限公司 Sample information acquisition and storage system and method
CN112086194A (en) * 2020-09-14 2020-12-15 东南大学附属中大医院 New-onset type 2diabetes scoring prediction system
CN112768076A (en) * 2021-02-01 2021-05-07 华中科技大学同济医学院附属协和医院 Method for predicting risk of bone marrow suppression of esophageal cancer chemotherapy
CN112768076B (en) * 2021-02-01 2023-11-21 华中科技大学同济医学院附属协和医院 Method for constructing risk prediction model for bone marrow suppression of esophageal cancer chemotherapy
CN116452559A (en) * 2023-04-19 2023-07-18 深圳市睿法生物科技有限公司 Tumor focus positioning method and device based on ctDNA fragmentation mode
CN116452559B (en) * 2023-04-19 2024-02-20 深圳市睿法生物科技有限公司 Tumor focus positioning method and device based on ctDNA fragmentation mode

Similar Documents

Publication Publication Date Title
CN102930163A (en) Method for judging 2 type diabetes mellitus risk state
CN101916334B (en) A kind of skin Forecasting Methodology and prognoses system thereof
CN113239279B (en) Chronic disease medical data acquisition, analysis and management method and cloud platform
CN113838577B (en) Convenient layered old people MODS early death risk assessment model, device and establishment method
CN105868532B (en) A kind of method and system of intelligent evaluation heart aging degree
CN113593708A (en) Sepsis prognosis prediction method based on integrated learning algorithm
Sun et al. Applying machine learning algorithms to electronic health records to predict pneumonia after respiratory tract infection
CN115099331A (en) Auxiliary diagnosis system for malignant pleural effusion based on interpretable machine learning algorithm
Tang et al. Back propagation artificial neural network for community Alzheimer's disease screening in China★
CN108877943A (en) Type-2 diabetes mellitus risk evaluation model based on evidence-based medical
CN115527608A (en) Intestinal age prediction method and system
CN114943629A (en) Health management and health care service system and health management method thereof
CN114628033A (en) Disease risk prediction method, device, equipment and storage medium
CN116864062B (en) Health physical examination report data analysis management system based on Internet
CN112768074A (en) Artificial intelligence-based serious disease risk prediction method and system
CN115083616B (en) Chronic nephropathy subtype mining system based on self-supervision graph clustering
CN117116475A (en) Method, system, terminal and storage medium for predicting risk of ischemic cerebral apoplexy
Salaffi et al. Classical test theory and Rasch analysis validation of the Recent-Onset Arthritis Disability questionnaire in rheumatoid arthritis patients
Dronavalli et al. Determinants and health outcomes of trajectories of social mobility in Australia
CN114464319B (en) AMS susceptibility assessment system based on slow feature analysis and deep neural network
CN114141360A (en) Breast cancer prediction method based on punished COX regression
CN112102948A (en) Big data cardiovascular disease risk monitoring system
Chen et al. Decision Tree and Hierarchical Cluster Analysis based on Leigh Solves the Problem of Intelligent Diagnosis of Alzheimer's Disease by Time Series Model
Kauppala Developmental trajectories of height, weight and BMI across childhood: Bayesian hierarchical modeling of longitudinal data
CN111430032B (en) Old people disease modeling method based on APC model and genetic clustering algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130213