CN107944479B - Disease prediction model establishing method and device based on semi-supervised learning - Google Patents

Disease prediction model establishing method and device based on semi-supervised learning Download PDF

Info

Publication number
CN107944479B
CN107944479B CN201711135644.1A CN201711135644A CN107944479B CN 107944479 B CN107944479 B CN 107944479B CN 201711135644 A CN201711135644 A CN 201711135644A CN 107944479 B CN107944479 B CN 107944479B
Authority
CN
China
Prior art keywords
data
label
result
classification model
semi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711135644.1A
Other languages
Chinese (zh)
Other versions
CN107944479A (en
Inventor
王宏志
宋扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201711135644.1A priority Critical patent/CN107944479B/en
Publication of CN107944479A publication Critical patent/CN107944479A/en
Application granted granted Critical
Publication of CN107944479B publication Critical patent/CN107944479B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to a disease prediction model building method and device based on semi-supervised learning, which comprises the following steps: classifying the labeled data to obtain a basic classification model of the labeled data; selecting part of non-label data; classifying the selected part of non-label data by a clustering method, marking the selected part of non-label data by using the basic classification model, obtaining a marking result of the non-label data according to a clustering result and a prediction result of the non-label data, combining the marking result with the labeled data for classification to obtain an updated basic classification model, continuously selecting part of non-label data from the rest non-label data for modeling again, and iterating until all the non-label data are processed to obtain a final classification model. The invention models the label-free data, specifically combines a labeled classification method and a label-free clustering method, and improves the prediction precision in an iteration mode, thereby better improving the model prediction precision.

Description

Disease prediction model establishing method and device based on semi-supervised learning
Technical Field
The invention relates to the field of data processing, in particular to a disease prediction model building method and device based on semi-supervised learning, and a disease prediction method and device based on semi-supervised learning.
Background
Disease prediction is a very important subject at present, and a prediction model is obtained by analyzing medical data, so that disease data can be better utilized, and doctors and individuals can be helped to judge diseases. The data modeling method adopted at present is mainly a supervised learning method, namely, data modeling is carried out according to a known use case, and the model is utilized to mark unmarked data. However, the supervised learning method generally performs data modeling on labeled data, but the effective data amount is very limited, and the number of massive label-free data is huge, so that many data models do not fit data well or even over-fit data.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a disease prediction model establishing method and device based on semi-supervised learning, which utilizes a semi-supervised learning method to model unlabelled data, combines a labeled classification method and an unlabelled clustering method, performs adjustment according to data classification results, and improves prediction accuracy through an iterative manner, aiming at the above defects in the prior art.
In order to solve the above technical problem, a first aspect of the present invention provides a disease prediction model building method based on semi-supervised learning, including the following steps:
s1, classifying the labeled data to obtain a basic classification model of the labeled data;
s2, selecting part of unlabeled data from the unlabeled data;
s3, classifying the part of the unlabeled data selected in the step S2 by a clustering method to obtain a clustering result M of the unlabeled data1And marking the part of the unlabeled data selected in the step S2 by using the basic classification model to obtain a prediction result T1(ii) a Clustering result M according to the label-free data1And predicted result T1Obtaining a marking result C of the label-free data;
s4, combining the labeling result C of the non-label data and the labeled data for classification to obtain an updated basic classification model, turning to the step S2, continuing to select part of non-label data from the rest non-label data to execute the steps S3 and S4, and iterating in the above way until all the non-label data are processed to obtain the final classification model.
Preferably, in step S2, if q is greater than q2Far greater than q1Wherein q is1Total amount of data for tagged data, q2The quantity of the selected part of the non-label data is a multiplied by q2And a is more than or equal to 15% and less than or equal to 25%, otherwise, the quantity of the selected part of the non-label data is bxq1And b is more than or equal to 45 percent and less than or equal to 55 percent.
Preferably, in step S2, if q is greater than q2>10q1Then the selected number of the part of the non-label data is a × q2Wherein a is 20%; if q is1≤q2≤10q1Then the selected number of the part of the non-label data is b × q1And b is 50%.
Preferably, in the step S3, the labeling result C of the unlabeled data is calculated by using the following linear formula:
C=αT1+βM1
wherein alpha and beta are classification coefficients; alpha 50% q1/(q1+q2),β=q1/(q1+q2)。
Preferably, the step S3 further includes: if C > 1.5q1/(q1+q2) The result C is marked with a value 1 indicating true, if C is less than or equal to 1.5q1/(q1+q2) Then result C is flagged as a value of 0 indicating false.
Preferably, in step S1, the labeled data is classified by any one of the following classification methods: neural networks, naive bayes, or multivariate linear regression analysis methods.
Preferably, the clustering method used in the step S3 is a K-means or hierarchical clustering method.
In a second aspect of the present invention, a disease prediction method based on semi-supervised learning is provided, wherein a final classification model established by the disease prediction model establishing method based on semi-supervised learning is adopted to process disease data to obtain a disease prediction result.
In a third aspect of the present invention, a disease prediction model building apparatus based on semi-supervised learning is provided, including:
the first processing unit is used for classifying the labeled data to obtain a basic classification model of the labeled data;
the second processing unit is used for selecting part of non-label data from the non-label data;
a third processing unit for classifying part of the non-label data selected by the second processing unit by a clustering method to obtain a clustering node of the non-label dataFruit M1And marking part of the unlabelled data selected by the second processing unit by using the basic classification model to obtain a prediction result T1(ii) a Clustering result M according to the label-free data1And predicted result T1Obtaining a marking result C of the label-free data;
and the fourth processing unit is used for combining the labeling result C of the non-label data and the labeled data for classification to obtain an updated basic classification model, then starting the second processing unit to continuously select part of the non-label data from the rest non-label data for modeling, and iterating until all the non-label data are processed to obtain a final classification model.
In a third aspect of the present invention, a disease prediction apparatus based on semi-supervised learning is provided, including: the disease prediction model building device based on semi-supervised learning is used for obtaining a final classification model; and the disease prediction unit is connected with the disease prediction unit and is used for processing the disease data by utilizing the final classification model to obtain a disease prediction result.
The implementation of the invention has the following beneficial effects: the invention utilizes a semi-supervised learning method to model unlabelled data, specifically combines a labeled classification method and an unlabelled clustering method, adjusts according to a data classification result, and improves the prediction precision in an iteration mode, thereby avoiding the situations of over-fitting or incomplete fitting caused by too little labeled data, and further better improving the model prediction precision.
Drawings
FIG. 1 is a flow chart of a method for building a semi-supervised learning based disease prediction model according to a preferred embodiment of the present invention;
FIG. 2 is a diagram illustrating a process of building a semi-supervised learning based disease prediction model according to a preferred embodiment of the present invention;
FIG. 3 is a block diagram of a semi-supervised learning based disease prediction model building apparatus according to the present invention;
FIG. 4 is a graph comparing the disease prediction effects of the conventional method and the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments obtained by a person skilled in the art without making any inventive step are within the scope of protection of the present invention.
The invention provides a disease prediction model establishing method and a disease prediction method based on semi-supervised learning, which can utilize unlabeled data and improve the accuracy of a prediction model in an iterative mode. Fig. 1 is a flowchart illustrating a method for building a semi-supervised learning based disease prediction model according to a preferred embodiment of the present invention. Please refer to fig. 2, which is a schematic diagram of a process for establishing a semi-supervised learning based disease prediction model according to a preferred embodiment of the present invention. As shown in fig. 1 and 2, the method for building a disease prediction model based on semi-supervised learning according to the embodiment includes the following steps:
first, in step S101, labeled data is classified to obtain a data classification model of the labeled data as a basic classification model, where the total amount of the labeled data is q1. Let the total amount of data of the unlabeled data be q2If the total amount of the tagged data and the untagged data is q ═ q1+q2. Wherein the data classification model is preferably a basic data classification model based on disease prediction. Preferably, the tagged data and the untagged data are both medical data, i.e. medical data of a certain disease, including but not limited to chronic diseases such as heart disease, hypertension, cancer and cardiovascular and cerebrovascular diseases. In this step, the labeled data is classified by any one of the following classification methods: neural networks, naive bayes, or multivariate linear regression analysis methods.
Subsequently, in step S102, part of the unlabeled data is selected from the whole unlabeled data for the subsequent modeling processing, that is, a certain amount of unlabeled data is selected each time for the subsequent modeling processing.
In a preferred embodiment of the invention, if q is2Far greater than q1I.e. q2>>q1Then the number of the part of the non-label data selected each time is a × q2And a is more than or equal to 15% and less than or equal to 25%, otherwise, the quantity of the part of non-label data selected each time is b multiplied by q1And b is more than or equal to 45 percent and less than or equal to 55 percent.
In general, the total amount of data of the non-labeled data is equal to or greater than the total amount of data of the labeled data, i.e. q2≥q1. Thus, in a more preferred embodiment of the invention, if q is2>10q1Then the number of said partial non-tag data is a × q2Wherein a is 20%; if q is1≤q2≤10q1The number of the partial non-label data is b × q1And b is 50%. That is, at 10q1As a far greater judgment criterion, when the total amount of data of the non-tag data is far greater than that of the tag data, the amount of data is selected to be 20% q each time2The label-free data of (2) is subjected to subsequent modeling processing. At q2≥q1And within 10 times, 50% q is taken each time1The label-free data of (2) is subjected to subsequent modeling processing. The proportion of selecting the non-label data each time is the optimal proportion obtained after a large number of experiments and experience summarization, and a better data modeling effect can be obtained.
Subsequently, in step S103, the part of the unlabeled data selected in step S2 is classified by a clustering method to obtain a clustering result M of the unlabeled data1. Preferably, the clustering method used in this step S103 is a K-means or hierarchical clustering method. Meanwhile, the basic classification model is utilized to mark the part of the non-label data selected in the step S102 to obtain a prediction result T1(ii) a Clustering result M according to label-free data1And predicted result T1And obtaining a marking result C of the non-label data.
Preferably, the labeling result C of the non-label data is calculated in this step S103 using the following linear formula (1):
C=αT1+βM1; (1)
wherein alpha and beta are classification coefficients; preferably, α is 50% q1/(q1+q2),β=q1/(q1+q2)。
The invention combines the classification method of the labeled data and the clustering method of the unlabeled data, fine adjustment is carried out according to the data classification result, and the final classification result is determined according to a certain proportion, thus obtaining the labeling result C.
The step S103 further includes: if C > 1.5q1/(q1+q2) If C is less than or equal to 1.5q, the marking result C takes the value of 1 which represents true1/(q1+q2) The value of the flag result C is 0 indicating false. The above clustering result M1And the predicted result T1And the labeling result C are both 0, 1 values.
Subsequently, in step S104, the labeling result C of the unlabeled data and the labeled data are combined for classification, so as to obtain an updated basic classification model. Combining the marking result C into the previous training data set to carry out model training, and obtaining an updated basic classification model.
Subsequently, in step S105, it is determined whether all the non-tag data are processed, if yes, step S106 is performed, otherwise step S102 is performed, and part of the non-tag data is continuously selected from the remaining non-tag data to perform steps S103 and S104, that is, the newly selected part of the non-tag data is classified by a clustering method to obtain a new clustering result M of the non-tag data1Meanwhile, the basic classification model updated in the step S104 is used for marking the newly selected part of the unlabeled data to obtain a new prediction result T1(ii) a The new labeling result C of the unlabeled data is calculated again using linear formula (1). Then, the new label result C of the non-label data and the label data (the label data includes the total amount of data q in step S101)1The original tagged data, and also the non-tagged data that was marked in the last iteration) are merged together for classification to obtain an updated basisAnd (5) classifying the models. And repeating the steps until all the non-label data are processed, so that all the non-label data are marked, and obtaining the final classification model. Preferably, the number of the non-tag data selected each time in step S102 is equal, until the number of the non-tag data remaining last is smaller than the number of the non-tag data that needs to be selected each time, all the remaining non-tag data are selected as data for the subsequent modeling processing.
Subsequently, in step S106, after the above iteration, all the unlabeled data are all labeled, resulting in a final classification model.
The invention also correspondingly provides a disease prediction method based on semi-supervised learning, which comprises the steps in the disease prediction model building method based on semi-supervised learning and the subsequent disease prediction step. In the disease prediction step, the disease data is processed by using the final classification model established by the disease prediction model establishing method based on semi-supervised learning to obtain a disease prediction result.
Please refer to fig. 3, which is a block diagram of a semi-supervised learning based disease prediction model building apparatus according to the present invention. As shown in fig. 3, the semi-supervised learning based disease prediction model creation apparatus 300 includes:
a first processing unit 301, configured to classify the labeled data to obtain a basic classification model of the labeled data, where a total amount of the labeled data is q1. Let the total amount of data of the unlabeled data be q2If the total amount of the tagged data and the untagged data is q ═ q1+q2. Preferably, the tagged data and the untagged data are both medical data, i.e. medical data of a certain disease, including but not limited to heart disease, cancer, cerebrovascular disease, etc. In this step, the labeled data is classified by any one of the following classification methods: neural networks, naive bayes, or multivariate linear regression analysis methods.
The second processing unit 302 is configured to select a part of the non-tag data from all the non-tag data for subsequent modeling processing, that is, select a certain amount of non-tag data each time for subsequent modeling processing.
In a preferred embodiment of the invention, if q is2Far greater than q1I.e. q2>>q1Then the number of the part of the non-label data selected each time is a × q2And a is more than or equal to 15% and less than or equal to 25%, otherwise, the quantity of the part of non-label data selected each time is b multiplied by q1And b is more than or equal to 45 percent and less than or equal to 55 percent.
In general, the total amount of data of the non-labeled data is equal to or greater than the total amount of data of the labeled data, i.e. q2≥q1. Thus, in a more preferred embodiment of the invention, if q is2>10q1Then the number of said partial non-tag data is a × q2Wherein a is 20%; if q is1≤q2≤10q1The number of the partial non-label data is b × q1And b is 50%. That is, at 10q1As a far greater judgment criterion, when the total amount of data of the non-tag data is far greater than that of the tag data, the amount of data is selected to be 20% q each time2The label-free data of (2) is subjected to subsequent modeling processing. At q2≥q1And within 10 times, 50% q is taken each time2The label-free data of (2) is subjected to subsequent modeling processing. The proportion of selecting the non-label data each time is the optimal proportion obtained after a large number of experiments and experience summarization, and a better data modeling effect can be obtained.
A third processing unit 303, configured to classify some of the unlabeled data selected by the second processing unit 302 by a clustering method to obtain a clustering result M of the unlabeled data1. Preferably, the clustering method used in this step S103 is a K-means or hierarchical clustering method. Meanwhile, the basic classification model is utilized to mark part of the non-label data selected by the second processing unit 302 to obtain a prediction result T1(ii) a Clustering result M according to the label-free data1And predicted result T1And obtaining a marking result C of the non-label data.
Preferably, the third processing unit 303 calculates the labeling result C of the non-tag data using the following linear formula (1):
C=αT1+βM1; (1)
wherein alpha and beta are classification coefficients; preferably, α is 50% q1/(q1+q2),β=q1/(q1+q2)。
The third processing unit 303 further performs the following operations: if C > 1.5q1/(q1+q2) If C is less than or equal to 1.5q, the marking result C takes the value of 1 which represents true1/(q1+q2) The value of the flag result C is 0 indicating false. The above clustering result M1And the predicted result T1And the labeling result C are both 0, 1 values.
The fourth processing unit 304 is configured to combine the labeling result C of the non-label data with the labeled data for classification to obtain an updated basic classification model, and restart the second processing unit 302 to continue to select a part of non-label data from the remaining non-label data for modeling, so as to iterate until all the non-label data are processed, and obtain a final classification model.
The invention also correspondingly provides a disease prediction device based on semi-supervised learning, which comprises: the semi-supervised learning based disease prediction model building apparatus 300 and the disease prediction unit connected thereto are as described above. The disease prediction model establishing device 300 based on semi-supervised learning is used for obtaining a final classification model, and the disease prediction unit is used for processing disease data by using the final classification model to obtain a disease prediction result.
The disease prediction effect of the common method and the method of the invention is compared through experiments. The method comprises the steps of utilizing the neural network as a basic data model for classifying the labeled data, utilizing k-means as a clustering algorithm, and obtaining the data model after 2 iterations. The experimental data source is heart disease data. The total sample size adopted in the method experiment is 689, wherein the test set comprises 300 data, 100 labeled classified data, 200 unlabeled classified data and the verification set comprises 389 data. The treatment process is as follows:
1. modeling 100 labeled data by using a neural network method to form a classification model;
2. classifying 100 (50%) of the 200 unlabeled data using the classification model;
3. clustering the same 100 unlabeled data by using Kmean;
4. calculating the classification and clustering results according to a formula to form C;
5. adding the 100 labeled data C into a training set to continue training to form a new classification model;
6. and repeating the step 2 to calculate another 100 pieces of non-label data to obtain the model.
Please refer to fig. 4, which is a graph comparing the disease prediction effect of the conventional method and the method of the present invention. The results including accuracy, error rate, precision, recall and correlation are compared, and the numerical results are shown in table 1.
Accuracy rate Error rate Accuracy of measurement Recall rate Degree of correlation
General procedure 0.945026178 0.054973822 0.846846847 0.959183673 0.82071
The invention 0.971204188 0.028795812 0.930693069 0.959183673 0.915200021
Therefore, compared with the common method, the method has the advantages of higher accuracy, lower error rate and capability of improving the accuracy by 3%.
In conclusion, the invention provides an improved disease prediction model, which utilizes a semi-supervised learning method to model unlabeled data, effectively utilizes the unlabeled data, further optimizes the prediction model, and helps to better improve the precision of model prediction, thereby better coping with the application scenes of the large-scale mass unlabeled data at present, and the precision can be improved by 3% according to the experimental result. According to the experimental result, the method can be effectively applied to the field of disease prediction, and can also be applied to other data models by fine-tuning parameters.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (5)

1. A disease prediction model building method based on semi-supervised learning is characterized by comprising the following steps:
s1, classifying the labeled data to obtain a basic classification model of the labeled data;
s2, selecting part of unlabeled data from the unlabeled data;
s3, classifying the part of the unlabeled data selected in the step S2 by a clustering method to obtain a clustering result M of the unlabeled data1And marking the part of the unlabeled data selected in the step S2 by using the basic classification model to obtain a prediction result T1(ii) a Clustering result M according to the label-free data1And predicted result T1Obtaining a marking result C of the label-free data;
s4, combining the labeling result C of the non-label data and the labeled data for classification to obtain an updated basic classification model, turning to the step S2 to continue to select part of non-label data from the rest non-label data to execute the steps S3 and S4, and iterating in the above way until all the non-label data are processed to obtain a final classification model;
in the step S2, if q is greater than q2>10q1Wherein q is1Total amount of data for tagged data, q2The quantity of the selected part of the non-label data is a multiplied by q2Wherein a is 20%; if q is1≤q2≤10q1Then the selected number of the part of the non-label data is b × q1And b is 50%;
in step S3, the labeling result C of the unlabeled data is calculated by using the following linear formula:
C=αT1+βM1
wherein alpha and beta are classification coefficients; alpha 50% q1/(q1+q2),β=q1/(q1+q2);
The step S3 further includes:
if C > 1.5q1/(q1+q2) The result C is marked with a value 1 indicating true, if C is less than or equal to 1.5q1/(q1+q2) If yes, marking result C as a value 0 representing false; wherein q is1Total amount of data for tagged data, q2Total amount of data q for unlabeled data2
2. The method for building a disease prediction model based on semi-supervised learning according to claim 1, wherein the labeled data is classified in the step S1 by any one of the following classification methods: neural networks, naive bayes, or multivariate linear regression analysis methods.
3. The method for building a disease prediction model based on semi-supervised learning according to claim 1, wherein the clustering method used in step S3 is K-means or hierarchical clustering method.
4. A disease prediction model building device based on semi-supervised learning is characterized by comprising:
the first processing unit is used for classifying the labeled data to obtain a basic classification model of the labeled data;
the second processing unit is used for selecting part of non-label data from the non-label data;
a third processing unit for classifying part of the non-label data selected by the second processing unit by a clustering method to obtain a clustering result M of the non-label data1And marking part of the unlabelled data selected by the second processing unit by using the basic classification model to obtain a prediction result T1(ii) a Clustering result M according to the label-free data1And predicted result T1Obtaining a marking result C of the label-free data;
the fourth processing unit is used for combining the labeling result C of the non-label data and the labeled data for classification to obtain an updated basic classification model, then the second processing unit is started to continuously select part of non-label data from the rest non-label data for modeling, and the iteration is carried out until all the non-label data are processed to obtain a final classification model;
in the second processing unit, if q2>10q1Wherein q is1Total amount of data for tagged data, q2The quantity of the selected part of the non-label data is a multiplied by q2Wherein a is 20%; if q is1≤q2≤10q1Then the selected number of the part of the non-label data is b × q1And b is 50%;
the third processing unit calculates a labeling result C of the non-tag data using the following linear formula:
C=αT1+βM1
wherein alpha and beta are classification coefficients; alpha 50% q1/(q1+q2),β=q1/(q1+q2);
If C > 1.5q1/(q1+q2) The result C is marked with a value 1 indicating true, if C is less than or equal to 1.5q1/(q1+q2) If yes, marking result C as a value 0 representing false; wherein q is1Total amount of data for tagged data, q2Total amount of data q for unlabeled data2
5. A disease prediction apparatus based on semi-supervised learning, comprising:
the semi-supervised learning based disease prediction model building apparatus of claim 4, for deriving a final classification model; and connected thereto
And the disease prediction unit is used for processing the disease data by using the final classification model to obtain a disease prediction result.
CN201711135644.1A 2017-11-16 2017-11-16 Disease prediction model establishing method and device based on semi-supervised learning Active CN107944479B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711135644.1A CN107944479B (en) 2017-11-16 2017-11-16 Disease prediction model establishing method and device based on semi-supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711135644.1A CN107944479B (en) 2017-11-16 2017-11-16 Disease prediction model establishing method and device based on semi-supervised learning

Publications (2)

Publication Number Publication Date
CN107944479A CN107944479A (en) 2018-04-20
CN107944479B true CN107944479B (en) 2020-10-30

Family

ID=61931431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711135644.1A Active CN107944479B (en) 2017-11-16 2017-11-16 Disease prediction model establishing method and device based on semi-supervised learning

Country Status (1)

Country Link
CN (1) CN107944479B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903053B (en) * 2019-03-01 2020-01-07 成都新希望金融信息有限公司 Anti-fraud method for behavior recognition based on sensor data
CN109948704A (en) * 2019-03-20 2019-06-28 ***股份有限公司 A kind of transaction detection method and apparatus
CN110009015A (en) * 2019-03-25 2019-07-12 西北工业大学 EO-1 hyperion small sample classification method based on lightweight network and semi-supervised clustering
CN111640510A (en) * 2020-04-09 2020-09-08 之江实验室 Disease prognosis prediction system based on deep semi-supervised multitask learning survival analysis
CN112786200A (en) * 2021-01-18 2021-05-11 吾征智能技术(北京)有限公司 Intelligent diet evaluation system based on meal data
CN115249543B (en) * 2022-08-01 2023-06-23 中日友好医院(中日友好临床医学研究所) Method for establishing artificial intelligence model for predicting ARDS patient prognosis

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751666A (en) * 2009-10-16 2010-06-23 西安电子科技大学 Semi-supervised multi-spectral remote sensing image segmentation method based on spectral clustering
CN103020122B (en) * 2012-11-16 2015-09-30 哈尔滨工程大学 A kind of transfer learning method based on semi-supervised clustering
CN103150580B (en) * 2013-03-18 2016-03-30 武汉大学 A kind of high spectrum image semisupervised classification method and device
CN103234767B (en) * 2013-04-21 2016-01-06 苏州科技学院 Based on the nonlinear fault detection method of semi-supervised manifold learning
CN104408466B (en) * 2014-11-17 2017-10-27 中国地质大学(武汉) Learn the high-spectrum remote sensing semisupervised classification method of composition based on local manifolds
CN104598813B (en) * 2014-12-09 2017-05-17 西安电子科技大学 Computer intrusion detection method based on integrated study and semi-supervised SVM
CN105303198B (en) * 2015-11-17 2018-08-17 福州大学 A kind of remote sensing image semisupervised classification method learnt from fixed step size
CN107194336B (en) * 2017-05-11 2019-12-24 西安电子科技大学 Polarized SAR image classification method based on semi-supervised depth distance measurement network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《一种改进的渐进直推式支持向量机分类学***等;《信号处理》;20080430;第24卷(第2期);第213-218页 *
《最小二乘支持向量机的半监督学习算法》;张健沛等;《哈尔滨工程大学学报》;20081031;第29卷(第10期);第1088-1092页 *

Also Published As

Publication number Publication date
CN107944479A (en) 2018-04-20

Similar Documents

Publication Publication Date Title
CN107944479B (en) Disease prediction model establishing method and device based on semi-supervised learning
CN110245721B (en) Training method and device for neural network model and electronic equipment
CN108564121B (en) Unknown class image label prediction method based on self-encoder
Guo et al. Learn to threshold: Thresholdnet with confidence-guided manifold mixup for polyp segmentation
CN110797101B (en) Medical data processing method, medical data processing device, readable storage medium and computer equipment
US20180276105A1 (en) Active learning source code review framework
WO2022062419A1 (en) Target re-identification method and system based on non-supervised pyramid similarity learning
Wankhade et al. A novel hybrid deep learning method for early detection of lung cancer using neural networks
CN110852076B (en) Method and device for automatic disease code conversion
Tang et al. Lesion segmentation and RECIST diameter prediction via click-driven attention and dual-path connection
CN109657693A (en) A kind of classification method based on joint entropy and transfer learning
Li et al. Self-paced convolutional neural network for computer aided detection in medical imaging analysis
CN107193979B (en) Method for searching homologous images
CN109657710B (en) Data screening method and device, server and storage medium
CN116071331A (en) Workpiece surface defect detection method based on improved SSD algorithm
US10497119B2 (en) System and methods for post-cardiac MRI images
CN111968114B (en) Orthopedics consumable detection method and system based on cascade deep learning method
Kasinathan et al. [Retracted] Development of Deep Learning Technique of Features for the Analysis of Clinical Images Integrated with CANN
CN114999661A (en) Construction method of skin cancer identification model, skin cancer identification device and electronic equipment
Hong et al. Sensor-type classification in buildings
CN110955811B (en) Power data classification method and system based on naive Bayes algorithm
Sun et al. Research on lung tumor cell segmentation method based on improved UNet algorithm
JP2020534614A (en) Evaluation of input data using deep learning algorithms
Hu et al. A Novel Global Energy and Local Energy‐Based Legendre Polynomial Approximation for Image Segmentation
Fu et al. Evidence Reconciled Neural Network for Out-of-Distribution Detection in Medical Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant