CN106777891B - A kind of selection of data characteristics and prediction technique and device - Google Patents

A kind of selection of data characteristics and prediction technique and device Download PDF

Info

Publication number
CN106777891B
CN106777891B CN201611043691.9A CN201611043691A CN106777891B CN 106777891 B CN106777891 B CN 106777891B CN 201611043691 A CN201611043691 A CN 201611043691A CN 106777891 B CN106777891 B CN 106777891B
Authority
CN
China
Prior art keywords
blood pressure
user
data
model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611043691.9A
Other languages
Chinese (zh)
Other versions
CN106777891A (en
Inventor
吴书
王亮
谭铁牛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201611043691.9A priority Critical patent/CN106777891B/en
Publication of CN106777891A publication Critical patent/CN106777891A/en
Application granted granted Critical
Publication of CN106777891B publication Critical patent/CN106777891B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • G06F19/32

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses data characteristics selection and prediction technique and devices.Method includes: step S1, acquisition user information and corresponding blood pressure observation data, forms data set, and remove from the data set outlier;Step S2, user characteristics are extracted from the user information in the data set;Step S3, blood pressure characteristics are extracted from the blood pressure observation data in the data set;Step S4, extracted user characteristics and blood pressure characteristics are normalized, processing result forms training set as training sample, it is input among supporting vector machine model and/or Gradient Iteration decision-tree model using the training sample in the training set, training obtains prediction model.The present invention chooses work, the accuracy of effective lift scheme using the cleaning of medical knowledge guide data and Feature Engineering.

Description

A kind of selection of data characteristics and prediction technique and device
Technical field
The present invention relates to machine learning and area of pattern recognition, the mainly feature selection approach in machine learning, and tie Gradient Iteration decision tree and supporting vector machine model are closed, the method and device of data characteristics selection and prediction is carried out.
Background technique
With the development of computer technology, computer can handle a variety of different data at present, help people more Add the task of being efficiently completed.Especially in artificial intelligence field, machine learning has been widely applied to as a core technology In many particular problems.Support vector machines (SVM) is one of the model of machine learning classics, it can also efficiently be obtained simultaneously very much Obtain good prediction result.Gradient Iteration decision tree (GBDT) is the in recent years very popular machine learning method of current industry, it From classical decision tree (Decision Tree) model.
In recent years, portable medical is a global in recent years market focus, and transboundary fusion is its essential characteristic, big data Prediction and application be even more Bright Prospect.
Summary of the invention
Based on the above issues, the screening model of relevant user blood pressure data sequence is established in present invention exploitation, is striven for individual character Change user and optimization strategy and intuitive quantization guidance are provided, assists the intervening measure for realizing maximum efficiency, provide individual character for user The Feature Selection service of change.
According to an aspect of the present invention, a kind of selection of data characteristics and prediction technique are provided, the method comprising the steps of:
Step S1, it acquires user information and corresponding blood pressure observes data, form data set, and pick from the data set Except outlier;
Step S2, user characteristics are extracted from the user information in the data set;
Step S3, blood pressure characteristics are extracted from the blood pressure observation data in the data set;
Step S4, extracted user characteristics and blood pressure characteristics are normalized, processing result is as training sample This formation training set is input to supporting vector machine model and/or Gradient Iteration decision using the training sample in the training set Among tree-model, training obtains prediction model.
Wherein, the user characteristics include age, gender and the body-mass index of user;The blood pressure characteristics include height Pressure, low pressure, heart rate and medication situation.
Wherein, the extraction of blood pressure characteristics described in step S3 includes: the blood pressure characteristics extracted under different prediction tasks;It is described Different prediction tasks include long period, short cycle, coarseness and fine granularity prediction task.
Wherein, support vector machines and/or gradient are input to using the training sample in the training set described in step S4 Among iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set The average value of feature and the average value of the blood pressure characteristics in the first predetermined acquisition time are pressed, is input in supporting vector machine model, The supporting vector machine model uses regression model, and the kernel function of the regression model uses linear kernel;
By same user in the output of the supporting vector machine model and the training set in the second predetermined acquisition time Blood pressure characteristics be compared, and then update the parameter of the supporting vector machine model;The second predetermined acquisition time is later than The first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model restrains, obtains the first prediction model.
Wherein, support vector machines and/or gradient are input to using the training sample in the training set described in step S4 Among iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set The average value of feature and the average value of the blood pressure characteristics in the predetermined acquisition time of third are pressed, Gradient Iteration decision-tree model is input to In, the loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By same user in the output of the Gradient Iteration decision-tree model and the training set in the 4th predetermined acquisition Interior blood pressure characteristics are compared, and then update the parameter of the Gradient Iteration decision-tree model;Described 4th predetermined acquisition Time is later than the predetermined acquisition time of the third;
Iteration executes above-mentioned steps, until the parameter of the Gradient Iteration decision tree restrains, obtains the second prediction model.
Wherein, support vector machines and/or gradient are input to using the training sample in the training set described in step S4 Among iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set The average value of feature and the average value of the blood pressure characteristics in the first predetermined acquisition time are pressed, is input to supporting vector machine model in Gradient Iteration decision-tree model, the supporting vector machine model use regression model, and the kernel function of the regression model uses line Property core;The loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By the output of the supporting vector machine model and the Gradient Iteration decision-tree model respectively and in the training set Blood pressure characteristics of the same user in the second predetermined acquisition time are compared, and then update the supporting vector machine model respectively With the parameter of the Gradient Iteration decision-tree model;The second predetermined acquisition time is later than the described first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model and the Gradient Iteration decision-tree model Convergence, obtains the first prediction model.
Wherein, step S1 further includes removing from the data set outlier, comprising:
Remove the age not user information in predetermined the range of age and the corresponding blood pressure data of user;
Remove the height not user information in predetermined height ranges and the corresponding blood pressure data of user;
Remove the weight not user information in predetermined weight range and the corresponding blood pressure data of user;
Remove the pressure value not user information in predetermined blood pressure range and the corresponding blood pressure data of user;
Remove user information and corresponding blood pressure data of the heart rate of user not within the scope of target heart rate.
According to a second aspect of the present invention, a kind of selection of data characteristics and prediction meanss are provided, comprising:
Acquisition module forms data set, and from the data for acquiring user information and corresponding blood pressure observation data Concentrate excluding outlier point;
User characteristics extraction module, for extracting user characteristics from the user information in the data set;
Blood pressure characteristics extraction module, for extracting blood pressure characteristics from the blood pressure observation data in the data set;
Training module, for extracted user characteristics and blood pressure characteristics to be normalized, processing result conduct Training sample forms training set, is input to supporting vector machine model using the training sample in the training set and/or gradient changes Among decision-tree model, training obtains prediction model.
Wherein, blood pressure characteristics extraction module includes:
Blood pressure characteristics extracting sub-module, for extracting the blood pressure characteristics under different prediction tasks;The difference prediction task Task is predicted including long period, short cycle, coarseness and fine granularity.
The present invention using medical knowledge guide data cleaning and Feature Engineering choose work, effective lift scheme it is accurate Property.
Detailed description of the invention
Fig. 1 is the flow chart of data characteristics selection and prediction technique proposed by the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and referring to attached Figure, the present invention is described in more detail.
As shown in Figure 1, the method comprising the steps of the invention proposes a kind of selection of data characteristics and prediction technique:
Step S1, it acquires user information and corresponding blood pressure observes data, form data set, and pick from the data set Except outlier;
Step S2, user characteristics are extracted from the user information in the data set;
Step S3, blood pressure characteristics are extracted from the blood pressure observation data in the data set;
Step S4, extracted user characteristics and blood pressure characteristics are normalized, processing result is as training sample This formation training set is input to supporting vector machine model and/or Gradient Iteration decision using the training sample in the training set Among tree-model, training obtains prediction model.
In one embodiment, the user characteristics include age, gender and the body-mass index of user;The blood pressure is special Sign includes high pressure, low pressure, heart rate.
The extraction of blood pressure characteristics described in step S3 includes: the blood pressure characteristics extracted under different prediction tasks;The difference Prediction task includes long period, short cycle, coarseness and fine granularity prediction task.
In one embodiment, the present invention can train SVM model and GBDT model simultaneously, and utilize above-mentioned two mould simultaneously Type predicts user's blood pressure;In another embodiment, SVM model or GBDT model can also be individually trained, and utilizes instruction The SVM model or GBDT model perfected are predicted.
In one embodiment, support vector machines is input to using the training sample in the training set described in step S4 And/or among Gradient Iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set The average value of feature and the average value of the blood pressure characteristics in the first predetermined acquisition time are pressed, is input in supporting vector machine model, The supporting vector machine model uses regression model, and the kernel function of the regression model uses linear kernel;
By same user in the output of the supporting vector machine model and the training set in the second predetermined acquisition time Blood pressure characteristics be compared, and then update the parameter of the supporting vector machine model;The second predetermined acquisition time is later than The first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model restrains, obtains the first prediction model.
In another embodiment, support vector machines is input to using the training sample in the training set described in step S4 And/or among Gradient Iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set The average value of feature and the average value of the blood pressure characteristics in the predetermined acquisition time of third are pressed, Gradient Iteration decision-tree model is input to In, the loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By same user in the output of the Gradient Iteration decision-tree model and the training set in the 4th predetermined acquisition Interior blood pressure characteristics are compared, and then update the parameter of the Gradient Iteration decision-tree model;Described 4th predetermined acquisition Time is later than the predetermined acquisition time of the third;
Iteration executes above-mentioned steps, until the parameter of the Gradient Iteration decision tree restrains, obtains the second prediction model.
In other embodiments, support vector machines is input to using the training sample in the training set described in step S4 And/or among Gradient Iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set The average value of feature and the average value of the blood pressure characteristics in the first predetermined acquisition time are pressed, is input to supporting vector machine model in Gradient Iteration decision-tree model, the supporting vector machine model use regression model, and the kernel function of the regression model uses line Property core;The loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By the output of the supporting vector machine model and the Gradient Iteration decision-tree model respectively and in the training set Blood pressure characteristics of the same user in the second predetermined acquisition time are compared, and then update the supporting vector machine model respectively With the parameter of the Gradient Iteration decision-tree model;The second predetermined acquisition time is later than the described first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model and the Gradient Iteration decision-tree model Convergence, obtains the first prediction model.
In one embodiment, step S1 further includes removing from the data set outlier, comprising:
Remove the age not user information in predetermined the range of age and the corresponding blood pressure data of user;
Remove the height not user information in predetermined height ranges and the corresponding blood pressure data of user;
Remove the weight not user information in predetermined weight range and the corresponding blood pressure data of user;
Remove the pressure value not user information in predetermined blood pressure range and the corresponding blood pressure data of user;
Remove user information and corresponding blood pressure data of the heart rate of user not within the scope of target heart rate.
Technical solution of the present invention is discussed in detail below by specific embodiment.
In one embodiment, the invention proposes a kind of selection of data characteristics and prediction techniques comprising:
Step 101, userspersonal information's data and blood pressure are collected and observes data, and by collected userspersonal information and Blood pressure is observed data and is imported among database, and the users personal data includes age of user, gender, height, weight, body matter Volume index (BMI), time of measuring etc.;The blood pressure observation data include high pressure, low pressure, heart rate, medication situation, measurement month letter Breath etc..Data are cleaned, data are observed to userspersonal information's data and blood pressure according to relevant medical knowledge, leave out outlier (i.e. abnormal userspersonal information's data and blood pressure observe data), data set is become to can be used for machine learning training pattern Target data.
The specific screening rule of outlier: the age is not in predetermined the range of age in removal userspersonal information's data Data, such as age are greater than 110 years old and the user less than 10 years old;The height not data in predetermined height ranges are removed, such as Data of the height less than 120 centimetres or greater than 200 centimetres;Remove the data in the no longer predetermined weight range of weight, such as body It is less than 20kg or the data greater than 130kg again;Remove blood pressure not data in predetermined blood pressure range, for example, low pressure be less than and Greater than the observation data of the user's history averaged blood pressure measurements 40, high pressure is removed smaller and larger than the user's history blood pressure measurement The observation data of average value 40;Remove the observation data that heart rate is 0.
Step 102, the feature of user, including age, gender and body-mass index are chosen from database.According to authority Known to medical information: age of user is bigger, and blood pressure is higher;Male's blood pressure is generally slightly above women;Body-mass index (BMI) is more High (approximation represents fatter), blood pressure is higher.Extracting feature includes: the age, gender in userspersonal information's data are (with 0 table Show women, 1 indicates male), and BMI (weight/height square) is converted by height, weight.
Step S3, chooses blood pressure characteristics from database, and including the blood pressure characteristics under different prediction tasks, difference prediction is appointed Business includes the prediction task of the different accuracies such as long period, short cycle, coarseness and fine granularity, selected under different prediction tasks The blood pressure characteristics taken include high pressure, low pressure, heart rate, medication situation.It includes user's high pressure, low pressure, heart rate, clothes that blood pressure, which observes data, Medicine situation, measurement month information.In this step, it has been further introduced into different prediction tasks.Such as long period and short cycle Prediction is respectively indicated and is inputted continuous 6 months or 3 months blood pressure datas of user as feature, if having it is of that month without measuring if use Vacancy value replaces.When coarseness is predicted, inputted using 2 months or 3 months user's averaged blood pressure measurements as feature, fine granularity is pre- When survey, inputted using one month or half of user's averaged blood pressure measurements as feature.
Step 103, to characteristic (BMI, age, the gender etc. of high pressure, low pressure, heart rate and user including measurement, I.e. from the characteristic in the predetermined time obtained in training data) and target data (be later than obtained in the training data The pressure value of a period of time of the predetermined time is as target data) normalized is done, by the scope control of data in 0 He Between 1.Normalized processing formula is as follows:
Wherein minimum value refers to this feature existing the smallest value in the database, and maximum value is wherein most A big value.The processing of month information is encoded using one-hot, integer data is expanded into 0 and 1 coding, passes through 1 The value encoded is expressed in position in the sequence, so that 12 month information is all converted to same status.
Step 104, using support vector machines (SVM) and Gradient Iteration decision tree (GBDT) to treated characteristic (including user characteristics and blood pressure measurement feature) and target data carry out recurrence learning, construct the prediction mould of user's future blood pressure Type.Using the above user characteristics, blood pressure measurement feature and the corresponding month information of every blood pressure measurement feature as training data Normalized is done, is put among support vector machines (SVM) and Gradient Iteration decision tree (GBDT) model, until the parameter of model Convergence, the parameter obtained at this time make model relative to being optimal of training data.It is experimentally confirmed in SVM model, It is regression model when choosing training pattern, it is best that kernel function is selected as effect when linear kernel (linear kernel).In Gradient Iteration In decision-tree model, loss function is chosen for least square difference function (least square error), will with predict function Prediction label output.
In order to verify implementation result of the invention, next made further with the experimental result on truthful data It is bright.Specific step is as follows:
Step 201, due to single blood pressure measurement can not the accurate description user blood pressure situation because for a use The average blood pressure that family acquires one month is arranged into data set.
Step 202, first the initial data in the data set is converted to the feature of suitable training pattern, chosen later There is within continuous six months the user of observation data out, can guarantee the continuity of user's measurement in this way, promote the accuracy of prediction.Example (the N-5 month to the N+1 month) is such as selected continuous seven month there are the data of the user of observational record to do training (for example, by using August part and 9 The user that month occurs simultaneously does training), the last one month N+1 month is as training objective;Using continuous (the N-4 month in seven months To the N+2 month) user that has observational record tests (such as being tested with September And October while the user that occurs), the last one The N+2 month moon is as test target.
Step S3, SVM Experiment Training integrates target as the average low pressures of the N+1 month, by the prediction result of model output and the N+1 month Data compare to update model parameter.Next we extract 1) with 2) two kinds of strategies as short cycle and long period Typical case.Specific training set feature extraction rule is as follows:
1) the N-2-N month: BMI (weight/height square) that the height and weight for extracting user are converted to, gender, age; Individually be averaged N-2, N-1, the N month high pressure, low pressure, heart rate, medication situation;N-2, N-1, the N month per two weeks are averaged high pressure, low Pressure, heart rate, medication situation;Be averaged N-2, N-1, N March high pressure, low pressure, heart rate, medication situation.
2) the N-5-N month: BMI (weight/height square) that the height and weight for extracting user are converted to, gender, age; User is in the average high pressure of the N-5-N month list moon, low pressure, heart rate, medication situation;N-5-N per two weeks is averaged high pressure, low pressure, the heart Rate, situation of taking medicine;Quarter-yearly average high pressure, low pressure, heart rate, situation of taking medicine.
Step S4, it is as follows that SVM tests test set extracting rule:
1) the N-1-N+1 month: the corresponding training set N-2-N month, the BMI (weight/body that the height and weight for extracting user are converted to High square), gender, age;N-1, N, the N+1 month, individually averagely high pressure, low pressure, heart rate, N-1, N, the N+1 month per two weeks were flat Equal high pressure, low pressure, heart rate;Be averaged N-1, N, N+1 March high pressure, low pressure, heart rate.
2) the N-4-N+1 month: the corresponding training set N-5-N month, BMI (weight/height is converted by the height of user and weight Square), gender, age;Average high pressure, low pressure, heart rate, medication of the user in the N-4-N+1 month list moon;N-4-N+1 per two weeks Average high pressure, low pressure, heart rate, medication;Quarter-yearly average high pressure, low pressure, heart rate, medication.
Training set is input among lib-SVM model by step S5, does training until model convergence, Optimized model parameter. I.e. exportable prediction result in trained model is input the feature into, and compared with test set target, obtains what low pressure returned Mean error.
SVM model construction is as follows:
Firstly, defining the function interval of hyperplane (w, b) about training datasetAre as follows:
Wherein, x is characteristic, and y is target data;
Therefore largest interval classifier objective function can be with is defined as:
It is further rewritten as:
Wherein, n is number of samples, yiIndicate the target data of i-th of sample, xiIndicate the characteristic of i-th of sample;
Objective function can be merged by Lagrangian method later with restrictive condition, be rewritten into general convex optimization Problem is in order to calculating.It, can be by this hyperplane according to the available optimum regression hyperplane of this objective function Row prediction.
It needs to be arranged accordingly in lib-SVM, suitable support vector machines kernel function is selected by input instruction And training setting.- s indicates the setting type of SVM, and 4 (nu-SVR, regression) of selection are regression model, and-t represents core The selection of function, selecting 0 (linear kernel) is kernel function, and it is best to be experimentally confirmed this setting effect.
Lib-SVM can store the resulting model parameter of training, can be to survey using svm_predict function Examination collection predict and evaluation model performance.Step S6, GBDT experiment test identical feature extraction rule, weight using with SVM Multiple S3, S4, S5 step.Training set feature and target are input among GBDT model.
Realize that GBDT is returned using the GBDT kit encapsulated in open source Machine learning tools scikit-learn, data It only needs to import and store into list format from file with Python.Data and label respectively correspond a list, identical bits It sets corresponding.
GBDT model construction:
The core of GBDT is decision tree (Decision Tree), and the overall procedure of decision tree is such that each of tree Node can all obtain a predicted value, this predicted value is equal to the average value for belonging to all features of this node.It measures best Standard be minimize mean square deviation.The branch foundation near spectrum can be found by minimizing mean square deviation.
The core concept of Gradient Iteration (Gradient Boosting) is by iteration more trees come Shared Decision Making.Therefore, The training method of available GBDT, i.e., every one tree is the residual errors for setting conclusion sums all before, this residual error is exactly one The accumulation amount of true value can be obtained after a plus predicted value.By this method, GBDT can integrate the prediction of multiple decision trees simultaneously Obtain more accurate prediction result.
The GradientBoostingRegressot function in scikit-learn is called to carry out training pattern, decision tree Depth is 3 layers, and learning rate is set as 0.005.It is best to be experimentally confirmed this setting effect.Model parameter can quilt after the completion of training It stores, by calling predict function that can predict using the model parameter come out is learned test set, and comments Valence model performance.
Blood pressure is obtained classification error with 10 for interval division by step S7, and specific hierarchical policy is as shown in table 1.Obtain SVM With the experimental result of GBDT respectively as shown in table 2, table 3, object of experiment month is October.
Evaluation index explanation:
Mean error: the average value of all data predicted values and true value difference.
Be classified error: all data obtain the average value of classification results Yu true classification results difference.
Relatively accurate rate: mean predicted value/average true value
1 blood pressure low voltage value category level of table
Low voltage value Category level
< 80 1
80-90 2
90-100 3
100-110 4
> 110 5
2 support vector machines of table (SVM) experimental result
SVM predicts that user tests in the average low pressures in October, 2015
Table 3 Gradient Iteration decision tree (GBDT) experimental result
GBDT predicts that user tests in the average low pressures in October, 2015
Step S8 compares experimental results in table 2,3 and fitted data basic (Baseline).Baseline is The numerical value in October is directly fitted with the low pressure data of user's September, as shown in table 4.
4 fitted data of table is basic (Baseline)
Month Mean error Average error rate It is classified error Sample number
October 5.27692 0.0638 0.43691 3012
By the experimental result in table it can be concluded that, compared with the baseline of fitted data basis under, it is average in low pressure It is obviously improved in terms of error, SVM model short cycle and long period prediction improve 10.37% and 11.14% respectively;GBDT Model short cycle and macrocyclic prediction improve 10.75% and 11.45% respectively.In terms of being classified error, with baseline It compares, SVM model short cycle and long period prediction improve 2.85% and 8.43% respectively;GBDT model short cycle and long period Prediction improve 8.43% and 10.48% respectively.
Particular embodiments described above has carried out further specifically the purpose of the present invention, technical solution and effect It is bright, it should be understood that the above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all at this Within the spirit and principle of invention, any modification, equivalent substitution, improvement and etc. done should be included in protection model of the invention Within enclosing.

Claims (14)

1. a kind of data characteristics selection and prediction technique, the method comprising the steps of:
Step S1, it acquires user information and corresponding blood pressure observes data, form data set, and remove from the data set different Constant value point;
Step S2, user characteristics are extracted from the user information in the data set;
Step S3, blood pressure characteristics are extracted from the blood pressure observation data in the data set;
Step S4, extracted user characteristics and blood pressure characteristics are normalized, processing result is as training sample shape At training set, it is input to Gradient Iteration decision-tree model using the training sample in the training set, is specifically included:
It is special that the user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood pressure of half a month are extracted from the training set The average value of blood pressure characteristics in the predetermined acquisition time of average value and third of sign, is input in Gradient Iteration decision-tree model, The loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By same user in the output of the Gradient Iteration decision-tree model and the training set in the 4th predetermined acquisition time Blood pressure characteristics be compared, and then update the parameter of the Gradient Iteration decision-tree model;The 4th predetermined acquisition time It is later than the predetermined acquisition time of the third;
Iteration executes above-mentioned steps, until the parameter of the Gradient Iteration decision tree restrains, obtains the second prediction model.
2. the method according to claim 1, wherein the user characteristics include age, gender and the body of user Body mass index;The blood pressure characteristics include high pressure, low pressure, heart rate and medication situation.
3. according to the method described in claim 2, it is characterized in that, the extraction of blood pressure characteristics described in step S3 includes: to extract Blood pressure characteristics under different prediction tasks;The difference prediction task includes long period, short cycle, coarseness and fine granularity prediction Task.
4. the method as described in claim 1, which is characterized in that remove from the data set outlier in step S1, wrap It includes:
Remove the age not user information in predetermined the range of age and the corresponding blood pressure data of user;
Remove the height not user information in predetermined height ranges and the corresponding blood pressure data of user;
Remove the weight not user information in predetermined weight range and the corresponding blood pressure data of user;
Remove the pressure value not user information in predetermined blood pressure range and the corresponding blood pressure data of user;
Remove user information and corresponding blood pressure data of the heart rate of user not within the scope of target heart rate.
5. a kind of data characteristics selection and prediction technique, the method comprising the steps of:
Step S1, it acquires user information and corresponding blood pressure observes data, form data set, and remove from the data set different Constant value point;
Step S2, user characteristics are extracted from the user information in the data set;
Step S3, blood pressure characteristics are extracted from the blood pressure observation data in the data set;
Step S4, extracted user characteristics and blood pressure characteristics are normalized, processing result is as training sample shape At training set, using the training sample in the training set be input to supporting vector machine model and Gradient Iteration decision-tree model it In, training obtains prediction model, it specifically includes:
It is special that the user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood pressure of half a month are extracted from the training set The average value of the average value of sign and the blood pressure characteristics in the first predetermined acquisition time, is input to supporting vector machine model and gradient changes For decision-tree model, the supporting vector machine model uses regression model, and the kernel function of the regression model uses linear kernel;Institute The loss function for stating Gradient Iteration decision-tree model is adopted as least square difference function;
By the output of the supporting vector machine model and the Gradient Iteration decision-tree model respectively with it is same in the training set Blood pressure characteristics of the user in the second predetermined acquisition time are compared, and then update the supporting vector machine model and institute respectively State the parameter of Gradient Iteration decision-tree model;The second predetermined acquisition time is later than the described first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model and the Gradient Iteration decision-tree model is received It holds back, obtains the first prediction model.
6. according to the method described in claim 5, it is characterized in that, the user characteristics include age, gender and the body of user Body mass index;The blood pressure characteristics include high pressure, low pressure, heart rate and medication situation.
7. according to the method described in claim 6, it is characterized in that, the extraction of blood pressure characteristics described in step S3 includes: to extract Blood pressure characteristics under different prediction tasks;The difference prediction task includes long period, short cycle, coarseness and fine granularity prediction Task.
8. method as claimed in claim 5, which is characterized in that remove from the data set outlier in step S1, wrap It includes:
Remove the age not user information in predetermined the range of age and the corresponding blood pressure data of user;
Remove the height not user information in predetermined height ranges and the corresponding blood pressure data of user;
Remove the weight not user information in predetermined weight range and the corresponding blood pressure data of user;
Remove the pressure value not user information in predetermined blood pressure range and the corresponding blood pressure data of user;
Remove user information and corresponding blood pressure data of the heart rate of user not within the scope of target heart rate.
9. a kind of data characteristics selection and prediction meanss characterized by comprising
Acquisition module forms data set, and from the data set for acquiring user information and corresponding blood pressure observation data Excluding outlier point;
User characteristics extraction module, for extracting user characteristics from the user information in the data set;
Blood pressure characteristics extraction module, for extracting blood pressure characteristics from the blood pressure observation data in the data set;
Training module, for extracted user characteristics and blood pressure characteristics to be normalized, processing result is as training Sample forms training set, is input among Gradient Iteration decision-tree model using the training sample in the training set, trained To prediction model, specifically include:
It is special that the user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood pressure of half a month are extracted from the training set The average value of blood pressure characteristics in the predetermined acquisition time of average value and third of sign, is input in Gradient Iteration decision-tree model, The loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By same user in the output of the Gradient Iteration decision-tree model and the training set in the 4th predetermined acquisition time Blood pressure characteristics be compared, and then update the parameter of the Gradient Iteration decision-tree model;The 4th predetermined acquisition time It is later than the predetermined acquisition time of the third;
Iteration executes above-mentioned steps, until the parameter of the Gradient Iteration decision tree restrains, obtains the second prediction model.
10. device according to claim 9, which is characterized in that the user characteristics include age, gender and the body of user Body mass index;The blood pressure characteristics include high pressure, low pressure, heart rate.
11. device according to claim 9, which is characterized in that blood pressure characteristics extraction module includes:
Blood pressure characteristics extracting sub-module, for extracting the blood pressure characteristics under different prediction tasks;It is described difference prediction task include Long period, short cycle, coarseness and fine granularity predict task.
12. a kind of data characteristics selection and prediction meanss characterized by comprising
Acquisition module forms data set, and from the data set for acquiring user information and corresponding blood pressure observation data Excluding outlier point;
User characteristics extraction module, for extracting user characteristics from the user information in the data set;
Blood pressure characteristics extraction module, for extracting blood pressure characteristics from the blood pressure observation data in the data set;
Training module, for extracted user characteristics and blood pressure characteristics to be normalized, processing result is as training Sample forms training set, is input to supporting vector machine model and Gradient Iteration decision tree using the training sample in the training set Among model, training obtains prediction model, specifically includes:
It is special that the user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood pressure of half a month are extracted from the training set The average value of the average value of sign and the blood pressure characteristics in the first predetermined acquisition time, is input to supporting vector machine model and gradient changes For decision-tree model, the supporting vector machine model uses regression model, and the kernel function of the regression model uses linear kernel;Institute The loss function for stating Gradient Iteration decision-tree model is adopted as least square difference function;
By the output of the supporting vector machine model and the Gradient Iteration decision-tree model respectively with it is same in the training set Blood pressure characteristics of the user in the second predetermined acquisition time are compared, and then update the supporting vector machine model and institute respectively State the parameter of Gradient Iteration decision-tree model;The second predetermined acquisition time is later than the described first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model and the Gradient Iteration decision-tree model is received It holds back, obtains the first prediction model.
13. device according to claim 12, which is characterized in that the user characteristics include age of user, gender and Body-mass index;The blood pressure characteristics include high pressure, low pressure, heart rate.
14. device according to claim 12, which is characterized in that blood pressure characteristics extraction module includes:
Blood pressure characteristics extracting sub-module, for extracting the blood pressure characteristics under different prediction tasks;It is described difference prediction task include Long period, short cycle, coarseness and fine granularity predict task.
CN201611043691.9A 2016-11-21 2016-11-21 A kind of selection of data characteristics and prediction technique and device Active CN106777891B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611043691.9A CN106777891B (en) 2016-11-21 2016-11-21 A kind of selection of data characteristics and prediction technique and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611043691.9A CN106777891B (en) 2016-11-21 2016-11-21 A kind of selection of data characteristics and prediction technique and device

Publications (2)

Publication Number Publication Date
CN106777891A CN106777891A (en) 2017-05-31
CN106777891B true CN106777891B (en) 2019-06-07

Family

ID=58974807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611043691.9A Active CN106777891B (en) 2016-11-21 2016-11-21 A kind of selection of data characteristics and prediction technique and device

Country Status (1)

Country Link
CN (1) CN106777891B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203700B (en) * 2017-07-14 2020-05-05 清华-伯克利深圳学院筹备办公室 Method and device based on continuous blood glucose monitoring
US11139048B2 (en) 2017-07-18 2021-10-05 Analytics For Life Inc. Discovering novel features to use in machine learning techniques, such as machine learning techniques for diagnosing medical conditions
US11062792B2 (en) 2017-07-18 2021-07-13 Analytics For Life Inc. Discovering genomes to use in machine learning techniques
CN109285075B (en) * 2017-07-19 2022-03-01 腾讯科技(深圳)有限公司 Claims risk assessment method and device and server
CN107688872A (en) * 2017-08-20 2018-02-13 平安科技(深圳)有限公司 Forecast model establishes device, method and computer-readable recording medium
CN107622236B (en) * 2017-09-15 2020-12-04 安徽农业大学 Crop disease diagnosis and early warning method based on swarm and gradient lifting decision tree algorithm
CN107590741A (en) * 2017-09-19 2018-01-16 广东工业大学 A kind of method and system of predicted pictures popularity
CN107908819B (en) * 2017-10-19 2021-05-11 深圳和而泰智能控制股份有限公司 Method and device for predicting user state change
CN109712708B (en) * 2017-10-26 2020-10-30 普天信息技术有限公司 Health condition prediction method and device based on data mining
CN107910066A (en) * 2017-11-13 2018-04-13 医渡云(北京)技术有限公司 Case history appraisal procedure, device, electronic equipment and storage medium
CN109947811A (en) * 2017-11-29 2019-06-28 北京京东金融科技控股有限公司 Generic features library generating method and device, storage medium, electronic equipment
CN108197654A (en) * 2018-01-03 2018-06-22 杭州贝嘟科技有限公司 Stature data predication method, device, storage medium and equipment based on SVM algorithm
CN108511057A (en) * 2018-02-28 2018-09-07 北京和兴创联健康科技有限公司 Transfusion volume model foundation and prediction technique, device, equipment and its storage medium
CN108509761A (en) * 2018-03-26 2018-09-07 中山大学 A kind of drug targets prediction technique promoting decision tree and feature selecting based on gradient
CN109192315B (en) * 2018-06-23 2020-10-20 重庆大学 Comprehensive age detection system based on weighted kernel regression and packaged deviation search
CN109047698B (en) * 2018-09-03 2021-01-15 中冶连铸技术工程有限责任公司 Continuous casting billet fixed weight and fixed length online prediction method
CN109299732B (en) * 2018-09-12 2020-05-05 北京三快在线科技有限公司 Unmanned driving behavior decision and model training method and device and electronic equipment
CN109919196B (en) * 2019-02-01 2023-12-08 华南理工大学 Physique identification method based on feature selection and classification model
TWI693062B (en) * 2019-04-25 2020-05-11 緯創資通股份有限公司 Method and electronic device for predicting sudden drop in blood pressure
CN110558960A (en) * 2019-09-10 2019-12-13 重庆大学 continuous blood pressure non-invasive monitoring method based on PTT and MIV-GA-SVR
CN111428930A (en) * 2020-03-24 2020-07-17 中电药明数据科技(成都)有限公司 GBDT-based medicine patient using number prediction method and system
CN112784492A (en) * 2021-01-26 2021-05-11 上海黑瞳信息技术有限公司 Automatic modeling system for machine learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130080808A1 (en) * 2011-09-28 2013-03-28 The Trustees Of Princeton University Biomedical device for comprehensive and adaptive data-driven patient monitoring
CN103876734A (en) * 2014-03-24 2014-06-25 北京工业大学 Electroencephalogram feature selection approach based on decision-making tree
CN104274164A (en) * 2013-07-05 2015-01-14 广州华久信息科技有限公司 Blood pressure predicting method and mobile phone based on facial image

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130080808A1 (en) * 2011-09-28 2013-03-28 The Trustees Of Princeton University Biomedical device for comprehensive and adaptive data-driven patient monitoring
CN104274164A (en) * 2013-07-05 2015-01-14 广州华久信息科技有限公司 Blood pressure predicting method and mobile phone based on facial image
CN103876734A (en) * 2014-03-24 2014-06-25 北京工业大学 Electroencephalogram feature selection approach based on decision-making tree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于支持向量机的特征提取方法研究与应用;蒋琳;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20070615(第06期);论文摘要、第17-36页 *

Also Published As

Publication number Publication date
CN106777891A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
CN106777891B (en) A kind of selection of data characteristics and prediction technique and device
Krause et al. A workflow for visual diagnostics of binary classifiers using instance-level explanations
Karthiga et al. Early prediction of heart disease using decision tree algorithm
CN111967495B (en) Classification recognition model construction method
CN104750819B (en) The Biomedical literature search method and system of a kind of word-based grading sorting algorithm
CN109948647A (en) A kind of electrocardiogram classification method and system based on depth residual error network
CN109344250A (en) Single diseases diagnostic message rapid structure method based on medical insurance data
CN106951499A (en) A kind of knowledge mapping method for expressing based on translation model
CN104657574B (en) The method for building up and device of a kind of medical diagnosismode
CN106529110A (en) Classification method and equipment of user data
CN108416373A (en) A kind of unbalanced data categorizing system based on regularization Fisher threshold value selection strategies
Weitschek et al. Clinical data mining: problems, pitfalls and solutions
CN113051404A (en) Knowledge reasoning method, device and equipment based on tensor decomposition
CN107766695B (en) A kind of method and device obtaining peripheral blood genetic model training data
CN115954072A (en) Intelligent clinical test scheme generation method and related device
CN117297606A (en) Emotion recognition method and device, electronic equipment and storage medium
CN110299194A (en) The similar case recommended method with the wide depth model of improvement is indicated based on comprehensive characteristics
Balamurugan et al. An integrated approach to performance measurement, analysis, improvements and knowledge management in healthcare sector
Azeem et al. Mobile Big Data Analytics Using Deep Learning and Apache Spark
CN112071431B (en) Clinical path automatic generation method and system based on deep learning and knowledge graph
CN110021386A (en) Feature extracting method and feature deriving means, equipment, storage medium
CN114048320B (en) Multi-label international disease classification training method based on course learning
Ardan et al. Design of Brain Tumor Detection System on MRI Image Using CNN
Yang et al. Process mining the trauma resuscitation patient cohorts
CN112686306B (en) ICD operation classification automatic matching method and system based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant