CN106777891B - A kind of selection of data characteristics and prediction technique and device - Google Patents
A kind of selection of data characteristics and prediction technique and device Download PDFInfo
- Publication number
- CN106777891B CN106777891B CN201611043691.9A CN201611043691A CN106777891B CN 106777891 B CN106777891 B CN 106777891B CN 201611043691 A CN201611043691 A CN 201611043691A CN 106777891 B CN106777891 B CN 106777891B
- Authority
- CN
- China
- Prior art keywords
- blood pressure
- user
- data
- model
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000036772 blood pressure Effects 0.000 claims abstract description 137
- 238000012549 training Methods 0.000 claims abstract description 102
- 238000003066 decision tree Methods 0.000 claims abstract description 65
- 238000012545 processing Methods 0.000 claims abstract description 10
- 239000003814 drug Substances 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 15
- 229940079593 drug Drugs 0.000 claims description 13
- 238000004140 cleaning Methods 0.000 abstract description 2
- 238000012706 support-vector machine Methods 0.000 description 28
- 230000006870 function Effects 0.000 description 26
- 238000012360 testing method Methods 0.000 description 9
- 238000009530 blood pressure measurement Methods 0.000 description 8
- 238000010801 machine learning Methods 0.000 description 7
- 239000008280 blood Substances 0.000 description 6
- 210000004369 blood Anatomy 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 208000001953 Hypotension Diseases 0.000 description 1
- 206010034719 Personality change Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G06F19/32—
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention discloses data characteristics selection and prediction technique and devices.Method includes: step S1, acquisition user information and corresponding blood pressure observation data, forms data set, and remove from the data set outlier;Step S2, user characteristics are extracted from the user information in the data set;Step S3, blood pressure characteristics are extracted from the blood pressure observation data in the data set;Step S4, extracted user characteristics and blood pressure characteristics are normalized, processing result forms training set as training sample, it is input among supporting vector machine model and/or Gradient Iteration decision-tree model using the training sample in the training set, training obtains prediction model.The present invention chooses work, the accuracy of effective lift scheme using the cleaning of medical knowledge guide data and Feature Engineering.
Description
Technical field
The present invention relates to machine learning and area of pattern recognition, the mainly feature selection approach in machine learning, and tie
Gradient Iteration decision tree and supporting vector machine model are closed, the method and device of data characteristics selection and prediction is carried out.
Background technique
With the development of computer technology, computer can handle a variety of different data at present, help people more
Add the task of being efficiently completed.Especially in artificial intelligence field, machine learning has been widely applied to as a core technology
In many particular problems.Support vector machines (SVM) is one of the model of machine learning classics, it can also efficiently be obtained simultaneously very much
Obtain good prediction result.Gradient Iteration decision tree (GBDT) is the in recent years very popular machine learning method of current industry, it
From classical decision tree (Decision Tree) model.
In recent years, portable medical is a global in recent years market focus, and transboundary fusion is its essential characteristic, big data
Prediction and application be even more Bright Prospect.
Summary of the invention
Based on the above issues, the screening model of relevant user blood pressure data sequence is established in present invention exploitation, is striven for individual character
Change user and optimization strategy and intuitive quantization guidance are provided, assists the intervening measure for realizing maximum efficiency, provide individual character for user
The Feature Selection service of change.
According to an aspect of the present invention, a kind of selection of data characteristics and prediction technique are provided, the method comprising the steps of:
Step S1, it acquires user information and corresponding blood pressure observes data, form data set, and pick from the data set
Except outlier;
Step S2, user characteristics are extracted from the user information in the data set;
Step S3, blood pressure characteristics are extracted from the blood pressure observation data in the data set;
Step S4, extracted user characteristics and blood pressure characteristics are normalized, processing result is as training sample
This formation training set is input to supporting vector machine model and/or Gradient Iteration decision using the training sample in the training set
Among tree-model, training obtains prediction model.
Wherein, the user characteristics include age, gender and the body-mass index of user;The blood pressure characteristics include height
Pressure, low pressure, heart rate and medication situation.
Wherein, the extraction of blood pressure characteristics described in step S3 includes: the blood pressure characteristics extracted under different prediction tasks;It is described
Different prediction tasks include long period, short cycle, coarseness and fine granularity prediction task.
Wherein, support vector machines and/or gradient are input to using the training sample in the training set described in step S4
Among iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set
The average value of feature and the average value of the blood pressure characteristics in the first predetermined acquisition time are pressed, is input in supporting vector machine model,
The supporting vector machine model uses regression model, and the kernel function of the regression model uses linear kernel;
By same user in the output of the supporting vector machine model and the training set in the second predetermined acquisition time
Blood pressure characteristics be compared, and then update the parameter of the supporting vector machine model;The second predetermined acquisition time is later than
The first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model restrains, obtains the first prediction model.
Wherein, support vector machines and/or gradient are input to using the training sample in the training set described in step S4
Among iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set
The average value of feature and the average value of the blood pressure characteristics in the predetermined acquisition time of third are pressed, Gradient Iteration decision-tree model is input to
In, the loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By same user in the output of the Gradient Iteration decision-tree model and the training set in the 4th predetermined acquisition
Interior blood pressure characteristics are compared, and then update the parameter of the Gradient Iteration decision-tree model;Described 4th predetermined acquisition
Time is later than the predetermined acquisition time of the third;
Iteration executes above-mentioned steps, until the parameter of the Gradient Iteration decision tree restrains, obtains the second prediction model.
Wherein, support vector machines and/or gradient are input to using the training sample in the training set described in step S4
Among iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set
The average value of feature and the average value of the blood pressure characteristics in the first predetermined acquisition time are pressed, is input to supporting vector machine model in
Gradient Iteration decision-tree model, the supporting vector machine model use regression model, and the kernel function of the regression model uses line
Property core;The loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By the output of the supporting vector machine model and the Gradient Iteration decision-tree model respectively and in the training set
Blood pressure characteristics of the same user in the second predetermined acquisition time are compared, and then update the supporting vector machine model respectively
With the parameter of the Gradient Iteration decision-tree model;The second predetermined acquisition time is later than the described first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model and the Gradient Iteration decision-tree model
Convergence, obtains the first prediction model.
Wherein, step S1 further includes removing from the data set outlier, comprising:
Remove the age not user information in predetermined the range of age and the corresponding blood pressure data of user;
Remove the height not user information in predetermined height ranges and the corresponding blood pressure data of user;
Remove the weight not user information in predetermined weight range and the corresponding blood pressure data of user;
Remove the pressure value not user information in predetermined blood pressure range and the corresponding blood pressure data of user;
Remove user information and corresponding blood pressure data of the heart rate of user not within the scope of target heart rate.
According to a second aspect of the present invention, a kind of selection of data characteristics and prediction meanss are provided, comprising:
Acquisition module forms data set, and from the data for acquiring user information and corresponding blood pressure observation data
Concentrate excluding outlier point;
User characteristics extraction module, for extracting user characteristics from the user information in the data set;
Blood pressure characteristics extraction module, for extracting blood pressure characteristics from the blood pressure observation data in the data set;
Training module, for extracted user characteristics and blood pressure characteristics to be normalized, processing result conduct
Training sample forms training set, is input to supporting vector machine model using the training sample in the training set and/or gradient changes
Among decision-tree model, training obtains prediction model.
Wherein, blood pressure characteristics extraction module includes:
Blood pressure characteristics extracting sub-module, for extracting the blood pressure characteristics under different prediction tasks;The difference prediction task
Task is predicted including long period, short cycle, coarseness and fine granularity.
The present invention using medical knowledge guide data cleaning and Feature Engineering choose work, effective lift scheme it is accurate
Property.
Detailed description of the invention
Fig. 1 is the flow chart of data characteristics selection and prediction technique proposed by the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and referring to attached
Figure, the present invention is described in more detail.
As shown in Figure 1, the method comprising the steps of the invention proposes a kind of selection of data characteristics and prediction technique:
Step S1, it acquires user information and corresponding blood pressure observes data, form data set, and pick from the data set
Except outlier;
Step S2, user characteristics are extracted from the user information in the data set;
Step S3, blood pressure characteristics are extracted from the blood pressure observation data in the data set;
Step S4, extracted user characteristics and blood pressure characteristics are normalized, processing result is as training sample
This formation training set is input to supporting vector machine model and/or Gradient Iteration decision using the training sample in the training set
Among tree-model, training obtains prediction model.
In one embodiment, the user characteristics include age, gender and the body-mass index of user;The blood pressure is special
Sign includes high pressure, low pressure, heart rate.
The extraction of blood pressure characteristics described in step S3 includes: the blood pressure characteristics extracted under different prediction tasks;The difference
Prediction task includes long period, short cycle, coarseness and fine granularity prediction task.
In one embodiment, the present invention can train SVM model and GBDT model simultaneously, and utilize above-mentioned two mould simultaneously
Type predicts user's blood pressure;In another embodiment, SVM model or GBDT model can also be individually trained, and utilizes instruction
The SVM model or GBDT model perfected are predicted.
In one embodiment, support vector machines is input to using the training sample in the training set described in step S4
And/or among Gradient Iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set
The average value of feature and the average value of the blood pressure characteristics in the first predetermined acquisition time are pressed, is input in supporting vector machine model,
The supporting vector machine model uses regression model, and the kernel function of the regression model uses linear kernel;
By same user in the output of the supporting vector machine model and the training set in the second predetermined acquisition time
Blood pressure characteristics be compared, and then update the parameter of the supporting vector machine model;The second predetermined acquisition time is later than
The first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model restrains, obtains the first prediction model.
In another embodiment, support vector machines is input to using the training sample in the training set described in step S4
And/or among Gradient Iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set
The average value of feature and the average value of the blood pressure characteristics in the predetermined acquisition time of third are pressed, Gradient Iteration decision-tree model is input to
In, the loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By same user in the output of the Gradient Iteration decision-tree model and the training set in the 4th predetermined acquisition
Interior blood pressure characteristics are compared, and then update the parameter of the Gradient Iteration decision-tree model;Described 4th predetermined acquisition
Time is later than the predetermined acquisition time of the third;
Iteration executes above-mentioned steps, until the parameter of the Gradient Iteration decision tree restrains, obtains the second prediction model.
In other embodiments, support vector machines is input to using the training sample in the training set described in step S4
And/or among Gradient Iteration decision-tree model, training obtains prediction model, comprising:
The user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood of half a month are extracted from the training set
The average value of feature and the average value of the blood pressure characteristics in the first predetermined acquisition time are pressed, is input to supporting vector machine model in
Gradient Iteration decision-tree model, the supporting vector machine model use regression model, and the kernel function of the regression model uses line
Property core;The loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By the output of the supporting vector machine model and the Gradient Iteration decision-tree model respectively and in the training set
Blood pressure characteristics of the same user in the second predetermined acquisition time are compared, and then update the supporting vector machine model respectively
With the parameter of the Gradient Iteration decision-tree model;The second predetermined acquisition time is later than the described first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model and the Gradient Iteration decision-tree model
Convergence, obtains the first prediction model.
In one embodiment, step S1 further includes removing from the data set outlier, comprising:
Remove the age not user information in predetermined the range of age and the corresponding blood pressure data of user;
Remove the height not user information in predetermined height ranges and the corresponding blood pressure data of user;
Remove the weight not user information in predetermined weight range and the corresponding blood pressure data of user;
Remove the pressure value not user information in predetermined blood pressure range and the corresponding blood pressure data of user;
Remove user information and corresponding blood pressure data of the heart rate of user not within the scope of target heart rate.
Technical solution of the present invention is discussed in detail below by specific embodiment.
In one embodiment, the invention proposes a kind of selection of data characteristics and prediction techniques comprising:
Step 101, userspersonal information's data and blood pressure are collected and observes data, and by collected userspersonal information and
Blood pressure is observed data and is imported among database, and the users personal data includes age of user, gender, height, weight, body matter
Volume index (BMI), time of measuring etc.;The blood pressure observation data include high pressure, low pressure, heart rate, medication situation, measurement month letter
Breath etc..Data are cleaned, data are observed to userspersonal information's data and blood pressure according to relevant medical knowledge, leave out outlier
(i.e. abnormal userspersonal information's data and blood pressure observe data), data set is become to can be used for machine learning training pattern
Target data.
The specific screening rule of outlier: the age is not in predetermined the range of age in removal userspersonal information's data
Data, such as age are greater than 110 years old and the user less than 10 years old;The height not data in predetermined height ranges are removed, such as
Data of the height less than 120 centimetres or greater than 200 centimetres;Remove the data in the no longer predetermined weight range of weight, such as body
It is less than 20kg or the data greater than 130kg again;Remove blood pressure not data in predetermined blood pressure range, for example, low pressure be less than and
Greater than the observation data of the user's history averaged blood pressure measurements 40, high pressure is removed smaller and larger than the user's history blood pressure measurement
The observation data of average value 40;Remove the observation data that heart rate is 0.
Step 102, the feature of user, including age, gender and body-mass index are chosen from database.According to authority
Known to medical information: age of user is bigger, and blood pressure is higher;Male's blood pressure is generally slightly above women;Body-mass index (BMI) is more
High (approximation represents fatter), blood pressure is higher.Extracting feature includes: the age, gender in userspersonal information's data are (with 0 table
Show women, 1 indicates male), and BMI (weight/height square) is converted by height, weight.
Step S3, chooses blood pressure characteristics from database, and including the blood pressure characteristics under different prediction tasks, difference prediction is appointed
Business includes the prediction task of the different accuracies such as long period, short cycle, coarseness and fine granularity, selected under different prediction tasks
The blood pressure characteristics taken include high pressure, low pressure, heart rate, medication situation.It includes user's high pressure, low pressure, heart rate, clothes that blood pressure, which observes data,
Medicine situation, measurement month information.In this step, it has been further introduced into different prediction tasks.Such as long period and short cycle
Prediction is respectively indicated and is inputted continuous 6 months or 3 months blood pressure datas of user as feature, if having it is of that month without measuring if use
Vacancy value replaces.When coarseness is predicted, inputted using 2 months or 3 months user's averaged blood pressure measurements as feature, fine granularity is pre-
When survey, inputted using one month or half of user's averaged blood pressure measurements as feature.
Step 103, to characteristic (BMI, age, the gender etc. of high pressure, low pressure, heart rate and user including measurement,
I.e. from the characteristic in the predetermined time obtained in training data) and target data (be later than obtained in the training data
The pressure value of a period of time of the predetermined time is as target data) normalized is done, by the scope control of data in 0 He
Between 1.Normalized processing formula is as follows:
Wherein minimum value refers to this feature existing the smallest value in the database, and maximum value is wherein most
A big value.The processing of month information is encoded using one-hot, integer data is expanded into 0 and 1 coding, passes through 1
The value encoded is expressed in position in the sequence, so that 12 month information is all converted to same status.
Step 104, using support vector machines (SVM) and Gradient Iteration decision tree (GBDT) to treated characteristic
(including user characteristics and blood pressure measurement feature) and target data carry out recurrence learning, construct the prediction mould of user's future blood pressure
Type.Using the above user characteristics, blood pressure measurement feature and the corresponding month information of every blood pressure measurement feature as training data
Normalized is done, is put among support vector machines (SVM) and Gradient Iteration decision tree (GBDT) model, until the parameter of model
Convergence, the parameter obtained at this time make model relative to being optimal of training data.It is experimentally confirmed in SVM model,
It is regression model when choosing training pattern, it is best that kernel function is selected as effect when linear kernel (linear kernel).In Gradient Iteration
In decision-tree model, loss function is chosen for least square difference function (least square error), will with predict function
Prediction label output.
In order to verify implementation result of the invention, next made further with the experimental result on truthful data
It is bright.Specific step is as follows:
Step 201, due to single blood pressure measurement can not the accurate description user blood pressure situation because for a use
The average blood pressure that family acquires one month is arranged into data set.
Step 202, first the initial data in the data set is converted to the feature of suitable training pattern, chosen later
There is within continuous six months the user of observation data out, can guarantee the continuity of user's measurement in this way, promote the accuracy of prediction.Example
(the N-5 month to the N+1 month) is such as selected continuous seven month there are the data of the user of observational record to do training (for example, by using August part and 9
The user that month occurs simultaneously does training), the last one month N+1 month is as training objective;Using continuous (the N-4 month in seven months
To the N+2 month) user that has observational record tests (such as being tested with September And October while the user that occurs), the last one
The N+2 month moon is as test target.
Step S3, SVM Experiment Training integrates target as the average low pressures of the N+1 month, by the prediction result of model output and the N+1 month
Data compare to update model parameter.Next we extract 1) with 2) two kinds of strategies as short cycle and long period
Typical case.Specific training set feature extraction rule is as follows:
1) the N-2-N month: BMI (weight/height square) that the height and weight for extracting user are converted to, gender, age;
Individually be averaged N-2, N-1, the N month high pressure, low pressure, heart rate, medication situation;N-2, N-1, the N month per two weeks are averaged high pressure, low
Pressure, heart rate, medication situation;Be averaged N-2, N-1, N March high pressure, low pressure, heart rate, medication situation.
2) the N-5-N month: BMI (weight/height square) that the height and weight for extracting user are converted to, gender, age;
User is in the average high pressure of the N-5-N month list moon, low pressure, heart rate, medication situation;N-5-N per two weeks is averaged high pressure, low pressure, the heart
Rate, situation of taking medicine;Quarter-yearly average high pressure, low pressure, heart rate, situation of taking medicine.
Step S4, it is as follows that SVM tests test set extracting rule:
1) the N-1-N+1 month: the corresponding training set N-2-N month, the BMI (weight/body that the height and weight for extracting user are converted to
High square), gender, age;N-1, N, the N+1 month, individually averagely high pressure, low pressure, heart rate, N-1, N, the N+1 month per two weeks were flat
Equal high pressure, low pressure, heart rate;Be averaged N-1, N, N+1 March high pressure, low pressure, heart rate.
2) the N-4-N+1 month: the corresponding training set N-5-N month, BMI (weight/height is converted by the height of user and weight
Square), gender, age;Average high pressure, low pressure, heart rate, medication of the user in the N-4-N+1 month list moon;N-4-N+1 per two weeks
Average high pressure, low pressure, heart rate, medication;Quarter-yearly average high pressure, low pressure, heart rate, medication.
Training set is input among lib-SVM model by step S5, does training until model convergence, Optimized model parameter.
I.e. exportable prediction result in trained model is input the feature into, and compared with test set target, obtains what low pressure returned
Mean error.
SVM model construction is as follows:
Firstly, defining the function interval of hyperplane (w, b) about training datasetAre as follows:
Wherein, x is characteristic, and y is target data;
Therefore largest interval classifier objective function can be with is defined as:
It is further rewritten as:
Wherein, n is number of samples, yiIndicate the target data of i-th of sample, xiIndicate the characteristic of i-th of sample;
Objective function can be merged by Lagrangian method later with restrictive condition, be rewritten into general convex optimization
Problem is in order to calculating.It, can be by this hyperplane according to the available optimum regression hyperplane of this objective function
Row prediction.
It needs to be arranged accordingly in lib-SVM, suitable support vector machines kernel function is selected by input instruction
And training setting.- s indicates the setting type of SVM, and 4 (nu-SVR, regression) of selection are regression model, and-t represents core
The selection of function, selecting 0 (linear kernel) is kernel function, and it is best to be experimentally confirmed this setting effect.
Lib-SVM can store the resulting model parameter of training, can be to survey using svm_predict function
Examination collection predict and evaluation model performance.Step S6, GBDT experiment test identical feature extraction rule, weight using with SVM
Multiple S3, S4, S5 step.Training set feature and target are input among GBDT model.
Realize that GBDT is returned using the GBDT kit encapsulated in open source Machine learning tools scikit-learn, data
It only needs to import and store into list format from file with Python.Data and label respectively correspond a list, identical bits
It sets corresponding.
GBDT model construction:
The core of GBDT is decision tree (Decision Tree), and the overall procedure of decision tree is such that each of tree
Node can all obtain a predicted value, this predicted value is equal to the average value for belonging to all features of this node.It measures best
Standard be minimize mean square deviation.The branch foundation near spectrum can be found by minimizing mean square deviation.
The core concept of Gradient Iteration (Gradient Boosting) is by iteration more trees come Shared Decision Making.Therefore,
The training method of available GBDT, i.e., every one tree is the residual errors for setting conclusion sums all before, this residual error is exactly one
The accumulation amount of true value can be obtained after a plus predicted value.By this method, GBDT can integrate the prediction of multiple decision trees simultaneously
Obtain more accurate prediction result.
The GradientBoostingRegressot function in scikit-learn is called to carry out training pattern, decision tree
Depth is 3 layers, and learning rate is set as 0.005.It is best to be experimentally confirmed this setting effect.Model parameter can quilt after the completion of training
It stores, by calling predict function that can predict using the model parameter come out is learned test set, and comments
Valence model performance.
Blood pressure is obtained classification error with 10 for interval division by step S7, and specific hierarchical policy is as shown in table 1.Obtain SVM
With the experimental result of GBDT respectively as shown in table 2, table 3, object of experiment month is October.
Evaluation index explanation:
Mean error: the average value of all data predicted values and true value difference.
Be classified error: all data obtain the average value of classification results Yu true classification results difference.
Relatively accurate rate: mean predicted value/average true value
1 blood pressure low voltage value category level of table
Low voltage value | Category level |
< 80 | 1 |
80-90 | 2 |
90-100 | 3 |
100-110 | 4 |
> 110 | 5 |
2 support vector machines of table (SVM) experimental result
SVM predicts that user tests in the average low pressures in October, 2015
Table 3 Gradient Iteration decision tree (GBDT) experimental result
GBDT predicts that user tests in the average low pressures in October, 2015
Step S8 compares experimental results in table 2,3 and fitted data basic (Baseline).Baseline is
The numerical value in October is directly fitted with the low pressure data of user's September, as shown in table 4.
4 fitted data of table is basic (Baseline)
Month | Mean error | Average error rate | It is classified error | Sample number |
October | 5.27692 | 0.0638 | 0.43691 | 3012 |
By the experimental result in table it can be concluded that, compared with the baseline of fitted data basis under, it is average in low pressure
It is obviously improved in terms of error, SVM model short cycle and long period prediction improve 10.37% and 11.14% respectively;GBDT
Model short cycle and macrocyclic prediction improve 10.75% and 11.45% respectively.In terms of being classified error, with baseline
It compares, SVM model short cycle and long period prediction improve 2.85% and 8.43% respectively;GBDT model short cycle and long period
Prediction improve 8.43% and 10.48% respectively.
Particular embodiments described above has carried out further specifically the purpose of the present invention, technical solution and effect
It is bright, it should be understood that the above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all at this
Within the spirit and principle of invention, any modification, equivalent substitution, improvement and etc. done should be included in protection model of the invention
Within enclosing.
Claims (14)
1. a kind of data characteristics selection and prediction technique, the method comprising the steps of:
Step S1, it acquires user information and corresponding blood pressure observes data, form data set, and remove from the data set different
Constant value point;
Step S2, user characteristics are extracted from the user information in the data set;
Step S3, blood pressure characteristics are extracted from the blood pressure observation data in the data set;
Step S4, extracted user characteristics and blood pressure characteristics are normalized, processing result is as training sample shape
At training set, it is input to Gradient Iteration decision-tree model using the training sample in the training set, is specifically included:
It is special that the user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood pressure of half a month are extracted from the training set
The average value of blood pressure characteristics in the predetermined acquisition time of average value and third of sign, is input in Gradient Iteration decision-tree model,
The loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By same user in the output of the Gradient Iteration decision-tree model and the training set in the 4th predetermined acquisition time
Blood pressure characteristics be compared, and then update the parameter of the Gradient Iteration decision-tree model;The 4th predetermined acquisition time
It is later than the predetermined acquisition time of the third;
Iteration executes above-mentioned steps, until the parameter of the Gradient Iteration decision tree restrains, obtains the second prediction model.
2. the method according to claim 1, wherein the user characteristics include age, gender and the body of user
Body mass index;The blood pressure characteristics include high pressure, low pressure, heart rate and medication situation.
3. according to the method described in claim 2, it is characterized in that, the extraction of blood pressure characteristics described in step S3 includes: to extract
Blood pressure characteristics under different prediction tasks;The difference prediction task includes long period, short cycle, coarseness and fine granularity prediction
Task.
4. the method as described in claim 1, which is characterized in that remove from the data set outlier in step S1, wrap
It includes:
Remove the age not user information in predetermined the range of age and the corresponding blood pressure data of user;
Remove the height not user information in predetermined height ranges and the corresponding blood pressure data of user;
Remove the weight not user information in predetermined weight range and the corresponding blood pressure data of user;
Remove the pressure value not user information in predetermined blood pressure range and the corresponding blood pressure data of user;
Remove user information and corresponding blood pressure data of the heart rate of user not within the scope of target heart rate.
5. a kind of data characteristics selection and prediction technique, the method comprising the steps of:
Step S1, it acquires user information and corresponding blood pressure observes data, form data set, and remove from the data set different
Constant value point;
Step S2, user characteristics are extracted from the user information in the data set;
Step S3, blood pressure characteristics are extracted from the blood pressure observation data in the data set;
Step S4, extracted user characteristics and blood pressure characteristics are normalized, processing result is as training sample shape
At training set, using the training sample in the training set be input to supporting vector machine model and Gradient Iteration decision-tree model it
In, training obtains prediction model, it specifically includes:
It is special that the user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood pressure of half a month are extracted from the training set
The average value of the average value of sign and the blood pressure characteristics in the first predetermined acquisition time, is input to supporting vector machine model and gradient changes
For decision-tree model, the supporting vector machine model uses regression model, and the kernel function of the regression model uses linear kernel;Institute
The loss function for stating Gradient Iteration decision-tree model is adopted as least square difference function;
By the output of the supporting vector machine model and the Gradient Iteration decision-tree model respectively with it is same in the training set
Blood pressure characteristics of the user in the second predetermined acquisition time are compared, and then update the supporting vector machine model and institute respectively
State the parameter of Gradient Iteration decision-tree model;The second predetermined acquisition time is later than the described first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model and the Gradient Iteration decision-tree model is received
It holds back, obtains the first prediction model.
6. according to the method described in claim 5, it is characterized in that, the user characteristics include age, gender and the body of user
Body mass index;The blood pressure characteristics include high pressure, low pressure, heart rate and medication situation.
7. according to the method described in claim 6, it is characterized in that, the extraction of blood pressure characteristics described in step S3 includes: to extract
Blood pressure characteristics under different prediction tasks;The difference prediction task includes long period, short cycle, coarseness and fine granularity prediction
Task.
8. method as claimed in claim 5, which is characterized in that remove from the data set outlier in step S1, wrap
It includes:
Remove the age not user information in predetermined the range of age and the corresponding blood pressure data of user;
Remove the height not user information in predetermined height ranges and the corresponding blood pressure data of user;
Remove the weight not user information in predetermined weight range and the corresponding blood pressure data of user;
Remove the pressure value not user information in predetermined blood pressure range and the corresponding blood pressure data of user;
Remove user information and corresponding blood pressure data of the heart rate of user not within the scope of target heart rate.
9. a kind of data characteristics selection and prediction meanss characterized by comprising
Acquisition module forms data set, and from the data set for acquiring user information and corresponding blood pressure observation data
Excluding outlier point;
User characteristics extraction module, for extracting user characteristics from the user information in the data set;
Blood pressure characteristics extraction module, for extracting blood pressure characteristics from the blood pressure observation data in the data set;
Training module, for extracted user characteristics and blood pressure characteristics to be normalized, processing result is as training
Sample forms training set, is input among Gradient Iteration decision-tree model using the training sample in the training set, trained
To prediction model, specifically include:
It is special that the user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood pressure of half a month are extracted from the training set
The average value of blood pressure characteristics in the predetermined acquisition time of average value and third of sign, is input in Gradient Iteration decision-tree model,
The loss function of the Gradient Iteration decision-tree model is adopted as least square difference function;
By same user in the output of the Gradient Iteration decision-tree model and the training set in the 4th predetermined acquisition time
Blood pressure characteristics be compared, and then update the parameter of the Gradient Iteration decision-tree model;The 4th predetermined acquisition time
It is later than the predetermined acquisition time of the third;
Iteration executes above-mentioned steps, until the parameter of the Gradient Iteration decision tree restrains, obtains the second prediction model.
10. device according to claim 9, which is characterized in that the user characteristics include age, gender and the body of user
Body mass index;The blood pressure characteristics include high pressure, low pressure, heart rate.
11. device according to claim 9, which is characterized in that blood pressure characteristics extraction module includes:
Blood pressure characteristics extracting sub-module, for extracting the blood pressure characteristics under different prediction tasks;It is described difference prediction task include
Long period, short cycle, coarseness and fine granularity predict task.
12. a kind of data characteristics selection and prediction meanss characterized by comprising
Acquisition module forms data set, and from the data set for acquiring user information and corresponding blood pressure observation data
Excluding outlier point;
User characteristics extraction module, for extracting user characteristics from the user information in the data set;
Blood pressure characteristics extraction module, for extracting blood pressure characteristics from the blood pressure observation data in the data set;
Training module, for extracted user characteristics and blood pressure characteristics to be normalized, processing result is as training
Sample forms training set, is input to supporting vector machine model and Gradient Iteration decision tree using the training sample in the training set
Among model, training obtains prediction model, specifically includes:
It is special that the user characteristics of same user, the average value of the blood pressure characteristics of Dan Yue, the blood pressure of half a month are extracted from the training set
The average value of the average value of sign and the blood pressure characteristics in the first predetermined acquisition time, is input to supporting vector machine model and gradient changes
For decision-tree model, the supporting vector machine model uses regression model, and the kernel function of the regression model uses linear kernel;Institute
The loss function for stating Gradient Iteration decision-tree model is adopted as least square difference function;
By the output of the supporting vector machine model and the Gradient Iteration decision-tree model respectively with it is same in the training set
Blood pressure characteristics of the user in the second predetermined acquisition time are compared, and then update the supporting vector machine model and institute respectively
State the parameter of Gradient Iteration decision-tree model;The second predetermined acquisition time is later than the described first predetermined acquisition time;
Iteration executes above-mentioned steps, until the parameter of the supporting vector machine model and the Gradient Iteration decision-tree model is received
It holds back, obtains the first prediction model.
13. device according to claim 12, which is characterized in that the user characteristics include age of user, gender and
Body-mass index;The blood pressure characteristics include high pressure, low pressure, heart rate.
14. device according to claim 12, which is characterized in that blood pressure characteristics extraction module includes:
Blood pressure characteristics extracting sub-module, for extracting the blood pressure characteristics under different prediction tasks;It is described difference prediction task include
Long period, short cycle, coarseness and fine granularity predict task.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611043691.9A CN106777891B (en) | 2016-11-21 | 2016-11-21 | A kind of selection of data characteristics and prediction technique and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611043691.9A CN106777891B (en) | 2016-11-21 | 2016-11-21 | A kind of selection of data characteristics and prediction technique and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106777891A CN106777891A (en) | 2017-05-31 |
CN106777891B true CN106777891B (en) | 2019-06-07 |
Family
ID=58974807
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611043691.9A Active CN106777891B (en) | 2016-11-21 | 2016-11-21 | A kind of selection of data characteristics and prediction technique and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106777891B (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107203700B (en) * | 2017-07-14 | 2020-05-05 | 清华-伯克利深圳学院筹备办公室 | Method and device based on continuous blood glucose monitoring |
US11139048B2 (en) | 2017-07-18 | 2021-10-05 | Analytics For Life Inc. | Discovering novel features to use in machine learning techniques, such as machine learning techniques for diagnosing medical conditions |
US11062792B2 (en) | 2017-07-18 | 2021-07-13 | Analytics For Life Inc. | Discovering genomes to use in machine learning techniques |
CN109285075B (en) * | 2017-07-19 | 2022-03-01 | 腾讯科技(深圳)有限公司 | Claims risk assessment method and device and server |
CN107688872A (en) * | 2017-08-20 | 2018-02-13 | 平安科技(深圳)有限公司 | Forecast model establishes device, method and computer-readable recording medium |
CN107622236B (en) * | 2017-09-15 | 2020-12-04 | 安徽农业大学 | Crop disease diagnosis and early warning method based on swarm and gradient lifting decision tree algorithm |
CN107590741A (en) * | 2017-09-19 | 2018-01-16 | 广东工业大学 | A kind of method and system of predicted pictures popularity |
CN107908819B (en) * | 2017-10-19 | 2021-05-11 | 深圳和而泰智能控制股份有限公司 | Method and device for predicting user state change |
CN109712708B (en) * | 2017-10-26 | 2020-10-30 | 普天信息技术有限公司 | Health condition prediction method and device based on data mining |
CN107910066A (en) * | 2017-11-13 | 2018-04-13 | 医渡云(北京)技术有限公司 | Case history appraisal procedure, device, electronic equipment and storage medium |
CN109947811A (en) * | 2017-11-29 | 2019-06-28 | 北京京东金融科技控股有限公司 | Generic features library generating method and device, storage medium, electronic equipment |
CN108197654A (en) * | 2018-01-03 | 2018-06-22 | 杭州贝嘟科技有限公司 | Stature data predication method, device, storage medium and equipment based on SVM algorithm |
CN108511057A (en) * | 2018-02-28 | 2018-09-07 | 北京和兴创联健康科技有限公司 | Transfusion volume model foundation and prediction technique, device, equipment and its storage medium |
CN108509761A (en) * | 2018-03-26 | 2018-09-07 | 中山大学 | A kind of drug targets prediction technique promoting decision tree and feature selecting based on gradient |
CN109192315B (en) * | 2018-06-23 | 2020-10-20 | 重庆大学 | Comprehensive age detection system based on weighted kernel regression and packaged deviation search |
CN109047698B (en) * | 2018-09-03 | 2021-01-15 | 中冶连铸技术工程有限责任公司 | Continuous casting billet fixed weight and fixed length online prediction method |
CN109299732B (en) * | 2018-09-12 | 2020-05-05 | 北京三快在线科技有限公司 | Unmanned driving behavior decision and model training method and device and electronic equipment |
CN109919196B (en) * | 2019-02-01 | 2023-12-08 | 华南理工大学 | Physique identification method based on feature selection and classification model |
TWI693062B (en) * | 2019-04-25 | 2020-05-11 | 緯創資通股份有限公司 | Method and electronic device for predicting sudden drop in blood pressure |
CN110558960A (en) * | 2019-09-10 | 2019-12-13 | 重庆大学 | continuous blood pressure non-invasive monitoring method based on PTT and MIV-GA-SVR |
CN111428930A (en) * | 2020-03-24 | 2020-07-17 | 中电药明数据科技(成都)有限公司 | GBDT-based medicine patient using number prediction method and system |
CN112784492A (en) * | 2021-01-26 | 2021-05-11 | 上海黑瞳信息技术有限公司 | Automatic modeling system for machine learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130080808A1 (en) * | 2011-09-28 | 2013-03-28 | The Trustees Of Princeton University | Biomedical device for comprehensive and adaptive data-driven patient monitoring |
CN103876734A (en) * | 2014-03-24 | 2014-06-25 | 北京工业大学 | Electroencephalogram feature selection approach based on decision-making tree |
CN104274164A (en) * | 2013-07-05 | 2015-01-14 | 广州华久信息科技有限公司 | Blood pressure predicting method and mobile phone based on facial image |
-
2016
- 2016-11-21 CN CN201611043691.9A patent/CN106777891B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130080808A1 (en) * | 2011-09-28 | 2013-03-28 | The Trustees Of Princeton University | Biomedical device for comprehensive and adaptive data-driven patient monitoring |
CN104274164A (en) * | 2013-07-05 | 2015-01-14 | 广州华久信息科技有限公司 | Blood pressure predicting method and mobile phone based on facial image |
CN103876734A (en) * | 2014-03-24 | 2014-06-25 | 北京工业大学 | Electroencephalogram feature selection approach based on decision-making tree |
Non-Patent Citations (1)
Title |
---|
基于支持向量机的特征提取方法研究与应用;蒋琳;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20070615(第06期);论文摘要、第17-36页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106777891A (en) | 2017-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106777891B (en) | A kind of selection of data characteristics and prediction technique and device | |
Krause et al. | A workflow for visual diagnostics of binary classifiers using instance-level explanations | |
Karthiga et al. | Early prediction of heart disease using decision tree algorithm | |
CN111967495B (en) | Classification recognition model construction method | |
CN104750819B (en) | The Biomedical literature search method and system of a kind of word-based grading sorting algorithm | |
CN109948647A (en) | A kind of electrocardiogram classification method and system based on depth residual error network | |
CN109344250A (en) | Single diseases diagnostic message rapid structure method based on medical insurance data | |
CN106951499A (en) | A kind of knowledge mapping method for expressing based on translation model | |
CN104657574B (en) | The method for building up and device of a kind of medical diagnosismode | |
CN106529110A (en) | Classification method and equipment of user data | |
CN108416373A (en) | A kind of unbalanced data categorizing system based on regularization Fisher threshold value selection strategies | |
Weitschek et al. | Clinical data mining: problems, pitfalls and solutions | |
CN113051404A (en) | Knowledge reasoning method, device and equipment based on tensor decomposition | |
CN107766695B (en) | A kind of method and device obtaining peripheral blood genetic model training data | |
CN115954072A (en) | Intelligent clinical test scheme generation method and related device | |
CN117297606A (en) | Emotion recognition method and device, electronic equipment and storage medium | |
CN110299194A (en) | The similar case recommended method with the wide depth model of improvement is indicated based on comprehensive characteristics | |
Balamurugan et al. | An integrated approach to performance measurement, analysis, improvements and knowledge management in healthcare sector | |
Azeem et al. | Mobile Big Data Analytics Using Deep Learning and Apache Spark | |
CN112071431B (en) | Clinical path automatic generation method and system based on deep learning and knowledge graph | |
CN110021386A (en) | Feature extracting method and feature deriving means, equipment, storage medium | |
CN114048320B (en) | Multi-label international disease classification training method based on course learning | |
Ardan et al. | Design of Brain Tumor Detection System on MRI Image Using CNN | |
Yang et al. | Process mining the trauma resuscitation patient cohorts | |
CN112686306B (en) | ICD operation classification automatic matching method and system based on graph neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |