CN109344907A - Based on the method for discrimination for improving judgment criteria sorting algorithm - Google Patents
Based on the method for discrimination for improving judgment criteria sorting algorithm Download PDFInfo
- Publication number
- CN109344907A CN109344907A CN201811272036.XA CN201811272036A CN109344907A CN 109344907 A CN109344907 A CN 109344907A CN 201811272036 A CN201811272036 A CN 201811272036A CN 109344907 A CN109344907 A CN 109344907A
- Authority
- CN
- China
- Prior art keywords
- model
- random forest
- data
- forest model
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A kind of method of discrimination based on improvement judgment criteria sorting algorithm is to propose that one kind selectes random forest parameter based on more judging quotas by taking random forests algorithm as an example, up-samples the scheme that balance sample is distributed to construct new Random Forest model.By comparing improved Random Forest model and original Random Forest model, Logic Regression Models and supporting vector machine model, obtain the improved more excellent conclusion of Random Forest model performance, that is to say, it is bright based on more judging quota selected algorithm parameters be a kind of feasible scheme.Present method solves in the prior art, the usual maintenance data of judgement of actual scene classification excavate in sorting algorithm, but the problem of the sorting algorithm of usually data mining constructs model with single index, and the differentiation effect of model is not so good as people's will.
Description
Technical field
It is specifically a kind of based on improving judgment criteria sorting algorithm the invention belongs to the application field of data mining technology
Method of discrimination.
Background technique
Data mining technology plays an increasingly important role in life production, applies to speech recognition, and image is known
Not, in the actual scenes such as commercial product recommending.Sorting algorithm therein is one of important support of data mining technology.One perfect
Sorting algorithm can match in excellence or beauty perception of the mankind to things.But since still there are various each for present traditional classification algorithm
The defect of sample can not effectively divide things so still perfect sorting algorithm cannot be deserved to be called under special scenes
Class.Therefore, it is necessary to improve to traditional sorting algorithm, perfect sorting algorithm can be become closer to it.
Summary of the invention
To solve the above-mentioned problems, the present invention proposes a kind of new actual scene category classification method, the thinking of this method
It is described below:
Random forests algorithm was proposed in 2001 by Breiman, as a kind of efficient discriminant classification method, was applied to each
A field.The principle of random forest is that the forest of a decision tree is established with random manner, between every one tree in forest
Almost without association (could also say that association is smaller).It, can be by inputting new sample after Random Forest model building finishes
Eigen differentiates that the classification of sample to be tested, the precision of differentiation have raising by a relatively large margin for common decision tree.
The present invention is a kind of based on the method for discrimination for improving judgment criteria sorting algorithm, and step includes:
One, acquisition characteristics achievement data is first passed through as sample data, constructs Random Forest model;
Two, again in actual classification scene, the characteristic index data of personnel to be measured are acquired, are obtained using step 1 random
Forest model carries out quick discrimination to characteristic index data, differentiates the classification of personnel to be measured.
Random forests algorithm in step 1:
1, original random forests algorithm
Single decision tree is there are error is larger and the risk of over-fitting, in order to solve the problems, such as that decision tree exists,
Breiman proposed random forests algorithm in 2001.The core concept of random forests algorithm is,
1) sample data for extracting same size data volume is put back to firstly, concentrating from initial data;
2) a certain number of features then, are extracted from primitive character variable, are constituted into character subset;
3) finally, constructing decision tree with the sample data and character subset not beta pruning extracted.
It repeats above-mentioned three step and operates n times, form N decision tree, decision tree is integrated, using the criterion of majority ballot, most
The building of Random Forest model is completed eventually.
New samples characteristic variable input model, Random Forest model using the consistent result of the judgement of majority decision tree as
Final result.
Random forest is capable of handling high dimensional data, and it goes without doing Feature Selection, being capable of rapid build model.But model
Only depended in training process it is estimating outside bag as a result, and model evaluation index it is single, with single evaluation index selection parameter,
It is easy to cause the optimistic estimate of model performance.When sample data imbalance, it is easy to tend to most class samples, minority class is sentenced
Other effect is poor.Therefore it needs to overcome three disadvantages above.The invention proposes improved random forests algorithms.
2, improved random forests algorithm
Be directed to only depended on existing for original Random Forest model the result estimated outside bag, model evaluation index it is single and
Model tends to the problem of most class samples when sample imbalance, and the present invention proposes a kind of improved random forests algorithm.
A, for the improvement for only depending on estimated result outside bag.
The judge of original random forest only depends on to be estimated outside bag, this will be easy to cause the optimistic estimate of assessment result.
In order to overcome this drawback,
Data are first trained the division of collection and test set by the present invention, and cross validation is carried out on training set, pass through friendship
Fork verification result carrys out the performance of entry evaluation model and determines parameter;Again on test set assessment models performance.
The Performance Evaluation of both cross validation and test set is estimated to model performance outside bag better than only depending on to rely only on
Assessment.
B, the improvement single for model performance evaluation index.
The model evaluation of original random forest only depends on single evaluation index, cannot be effectively anti-in evaluation process
Answer sample class uneven or sample class importance this information.In order to overcome this drawback,
The present invention is in model training stage:
Firstly, calculating F1 statistic, optimal F1 statistic is selected and lower than the F1 system within 1 point 5 standard deviation
The model parameter of metering;
Then, nicety of grading is calculated in previous step candidate parameter, selects optimal nicety of grading and lower than a bit
Nicety of grading within five standard deviations;Parameter combination corresponding to these niceties of grading is as candidate parameter combination;
Finally, calculating AUC in a upper candidate parameter, select within optimal AUC and lower than 1 point 5 standard deviation
AUC, parameter combination corresponding to these AUC is as candidate parameter combination.
Candidate parameter in above-mentioned steps is substituted on test set, F1 statistic shows optimal parameter combination on test set
As last parameter combination.Using the model performance of final parameter as the assessment of final model performance.
C, it is partial to the improvement that most classes differentiate for sample data imbalance model.
Here mainly from data distribution angle is changed, main policies have up-sampling and down-sampling.When data distribution not
Balance, and when the quantity of two classifications is not especially more, using up-sampling strategy, expand the quantity of minority class sample;Work as number
It is uneven according to distribution, and two classifications quantity it is all many when, using down-sampling strategy, the quantity of less majority class samples.
In the prior art: comprehensive evaluation index F1 is the harmonic average of accurate rate (also referred to as precision ratio) P and recall rate R
Number.AUC is area under ROC curve.
Method of the invention applies to Random Forest model in actual classification scene, for existing for Random Forest model
Deficiency and primary data sample are unevenly distributed the actual conditions of weighing apparatus, by being improved to existing random forests algorithm, if
Determine many indexes search optimized parameter, artificial sample is constructed to raw sample data, forms new data set.It is quasi- with optimized parameter
Sample data is closed, new Random Forest model is constructed.The result shows that the performance based on improved Random Forest model is mentioned
It rises, is suitable for actual classification scene.
Detailed description of the invention
Fig. 1 is the corresponding ROC curve figure of maximal accuracy in original random forests algorithm real example part;
Fig. 2 .1,2. and 2.3 are improved in Random Forest model real example part respectively, and the corresponding ROC of the AUC value of table 2.3 is bent
Line;
Fig. 2 .4 is improved in Random Forest model real example part, three times test set ROC curve figure;
Fig. 3 .1,3.2 and 3.3 are respectively in model rating unit, and the training set three times and test set of three models divide
ROC curve and AUC value.
Fig. 3 .4,3.5 and 3.6 are the ROC song that the test set three times of front and back model is improved in model rating unit respectively
Line and AUC value.
Specific embodiment
The present invention is further described with specific embodiment with reference to the accompanying drawing.
1, original random forests algorithm real example
In order to show the improvement effect of model, select a classified sample set data as the data set being fitted, and
And the positive and negative sample proportion of data data set is 1:3.Characteristic variable is feature1, feature2, feature3,
Feature4, feature4, feature5, feature6, y.Wherein y is variable to be sorted.
1.1 data prediction
(1) multicollinearity is eliminated
Logarithm type characteristic variable feature1, feature2, feature3, feature4, feature4, feature5
Test for multi-collinearity is carried out, inspection result is as shown in table 1.1:
Table 1.1
As shown in table 1.1, the absolute value of the related coefficient between numeric type characteristic variable shows characteristic variable less than 0.5
Between linear dependence it is weaker, can will these characteristic variables substitute into Random Forest model in.
(2) degree of bias is corrected
Logarithm type characteristic variable carries out variable normal distribution and examines, and the index of selection is the degree of bias of variable.Each variable
The degree of bias as shown in table 1.2:
Table 1.2
Since the degree of bias of feature1, feature2, feature3, feature5 are larger, so needing to these features
Variable carries out degree of bias transformation, is converted here using Box-Cox.The transformed data degree of bias is as shown in table 3:
Table 1.3
Characteristic variable by transformation is more nearly normal distribution than original characteristic variable.
(3) it standardizes
Logarithm type variable is standardized transformation.By the data mean value that Box-Cox is converted, standard deviation such as 1.4 institute of table
Show:
Table 1.4
By the mean value of the data of standardized transformation, standard deviation is as shown in table 1.5:
Table 1.5
Because subtype variable feature6 only has two states, therefore do not need to do it one-hot coding operation.
1.2 Random Forest models construction
The process of Random Forest model building is as follows:
(1) determine that characteristic variable sum is 6, the number m for constructing the characteristic variable of the character subset of single decision tree can be with
It is 2,3,4;
(2) the tree n for determining forest tree, is set as 10,50,100,150,200,300,500;
(3) cartesian product for calculating the tree of character subset number and tree, obtains parameter combination (m, n);
(4) by each group of parameter fitting Random Forest model, 3 × 7=21 Random Forest model is obtained;
(5) precision estimated outside the bag of each Random Forest model is obtained, the highest parameter of choice accuracy is as optimal ginseng
Array is closed;
(6) Random Forest model is fitted with best parameter group and total data.
Table 1.6 is the precision estimated outside whole Random Forest model bags under whole parameter combinations:
Table 1.6
Table 1.6 shows when the number of the characteristic variable of character subset is 3, and a number for random forest tree is 50, random gloomy
Woods model enables to the precision estimated outside bag to reach maximum, and maximum precision is 78.09%.
Maximal accuracy corresponds to the precision estimated outside bag, precision ratio, and the value of recall rate and F1 statistic is as shown in table 1.7.
Table 1.7
The corresponding ROC curve of maximal accuracy is as shown in Figure 1, the value of AUC is 0.77.
The Random Forest model analysis that parameter combination is (3,50) is found, model accuracy 78.09%, precision ratio is
75.36%, recall rate 70.27%, F1 72.73%.The ROC value of model is 0.77.Due in sample data, negative sample
Quantity is more than the quantity of positive sample, so this result occur meets reality.
The final result of the model of original random forest construction shows that the maximal accuracy of model is 78.09%, and precision ratio is
75.36%, recall rate 70.27%, F1 72.73%, it is contemplated that this result occur in this actual conditions of sample imbalance
Meet reality.Since the model of original random forest building cannot effectively differentiate positive sample, it is therefore desirable to consideration pair
Original random forests algorithm improves, and can take into account the differentiation of positive and negative two classes sample.It is answered in the index selection of model
The multiple indexes integrate, rather than single index determines the parameter of model.
2, Random Forest model real example is improved
2.1 sample equilibratings
Since the sample distribution of data data is uneven, and the negligible amounts of positive negative sample, therefore be suitble to using up-sampling
Method.The present invention mainly uses SMOTE algorithm to up-sample.
The basis of SMOTE (Synthetic Minority Oversampling Technique) algorithm is to cross to adopt at random
Sample algorithm, but since random over-sampling is the simple copy to minority class sample, this will lead to the over-fitting of model.For
The drawbacks of random over-sampling, the proposition of SMOTE algorithm first analyzes minority class sample, and synthesizes people based on the analysis results
Work sample rather than simple copy.Algorithm flow is as follows:
(1) sample is calculated to minority class using Euclidean distance as module for each sample x of minority class
The distance of whole samples, and its k neighbour is determined according to Euclidean distance.
(2) the uneven ratio n for calculating positive negative sample, determines the multiple n of sampling, neighbour is randomly choosed from k neighbour, false
If the neighbour selected is y
(3) for each the neighbour y selected at random, new samples are constructed:
X_new=x+rand (0,1) × | x-y |
SMOTE algorithm is used to data data, data reach balance.The ratio of positive negative sample is approximately 1:1.
2.2 training sets and test set divide
Random forest is not due to that can have to be trained data collection and test set is divided there are estimation outside bag.But
It is the optimistic estimate due to estimating to may result in model performance outside bag, more true model generalization performance in order to obtain
Assessment, need to be trained data data collection and test set and divide.The division proportion of training set and test set is set as 3:
1.Data can be carried out with the division of training set and test set in triplicate, the assessment of the Generalization Capability of model is relatively reliable.
2.3 optimized parameters determine
(1) using the F1 statistic of initial data as evaluation index, first round screening is carried out to parameter combination.Table 2.1 is complete
The F1 statistic estimated outside whole Random Forest model bags under portion's parameter combination.
Table 2.1
The maximum value of F1 statistic is 72.82%, standard deviation 2.5%, therefore is lower than 1 point 5 standard deviation of maximum value
Range is 68.98%~72.82%, therefore candidate parameter combination has (2,10), (3,10), (3,50), (3,100), under
One wheel screening.
(2) using initial data precision as evaluation index, the second wheel is carried out to parameter combination and is screened.Table 2.2 is waited in turn for second
Select the precision estimated outside whole Random Forest model bags under parameter combination.
Table 2.2
Precision maximum value is 78.09%, standard deviation 2.1%, therefore is lower than 1 point 5 standard deviation range of maximum value
75.00%~78.09%, therefore candidate parameter combination has (3,10), (3,50), (3,100) are screened into next round.
(3) using initial data AUC value as evaluation index, third round screening is carried out to parameter combination.Table 2.3 is waited in turn for third
Select the AUC value estimated outside whole Random Forest model bags under parameter combination.Fig. 2 .1,2.2 to Fig. 2 .3 are the corresponding ROC of AUC value
Curve.
Table 2.3
AUC value is up to 0.77, standard deviation 0.05, thus lower than 1 point 5 standard deviation range of maximum value be 0.75~
0.77, therefore candidate parameter combination has (3,50), (3,100) are screened into next round.
(4) using the F1 statistic of test set as evaluation index, fourth round screening is carried out to parameter combination.Table 2.4 is complete
F1 statistic of whole Random Forest models on test set under portion's parameter combination.
Table 2.4
Table 2.4 shows to intend when the number of the characteristic variable of character subset is 3, and a tree for the tree of forest is 100
Close out the best Random Forest model of performance.Due to being to determine optimized parameter in initial data, still without solving positive and negative sample
This unbalanced problem, it is therefore desirable in determining optimized parameter, final random forest is constructed on the data set of up-sampling
Model.
2.4 models fitting
The process of models fitting is as follows:
(1) training set test set divides
(2) SMOTE algorithm construction artificial sample is carried out in training set, is added in initial data, forms new training set
(3) Random Forest model fitting is carried out in the parameter of new training set determination
The training set three times and test set division, the precision estimated outside bag on training set three times for carrying out total data look into standard
Rate, recall rate and F1 statistic are as shown in table 2.5, and the result of test set prediction is as shown in table 2.6 three times.Test set three times
ROC curve is as shown in Fig. 2 .4.
Table 2.5
Table 2.6
Table 2.5 the result shows that, overall precision is estimated outside the bag of improved Random Forest model 81% or so, precision ratio exists
81% or so, for recall rate in 80% or so, F1 statistic 80% or so, the overall performance that model is estimated outside bag is excellent.It is former
Estimate outside beginning random forest-model bag overall precision 78% or so, precision ratio 75% or so, recall rate 70 or so,
F1 statistic is 72% or so.Improved Random Forest model is in precision, precision ratio, recall rate and F1 statistic better than original
Random Forest model.
Table 2.6 the result shows that, in test set overall precision 81% or so, precision ratio exists improved Random Forest model
80% or so, for recall rate in 80% or so, F1 statistic 80% or so, model is excellent in the overall performance of test set, with oneself
The outer estimated result of the bag of body is consistent.
Fig. 2 .4 the result shows that, the ROC curve of improved Random Forest model test set area AUC below 0.84 a left side
The right side, model are excellent in ROC curve.The ROC curve area AUC below estimated outside original Random Forest model bag is optimal
It as a result is 0.77 or so.Improved Random Forest model is better than original Random Forest model in the performance of AUC value.
3. model compares
3.1 compared with logistic regression and support vector machines
Since what is compared is performance between different models, it is therefore desirable to keep the consistency of data.The data of use are all
For the data up-sampled.Table 3.1,3.2 and 3.3 is three models precision that training set and test set divide three times, is looked into
The value of quasi- rate, recall rate and F1.Fig. 3 .1,3.2 and 3.3 are the training set three times of three models and the ROC that test set divides
Curve and AUC value.
Table 3.1
Table 3.2
Table 3.3
Table 3.1 the result shows that, by up-sampling after, the precision 81.22% of random forest, higher than logistic regression
72.31%, higher than the 78.27% of support vector machines;The precision ratio 80.25% of random forest, higher than logistic regression
77.14%, higher than the 78.53% of support vector machines;The recall rate 81.31% of random forest, higher than the 71.85 of logistic regression,
Higher than the 78.14% of support vector machines;The F1 of random forest is 80.76%, higher than the 74.18% of logistic regression, is higher than and supports
The 78.33% of vector machine.
Table 3.2 the result shows that, by up-sampling after, the precision 80.76% of random forest, higher than logistic regression
72.52%, higher than the 77.51% of support vector machines;The precision ratio 80.45% of random forest, higher than logistic regression
77.43%, higher than the 78.58% of support vector machines;The recall rate 80.83% of random forest, higher than logistic regression
71.15%, higher than the 77.19% of support vector machines;The F1 of random forest is 80.64%, higher than the 74.31% of logistic regression,
Higher than the 77.88% of support vector machines.
Table 3.3 the result shows that, by up-sampling after, the precision 80.57% of random forest, higher than logistic regression
72.48%, higher than the 79.11% of support vector machines;The precision ratio 81.11% of random forest, higher than logistic regression
77.21%, higher than the 79.08% of support vector machines;The recall rate 80.39% of random forest, higher than logistic regression
71.82%, higher than the 79.16% of support vector machines;The F1 of random forest is 80.75%, higher than the 74.36% of logistic regression,
Higher than the 79.12% of support vector machines.
Fig. 3 .1 shows that the AUC value of improved Random Forest model is 0.85, higher than the 0.79 of logistic regression, is higher than and supports
The 0.82 of vector machine.
Fig. 3 .2 shows that the AUC value of improved Random Forest model is 0.83, higher than the 0.78 of logistic regression, is higher than and supports
The 0.80 of vector machine.
Fig. 3 .3 shows that the AUC value of improved Random Forest model is 0.83, higher than the 0.78 of logistic regression, is higher than and supports
The 0.80 of vector machine.
More improved Random Forest model and Logic Regression Models, precision ratio, are recalled at the precision of supporting vector machine model
Rate, F1 value and AUC value as a result, improved Random Forest model is better than Logic Regression Models and support vector machines mould comprehensively
Type shows improved random forest under same data set better than Logic Regression Models and supporting vector machine model.
3.2 compared with original Random Forest model
Due to having the process up-sampled to training data in improved Random Forest model, with test set come
Evaluate the performance of two models.It carries out training set and test set three times to initial data to divide, and with original training data structure
Original Random Forest model is built, training data is up-sampled three times, then constructs improved Random Forest model.Table 3.4,3.5
And 3.6 be to improve former and later two models test set precision three times, the value of precision ratio, recall rate and F1.Fig. 3 .4,3.5 and
3.6 be the ROC curve and AUC value for improving the test set three times of front and back model.
Table 3.4
Table 3.5
Table 3.6
Table 3.4, table 3.5 and table 3.6 the result shows that, improved Random Forest model in precision, precision ratio, recall
Rate and the value of F1 are better than the model of the random forest before improving.
Fig. 3 .4, Fig. 3 .5 and Fig. 3 .6's the result shows that, before the AUC value of improved Random Forest model is than improving
The AUC value of the model of random forest is high by 0.09 or so, shows that the performance of model has and is promoted by a relatively large margin.
Table 3.4~3.6 and .4~3.6 Fig. 3 show improved Random Forest model in performance comprehensively better than it is original with
Machine forest model, improved plan are practicable.
Improved Random Forest model with original Random Forest model, Logic Regression Models and support vector machines mould
After type compares, obtain the conclusion of best performance, this shows improved Random Forest model can be transported in actual classification scene
In the differentiation for using personnel's classification.
Claims (1)
1. it is a kind of based on the method for discrimination for improving judgment criteria sorting algorithm, it is characterized in that step includes:
(1) acquisition data are first passed through as sample data, construct Random Forest model;
(2) again in actual classification scene, tested personnel's characteristic index data, the random forest obtained using step 1 are acquired
Model carries out quick discrimination to characteristic index data, learns the classification of personnel to be measured;
The construction method of Random Forest model in the step (1) be first using original random forests algorithm building it is original with
Machine forest model;Virgin forest model is improved using improved random forests algorithm again, obtains final random forest
Model:
The construction step of original Random Forest model includes:
1) sample data for extracting same size data volume is put back to firstly, concentrating from the initial data of sample data;2) then, from
A certain number of features, constitutive characteristic subset are extracted in the primitive character variable of sample data;3) finally, being obtained with step 1)
The character subset not beta pruning that sample data and step 2) obtain constructs decision tree;4) step 1~3 are repeated) n times, form N decision
Tree, decision tree is integrated, and using the criterion of majority ballot, is finally completed the building of Random Forest model;
In the step (2), the characteristic variable in the characteristic index data of personnel to be measured is inputted Random Forest model, at random
Forest model is using the consistent result of the judgement of majority decision tree as final result;
Original Random Forest model is improved:
A, raw data set is first trained to the division of collection and test set, cross validation is carried out on training set, passes through intersection
Verification result carrys out the performance of entry evaluation model and determines parameter;Again on test set assessment models performance;
B, in training set, firstly, calculate F1 statistic, select optimal F1 statistic and lower than 1 point 5 standard deviation with
The model parameter of interior F1 statistic is as candidate parameter;
Then, nicety of grading is calculated in candidate parameter, select optimal nicety of grading and lower than 1 point 5 standard deviation with
Interior nicety of grading, parameter combination corresponding to these niceties of grading is as candidate parameter combination;
In addition, calculate AUC in candidate parameter, optimal AUC is selected and lower than the AUC within 1 point 5 standard deviation, this
Parameter combination corresponding to a little AUC is as candidate parameter combination;
Finally, substituting into candidate parameter in test set;Optimal parameter combination is showed as last in test set F1 statistic
Parameter combination;Using the model performance of final parameter as the assessment of final model performance.
C, from data distribution angle is changed, using up-sampling or the strategy of down-sampling;
When data distribution is uneven, and the quantity of positive negative sample is not especially more, using up-sampling strategy, expand minority class
The quantity of sample;
When data distribution imbalance, and positive negative sample quantity it is all many when, using down-sampling strategy, less majority class samples
Quantity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811272036.XA CN109344907A (en) | 2018-10-30 | 2018-10-30 | Based on the method for discrimination for improving judgment criteria sorting algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811272036.XA CN109344907A (en) | 2018-10-30 | 2018-10-30 | Based on the method for discrimination for improving judgment criteria sorting algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109344907A true CN109344907A (en) | 2019-02-15 |
Family
ID=65310923
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811272036.XA Pending CN109344907A (en) | 2018-10-30 | 2018-10-30 | Based on the method for discrimination for improving judgment criteria sorting algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109344907A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111222572A (en) * | 2020-01-06 | 2020-06-02 | 紫光云技术有限公司 | Office scene-oriented optical character recognition method |
CN112257336A (en) * | 2020-10-13 | 2021-01-22 | 华北科技学院 | Mine water inrush source distinguishing method based on feature selection and support vector machine model |
CN113283484A (en) * | 2021-05-14 | 2021-08-20 | 中国邮政储蓄银行股份有限公司 | Improved feature selection method, device and storage medium |
CN113762712A (en) * | 2021-07-26 | 2021-12-07 | 广西大学 | Small hydropower cleaning rectification evaluation index screening strategy under big data environment |
CN115512844A (en) * | 2021-06-03 | 2022-12-23 | 四川大学 | Metabolic syndrome risk prediction method based on SMOTE technology and random forest algorithm |
CN116564409A (en) * | 2023-05-06 | 2023-08-08 | 海南大学 | Machine learning-based identification method for sequencing data of transcriptome of metastatic breast cancer |
CN117092525A (en) * | 2023-10-20 | 2023-11-21 | 广东采日能源科技有限公司 | Training method and device for battery thermal runaway early warning model and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105931224A (en) * | 2016-04-14 | 2016-09-07 | 浙江大学 | Pathology identification method for routine scan CT image of liver based on random forests |
CN108038448A (en) * | 2017-12-13 | 2018-05-15 | 河南理工大学 | Semi-supervised random forest Hyperspectral Remote Sensing Imagery Classification method based on weighted entropy |
US20180246112A1 (en) * | 2017-02-28 | 2018-08-30 | University Of Kentucky Research Foundation | Biomarkers of Breast and Lung Cancer |
-
2018
- 2018-10-30 CN CN201811272036.XA patent/CN109344907A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105931224A (en) * | 2016-04-14 | 2016-09-07 | 浙江大学 | Pathology identification method for routine scan CT image of liver based on random forests |
US20180246112A1 (en) * | 2017-02-28 | 2018-08-30 | University Of Kentucky Research Foundation | Biomarkers of Breast and Lung Cancer |
CN108038448A (en) * | 2017-12-13 | 2018-05-15 | 河南理工大学 | Semi-supervised random forest Hyperspectral Remote Sensing Imagery Classification method based on weighted entropy |
Non-Patent Citations (2)
Title |
---|
刘继辉: "基于随机森林回归的制丝过程参数影响权重分析", 《烟草科技》 * |
肖坚: "一种基于随机森林的不平衡数据分类方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111222572A (en) * | 2020-01-06 | 2020-06-02 | 紫光云技术有限公司 | Office scene-oriented optical character recognition method |
CN112257336A (en) * | 2020-10-13 | 2021-01-22 | 华北科技学院 | Mine water inrush source distinguishing method based on feature selection and support vector machine model |
CN113283484A (en) * | 2021-05-14 | 2021-08-20 | 中国邮政储蓄银行股份有限公司 | Improved feature selection method, device and storage medium |
CN115512844A (en) * | 2021-06-03 | 2022-12-23 | 四川大学 | Metabolic syndrome risk prediction method based on SMOTE technology and random forest algorithm |
CN115512844B (en) * | 2021-06-03 | 2023-05-23 | 四川大学 | Metabolic syndrome risk prediction method based on SMOTE technology and random forest algorithm |
CN113762712A (en) * | 2021-07-26 | 2021-12-07 | 广西大学 | Small hydropower cleaning rectification evaluation index screening strategy under big data environment |
CN113762712B (en) * | 2021-07-26 | 2024-04-09 | 广西大学 | Small hydropower cleaning rectification evaluation index screening strategy in big data environment |
CN116564409A (en) * | 2023-05-06 | 2023-08-08 | 海南大学 | Machine learning-based identification method for sequencing data of transcriptome of metastatic breast cancer |
CN117092525A (en) * | 2023-10-20 | 2023-11-21 | 广东采日能源科技有限公司 | Training method and device for battery thermal runaway early warning model and electronic equipment |
CN117092525B (en) * | 2023-10-20 | 2024-01-09 | 广东采日能源科技有限公司 | Training method and device for battery thermal runaway early warning model and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109344907A (en) | Based on the method for discrimination for improving judgment criteria sorting algorithm | |
US10606862B2 (en) | Method and apparatus for data processing in data modeling | |
CN107544253B (en) | Large missile equipment retirement safety control method based on improved fuzzy entropy weight method | |
CN105630743B (en) | A kind of system of selection of spectrum wave number | |
CN108897834A (en) | Data processing and method for digging | |
CN110346831B (en) | Intelligent seismic fluid identification method based on random forest algorithm | |
CN105373606A (en) | Unbalanced data sampling method in improved C4.5 decision tree algorithm | |
CN106228389A (en) | Network potential usage mining method and system based on random forests algorithm | |
CN106056136A (en) | Data clustering method for rapidly determining clustering center | |
CN101957913B (en) | Information fusion technology-based fingerprint identification method and device | |
CN110428270A (en) | The potential preference client recognition methods of the channel of logic-based regression algorithm | |
CN109800810A (en) | A kind of few sample learning classifier construction method based on unbalanced data | |
CN107784452A (en) | A kind of objective integrated evaluating method of tobacco style characteristic similarity | |
CN110109902A (en) | A kind of electric business platform recommender system based on integrated learning approach | |
CN110852600A (en) | Method for evaluating dynamic risk of market subject | |
CN107239964A (en) | User is worth methods of marking and system | |
CN110334773A (en) | Model based on machine learning enters the screening technique of modular character | |
CN112396428A (en) | User portrait data-based customer group classification management method and device | |
CN108344701A (en) | Paraffin grade qualitative classification based on hyperspectral technique and quantitative homing method | |
CN113239199B (en) | Credit classification method based on multi-party data set | |
Rofik et al. | The Optimization of Credit Scoring Model Using Stacking Ensemble Learning and Oversampling Techniques | |
CN108776809A (en) | A kind of dual sampling Ensemble classifier model based on Fisher cores | |
CN110222981B (en) | Reservoir classification evaluation method based on parameter secondary selection | |
CN115481494B (en) | Method for generating model line pedigree of Yangtze river all-line passenger ship | |
CN115186776B (en) | Method, device and storage medium for classifying ruby producing areas |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190215 |
|
RJ01 | Rejection of invention patent application after publication |