CN108009643A

CN108009643A - A kind of machine learning algorithm automatic selecting method and system

Info

Publication number: CN108009643A
Application number: CN201711354616.9A
Authority: CN
Inventors: ***; 龙明盛; 付博; 黄向东
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2017-12-15
Filing date: 2017-12-15
Publication date: 2018-05-08
Anticipated expiration: 2037-12-15
Also published as: CN108009643B

Abstract

The present invention, which provides a kind of machine learning algorithm automatic selecting method and system, system of selection, to be included：Determine algorithm set to be selected；Based on multiple history parameters and multiple predetermined coefficients, the training test sequence of each algorithm to be selected in algorithm set to be selected is determined；According to training test sequence, based on definite training set, the algorithm to be selected in algorithm set to be selected is trained successively, obtain the corresponding training pattern of each algorithm to be selected, based on the corresponding training pattern of each algorithm to be selected, definite test set is predicted, obtains multiple comprehensive grading parameters of each algorithm to be selected；Based on multiple comprehensive grading parameters and multiple predetermined coefficients, the comprehensive grading of each algorithm to be selected of acquisition；Using the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm selection result.A kind of machine learning algorithm automatic selecting method and system provided by the invention, have very strong study analysis ability, realize extremely simple, can obtain the good result of effect.

Description

A kind of machine learning algorithm automatic selecting method and system

Technical field

The present invention relates to field of computer data processing, more particularly, to a kind of machine learning algorithm side of automatically selecting Method and system.

Background technology

Machine learning achieves significant progress in many application fields recently, this is facilitated popularizes comprehensively in each field The demand of machine learning method.Correspondingly, more and more commercial enterprises meeting this demand (for example, BigML.com, Wise.io, SkyTree.com, RapidMiner.com, Dato.com, Prediction.io, DataRobot.com, Microsoft Azure machine learning and Amazon machine learning).The core of machine learning is that each effective machine learning service needs Solve determine which kind of machine learning algorithm is used on data-oriented collection, and if how its feature is pre-processed with And how all hyper parameters are set.

One specific algorithm of selection generally requires expertise, weighs from the aspect of difference, there is several factors meeting The selection of specific algorithm is influenced, includes following factor：(1) size of data, quality and property；(2) can use calculate the time with Calculate space；(3) urgency of task；(4) usage of data.

In addition the development of some time is passed through in machine learning, and the quantity of algorithm is also increasingly come more, and each algorithm has How each the characteristics of and quality, so for many machine learning algorithm beginners, quickly select one properly Machine learning algorithm become one and have the problem of to be solved.

The content of the invention

The present invention provides a kind of a kind of machine learning algorithm automatic selecting method and system for overcoming the above problem.

According to an aspect of the present invention, there is provided a kind of machine learning algorithm automatic selecting method, including：Selected based on algorithm Knowledge base is selected, by decision tree back-and-forth method, determines algorithm set to be selected；Based on multiple history parameters and with the multiple history The corresponding multiple predetermined coefficients of parameter, determine the training test time of each algorithm to be selected in the algorithm set to be selected Sequence；According to the trained test sequence, based on definite training set, successively to the algorithm to be selected in the algorithm set to be selected into Row training, obtains the corresponding training pattern of each algorithm to be selected, based on the corresponding training pattern of each algorithm to be selected, to what is determined Test set is predicted, and obtains multiple comprehensive grading parameters corresponding with the multiple history parameters of each algorithm to be selected；Base In the multiple comprehensive grading parameters and the multiple predetermined coefficient, each algorithm to be selected in the algorithm set to be selected is obtained Comprehensive grading；Using the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm selection result.

Preferably, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, determine to go back before algorithm set to be selected Including：Determine the residing maximum classification of the algorithm to be selected, the residing maximum classification includes：Supervised learning class, semi-supervised Practise class and unsupervised learning class；Correspondingly, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, determine calculation to be selected Method set further comprises：Based on the decision tree in algorithms selection knowledge base, by residing for the definite algorithm to be selected most Big classification, successively chooses the algorithm to be selected, and the one or more algorithms to be selected successively chosen are as the algorithm to be selected Set.

Preferably, it is described based on multiple history parameters and with the corresponding multiple default systems of the multiple history parameters Number, determines that the training test sequence of each algorithm to be selected in the algorithm set to be selected further comprises：Based on multiple history Parameter and with the corresponding multiple predetermined coefficients of the multiple history parameters, going through for any algorithm to be selected is obtained by following formula Commentary on historical events or historical records point：

F '=aI '+bO '+cS '+dT '+eA '；

Wherein, F ' is that the history of any algorithm to be selected scores, and a inputs resource consumption value coefficient for default data, and I ' is Historical data inputs resource consumption value, and b exports resource consumption value coefficient for default data, and O ' exports resource for historical data and disappears Consumption value, c predict memory coefficient for default training, and S ' is history training prediction memory, and d is default trained predicted time system Number, T ' train predicted time for history, and e is default prediction accuracy coefficient, and A ' is historical forecast accuracy；All are treated Select the history of algorithm to score to arrange from high to low according to fraction, the instruction using the order of the algorithm to be selected arranged as algorithm to be selected Practice test sequence.

Preferably, it is described according to the trained test sequence, based on definite training set, successively to the set of algorithms to be selected Algorithm to be selected in conjunction is trained, and obtains the corresponding training pattern of each algorithm to be selected, corresponding based on each algorithm to be selected Training pattern, is predicted definite test set, obtains the corresponding with the multiple history parameters more of each algorithm to be selected A comprehensive grading parameters further comprise：It is described according to the trained test sequence, based on definite training set, successively to described Algorithm to be selected in algorithm set to be selected is trained, and obtains the corresponding training pattern of each algorithm to be selected, and obtain each treat The training data of algorithm is selected to input resource consumption value, training data output resource consumption value, training time and training memory；It is based on The corresponding training pattern of each algorithm to be selected, is predicted definite test set, obtains the prediction data of each algorithm to be selected Input resource consumption value, prediction data output resource consumption value, predicted time, prediction memory and prediction accuracy；To the instruction Practice data input resource consumption value and prediction data input resource consumption value weighted sum, obtain data input resource consumption Value；Resource consumption value and prediction data output resource consumption value weighted sum are exported to the training data, obtains data Export resource consumption value；To the training time and the predicted time weighted sum, training predicted time is obtained；To the instruction Practice memory and the prediction memory weighted sum, obtain training prediction memory；By data input resource consumption value, the number According to output resource consumption value, the trained predicted time, the training prediction memory and the prediction accuracy as described more A comprehensive grading parameters.

Preferably, it is described to be based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, obtained by following formula The comprehensive grading of each algorithm to be selected in the algorithm set to be selected：

F=aI+bO+cS+dT+eA；

Wherein, F is the comprehensive grading of any algorithm to be selected, and a inputs resource consumption value coefficient for default data, and I is number According to input resource consumption value, b exports resource consumption value coefficient for default data, and O exports resource consumption value for data, and c is pre- If training prediction memory coefficient, S is training prediction memory, and d be default trained predicted time coefficient, when T predicts for training Between, e is default prediction accuracy coefficient, and A is prediction accuracy.

Preferably, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, determine algorithm set to be selected, and institute State based on multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, determine the calculation to be selected Further included between the training test sequence of each algorithm to be selected in method set：Each data in definite data set are carried out Feature extraction and feature selecting, obtain the feature of each data；The classification of feature and all algorithms based on each data, by institute State the data in definite data set and be divided into the definite training set and the definite test set, wherein, it is described all Algorithm comes from the algorithms selection knowledge base.

Preferably, each data in described pair of definite data set carry out feature extraction and feature selecting, obtain each Further included after the feature of data：Based on the feature of each data, acquisition is not suitable for algorithm, and by the algorithm that is not suitable for from institute State and deleted in algorithm set to be selected.

Preferably, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, determine to go back before algorithm set to be selected Including：By Bayes's optimization and element study method, the machine learning algorithm thermal starting is aided in.

Preferably, the prediction accuracy is any of indexs such as precision ratio, recall ratio, AUC value.

According to another aspect of the present invention, there is provided a kind of machine learning algorithm automatic selective system, including：Determine to be selected Algorithm set module, for based on algorithms selection knowledge base, by decision tree back-and-forth method, determining algorithm set to be selected；Determine excellent First level module, for based on multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, really The training test sequence of each algorithm to be selected in the fixed algorithm set to be selected；Training test module, for according to the instruction Practice test sequence, based on definite training set, the algorithm to be selected in the algorithm set to be selected is trained successively, obtain every The corresponding training pattern of one algorithm to be selected, based on the corresponding training pattern of each algorithm to be selected, carries out definite test set pre- Survey, obtain multiple comprehensive grading parameters corresponding with the multiple history parameters of each algorithm to be selected；Obtain comprehensive grading mould Block, for based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, obtaining every in the algorithm set to be selected The comprehensive grading of one algorithm to be selected；Selection result module is obtained, for by the highest one or more algorithm to be selected of comprehensive grading As machine learning algorithm selection result.

A kind of machine learning algorithm automatic selecting method and system provided by the invention, select decision tree by setting Set in algorithm to be selected be trained and predict, and obtain comprehensive grading finally determine selection result, can have very strong Study analysis ability, realize extremely simple, the good result of effect can be obtained.Due to the use of in algorithms selection knowledge base Decision tree, being capable of fast selecting algorithm set to be selected.

Brief description of the drawings

Fig. 1 is a kind of flow chart of machine learning algorithm automatic selecting method in the embodiment of the present invention；

Fig. 2 is a kind of decision tree exemplary plot in the embodiment of the present invention；

Fig. 3 is a kind of FB(flow block) of machine learning algorithm automatic selecting method in the embodiment of the present invention；

Fig. 4 is a kind of module map of machine learning algorithm automatic selective system in the embodiment of the present invention.

Embodiment

With reference to the accompanying drawings and examples, the embodiment of the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but is not limited to the scope of the present invention.

Fig. 1 is a kind of flow chart of machine learning algorithm automatic selecting method in the embodiment of the present invention, as shown in Figure 1, Including：Based on algorithms selection knowledge base, by decision tree back-and-forth method, algorithm set to be selected is determined；Based on multiple history parameters with And with the corresponding multiple predetermined coefficients of the multiple history parameters, determine each calculation to be selected in the algorithm set to be selected The training test sequence of method；According to the trained test sequence, based on definite training set, successively to the algorithm set to be selected In algorithm to be selected be trained, the corresponding training pattern of each algorithm to be selected is obtained, based on the corresponding instruction of each algorithm to be selected Practice model, definite test set is predicted, obtain the corresponding with the multiple history parameters multiple of each algorithm to be selected Comprehensive grading parameters；Based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, the set of algorithms to be selected is obtained The comprehensive grading of each algorithm to be selected in conjunction；Using the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm Selection result.

Specifically, algorithms selection knowledge base includes many algorithms.Fig. 2 is that a kind of decision tree in the embodiment of the present invention is shown Illustration, based on the decision tree shown in Fig. 2, determines algorithm set to be selected.All kinds of algorithms include by different level in algorithms selection knowledge base Specific algorithm, the selection level of decision tree is also corresponding.It should be noted that the algorithm in the embodiment of the present invention is machine Device learning algorithm.Further, for determining algorithm set to be selected, the algorithm reference value value in definite algorithm set to be selected It is identical with method, but respectively have quality in training speed, accuracy, these algorithms can serve as candidate algorithm, and table 1 is portion Partial node includes the explanation of algorithm.

Such as the task of a prediction watermelon quality, you can according to condition " having label ", " prediction classification ", " two species " Determine to belong to " binary classification " node, choose the algorithm included under binary classification node as candidate algorithm.For partly Meta learning has been used to carry out the project of assisted Selection algorithm, the algorithm set of candidate needs the algorithm included.

1 part of nodes of table includes algorithmic descriptions table

A kind of machine learning algorithm automatic selecting method provided by the invention, by setting the set selected decision tree In algorithm to be selected be trained and predict, and obtain comprehensive grading finally determine selection result, can have very strong study Analysis ability, realizes extremely simple, can obtain the good result of effect.Due to the use of the decision-making in algorithms selection knowledge base Tree, being capable of fast selecting algorithm set to be selected.

It is described to be based on algorithms selection knowledge base based on above-described embodiment, by decision tree back-and-forth method, determine set of algorithms to be selected Further included before closing：Determine the residing maximum classification of the algorithm to be selected, the residing maximum classification includes：Supervised learning class, Semi-supervised learning class and unsupervised learning class；Correspondingly, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, really Fixed algorithm set to be selected further comprises：Based on the decision tree in algorithms selection knowledge base, pass through the definite algorithm to be selected Residing maximum classification, the algorithm to be selected is successively chosen, described in the one or more algorithms conducts to be selected successively chosen Algorithm set to be selected.

Specifically, supervised learning class algorithm is made below and further illustrates, supervised learning class algorithm is based on one group of sample This is to making a prediction.For example, sales achievement can be used for predicting the price trend in future in the past.By supervised learning, one is had The input variable and the output variable of one group of hope prediction that group is made of mark training data.Algorithm Analysis training number can be used The function for being mapped to output will be inputted according to learn one.The function that algorithm is inferred can not known by summarizing training data prediction Result in scape and then predict unknown new example.

Classification：When data be used to predict classification, supervised learning can also handle this kind of classification task.Pasted to a pictures It is particularly the case for the label of upper cat or dog.When tag along sort only has two, here it is binary classification；It is then more more than two Member classification.

Return：When being predicted as serial number type, here it is a regression problem.This is one based on past and present The process in data prediction future, its maximum application are trend analyses.One representative instance is the merchandising business according to this year and the year before last Achievement is to predict the sales achievement of next year.

Abnormality detection：Sometimes, target is to identify only uncommon data point.For example, in fraud detection, it is any Extremely uncommon credit card purchase pattern is all suspicious.The possible variation of fraud is very much, but example of shaping is seldom, because This can not understand the outer sheet form of deception sexuality.The method that abnormal conditions detection uses is exactly only to understand the form of normal activity (using non-fraudulent transactions historical record), and determine any activity being very different.

Further, semi-supervised learning class algorithm is made below and further illustrates, the significant challenge of supervised learning is Labeled data is expensive and very time-consuming.If label is limited, supervised learning can be improved using non-labeled data.Due to Machine and non-fully there is supervision in this case, so referred to as semi-supervised.By semi-supervised learning, it can use and only include The non-mark example lifting study accuracy of a small amount of labeled data.

Further, unsupervised learning class algorithm is made below and further illustrates, among unsupervised learning, machine Non- labeled data is used completely, it is required to find to be hidden in the inherent pattern under data, such as cluster structure, low dimensional manifold Or sparse tree and figure.

Cluster：One group of data instance is classified as one kind, thus the example among a class (cluster) with other such Among example it is more like (according to some indexs), it is several classes that it, which is often used in whole Segmentation of Data Set,.This point Analysis can carry out among each classification, so as to help user.

Dimensionality reduction：Reduce the variable quantity considered.In many applications, initial data has very high characteristic dimension, and Some are characterized in unnecessary and uncorrelated to task.Dimensionality reduction will be helpful to find true, potential relation.

It is described based on multiple history parameters and corresponding more with the multiple history parameters based on above-described embodiment A predetermined coefficient, determines that the training test sequence of each algorithm to be selected in the algorithm set to be selected further comprises：It is based on Multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, obtained by following formula any to be selected The history scoring of algorithm：

F '=aI '+bO '+cS '+dT '+eA '；

Specifically, each coefficient can use 0.

A kind of machine learning algorithm automatic selecting method provided by the invention, by setting predetermined coefficient, and proposes five The different dimension of kind, can be more advantageous to obtaining optimal most suitable algorithm.

It is described according to the trained test sequence based on above-described embodiment, based on definite training set, treated successively to described Select the algorithm to be selected in algorithm set to be trained, the corresponding training pattern of each algorithm to be selected is obtained, based on each calculation to be selected The corresponding training pattern of method, is predicted definite test set, obtaining each algorithm to be selected with the multiple history parameters Corresponding multiple comprehensive grading parameters further comprise：It is described according to the trained test sequence, based on definite training set, according to The secondary algorithm to be selected in the algorithm set to be selected is trained, and obtains the corresponding training pattern of each algorithm to be selected, and obtain The training data of each algorithm to be selected is taken to input resource consumption value, training data output resource consumption value, training time and training Memory；Based on the corresponding training pattern of each algorithm to be selected, definite test set is predicted, obtains each algorithm to be selected Prediction data input resource consumption value, prediction data output resource consumption value, predicted time, prediction memory and prediction accuracy； Resource consumption value and prediction data input resource consumption value weighted sum are inputted to the training data, obtains data input Resource consumption value；Resource consumption value and prediction data output resource consumption value weighted sum are exported to the training data, Obtain data output resource consumption value；To the training time and the predicted time weighted sum, training predicted time is obtained； To the trained memory and the prediction memory weighted sum, training prediction memory is obtained；The data are inputted into resource consumption Value, data output resource consumption value, the trained predicted time, the training prediction memory and the prediction accuracy are made For the multiple comprehensive grading parameters.

It is described to be based on the multiple comprehensive grading parameters and the multiple predetermined coefficient based on above-described embodiment, pass through Following formula obtains the comprehensive grading of each algorithm to be selected in the algorithm set to be selected：

F=aI+bO+cS+dT+eA；

Specifically, in training data input resource consumption value, training data output resource consumption value, training time and training Depositing corresponding training resource consumption parameter is not the absolute value of design parameter, but chooses a standard and make reference, and provides its phase To value, to facilitate following calculating.The hyper parameter that is needed in Algorithm for Training is predeterminable can also to use other hyperparameter optimization Instrument, last first resource consumption parameter value is in the case of optimal hyper parameter；Similarly, prediction data input resource consumption Value, prediction data output resource consumption value, predicted time, prediction memory and the corresponding prediction resource consumption parameter of prediction accuracy Also it is such.

It is described to be based on algorithms selection knowledge base based on above-described embodiment, by decision tree back-and-forth method, determine set of algorithms to be selected Close, and it is described based on multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, determine institute State and further include between the training test sequence of each algorithm to be selected in algorithm set to be selected：To each in definite data set Data carry out feature extraction and feature selecting, obtain the feature of each data；Feature and all algorithms based on each data Data in the definite data set are divided into the definite training set and the definite test set by classification, wherein, All algorithms come from the algorithms selection knowledge base.

Specifically, feature extraction and feature selecting be all found out from primitive character it is most effective (consistency of similar sample, The distinctive of different samples, the robustness to noise) feature.

Further, feature extraction：Primitive character is converted to one group has obvious physical significance (Gabor, geometric properties [angle point, invariant], texture [LBP HOG]) or statistical significance or the feature of core.

Feature selecting：The feature of one group of most statistical significance is selected from characteristic set.

Both feature extraction and feature selecting can reduce data storage and input data bandwidth, reduce redundancy, can send out Existing more meaningful potential variable, help to produce data deeper into understanding.

Such as image, SIFT (Scale-invariant feature transform) is that a kind of detection is local The method of feature, it finds extreme point in space scale to a width figure, and extracts its position, scale, rotational invariants etc. Description, obtains feature and carries out Image Feature Point Matching, can be used to detect and the locality characteristic in description image.It is base In some local features on object, it maintains the invariance rotation, scaling, brightness change, to visual angle change, affine change Change, noise also keeps a degree of stability.

Then data are divided into training set S and test set T according to the type and data characteristics of algorithm.This step can make With a variety of methods, method, cross-validation method, bootstrap are such as reserved.

Table 2 is the corresponding common data collection feature of common clustering algorithm.

2 clustering algorithm character pair of table illustrates table

Such as a certain purpose data can not be converted into the vector in N-dimensional Euclidean space, can only provide similar between data Matrix is spent, just needs to reject K-means scheduling algorithms, prioritizing selection spectral clustering (Spectral clustering) scheduling algorithm at this time.

Based on above-described embodiment, each data in described pair of definite data set carry out feature extraction and feature selecting, The feature for obtaining each data further includes afterwards：Based on the feature of each data, acquisition is not suitable for algorithm, and is not suitable for described Algorithm is deleted from the algorithm set to be selected.

Based on above-described embodiment, the classification of feature and all algorithms based on each data, by reserving method, cross validation Any of method and bootstrap, by the data in the definite data set be divided into the definite training set and it is described really Fixed test set, wherein, all algorithms come from the algorithms selection knowledge base.

It is described to be based on algorithms selection knowledge base based on above-described embodiment, by decision tree back-and-forth method, determine set of algorithms to be selected Further included before closing：By Bayes's optimization and element study method, the machine learning algorithm thermal starting is aided in.

Domain expert obtains knowledge from pervious task：The performance characteristics of their Learning machine learning algorithms, meta learning This strategy is simulated by the performance of the learning algorithm of reasoning cross datasets.In this work, selected using meta learning Algorithm, these algorithms may show well in new data set.More specifically, for mass data collection, performance data is collected With a group metadata feature, you can with the feature of the data set effectively calculated, and aid in determining whether which uses in new data set Kind algorithm.

This element study method complements each other with Bayes's optimization, can optimize machine learning frame.Meta learning can be very It is proposed some algorithm examples of machine learning frame soon, these examples may show fairly good, but cannot provide on The fine granularity information of performance.

Based on above-described embodiment, the predictablity rate is any of indexs such as precision ratio, recall ratio, AUC value.

As a preferred embodiment, Fig. 3 is a kind of machine learning algorithm automatic selecting method in the embodiment of the present invention FB(flow block).The present embodiment refers to Fig. 3.

First, the residing maximum classification of the algorithm to be selected is determined, the residing maximum classification includes：Supervised learning class, Semi-supervised learning class and unsupervised learning class.

Further, by Bayes's optimization and element study method, the machine learning algorithm thermal starting is aided in.

Further, based on algorithms selection knowledge base, by decision tree back-and-forth method, algorithm set to be selected is determined.

Further, feature extraction and feature selecting are carried out to each data in definite data set, obtains each number According to feature；The classification of feature and all algorithms based on each data, the data in the definite data set are divided into The definite training set and the definite test set, wherein, all algorithms come from the algorithms selection knowledge base.

Further, the feature based on each data, acquisition is not suitable for algorithm, and the algorithm that is not suitable for is treated from described Select in algorithm set and delete.

Further, based on multiple history parameters and with the corresponding multiple default systems of the multiple history parameters Number, determines the training test sequence of each algorithm to be selected in the algorithm set to be selected.

Further, according to the trained test sequence, based on definite training set, successively to the algorithm set to be selected In algorithm to be selected be trained, the corresponding training pattern of each algorithm to be selected is obtained, based on the corresponding instruction of each algorithm to be selected Practice model, definite test set is predicted, obtain the corresponding with the multiple history parameters multiple of each algorithm to be selected Comprehensive grading parameters.

Further, based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, the calculation to be selected is obtained The comprehensive grading of each algorithm to be selected in method set.

Finally, using the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm selection result.

Based on above-described embodiment, Fig. 4 is a kind of mould of machine learning algorithm automatic selective system in the embodiment of the present invention Block diagram, as shown in figure 4, including：Determine algorithm set module to be selected, based on algorithms selection knowledge base, by decision tree back-and-forth method, Determine algorithm set to be selected；Priority block is determined, for based on multiple history parameters and each with the multiple history parameters Self-corresponding multiple predetermined coefficients, determine the training test sequence of each algorithm to be selected in the algorithm set to be selected；Training Test module, for according to the trained test sequence, based on definite training set, successively in the algorithm set to be selected Algorithm to be selected is trained, and obtains the corresponding training pattern of each algorithm to be selected, based on the corresponding trained mould of each algorithm to be selected Type, is predicted definite test set, obtains multiple synthesis corresponding with the multiple history parameters of each algorithm to be selected Grading parameters；Comprehensive grading module is obtained, for based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, obtaining Take the comprehensive grading of each algorithm to be selected in the algorithm set to be selected；Obtain selection result module, for by comprehensive grading most High one or more algorithms to be selected are as machine learning algorithm selection result.

A kind of machine learning algorithm automatic selecting method and system provided by the invention, select decision tree by setting Set in algorithm to be selected be trained and predict, and obtain comprehensive grading finally determine selection result, can have very strong Study analysis ability, realize extremely simple, the good result of effect can be obtained.Due to the use of in algorithms selection knowledge base Decision tree, being capable of fast selecting algorithm set to be selected.By setting predetermined coefficient, and five kinds of different dimensions are proposed, can It is more advantageous to obtaining optimal most suitable algorithm.Engineering is being carried out using algorithms selection knowledge base provided by the invention and instrument When practising algorithms selection, the algorithm of selection is substantially consistent with the algorithm of selection of specialists or more similar, and experimental result is effectively demonstrate,proved Understand the validity of system of selection provided by the invention.System of selection provided by the invention has very strong adaptability, can adapt to In a variety of machine learning frames and system.System of selection provided by the invention can effectively achieve and automatically select suitable machine The purpose of algorithm is practised, method is directly perceived effective, easy to use.

Finally, method of the invention is only preferable embodiment, is not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on, should be included in the protection of the present invention Within the scope of.

Claims

A kind of 1. machine learning algorithm system of selection, it is characterised in that including：

Based on algorithms selection knowledge base, by decision tree back-and-forth method, algorithm set to be selected is determined；

Based on multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, determine described to be selected The training test sequence of each algorithm to be selected in algorithm set；

According to the trained test sequence, based on definite training set, successively to the algorithm to be selected in the algorithm set to be selected It is trained, obtains the corresponding training pattern of each algorithm to be selected, based on the corresponding training pattern of each algorithm to be selected, to determines Test set be predicted, obtain multiple comprehensive grading parameters corresponding with the multiple history parameters of each algorithm to be selected；

Based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, each in the algorithm set to be selected treat is obtained Select the comprehensive grading of algorithm；

Using the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm selection result.
2. system of selection according to claim 1, it is characterised in that it is described to be based on algorithms selection knowledge base, pass through decision-making Back-and-forth method is set, determines to further include before algorithm set to be selected：

Determine the residing maximum classification of the algorithm to be selected, the residing maximum classification includes：Supervised learning class, semi-supervised learning Class and unsupervised learning class；

Correspondingly, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, determine that algorithm set to be selected is further wrapped Include：

Based on the decision tree in algorithms selection knowledge base, by the residing maximum classification of the definite algorithm to be selected, successively select The algorithm to be selected is taken, the one or more algorithms to be selected successively chosen are as the algorithm set to be selected.
3. system of selection according to claim 1, it is characterised in that it is described based on multiple history parameters and with it is described more A corresponding multiple predetermined coefficients of history parameters, determine the training survey of each algorithm to be selected in the algorithm set to be selected Examination order further comprises：

Based on multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, obtained by following formula The history scoring of any algorithm to be selected：

F '=aI '+bO '+cS '+dT '+eA '；

Wherein, F ' is that the history of any algorithm to be selected scores, and a inputs resource consumption value coefficient for default data, and I ' is history Data input resource consumption value, and b exports resource consumption value coefficient for default data, and O ' exports resource consumption for historical data Value, c predict memory coefficient for default training, and S ' is history training prediction memory, and d is default trained predicted time coefficient, T ' trains predicted time for history, and e is default prediction accuracy coefficient, and A ' is historical forecast accuracy；

The history of all algorithms to be selected is scored and is arranged from high to low according to fraction, the order of the algorithm to be selected arranged is made For the training test sequence of algorithm to be selected.
4. system of selection according to claim 3, it is characterised in that it is described according to the trained test sequence, based on true Fixed training set, is successively trained the algorithm to be selected in the algorithm set to be selected, and it is corresponding to obtain each algorithm to be selected Training pattern, based on the corresponding training pattern of each algorithm to be selected, is predicted definite test set, obtains each calculation to be selected Multiple comprehensive grading parameters corresponding with the multiple history parameters of method further comprise：

It is described according to the trained test sequence, based on definite training set, successively to be selected in the algorithm set to be selected Algorithm is trained, and obtains the corresponding training pattern of each algorithm to be selected, and obtains the training data input of each algorithm to be selected Resource consumption value, training data output resource consumption value, training time and training memory；

Based on the corresponding training pattern of each algorithm to be selected, definite test set is predicted, obtains each algorithm to be selected Prediction data input resource consumption value, prediction data output resource consumption value, predicted time, prediction memory and prediction accuracy；

Resource consumption value and prediction data input resource consumption value weighted sum are inputted to the training data, obtains data Input resource consumption value；

Resource consumption value and prediction data output resource consumption value weighted sum are exported to the training data, obtains data Export resource consumption value；

To the training time and the predicted time weighted sum, training predicted time is obtained；

To the trained memory and the prediction memory weighted sum, training prediction memory is obtained；

By data input resource consumption value, data output resource consumption value, the trained predicted time, the training Predict memory and the prediction accuracy as the multiple comprehensive grading parameters.
5. system of selection according to claim 4, it is characterised in that it is described based on the multiple comprehensive grading parameters and The multiple predetermined coefficient, the comprehensive grading of each algorithm to be selected in the algorithm set to be selected is obtained by following formula：

F=aI+bO+cS+dT+eA；

Wherein, F is the comprehensive grading of any algorithm to be selected, and a inputs resource consumption value coefficient for default data, and I is defeated for data Enter resource consumption value, b exports resource consumption value coefficient for default data, and O exports resource consumption value for data, and c is default Memory coefficient is predicted in training, and S is training prediction memory, and d is default trained predicted time coefficient, and T is to train predicted time, e For default prediction accuracy coefficient, A is prediction accuracy.
6. system of selection according to claim 1, it is characterised in that it is described to be based on algorithms selection knowledge base, pass through decision-making Back-and-forth method is set, determines algorithm set to be selected, and it is described based on multiple history parameters and each right with the multiple history parameters The multiple predetermined coefficients answered, determine also to wrap between the training test sequence of each algorithm to be selected in the algorithm set to be selected Include：

Feature extraction and feature selecting are carried out to each data in definite data set, obtain the feature of each data；

The classification of feature and all algorithms based on each data, the data in the definite data set is divided into described true Fixed training set and the definite test set, wherein, all algorithms come from the algorithms selection knowledge base.
7. system of selection according to claim 6, it is characterised in that each data in described pair of definite data set into Row feature extraction and feature selecting, the feature for obtaining each data further include afterwards：

Based on the feature of each data, acquisition is not suitable for algorithm, and by the algorithm that is not suitable for from the algorithm set to be selected Delete.
8. system of selection according to claim 1, it is characterised in that it is described to be based on algorithms selection knowledge base, pass through decision-making Back-and-forth method is set, determines to further include before algorithm set to be selected：

By Bayes's optimization and element study method, the machine learning algorithm thermal starting is aided in.
9. system of selection according to claim 5, it is characterised in that the prediction accuracy for precision ratio, recall ratio, Any of AUC value.
10. a kind of machine learning algorithm selects system, it is characterised in that including：

Algorithm set module to be selected is determined, for based on algorithms selection knowledge base, by decision tree back-and-forth method, determining algorithm to be selected Set；

Priority block is determined, for based on multiple history parameters and corresponding multiple pre- with the multiple history parameters If coefficient, the training test sequence of each algorithm to be selected in the algorithm set to be selected is determined；

Training test module, for according to the trained test sequence, based on definite training set, successively to the algorithm to be selected Algorithm to be selected in set is trained, and is obtained the corresponding training pattern of each algorithm to be selected, is corresponded to based on each algorithm to be selected Training pattern, definite test set is predicted, obtains the corresponding with the multiple history parameters of each algorithm to be selected Multiple comprehensive grading parameters；

Comprehensive grading module is obtained, for based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, obtaining institute State the comprehensive grading of each algorithm to be selected in algorithm set to be selected；

Selection result module is obtained, for being selected the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm Select result.