CN108009643A - A kind of machine learning algorithm automatic selecting method and system - Google Patents

A kind of machine learning algorithm automatic selecting method and system Download PDF

Info

Publication number
CN108009643A
CN108009643A CN201711354616.9A CN201711354616A CN108009643A CN 108009643 A CN108009643 A CN 108009643A CN 201711354616 A CN201711354616 A CN 201711354616A CN 108009643 A CN108009643 A CN 108009643A
Authority
CN
China
Prior art keywords
algorithm
training
data
resource consumption
consumption value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711354616.9A
Other languages
Chinese (zh)
Other versions
CN108009643B (en
Inventor
***
龙明盛
付博
黄向东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201711354616.9A priority Critical patent/CN108009643B/en
Publication of CN108009643A publication Critical patent/CN108009643A/en
Application granted granted Critical
Publication of CN108009643B publication Critical patent/CN108009643B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention, which provides a kind of machine learning algorithm automatic selecting method and system, system of selection, to be included:Determine algorithm set to be selected;Based on multiple history parameters and multiple predetermined coefficients, the training test sequence of each algorithm to be selected in algorithm set to be selected is determined;According to training test sequence, based on definite training set, the algorithm to be selected in algorithm set to be selected is trained successively, obtain the corresponding training pattern of each algorithm to be selected, based on the corresponding training pattern of each algorithm to be selected, definite test set is predicted, obtains multiple comprehensive grading parameters of each algorithm to be selected;Based on multiple comprehensive grading parameters and multiple predetermined coefficients, the comprehensive grading of each algorithm to be selected of acquisition;Using the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm selection result.A kind of machine learning algorithm automatic selecting method and system provided by the invention, have very strong study analysis ability, realize extremely simple, can obtain the good result of effect.

Description

A kind of machine learning algorithm automatic selecting method and system
Technical field
The present invention relates to field of computer data processing, more particularly, to a kind of machine learning algorithm side of automatically selecting Method and system.
Background technology
Machine learning achieves significant progress in many application fields recently, this is facilitated popularizes comprehensively in each field The demand of machine learning method.Correspondingly, more and more commercial enterprises meeting this demand (for example, BigML.com, Wise.io, SkyTree.com, RapidMiner.com, Dato.com, Prediction.io, DataRobot.com, Microsoft Azure machine learning and Amazon machine learning).The core of machine learning is that each effective machine learning service needs Solve determine which kind of machine learning algorithm is used on data-oriented collection, and if how its feature is pre-processed with And how all hyper parameters are set.
One specific algorithm of selection generally requires expertise, weighs from the aspect of difference, there is several factors meeting The selection of specific algorithm is influenced, includes following factor:(1) size of data, quality and property;(2) can use calculate the time with Calculate space;(3) urgency of task;(4) usage of data.
In addition the development of some time is passed through in machine learning, and the quantity of algorithm is also increasingly come more, and each algorithm has How each the characteristics of and quality, so for many machine learning algorithm beginners, quickly select one properly Machine learning algorithm become one and have the problem of to be solved.
The content of the invention
The present invention provides a kind of a kind of machine learning algorithm automatic selecting method and system for overcoming the above problem.
According to an aspect of the present invention, there is provided a kind of machine learning algorithm automatic selecting method, including:Selected based on algorithm Knowledge base is selected, by decision tree back-and-forth method, determines algorithm set to be selected;Based on multiple history parameters and with the multiple history The corresponding multiple predetermined coefficients of parameter, determine the training test time of each algorithm to be selected in the algorithm set to be selected Sequence;According to the trained test sequence, based on definite training set, successively to the algorithm to be selected in the algorithm set to be selected into Row training, obtains the corresponding training pattern of each algorithm to be selected, based on the corresponding training pattern of each algorithm to be selected, to what is determined Test set is predicted, and obtains multiple comprehensive grading parameters corresponding with the multiple history parameters of each algorithm to be selected;Base In the multiple comprehensive grading parameters and the multiple predetermined coefficient, each algorithm to be selected in the algorithm set to be selected is obtained Comprehensive grading;Using the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm selection result.
Preferably, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, determine to go back before algorithm set to be selected Including:Determine the residing maximum classification of the algorithm to be selected, the residing maximum classification includes:Supervised learning class, semi-supervised Practise class and unsupervised learning class;Correspondingly, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, determine calculation to be selected Method set further comprises:Based on the decision tree in algorithms selection knowledge base, by residing for the definite algorithm to be selected most Big classification, successively chooses the algorithm to be selected, and the one or more algorithms to be selected successively chosen are as the algorithm to be selected Set.
Preferably, it is described based on multiple history parameters and with the corresponding multiple default systems of the multiple history parameters Number, determines that the training test sequence of each algorithm to be selected in the algorithm set to be selected further comprises:Based on multiple history Parameter and with the corresponding multiple predetermined coefficients of the multiple history parameters, going through for any algorithm to be selected is obtained by following formula Commentary on historical events or historical records point:
F '=aI '+bO '+cS '+dT '+eA ';
Wherein, F ' is that the history of any algorithm to be selected scores, and a inputs resource consumption value coefficient for default data, and I ' is Historical data inputs resource consumption value, and b exports resource consumption value coefficient for default data, and O ' exports resource for historical data and disappears Consumption value, c predict memory coefficient for default training, and S ' is history training prediction memory, and d is default trained predicted time system Number, T ' train predicted time for history, and e is default prediction accuracy coefficient, and A ' is historical forecast accuracy;All are treated Select the history of algorithm to score to arrange from high to low according to fraction, the instruction using the order of the algorithm to be selected arranged as algorithm to be selected Practice test sequence.
Preferably, it is described according to the trained test sequence, based on definite training set, successively to the set of algorithms to be selected Algorithm to be selected in conjunction is trained, and obtains the corresponding training pattern of each algorithm to be selected, corresponding based on each algorithm to be selected Training pattern, is predicted definite test set, obtains the corresponding with the multiple history parameters more of each algorithm to be selected A comprehensive grading parameters further comprise:It is described according to the trained test sequence, based on definite training set, successively to described Algorithm to be selected in algorithm set to be selected is trained, and obtains the corresponding training pattern of each algorithm to be selected, and obtain each treat The training data of algorithm is selected to input resource consumption value, training data output resource consumption value, training time and training memory;It is based on The corresponding training pattern of each algorithm to be selected, is predicted definite test set, obtains the prediction data of each algorithm to be selected Input resource consumption value, prediction data output resource consumption value, predicted time, prediction memory and prediction accuracy;To the instruction Practice data input resource consumption value and prediction data input resource consumption value weighted sum, obtain data input resource consumption Value;Resource consumption value and prediction data output resource consumption value weighted sum are exported to the training data, obtains data Export resource consumption value;To the training time and the predicted time weighted sum, training predicted time is obtained;To the instruction Practice memory and the prediction memory weighted sum, obtain training prediction memory;By data input resource consumption value, the number According to output resource consumption value, the trained predicted time, the training prediction memory and the prediction accuracy as described more A comprehensive grading parameters.
Preferably, it is described to be based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, obtained by following formula The comprehensive grading of each algorithm to be selected in the algorithm set to be selected:
F=aI+bO+cS+dT+eA;
Wherein, F is the comprehensive grading of any algorithm to be selected, and a inputs resource consumption value coefficient for default data, and I is number According to input resource consumption value, b exports resource consumption value coefficient for default data, and O exports resource consumption value for data, and c is pre- If training prediction memory coefficient, S is training prediction memory, and d be default trained predicted time coefficient, when T predicts for training Between, e is default prediction accuracy coefficient, and A is prediction accuracy.
Preferably, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, determine algorithm set to be selected, and institute State based on multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, determine the calculation to be selected Further included between the training test sequence of each algorithm to be selected in method set:Each data in definite data set are carried out Feature extraction and feature selecting, obtain the feature of each data;The classification of feature and all algorithms based on each data, by institute State the data in definite data set and be divided into the definite training set and the definite test set, wherein, it is described all Algorithm comes from the algorithms selection knowledge base.
Preferably, each data in described pair of definite data set carry out feature extraction and feature selecting, obtain each Further included after the feature of data:Based on the feature of each data, acquisition is not suitable for algorithm, and by the algorithm that is not suitable for from institute State and deleted in algorithm set to be selected.
Preferably, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, determine to go back before algorithm set to be selected Including:By Bayes's optimization and element study method, the machine learning algorithm thermal starting is aided in.
Preferably, the prediction accuracy is any of indexs such as precision ratio, recall ratio, AUC value.
According to another aspect of the present invention, there is provided a kind of machine learning algorithm automatic selective system, including:Determine to be selected Algorithm set module, for based on algorithms selection knowledge base, by decision tree back-and-forth method, determining algorithm set to be selected;Determine excellent First level module, for based on multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, really The training test sequence of each algorithm to be selected in the fixed algorithm set to be selected;Training test module, for according to the instruction Practice test sequence, based on definite training set, the algorithm to be selected in the algorithm set to be selected is trained successively, obtain every The corresponding training pattern of one algorithm to be selected, based on the corresponding training pattern of each algorithm to be selected, carries out definite test set pre- Survey, obtain multiple comprehensive grading parameters corresponding with the multiple history parameters of each algorithm to be selected;Obtain comprehensive grading mould Block, for based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, obtaining every in the algorithm set to be selected The comprehensive grading of one algorithm to be selected;Selection result module is obtained, for by the highest one or more algorithm to be selected of comprehensive grading As machine learning algorithm selection result.
A kind of machine learning algorithm automatic selecting method and system provided by the invention, select decision tree by setting Set in algorithm to be selected be trained and predict, and obtain comprehensive grading finally determine selection result, can have very strong Study analysis ability, realize extremely simple, the good result of effect can be obtained.Due to the use of in algorithms selection knowledge base Decision tree, being capable of fast selecting algorithm set to be selected.
Brief description of the drawings
Fig. 1 is a kind of flow chart of machine learning algorithm automatic selecting method in the embodiment of the present invention;
Fig. 2 is a kind of decision tree exemplary plot in the embodiment of the present invention;
Fig. 3 is a kind of FB(flow block) of machine learning algorithm automatic selecting method in the embodiment of the present invention;
Fig. 4 is a kind of module map of machine learning algorithm automatic selective system in the embodiment of the present invention.
Embodiment
With reference to the accompanying drawings and examples, the embodiment of the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
Fig. 1 is a kind of flow chart of machine learning algorithm automatic selecting method in the embodiment of the present invention, as shown in Figure 1, Including:Based on algorithms selection knowledge base, by decision tree back-and-forth method, algorithm set to be selected is determined;Based on multiple history parameters with And with the corresponding multiple predetermined coefficients of the multiple history parameters, determine each calculation to be selected in the algorithm set to be selected The training test sequence of method;According to the trained test sequence, based on definite training set, successively to the algorithm set to be selected In algorithm to be selected be trained, the corresponding training pattern of each algorithm to be selected is obtained, based on the corresponding instruction of each algorithm to be selected Practice model, definite test set is predicted, obtain the corresponding with the multiple history parameters multiple of each algorithm to be selected Comprehensive grading parameters;Based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, the set of algorithms to be selected is obtained The comprehensive grading of each algorithm to be selected in conjunction;Using the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm Selection result.
Specifically, algorithms selection knowledge base includes many algorithms.Fig. 2 is that a kind of decision tree in the embodiment of the present invention is shown Illustration, based on the decision tree shown in Fig. 2, determines algorithm set to be selected.All kinds of algorithms include by different level in algorithms selection knowledge base Specific algorithm, the selection level of decision tree is also corresponding.It should be noted that the algorithm in the embodiment of the present invention is machine Device learning algorithm.Further, for determining algorithm set to be selected, the algorithm reference value value in definite algorithm set to be selected It is identical with method, but respectively have quality in training speed, accuracy, these algorithms can serve as candidate algorithm, and table 1 is portion Partial node includes the explanation of algorithm.
Such as the task of a prediction watermelon quality, you can according to condition " having label ", " prediction classification ", " two species " Determine to belong to " binary classification " node, choose the algorithm included under binary classification node as candidate algorithm.For partly Meta learning has been used to carry out the project of assisted Selection algorithm, the algorithm set of candidate needs the algorithm included.
1 part of nodes of table includes algorithmic descriptions table
A kind of machine learning algorithm automatic selecting method provided by the invention, by setting the set selected decision tree In algorithm to be selected be trained and predict, and obtain comprehensive grading finally determine selection result, can have very strong study Analysis ability, realizes extremely simple, can obtain the good result of effect.Due to the use of the decision-making in algorithms selection knowledge base Tree, being capable of fast selecting algorithm set to be selected.
It is described to be based on algorithms selection knowledge base based on above-described embodiment, by decision tree back-and-forth method, determine set of algorithms to be selected Further included before closing:Determine the residing maximum classification of the algorithm to be selected, the residing maximum classification includes:Supervised learning class, Semi-supervised learning class and unsupervised learning class;Correspondingly, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, really Fixed algorithm set to be selected further comprises:Based on the decision tree in algorithms selection knowledge base, pass through the definite algorithm to be selected Residing maximum classification, the algorithm to be selected is successively chosen, described in the one or more algorithms conducts to be selected successively chosen Algorithm set to be selected.
Specifically, supervised learning class algorithm is made below and further illustrates, supervised learning class algorithm is based on one group of sample This is to making a prediction.For example, sales achievement can be used for predicting the price trend in future in the past.By supervised learning, one is had The input variable and the output variable of one group of hope prediction that group is made of mark training data.Algorithm Analysis training number can be used The function for being mapped to output will be inputted according to learn one.The function that algorithm is inferred can not known by summarizing training data prediction Result in scape and then predict unknown new example.
Classification:When data be used to predict classification, supervised learning can also handle this kind of classification task.Pasted to a pictures It is particularly the case for the label of upper cat or dog.When tag along sort only has two, here it is binary classification;It is then more more than two Member classification.
Return:When being predicted as serial number type, here it is a regression problem.This is one based on past and present The process in data prediction future, its maximum application are trend analyses.One representative instance is the merchandising business according to this year and the year before last Achievement is to predict the sales achievement of next year.
Abnormality detection:Sometimes, target is to identify only uncommon data point.For example, in fraud detection, it is any Extremely uncommon credit card purchase pattern is all suspicious.The possible variation of fraud is very much, but example of shaping is seldom, because This can not understand the outer sheet form of deception sexuality.The method that abnormal conditions detection uses is exactly only to understand the form of normal activity (using non-fraudulent transactions historical record), and determine any activity being very different.
Further, semi-supervised learning class algorithm is made below and further illustrates, the significant challenge of supervised learning is Labeled data is expensive and very time-consuming.If label is limited, supervised learning can be improved using non-labeled data.Due to Machine and non-fully there is supervision in this case, so referred to as semi-supervised.By semi-supervised learning, it can use and only include The non-mark example lifting study accuracy of a small amount of labeled data.
Further, unsupervised learning class algorithm is made below and further illustrates, among unsupervised learning, machine Non- labeled data is used completely, it is required to find to be hidden in the inherent pattern under data, such as cluster structure, low dimensional manifold Or sparse tree and figure.
Cluster:One group of data instance is classified as one kind, thus the example among a class (cluster) with other such Among example it is more like (according to some indexs), it is several classes that it, which is often used in whole Segmentation of Data Set,.This point Analysis can carry out among each classification, so as to help user.
Dimensionality reduction:Reduce the variable quantity considered.In many applications, initial data has very high characteristic dimension, and Some are characterized in unnecessary and uncorrelated to task.Dimensionality reduction will be helpful to find true, potential relation.
It is described based on multiple history parameters and corresponding more with the multiple history parameters based on above-described embodiment A predetermined coefficient, determines that the training test sequence of each algorithm to be selected in the algorithm set to be selected further comprises:It is based on Multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, obtained by following formula any to be selected The history scoring of algorithm:
F '=aI '+bO '+cS '+dT '+eA ';
Wherein, F ' is that the history of any algorithm to be selected scores, and a inputs resource consumption value coefficient for default data, and I ' is Historical data inputs resource consumption value, and b exports resource consumption value coefficient for default data, and O ' exports resource for historical data and disappears Consumption value, c predict memory coefficient for default training, and S ' is history training prediction memory, and d is default trained predicted time system Number, T ' train predicted time for history, and e is default prediction accuracy coefficient, and A ' is historical forecast accuracy;All are treated Select the history of algorithm to score to arrange from high to low according to fraction, the instruction using the order of the algorithm to be selected arranged as algorithm to be selected Practice test sequence.
Specifically, each coefficient can use 0.
A kind of machine learning algorithm automatic selecting method provided by the invention, by setting predetermined coefficient, and proposes five The different dimension of kind, can be more advantageous to obtaining optimal most suitable algorithm.
It is described according to the trained test sequence based on above-described embodiment, based on definite training set, treated successively to described Select the algorithm to be selected in algorithm set to be trained, the corresponding training pattern of each algorithm to be selected is obtained, based on each calculation to be selected The corresponding training pattern of method, is predicted definite test set, obtaining each algorithm to be selected with the multiple history parameters Corresponding multiple comprehensive grading parameters further comprise:It is described according to the trained test sequence, based on definite training set, according to The secondary algorithm to be selected in the algorithm set to be selected is trained, and obtains the corresponding training pattern of each algorithm to be selected, and obtain The training data of each algorithm to be selected is taken to input resource consumption value, training data output resource consumption value, training time and training Memory;Based on the corresponding training pattern of each algorithm to be selected, definite test set is predicted, obtains each algorithm to be selected Prediction data input resource consumption value, prediction data output resource consumption value, predicted time, prediction memory and prediction accuracy; Resource consumption value and prediction data input resource consumption value weighted sum are inputted to the training data, obtains data input Resource consumption value;Resource consumption value and prediction data output resource consumption value weighted sum are exported to the training data, Obtain data output resource consumption value;To the training time and the predicted time weighted sum, training predicted time is obtained; To the trained memory and the prediction memory weighted sum, training prediction memory is obtained;The data are inputted into resource consumption Value, data output resource consumption value, the trained predicted time, the training prediction memory and the prediction accuracy are made For the multiple comprehensive grading parameters.
It is described to be based on the multiple comprehensive grading parameters and the multiple predetermined coefficient based on above-described embodiment, pass through Following formula obtains the comprehensive grading of each algorithm to be selected in the algorithm set to be selected:
F=aI+bO+cS+dT+eA;
Wherein, F is the comprehensive grading of any algorithm to be selected, and a inputs resource consumption value coefficient for default data, and I is number According to input resource consumption value, b exports resource consumption value coefficient for default data, and O exports resource consumption value for data, and c is pre- If training prediction memory coefficient, S is training prediction memory, and d be default trained predicted time coefficient, when T predicts for training Between, e is default prediction accuracy coefficient, and A is prediction accuracy.
Specifically, in training data input resource consumption value, training data output resource consumption value, training time and training Depositing corresponding training resource consumption parameter is not the absolute value of design parameter, but chooses a standard and make reference, and provides its phase To value, to facilitate following calculating.The hyper parameter that is needed in Algorithm for Training is predeterminable can also to use other hyperparameter optimization Instrument, last first resource consumption parameter value is in the case of optimal hyper parameter;Similarly, prediction data input resource consumption Value, prediction data output resource consumption value, predicted time, prediction memory and the corresponding prediction resource consumption parameter of prediction accuracy Also it is such.
It is described to be based on algorithms selection knowledge base based on above-described embodiment, by decision tree back-and-forth method, determine set of algorithms to be selected Close, and it is described based on multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, determine institute State and further include between the training test sequence of each algorithm to be selected in algorithm set to be selected:To each in definite data set Data carry out feature extraction and feature selecting, obtain the feature of each data;Feature and all algorithms based on each data Data in the definite data set are divided into the definite training set and the definite test set by classification, wherein, All algorithms come from the algorithms selection knowledge base.
Specifically, feature extraction and feature selecting be all found out from primitive character it is most effective (consistency of similar sample, The distinctive of different samples, the robustness to noise) feature.
Further, feature extraction:Primitive character is converted to one group has obvious physical significance (Gabor, geometric properties [angle point, invariant], texture [LBP HOG]) or statistical significance or the feature of core.
Feature selecting:The feature of one group of most statistical significance is selected from characteristic set.
Both feature extraction and feature selecting can reduce data storage and input data bandwidth, reduce redundancy, can send out Existing more meaningful potential variable, help to produce data deeper into understanding.
Such as image, SIFT (Scale-invariant feature transform) is that a kind of detection is local The method of feature, it finds extreme point in space scale to a width figure, and extracts its position, scale, rotational invariants etc. Description, obtains feature and carries out Image Feature Point Matching, can be used to detect and the locality characteristic in description image.It is base In some local features on object, it maintains the invariance rotation, scaling, brightness change, to visual angle change, affine change Change, noise also keeps a degree of stability.
Then data are divided into training set S and test set T according to the type and data characteristics of algorithm.This step can make With a variety of methods, method, cross-validation method, bootstrap are such as reserved.
Table 2 is the corresponding common data collection feature of common clustering algorithm.
2 clustering algorithm character pair of table illustrates table
Such as a certain purpose data can not be converted into the vector in N-dimensional Euclidean space, can only provide similar between data Matrix is spent, just needs to reject K-means scheduling algorithms, prioritizing selection spectral clustering (Spectral clustering) scheduling algorithm at this time.
Based on above-described embodiment, each data in described pair of definite data set carry out feature extraction and feature selecting, The feature for obtaining each data further includes afterwards:Based on the feature of each data, acquisition is not suitable for algorithm, and is not suitable for described Algorithm is deleted from the algorithm set to be selected.
Based on above-described embodiment, the classification of feature and all algorithms based on each data, by reserving method, cross validation Any of method and bootstrap, by the data in the definite data set be divided into the definite training set and it is described really Fixed test set, wherein, all algorithms come from the algorithms selection knowledge base.
It is described to be based on algorithms selection knowledge base based on above-described embodiment, by decision tree back-and-forth method, determine set of algorithms to be selected Further included before closing:By Bayes's optimization and element study method, the machine learning algorithm thermal starting is aided in.
Domain expert obtains knowledge from pervious task:The performance characteristics of their Learning machine learning algorithms, meta learning This strategy is simulated by the performance of the learning algorithm of reasoning cross datasets.In this work, selected using meta learning Algorithm, these algorithms may show well in new data set.More specifically, for mass data collection, performance data is collected With a group metadata feature, you can with the feature of the data set effectively calculated, and aid in determining whether which uses in new data set Kind algorithm.
This element study method complements each other with Bayes's optimization, can optimize machine learning frame.Meta learning can be very It is proposed some algorithm examples of machine learning frame soon, these examples may show fairly good, but cannot provide on The fine granularity information of performance.
Based on above-described embodiment, the predictablity rate is any of indexs such as precision ratio, recall ratio, AUC value.
As a preferred embodiment, Fig. 3 is a kind of machine learning algorithm automatic selecting method in the embodiment of the present invention FB(flow block).The present embodiment refers to Fig. 3.
First, the residing maximum classification of the algorithm to be selected is determined, the residing maximum classification includes:Supervised learning class, Semi-supervised learning class and unsupervised learning class.
Further, by Bayes's optimization and element study method, the machine learning algorithm thermal starting is aided in.
Further, based on algorithms selection knowledge base, by decision tree back-and-forth method, algorithm set to be selected is determined.
Further, feature extraction and feature selecting are carried out to each data in definite data set, obtains each number According to feature;The classification of feature and all algorithms based on each data, the data in the definite data set are divided into The definite training set and the definite test set, wherein, all algorithms come from the algorithms selection knowledge base.
Further, the feature based on each data, acquisition is not suitable for algorithm, and the algorithm that is not suitable for is treated from described Select in algorithm set and delete.
Further, based on multiple history parameters and with the corresponding multiple default systems of the multiple history parameters Number, determines the training test sequence of each algorithm to be selected in the algorithm set to be selected.
Further, according to the trained test sequence, based on definite training set, successively to the algorithm set to be selected In algorithm to be selected be trained, the corresponding training pattern of each algorithm to be selected is obtained, based on the corresponding instruction of each algorithm to be selected Practice model, definite test set is predicted, obtain the corresponding with the multiple history parameters multiple of each algorithm to be selected Comprehensive grading parameters.
Further, based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, the calculation to be selected is obtained The comprehensive grading of each algorithm to be selected in method set.
Finally, using the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm selection result.
Based on above-described embodiment, Fig. 4 is a kind of mould of machine learning algorithm automatic selective system in the embodiment of the present invention Block diagram, as shown in figure 4, including:Determine algorithm set module to be selected, based on algorithms selection knowledge base, by decision tree back-and-forth method, Determine algorithm set to be selected;Priority block is determined, for based on multiple history parameters and each with the multiple history parameters Self-corresponding multiple predetermined coefficients, determine the training test sequence of each algorithm to be selected in the algorithm set to be selected;Training Test module, for according to the trained test sequence, based on definite training set, successively in the algorithm set to be selected Algorithm to be selected is trained, and obtains the corresponding training pattern of each algorithm to be selected, based on the corresponding trained mould of each algorithm to be selected Type, is predicted definite test set, obtains multiple synthesis corresponding with the multiple history parameters of each algorithm to be selected Grading parameters;Comprehensive grading module is obtained, for based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, obtaining Take the comprehensive grading of each algorithm to be selected in the algorithm set to be selected;Obtain selection result module, for by comprehensive grading most High one or more algorithms to be selected are as machine learning algorithm selection result.
A kind of machine learning algorithm automatic selecting method and system provided by the invention, select decision tree by setting Set in algorithm to be selected be trained and predict, and obtain comprehensive grading finally determine selection result, can have very strong Study analysis ability, realize extremely simple, the good result of effect can be obtained.Due to the use of in algorithms selection knowledge base Decision tree, being capable of fast selecting algorithm set to be selected.By setting predetermined coefficient, and five kinds of different dimensions are proposed, can It is more advantageous to obtaining optimal most suitable algorithm.Engineering is being carried out using algorithms selection knowledge base provided by the invention and instrument When practising algorithms selection, the algorithm of selection is substantially consistent with the algorithm of selection of specialists or more similar, and experimental result is effectively demonstrate,proved Understand the validity of system of selection provided by the invention.System of selection provided by the invention has very strong adaptability, can adapt to In a variety of machine learning frames and system.System of selection provided by the invention can effectively achieve and automatically select suitable machine The purpose of algorithm is practised, method is directly perceived effective, easy to use.
Finally, method of the invention is only preferable embodiment, is not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on, should be included in the protection of the present invention Within the scope of.

Claims (10)

  1. A kind of 1. machine learning algorithm system of selection, it is characterised in that including:
    Based on algorithms selection knowledge base, by decision tree back-and-forth method, algorithm set to be selected is determined;
    Based on multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, determine described to be selected The training test sequence of each algorithm to be selected in algorithm set;
    According to the trained test sequence, based on definite training set, successively to the algorithm to be selected in the algorithm set to be selected It is trained, obtains the corresponding training pattern of each algorithm to be selected, based on the corresponding training pattern of each algorithm to be selected, to determines Test set be predicted, obtain multiple comprehensive grading parameters corresponding with the multiple history parameters of each algorithm to be selected;
    Based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, each in the algorithm set to be selected treat is obtained Select the comprehensive grading of algorithm;
    Using the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm selection result.
  2. 2. system of selection according to claim 1, it is characterised in that it is described to be based on algorithms selection knowledge base, pass through decision-making Back-and-forth method is set, determines to further include before algorithm set to be selected:
    Determine the residing maximum classification of the algorithm to be selected, the residing maximum classification includes:Supervised learning class, semi-supervised learning Class and unsupervised learning class;
    Correspondingly, it is described to be based on algorithms selection knowledge base, by decision tree back-and-forth method, determine that algorithm set to be selected is further wrapped Include:
    Based on the decision tree in algorithms selection knowledge base, by the residing maximum classification of the definite algorithm to be selected, successively select The algorithm to be selected is taken, the one or more algorithms to be selected successively chosen are as the algorithm set to be selected.
  3. 3. system of selection according to claim 1, it is characterised in that it is described based on multiple history parameters and with it is described more A corresponding multiple predetermined coefficients of history parameters, determine the training survey of each algorithm to be selected in the algorithm set to be selected Examination order further comprises:
    Based on multiple history parameters and with the corresponding multiple predetermined coefficients of the multiple history parameters, obtained by following formula The history scoring of any algorithm to be selected:
    F '=aI '+bO '+cS '+dT '+eA ';
    Wherein, F ' is that the history of any algorithm to be selected scores, and a inputs resource consumption value coefficient for default data, and I ' is history Data input resource consumption value, and b exports resource consumption value coefficient for default data, and O ' exports resource consumption for historical data Value, c predict memory coefficient for default training, and S ' is history training prediction memory, and d is default trained predicted time coefficient, T ' trains predicted time for history, and e is default prediction accuracy coefficient, and A ' is historical forecast accuracy;
    The history of all algorithms to be selected is scored and is arranged from high to low according to fraction, the order of the algorithm to be selected arranged is made For the training test sequence of algorithm to be selected.
  4. 4. system of selection according to claim 3, it is characterised in that it is described according to the trained test sequence, based on true Fixed training set, is successively trained the algorithm to be selected in the algorithm set to be selected, and it is corresponding to obtain each algorithm to be selected Training pattern, based on the corresponding training pattern of each algorithm to be selected, is predicted definite test set, obtains each calculation to be selected Multiple comprehensive grading parameters corresponding with the multiple history parameters of method further comprise:
    It is described according to the trained test sequence, based on definite training set, successively to be selected in the algorithm set to be selected Algorithm is trained, and obtains the corresponding training pattern of each algorithm to be selected, and obtains the training data input of each algorithm to be selected Resource consumption value, training data output resource consumption value, training time and training memory;
    Based on the corresponding training pattern of each algorithm to be selected, definite test set is predicted, obtains each algorithm to be selected Prediction data input resource consumption value, prediction data output resource consumption value, predicted time, prediction memory and prediction accuracy;
    Resource consumption value and prediction data input resource consumption value weighted sum are inputted to the training data, obtains data Input resource consumption value;
    Resource consumption value and prediction data output resource consumption value weighted sum are exported to the training data, obtains data Export resource consumption value;
    To the training time and the predicted time weighted sum, training predicted time is obtained;
    To the trained memory and the prediction memory weighted sum, training prediction memory is obtained;
    By data input resource consumption value, data output resource consumption value, the trained predicted time, the training Predict memory and the prediction accuracy as the multiple comprehensive grading parameters.
  5. 5. system of selection according to claim 4, it is characterised in that it is described based on the multiple comprehensive grading parameters and The multiple predetermined coefficient, the comprehensive grading of each algorithm to be selected in the algorithm set to be selected is obtained by following formula:
    F=aI+bO+cS+dT+eA;
    Wherein, F is the comprehensive grading of any algorithm to be selected, and a inputs resource consumption value coefficient for default data, and I is defeated for data Enter resource consumption value, b exports resource consumption value coefficient for default data, and O exports resource consumption value for data, and c is default Memory coefficient is predicted in training, and S is training prediction memory, and d is default trained predicted time coefficient, and T is to train predicted time, e For default prediction accuracy coefficient, A is prediction accuracy.
  6. 6. system of selection according to claim 1, it is characterised in that it is described to be based on algorithms selection knowledge base, pass through decision-making Back-and-forth method is set, determines algorithm set to be selected, and it is described based on multiple history parameters and each right with the multiple history parameters The multiple predetermined coefficients answered, determine also to wrap between the training test sequence of each algorithm to be selected in the algorithm set to be selected Include:
    Feature extraction and feature selecting are carried out to each data in definite data set, obtain the feature of each data;
    The classification of feature and all algorithms based on each data, the data in the definite data set is divided into described true Fixed training set and the definite test set, wherein, all algorithms come from the algorithms selection knowledge base.
  7. 7. system of selection according to claim 6, it is characterised in that each data in described pair of definite data set into Row feature extraction and feature selecting, the feature for obtaining each data further include afterwards:
    Based on the feature of each data, acquisition is not suitable for algorithm, and by the algorithm that is not suitable for from the algorithm set to be selected Delete.
  8. 8. system of selection according to claim 1, it is characterised in that it is described to be based on algorithms selection knowledge base, pass through decision-making Back-and-forth method is set, determines to further include before algorithm set to be selected:
    By Bayes's optimization and element study method, the machine learning algorithm thermal starting is aided in.
  9. 9. system of selection according to claim 5, it is characterised in that the prediction accuracy for precision ratio, recall ratio, Any of AUC value.
  10. 10. a kind of machine learning algorithm selects system, it is characterised in that including:
    Algorithm set module to be selected is determined, for based on algorithms selection knowledge base, by decision tree back-and-forth method, determining algorithm to be selected Set;
    Priority block is determined, for based on multiple history parameters and corresponding multiple pre- with the multiple history parameters If coefficient, the training test sequence of each algorithm to be selected in the algorithm set to be selected is determined;
    Training test module, for according to the trained test sequence, based on definite training set, successively to the algorithm to be selected Algorithm to be selected in set is trained, and is obtained the corresponding training pattern of each algorithm to be selected, is corresponded to based on each algorithm to be selected Training pattern, definite test set is predicted, obtains the corresponding with the multiple history parameters of each algorithm to be selected Multiple comprehensive grading parameters;
    Comprehensive grading module is obtained, for based on the multiple comprehensive grading parameters and the multiple predetermined coefficient, obtaining institute State the comprehensive grading of each algorithm to be selected in algorithm set to be selected;
    Selection result module is obtained, for being selected the highest one or more algorithm to be selected of comprehensive grading as machine learning algorithm Select result.
CN201711354616.9A 2017-12-15 2017-12-15 A kind of machine learning algorithm automatic selecting method and system Active CN108009643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711354616.9A CN108009643B (en) 2017-12-15 2017-12-15 A kind of machine learning algorithm automatic selecting method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711354616.9A CN108009643B (en) 2017-12-15 2017-12-15 A kind of machine learning algorithm automatic selecting method and system

Publications (2)

Publication Number Publication Date
CN108009643A true CN108009643A (en) 2018-05-08
CN108009643B CN108009643B (en) 2018-10-30

Family

ID=62059505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711354616.9A Active CN108009643B (en) 2017-12-15 2017-12-15 A kind of machine learning algorithm automatic selecting method and system

Country Status (1)

Country Link
CN (1) CN108009643B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376419A (en) * 2018-10-16 2019-02-22 北京字节跳动网络技术有限公司 A kind of method, apparatus of data modeling, electronic equipment and readable medium
CN109933834A (en) * 2018-12-26 2019-06-25 阿里巴巴集团控股有限公司 A kind of model creation method and device of time series data prediction
CN109992866A (en) * 2019-03-25 2019-07-09 新奥数能科技有限公司 Training method, device, readable medium and the electronic equipment of load forecasting model
CN110008121A (en) * 2019-03-19 2019-07-12 合肥中科类脑智能技术有限公司 A kind of personalization test macro and its test method
CN110263982A (en) * 2019-05-30 2019-09-20 百度在线网络技术(北京)有限公司 The optimization method and device of ad click rate prediction model
CN110298032A (en) * 2019-05-29 2019-10-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Text classification corpus labeling training system
WO2020011068A1 (en) * 2018-07-10 2020-01-16 第四范式(北京)技术有限公司 Method and system for executing machine learning process
CN111210023A (en) * 2020-01-13 2020-05-29 哈尔滨工业大学 Automatic selection system and method for data set classification learning algorithm
TWI712981B (en) * 2018-12-13 2020-12-11 開曼群島商創新先進技術有限公司 Risk identification model training method, device and server
CN112988384A (en) * 2021-03-19 2021-06-18 深圳前海黑顿科技有限公司 Scene-based algorithm resource automatic integration calling method
US20210342998A1 (en) * 2020-05-01 2021-11-04 Samsung Electronics Co., Ltd. Systems and methods for quantitative evaluation of optical map quality and for data augmentation automation
CN113626331A (en) * 2021-08-12 2021-11-09 曙光信息产业(北京)有限公司 Communication algorithm selection method and device, computer equipment and storage medium
CN114492214A (en) * 2022-04-18 2022-05-13 支付宝(杭州)信息技术有限公司 Method and device for determining selection operator and optimizing strategy combination by using machine learning
WO2022218633A1 (en) * 2021-04-13 2022-10-20 British Telecommunications Public Limited Company Algorithm selection for processor-controlled device
CN115658371A (en) * 2022-12-14 2023-01-31 北京航空航天大学 Diagnosis algorithm quantitative recommendation method based on case learning and diagnosability analysis
US11645572B2 (en) 2020-01-17 2023-05-09 Nec Corporation Meta-automated machine learning with improved multi-armed bandit algorithm for selecting and tuning a machine learning algorithm
US11687795B2 (en) 2019-02-19 2023-06-27 International Business Machines Corporation Machine learning engineering through hybrid knowledge representation
CN116701652A (en) * 2023-06-13 2023-09-05 上海沄熹科技有限公司 Machine learning-based database intelligent operation and maintenance system and method
CN116862643A (en) * 2023-06-25 2023-10-10 福建润楼数字科技有限公司 Automatic wind control feature screening method for multi-channel fund integration credit business

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782976A (en) * 2010-01-15 2010-07-21 南京邮电大学 Automatic selection method for machine learning in cloud computing environment
CN104182770A (en) * 2013-05-24 2014-12-03 塔塔咨询服务有限公司 Method and system for automatic selection of one or more image processing algorithm
CN104376366A (en) * 2013-08-14 2015-02-25 华为技术有限公司 Method and device for selecting optimal network maximum flow algorithm
CN106250986A (en) * 2015-06-04 2016-12-21 波音公司 Advanced analysis base frame for machine learning
US20170286839A1 (en) * 2016-04-05 2017-10-05 BigML, Inc. Selection of machine learning algorithms

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782976A (en) * 2010-01-15 2010-07-21 南京邮电大学 Automatic selection method for machine learning in cloud computing environment
CN104182770A (en) * 2013-05-24 2014-12-03 塔塔咨询服务有限公司 Method and system for automatic selection of one or more image processing algorithm
CN104376366A (en) * 2013-08-14 2015-02-25 华为技术有限公司 Method and device for selecting optimal network maximum flow algorithm
CN106250986A (en) * 2015-06-04 2016-12-21 波音公司 Advanced analysis base frame for machine learning
US20170286839A1 (en) * 2016-04-05 2017-10-05 BigML, Inc. Selection of machine learning algorithms

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MATTHEW C. SIMPSON ET AL.: "Automatic Algorithm Selection in Computational", 《2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020011068A1 (en) * 2018-07-10 2020-01-16 第四范式(北京)技术有限公司 Method and system for executing machine learning process
CN109376419A (en) * 2018-10-16 2019-02-22 北京字节跳动网络技术有限公司 A kind of method, apparatus of data modeling, electronic equipment and readable medium
CN109376419B (en) * 2018-10-16 2023-12-22 北京字节跳动网络技术有限公司 Data model generation method and device, electronic equipment and readable medium
TWI712981B (en) * 2018-12-13 2020-12-11 開曼群島商創新先進技術有限公司 Risk identification model training method, device and server
CN109933834B (en) * 2018-12-26 2023-06-27 创新先进技术有限公司 Model creation method and device for time sequence data prediction
CN109933834A (en) * 2018-12-26 2019-06-25 阿里巴巴集团控股有限公司 A kind of model creation method and device of time series data prediction
US11687795B2 (en) 2019-02-19 2023-06-27 International Business Machines Corporation Machine learning engineering through hybrid knowledge representation
CN110008121A (en) * 2019-03-19 2019-07-12 合肥中科类脑智能技术有限公司 A kind of personalization test macro and its test method
CN110008121B (en) * 2019-03-19 2022-07-12 合肥中科类脑智能技术有限公司 Personalized test system and test method thereof
CN109992866A (en) * 2019-03-25 2019-07-09 新奥数能科技有限公司 Training method, device, readable medium and the electronic equipment of load forecasting model
CN109992866B (en) * 2019-03-25 2022-11-29 新奥数能科技有限公司 Training method and device of load prediction model, readable medium and electronic equipment
CN110298032A (en) * 2019-05-29 2019-10-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Text classification corpus labeling training system
CN110298032B (en) * 2019-05-29 2022-06-14 西南电子技术研究所(中国电子科技集团公司第十研究所) Text classification corpus labeling training system
CN110263982A (en) * 2019-05-30 2019-09-20 百度在线网络技术(北京)有限公司 The optimization method and device of ad click rate prediction model
CN111210023B (en) * 2020-01-13 2023-04-11 哈尔滨工业大学 Automatic selection system and method for data set classification learning algorithm
CN111210023A (en) * 2020-01-13 2020-05-29 哈尔滨工业大学 Automatic selection system and method for data set classification learning algorithm
US11645572B2 (en) 2020-01-17 2023-05-09 Nec Corporation Meta-automated machine learning with improved multi-armed bandit algorithm for selecting and tuning a machine learning algorithm
US20210342998A1 (en) * 2020-05-01 2021-11-04 Samsung Electronics Co., Ltd. Systems and methods for quantitative evaluation of optical map quality and for data augmentation automation
US11847771B2 (en) * 2020-05-01 2023-12-19 Samsung Electronics Co., Ltd. Systems and methods for quantitative evaluation of optical map quality and for data augmentation automation
CN112988384A (en) * 2021-03-19 2021-06-18 深圳前海黑顿科技有限公司 Scene-based algorithm resource automatic integration calling method
WO2022218633A1 (en) * 2021-04-13 2022-10-20 British Telecommunications Public Limited Company Algorithm selection for processor-controlled device
CN113626331A (en) * 2021-08-12 2021-11-09 曙光信息产业(北京)有限公司 Communication algorithm selection method and device, computer equipment and storage medium
CN114492214A (en) * 2022-04-18 2022-05-13 支付宝(杭州)信息技术有限公司 Method and device for determining selection operator and optimizing strategy combination by using machine learning
CN115658371A (en) * 2022-12-14 2023-01-31 北京航空航天大学 Diagnosis algorithm quantitative recommendation method based on case learning and diagnosability analysis
CN116701652A (en) * 2023-06-13 2023-09-05 上海沄熹科技有限公司 Machine learning-based database intelligent operation and maintenance system and method
CN116862643A (en) * 2023-06-25 2023-10-10 福建润楼数字科技有限公司 Automatic wind control feature screening method for multi-channel fund integration credit business

Also Published As

Publication number Publication date
CN108009643B (en) 2018-10-30

Similar Documents

Publication Publication Date Title
CN108009643B (en) A kind of machine learning algorithm automatic selecting method and system
Ando et al. Deep over-sampling framework for classifying imbalanced data
Wang et al. Kernelized subspace ranking for saliency detection
Wang et al. Transferring deep object and scene representations for event recognition in still images
WO2017133188A1 (en) Method and device for determining feature set
Malik et al. Applied unsupervised learning with R: Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA
Bonner et al. Exploring the semantic content of unsupervised graph embeddings: An empirical study
CN110880007A (en) Automatic selection method and system for machine learning algorithm
Garreta et al. Scikit-learn: machine learning simplified: implement scikit-learn into every step of the data science pipeline
Schultheiss et al. Finding the unknown: Novelty detection with extreme value signatures of deep neural activations
Wang et al. Fabric identification using convolutional neural network
CN113486983A (en) Big data office information analysis method and system for anti-fraud processing
Schuh et al. A comparative evaluation of automated solar filament detection
Fischer et al. REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit
Sunitha et al. Novel content based medical image retrieval based on BoVW classification method
Lin et al. Deep convolutional neural network for automatic discrimination between Fragaria× Ananassa flowers and other similar white wild flowers in fields
Shen et al. On image classification: Correlation vs causality
CN110163280A (en) A kind of clustering method and device
CN116861226A (en) Data processing method and related device
CN115907954A (en) Account identification method and device, computer equipment and storage medium
Shubh et al. Handwriting recognition using deep learning
Gallego et al. Multi-label logo classification using convolutional neural networks
Kumar et al. Image classification in python using Keras
Yuan et al. Multiple-instance learning via multiple-point concept based instance selection
Patil et al. Efficient processing of decision tree using ID3 & improved C4. 5 algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant