CN107169574A

CN107169574A - Using nested machine learning model come the method and system of perform prediction

Info

Publication number: CN107169574A
Application number: CN201710311867.2A
Authority: CN
Inventors: 陈雨强; 戴文渊; 杨强; 郭夏玮; 涂威威
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2017-05-05
Filing date: 2017-05-05
Publication date: 2017-09-15
Also published as: CN113610240A

Abstract

There is provided it is a kind of using nested machine learning model come the method and system of perform prediction, wherein, the nested machine learning model includes the upper layer model and underlying model trained according to levels nesting frame, and methods described includes：(A) prediction data record is obtained；(B) multiple character subsets that corresponding forecast sample is recorded with prediction data are generated based on the attribute information of prediction data record；(C) multiple character subsets of forecast sample are respectively supplied to upper layer model and the underlying model included by nested machine learning model, to obtain nested machine learning model predicting the outcome for forecast sample.It according to the system and method, effectively can merge polytype submodel to cooperate, give full play to the advantage of each submodel to obtain preferably comprehensive machine results of learning.

Description

Using nested machine learning model come the method and system of perform prediction

Technical field

The exemplary embodiment all things considered of the present invention is related to artificial intelligence field, more specifically to one kind utilization Nested machine learning model comes the method and system of perform prediction and a kind of method of the nested machine learning model of training and is System.

Background technology

With the appearance of mass data, artificial intelligence technology is developed rapidly, and in order to be excavated from mass data Bid value is, it is necessary to produce the training suitable for machine learning and/or forecast sample based on data record, to help to train Machine learning model and/or estimated using the machine learning model trained to perform.

Here, per data, record can be seen as the description as described in an event or object, corresponding to an example or sample Example.In data record, include each item of the performance or property of reflection event or object in terms of certain, these items can claim For " attribute ".The processing such as Feature Engineering is carried out by the attribute information to data record, can be produced including various features Machine learning sample.

In actual machine learning application, the attribute information of data record respectively has feature in terms of form or implication, Correspondingly, also in terms of form or implication there is each species diversity in produced feature, and this causes in individual machine learning sample Often there is the feature of different situations.

However, because the scene of application machine learning techniques will necessarily face, computing resource is limited, sample data is not enough, spy Levy processing and depart from the objective problems such as application scenarios, so a kind of machine learning model is difficult to find that in reality, can be each Plant and be respectively provided with appropriate performance in feature.For example, in the prior art, existing using linear model and neural network model to melt The scheme for closing training range and training depth (refers to Google paper " Wide＆Deep Learning for Recommender Systems "), but there is training complexity, computation complexity height, parameter regulation difficulty etc. and lack in such scheme Fall into, be very limited in the application of industrial quarters.

The content of the invention

The exemplary embodiment of the present invention is intended to overcome single machine learning model can not preferably be applied to all types The defect of feature.

According to the present invention exemplary embodiment there is provided it is a kind of using nesting machine learning model come the side of perform prediction Method, wherein, the nested machine learning model includes the upper layer model and underlying model trained according to levels nesting frame, Methods described includes：(A) prediction data record is obtained；(B) generate and predict number based on the attribute information of prediction data record According to the multiple character subsets for recording corresponding forecast sample；(C) multiple character subsets of forecast sample are respectively supplied to nesting Upper layer model and underlying model included by machine learning model, to obtain nested machine learning model for the pre- of forecast sample Survey result.

Alternatively, in the process, the upper layer model includes a decision tree submodel, also, lower floor's mould Type includes multiple linear submodels, wherein, each linear submodel corresponds to a leaf node of the decision tree submodel.

Alternatively, in the process, in step (B), prediction is generated based on the attribute information of prediction data record The feature of sample, and it is sub come the upper strata feature for generating forecast sample according to the value continuity and/or valued space scale of feature Lower floor's character subset of collection and forecast sample.

Alternatively, in the process, upper strata character subset covers whole features that value is successive value, also, lower floor Character subset covers whole features that value is discrete value；Or, upper strata character subset covers the whole that value is successive value Feature is together with the feature that at least a portion value is discrete value, also, it is discontinuous that lower floor's character subset, which covers remaining value, The feature of value.

Alternatively, in the process, in step (B), prediction is generated based on the attribute information of prediction data record The feature of sample, and lower floor's feature of the Deletional upper strata character subset and forecast sample to generate forecast sample according to feature Subset, wherein, whether Deletional instruction this feature of feature records the missing recorded relative to training data based on prediction data Attribute information and generate.

Alternatively, in the process, upper strata character subset covers all non-missing features, also, lower floor's character subset Cover all missing features and all non-missing feature.

In accordance with an alternative illustrative embodiment of the present invention there is provided it is a kind of using nested machine learning model come perform prediction System, wherein, the nested machine learning model includes the upper layer model and lower floor's mould trained according to levels nesting frame Type, the system includes：Prediction data records acquisition device, for obtaining prediction data record；Predicted characteristics subset produces dress Put, multiple features that corresponding forecast sample is recorded with prediction data are generated for the attribute information recorded based on prediction data Subset；Prediction meanss, for multiple character subsets of forecast sample to be respectively supplied to included by nested machine learning model Upper layer model and underlying model, to obtain nested machine learning model predicting the outcome for forecast sample.

Alternatively, in the system, the upper layer model includes a decision tree submodel, also, lower floor's mould Type includes multiple linear submodels, wherein, each linear submodel corresponds to a leaf node of the decision tree submodel.

Alternatively, in the system, predicted characteristics subset generation device based on the attribute information that prediction data is recorded come The feature of forecast sample is generated, and the upper of forecast sample is generated according to the value continuity and/or valued space scale of feature Lower floor's character subset of layer character subset and forecast sample.

Alternatively, in the system, upper strata character subset covers whole features that value is successive value, also, lower floor Character subset covers whole features that value is discrete value；Or, upper strata character subset covers the whole that value is successive value Feature is together with the feature that at least a portion value is discrete value, also, it is discontinuous that lower floor's character subset, which covers remaining value, The feature of value.

Alternatively, in the system, predicted characteristics subset generation device based on the attribute information that prediction data is recorded come The feature of forecast sample is generated, and according to the Deletional upper strata character subset and forecast sample to generate forecast sample of feature Lower floor's character subset, wherein, whether Deletional instruction this feature of feature is remembered based on prediction data record relative to training data The missing attribute information of record and generate.

Alternatively, in the system, upper strata character subset covers all non-missing features, also, lower floor's character subset Cover all missing features and all non-missing feature.

In accordance with an alternative illustrative embodiment of the present invention there is provided it is a kind of using nested machine learning model come perform prediction Computer-readable medium, wherein, the nested machine learning model includes the upper strata mould trained according to levels nesting frame Type and underlying model, wherein, record has the computer program for performing following steps on the computer-readable medium： (A) prediction data record is obtained；(B) attribute information recorded based on prediction data records corresponding pre- to generate with prediction data Multiple character subsets of test sample sheet；(C) multiple character subsets of forecast sample are respectively supplied to nested machine learning model institute Including upper layer model and underlying model, being predicted the outcome with obtaining nested machine learning model for forecast sample.

Alternatively, in the computer-readable medium, the upper layer model includes a decision tree submodel, also, The underlying model includes multiple linear submodels, wherein, each linear submodel corresponds to the one of the decision tree submodel Individual leaf node.

Alternatively, in the computer-readable medium, in step (B), the attribute information recorded based on prediction data To generate the feature of forecast sample, and forecast sample is generated according to the value continuity and/or valued space scale of feature Lower floor's character subset of upper strata character subset and forecast sample.

Alternatively, in the computer-readable medium, upper strata character subset covers whole features that value is successive value, Also, lower floor's character subset covers whole features that value is discrete value；Or, it is continuous that upper strata character subset, which covers value, Whole features of value are together with the feature that at least a portion value is discrete value, also, lower floor's character subset covers remaining value For the feature of discrete value.

Alternatively, in the computer-readable medium, in step (B), the attribute information recorded based on prediction data To generate the feature of forecast sample, and according to the Deletional upper strata character subset and forecast sample to generate forecast sample of feature Lower floor's character subset, wherein, whether Deletional instruction this feature of feature is recorded relative to training data based on prediction data The missing attribute information of record and generate.

Alternatively, in the computer-readable medium, upper strata character subset covers all non-missing features, also, under Layer character subset covers all missing features and all non-missing feature.

In accordance with an alternative illustrative embodiment of the present invention there is provided it is a kind of using nested machine learning model come perform prediction Computing device, wherein, the nested machine learning model includes the upper layer model trained according to levels nesting frame with Layer model, wherein, the computing device includes the computer executable instructions that are stored with memory unit and processor, memory unit Set, when the set of computer-executable instructions is closed by the computing device, performs following step：(A) prediction number is obtained According to record；(B) attribute information recorded based on prediction data records the multiple of corresponding forecast sample to generate with prediction data Character subset；(C) multiple character subsets of forecast sample are respectively supplied to the upper strata mould included by nested machine learning model Type and underlying model, to obtain nested machine learning model predicting the outcome for forecast sample.

Alternatively, in the computing device, the upper layer model includes decision tree submodel, also, it is described under Layer model includes multiple linear submodels, wherein, each linear submodel corresponds to a leaf of the decision tree submodel Node.

Alternatively, in the computing device, in step (B), generated based on the attribute information of prediction data record The feature of forecast sample, and it is special come the upper strata for generating forecast sample according to the value continuity and/or valued space scale of feature Levy lower floor's character subset of subset and forecast sample.

Alternatively, in the computing device, upper strata character subset covers whole features that value is successive value, also, Lower floor's character subset covers whole features that value is discrete value；Or, it is successive value that upper strata character subset, which covers value, Whole features are together with the feature that at least a portion value is discrete value, also, it is non-that lower floor's character subset, which covers remaining value, The feature of successive value.

Alternatively, in the computing device, in step (B), generated based on the attribute information of prediction data record The feature of forecast sample, and according to the Deletional upper strata character subset and the lower floor of forecast sample to generate forecast sample of feature Character subset, wherein, whether Deletional instruction this feature of feature records what is recorded relative to training data based on prediction data Lack attribute information and generate.

Alternatively, in the computing device, upper strata character subset covers all non-missing features, also, lower floor's feature Subset covers all missing features and all non-missing feature.

In accordance with an alternative illustrative embodiment of the present invention there is provided a kind of method for training nested machine learning model, wherein, The nested machine learning model includes the upper layer model and underlying model trained according to levels nesting frame, methods described Including：(a) training data record is obtained；(b) based on training data record attribute information come generate and training data record pair The multiple character subsets for the training sample answered；And (c) trains nested machine learning model institute according to levels nesting frame Including upper layer model and underlying model, wherein, among upper layer model and underlying model each be based on respective feature son Collect to be trained.

Alternatively, in the process, in step (b), training is generated based on the attribute information of training data record The feature of sample, and it is sub come the upper strata feature for generating training sample according to the value continuity and/or valued space scale of feature Lower floor's character subset of collection and training sample.

Alternatively, in the process, in step (b), training is generated based on the attribute information of training data record The feature of sample, and lower floor's feature of the Deletional upper strata character subset and training sample to generate training sample according to feature Subset, wherein, whether Deletional instruction this feature of feature records the missing recorded relative to training data based on prediction data Attribute information and generate.

Alternatively, in the process, in step (c), by nested machine learning model parameter, linear submodule shape parameter And/or decision tree submodel parameter is set to gradually change.

In accordance with an alternative illustrative embodiment of the present invention there is provided a kind of system for training nested machine learning model, wherein, The nested machine learning model includes the upper layer model and underlying model trained according to levels nesting frame, the system Including：Training data records acquisition device, for obtaining training data record；Training characteristics subset generation device, for based on The attribute information of training data record records multiple character subsets of corresponding training sample to generate with training data；And instruction Practice device, for training upper layer model and lower floor's mould included by nested machine learning model according to levels nesting frame Type, wherein, each among upper layer model and underlying model is trained based on respective character subset.

Alternatively, in the system, training characteristics subset generation device based on the attribute information that training data is recorded come The feature of training sample is generated, and the upper of training sample is generated according to the value continuity and/or valued space scale of feature Lower floor's character subset of layer character subset and training sample.

Alternatively, in the system, training characteristics subset generation device based on the attribute information that training data is recorded come The feature of training sample is generated, and according to the Deletional upper strata character subset and training sample to generate training sample of feature Lower floor's character subset, wherein, whether Deletional instruction this feature of feature is remembered based on prediction data record relative to training data The missing attribute information of record and generate.

Alternatively, in the system, training characteristics subset generation device is by nested machine learning model parameter, linear son Model parameter and/or decision tree submodel parameter are set to gradually change.

According to the exemplary embodiment of the present invention, there is provided a kind of computer-readable Jie of the nested machine learning model of training Matter, wherein, the nested machine learning model includes the upper layer model and underlying model trained according to levels nesting frame, Wherein, record has the computer program for performing following steps on the computer-readable medium：(a) training data is obtained Record；(b) multiple spies that corresponding training sample is recorded with training data are generated based on the attribute information of training data record Levy subset；And (c) trains upper layer model and the lower floor included by nested machine learning model according to levels nesting frame Model, wherein, each among upper layer model and underlying model is trained based on respective character subset.

Alternatively, in the computer-readable medium, in step (b), the attribute information recorded based on training data To generate the feature of training sample, and training sample is generated according to the value continuity and/or valued space scale of feature Lower floor's character subset of upper strata character subset and training sample.

Alternatively, in the computer-readable medium, in step (b), the attribute information recorded based on training data To generate the feature of training sample, and according to the Deletional upper strata character subset and training sample to generate training sample of feature Lower floor's character subset, wherein, whether Deletional instruction this feature of feature is recorded relative to training data based on prediction data The missing attribute information of record and generate.

Alternatively, in the computer-readable medium, in step (c), by nested machine learning model parameter, linearly Submodule shape parameter and/or decision tree submodel parameter are set to gradually change.

According to the exemplary embodiment of the present invention there is provided a kind of computing device of the nested machine learning model of training, wherein, The nested machine learning model includes the upper layer model and underlying model trained according to levels nesting frame, wherein, institute Stating computing device includes the set of computer-executable instructions conjunction that is stored with memory unit and processor, memory unit, when the meter When calculation machine executable instruction set is by the computing device, following step is performed：(a) training data record is obtained；(b) it is based on The attribute information of training data record records multiple character subsets of corresponding training sample to generate with training data；And (c) upper layer model and the underlying model included by nested machine learning model are trained according to levels nesting frame, wherein, on Each among layer model and underlying model is trained based on respective character subset.

Alternatively, in the computing device, in step (b), generated based on the attribute information of training data record The feature of training sample, and it is special come the upper strata for generating training sample according to the value continuity and/or valued space scale of feature Levy lower floor's character subset of subset and training sample.

Alternatively, in the computing device, in step (b), generated based on the attribute information of training data record The feature of training sample, and according to the Deletional upper strata character subset and the lower floor of training sample to generate training sample of feature Character subset, wherein, whether Deletional instruction this feature of feature records what is recorded relative to training data based on prediction data Lack attribute information and generate.

Alternatively, in the computing device, in step (c), by nested machine learning model parameter, linear submodel Parameter and/or decision tree submodel parameter are set to gradually change.

It is according to an exemplary embodiment of the present invention using nested machine learning model come the method and system of perform prediction And in the method and system of the nested machine learning model of training, the upper layer model and lower floor's mould of nesting machine learning model Type is configured as being formed according to the training of nested framework, also, each upper layer model or underlying model act on respective sample spy Subset is levied, in this way, effectively can merge polytype submodel to cooperate, give full play to each submodule The advantage of type preferably integrates machine results of learning to obtain.

Brief description of the drawings

From detailed description below in conjunction with the accompanying drawings to the embodiment of the present invention, these and/or other aspect of the invention and Advantage will become clearer and be easier to understand, wherein：

Fig. 1 is shown according to exemplary embodiment of the invention using nested machine learning model come the system of perform prediction Block diagram；

Fig. 2 is shown according to exemplary embodiment of the invention using nested machine learning model come the method for perform prediction Flow chart；

Fig. 3 shows the block diagram of the system of the nested machine learning model of training according to the exemplary embodiment of the present invention；

Fig. 4 shows the flow chart of the method for the nested machine learning model of training according to the exemplary embodiment of the present invention；

Fig. 5 A show the example of decision-tree model of the prior art；And

Fig. 5 B show the example of nested machine learning model according to an exemplary embodiment of the present invention.

Embodiment

In order that those skilled in the art more fully understand the present invention, with reference to the accompanying drawings and detailed description to this hair Bright exemplary embodiment is described in further detail.

Machine learning is the inevitable outcome that artificial intelligence study develops into certain phase, and it is directed to the hand by calculating Section, improves the performance of system itself using experience.In computer systems, " experience " generally exists in " data " form, leads to Machine learning algorithm is crossed, " model " can be produced from data, that is to say, that empirical data is supplied to machine learning algorithm, just Model can be produced based on these empirical datas, when in face of news, model can provide corresponding judgement, i.e. predict the outcome. Machine learning can be implemented as the form of " supervised learning ", " unsupervised learning " or " semi-supervised learning ", it should be noted that the present invention Exemplary embodiment to can be applied to levels nesting frame specific machine learning algorithm and without specific limitation.This Outside, it is also noted that training with application nested machine learning model during, also using statistic algorithm, business rule and/ Or expertise etc., further to improve the effect of machine learning.

Particularly, exemplary embodiment of the invention is related to the training of nested machine learning model and estimated, wherein, institute State upper layer model and underlying model that nested machine learning model includes training according to levels nesting frame, each upper strata mould Type or underlying model have respective character subset and formed according to the training of levels nesting frame.Correspondingly, it is of the invention to show Example property embodiment need to carry out the character subset division processing of ad hoc fashion for data record, and should by the character subset marked off For corresponding upper layer model or underlying model.

Fig. 1 is shown according to exemplary embodiment of the invention using nested machine learning model come the system of perform prediction Block diagram.Particularly, the forecasting system can be used for being directed to forecast sample, and its pass is provided using nested machine learning model In predicting the outcome for specific transactions problem (that is, predicting target), wherein, the nested machine learning model is included according to levels Upper layer model and underlying model that nesting frame is trained.

Here, the upper layer model or underlying model of nested machine learning model are unrestricted in type, it is any being capable of root The machine learning model for being trained for nested structure model according to levels nesting frame can be used as to be implemented according to the present invention is exemplary The upper layer model or underlying model of example., can be by setting corresponding configuration item in the training process of nested machine learning model To complete the training of layer model and underlying model on each.As an example, the parameter and/or parameter of nested machine learning model become Change mode etc. can be configured in the training process.

As an example, the upper layer model may include a decision tree submodel, also, the underlying model may include it is many Individual linear submodel, wherein, each linear submodel may correspond to a leaf node of the decision tree submodel.Wherein, The type of linear submodel is linear model, and the type of decision tree submodel is decision-tree model.Particularly, linear model With it is simple, training speed is fast the features such as, it can accommodate the large data sets of high-dimensional high sample number, but be used as linear classification Device, this class model can not capture the nonlinear transformations between feature, and model complexity is relatively low makes it in face of such as continuous special During situation about levying, preferable effect can not be often obtained.On the other hand, decision-tree model has very strong non-linear, it is easier to Extract the interactivity (interaction) of feature.Using integrated framework by decision-tree model carry out it is integrated (for example gradient lifting determine Plan tree (Gradient Boosting Decision Tree), flexibility is stronger, and (parameter is joined by correct adjust Tuning tend to obtain more preferable classifying quality after).However, its EMS memory occupation is big, speed is slow, for big data (it is high-dimensional, Multisample) training set is difficult to run with test set, and largely an efficient coding (one-hot has been made especially for existing Encoding the data set of discrete features), often has extremely high dimension, makes training and adjusts ginseng to become very difficult, instead And not as linear model.

By the way that linear submodel is trained for into nested machine learning mould according to levels nesting frame with decision tree submodel Type, can not only give full play to the advantage of two kinds of models, and the joint training linear model used with industrial quarters and god Mode through network is compared, and is all had in terms of ginseng difficulty and training speed is adjusted and is significantly improved.However, it should be noted that according to this The upper layer model or underlying model of invention exemplary embodiment are not limited to above two.

System shown in Fig. 1 can be realized all by computer program with software mode, can also be filled by special hardware Put to realize, can also be realized by way of software and hardware combining.Correspondingly, each device of the system shown in composition Fig. 1 can To be to only rely on computer program to realize the virtual module of corresponding function or realize the work(by hardware configuration The universal or special device of energy, can also be that operation has processor of corresponding computer program etc..

As shown in figure 1, prediction data record acquisition device 100 is used to obtain prediction data record.These prediction data are remembered Record can in any way be produced by any side, for example, it may be the number that the data filled in manually of client, client are submitted online According to, the data that prestore or generate, can also be data from external reception.The attribute information of these data can relate to client The information of itself, for example, the information such as identity, educational background, occupation, assets, contact method.Or, the attribute information of these data The information of business relevant item is can relate to, for example, on the turnover of deal contract, both parties, subject matter, loco etc. Information.It should be noted that the attribute for the data mentioned in the exemplary embodiment of the present invention can relate to any object or affairs in certain side The performance in face or property, and be not limited to that individual, object, tissue, unit, mechanism, project, event etc. are defined or described. In fact, any can be applied to the exemplary embodiment of the present invention by carrying out the information data of machine learning to it.

Prediction data record acquisition device 100 can obtain separate sources (for example, data from metadata provider, coming Come from the data of internet (for example, social network sites), the data from mobile operator, the data from APP operators, Data from express company, from data of credit institution etc.) structuring or unstructured data, for example, literary Notebook data or numeric data etc..These data can be input to prediction data record acquisition device 100, Huo Zheyou by input unit Prediction data record acquisition device 100 is automatically generated according to existing data, or can record acquisition device by prediction data 100 (for example, storage medium (for example, data warehouse) on the network) acquisitions from network, in addition, the mediant of such as server Prediction data, which is can help to, according to switch records acquisition device 100 from the corresponding data of external data source acquisition.Here, obtain The data conversion module such as the text analysis model that can be predicted in data record acquisition device 100 of data be converted to and be easily processed Form.It should be noted that prediction data record acquisition device 100 can be configured as being made up of software, hardware and/or firmware each Module, some of these modules module or whole modules can be integrated into one or common cooperation to complete specific function.

Predicted characteristics subset generation device 200 is used to the attribute information that records based on prediction data generate and prediction data Record multiple character subsets of corresponding forecast sample.Particularly, predicted characteristics subset generation device 200 can be by prediction The attribute information of data record screened, be grouped or further additional treatments etc. and obtain multiple features, and by described Multiple features carry out various divisions and obtain multigroup feature (wherein, each feature can be divided into one or more groups), often Group feature can as forecast sample a character subset, here, forecast sample is corresponding with prediction data record, is usually implemented as Machine learning model is directly inputted.It should be noted that features described above subset may include a part of feature, or, it may include it is all special Levy, or, it may not include any feature.According to the exemplary embodiment of the present invention, predicted characteristics subset generation device 200 can be pressed Character subset is generated according to any appropriate mode, for example, it is contemplated that content, implication, value continuity, the value of attribute information The factors such as scope, valued space scale, Deletional, importance, or, can combine upper strata in nested machine learning model or under The characteristics of layer model etc..

Prediction meanss 300 are wrapped for multiple character subsets of forecast sample to be respectively supplied into nested machine learning model The upper layer model and underlying model included, to obtain nested machine learning model predicting the outcome for forecast sample.

Particularly, prediction meanss 300 can be carried discriminatively to as each upper layer model or underlying model of submodel For one or more character subsets, here, the character subset that all submodels are obtained is not fully identical, and any two submodule Type (whether belonging to the submodel of same layering or the submodel of different layering) can be provided that identical, part it is identical or Entirely different character subset.That is, each upper layer model or underlying model of nested machine learning model are for its quilt The character subset of offer is estimated to perform, correspondingly, can integrate the estimation results of all submodels and obtain nested machine learning Model predicts the outcome for forecast sample is overall.

Describe to be held using nested machine learning model according to exemplary embodiment of the invention hereinafter with reference to Fig. 2 The flow chart of the method for row prediction.Here, as an example, method shown in Fig. 2 can be as shown in Figure 1 forecasting system perform, Also it can be realized, can also be performed by the computing device of particular configuration shown in Fig. 2 with software mode by computer program completely Method.

For convenience, it is assumed that the forecasting system of method as shown in Figure 1 shown in Fig. 2 is performed, as illustrated, in step In rapid S100, prediction data record is obtained by prediction data record acquisition device 100.

Here, as an example, every prediction data record may correspond to an item to be predicted on particular prediction problem (for example, event or object), correspondingly, prediction data record may include the performance or property of reflection event or object in terms of certain The various attribute informations of (that is, attribute)., can be further by carrying out screening, be grouped or handling accordingly to these attribute informations Obtain the sample characteristics for carrying out machine learning.Here, prediction data record acquisition device 100 can be by manual, semi-automatic Or full automatic mode carrys out gathered data, as an example, prediction data record acquisition device 100 can gathered data in bulk.

Prediction data record acquisition device 100 can receive what user was manually entered by input unit (for example, work station) Prediction data is recorded.In addition, prediction data record acquisition device 100 can be taken out by full automatic mode from data source systems Prediction data record, for example, by with software, firmware, hardware or its combination realize timer mechanism come systematically number of request Asked data are obtained according to source and from response.The data source may include one or more databases or other servers. Can be realized via internal network and/or external network it is full-automatic obtain the mode of data, wherein may include by internet come Transmit the data of encryption.In the case where server, database, network etc. are configured as communicating with one another, it can not do manually It is automatic in the case of pre- to carry out data acquisition, it should be noted that certain user's input operation still may be present in this manner. Semiautomatic fashion is between manual mode and full-automatic mode.Semiautomatic fashion and the difference of full-automatic mode are by user The trigger mechanism of activation instead of timer mechanism.In this case, in the case where receiving specific user's input, Produce the request for extracting data.When obtaining data every time, it is preferable that can be by the data storage of capture in nonvolatile memory In.As an example, availability data warehouse come be stored in obtain during the data that gather.Alternatively, can be (all by hardware cluster Such as Hadoop clusters) data collected are stored and/or subsequent treatment, for example, storage, classification and other offline behaviour Make.In addition, also online stream process can be carried out to the data of collection.

As an example, may include the data conversion modules such as text analysis model in prediction data record acquisition device 100, use The structural data that uses is easier to be further processed or quote in the unstructured datas such as text are converted to.Base It may include Email, document, webpage, figure, spreadsheet, call center's daily record, suspicious transaction report in the data of text Accuse etc..

Next, in step s 200, being believed by predicted characteristics subset generation device 200 based on the attribute that prediction data is recorded Breath records multiple character subsets of corresponding forecast sample to generate with prediction data.

Here, can base during being converted to by prediction data record and can directly input the corresponding forecast sample of model Each feature of forecast sample is generated in each attribute information.According to the exemplary embodiment of the present invention, forecast sample can have Multiple character subsets so that each upper layer model or underlying model can have respective character subset.

Predicted characteristics subset generation device 200 can in any suitable manner, the attribute letter recorded based on prediction data Cease to produce the individual features of forecast sample, and these features are combined as each character subset according to ad hoc fashion.

For example, predicted characteristics subset generation device 200 can generate prediction sample based on the attribute information of prediction data record This feature, and the upper strata character subset of forecast sample is generated according to the value continuity and/or valued space scale of feature With lower floor's character subset of forecast sample.

Particularly, after each feature of forecast sample is generated, the value that may correspondingly determine that each feature is Successive value or discrete value (that is, centrifugal pump), or, it may correspondingly determine that the valued space scale of each feature (for example, property Other feature can correspond to two-dimensional feature space) etc..On this basis, each feature can be produced according to specific dividing mode Collection.

As an example, for upper layer model, its character subset can only include the continuous value tag of at least a portion, example Such as, part or all of spy that value is successive value can be covered as the character subset of the single decision tree submodel of upper layer model Levy.

In addition, for above-mentioned single decision tree submodel, its character subset is except including at least a portion successive value Outside feature, a part of discrete value feature is may also include, in this case, it is contemplated that the valued space of discrete value feature The factors such as scale, the feature sum of character subset determine the discrete value feature that will be included in decision tree character subset.

As an example, for underlying model, its character subset can only include at least a portion discrete value feature, example Such as, each it may include that identical, part is identical or entirely different as the character subset of the linear submodel of underlying model Discrete value feature；As an example, the character subset of linear submodel as the overall value that can cover be discrete value Part or all of feature.

In addition, for above-mentioned linear submodel, its character subset is except including at least a portion discrete value feature Outside, it may also include a part of continuous value tag.That is, the character subset of each linear submodel may include complete phase With the identical or entirely different continuous value tag in, part or discrete value feature；As an example, linear submodel feature Subset is as overall at least a portion discrete value feature that can cover together with a part of continuous value tag.

Here, upper strata character subset and lower floor's character subset can synergistically be generated.According to the exemplary embodiment of the present invention, Upper strata character subset and lower floor's character subset can cover entirely different feature, can also cover at least a portion identical feature. As an example, upper strata character subset can cover whole features that value is successive value, correspondingly, lower floor's character subset, which can be covered, to be taken It is worth whole features for discrete value, or, lower floor's character subset can cover all features of forecast sample；Show as another Example, upper strata character subset can cover whole features that value is successive value together with the spy that at least a portion value is discrete value Levy, correspondingly, lower floor's character subset can cover the feature that remaining value is discrete value, or, lower floor's character subset can be covered All features of forecast sample.

In addition, predicted characteristics subset generation device 200 can generate prediction sample based on the attribute information of prediction data record This feature, and lower floor's feature of the Deletional upper strata character subset and forecast sample to generate forecast sample according to feature Collection, wherein, whether Deletional instruction this feature of feature records the missing category recorded relative to training data based on prediction data Property information and generate.

Particularly, in the practical application scene of machine learning model, some of training data record attribute information Usually do not appear in prediction data record, the feature generated in forecast sample based on such missing attribute information is Missing feature (wherein, lack attribute information can be set as null value), on the contrary, not using lack attribute information as foundation feature i.e. For non-missing feature.It should be understood that the Deletional deviation (bias) that can cause to predict the outcome of this feature, and according to the present invention's Exemplary embodiment, can be above-mentioned effectively to eliminate by the way that missing feature and non-missing feature are divided into appropriate character subset Deviation.

As an example, for upper layer model, its character subset can only include the non-missing feature of at least a portion, example Such as, a part of non-missing feature or all non-missing can be covered as the character subset of the single decision tree submodel of upper layer model Feature.

As an example, for underlying model, its character subset may include at least a portion missing feature, for example, often The character subset of the individual linear submodel as underlying model may include that the identical or entirely different missing in identical, part is special Levy；As an example, the character subset of linear submodel as overall can cover part missing feature or all missing is special Levy.

In addition, for above-mentioned linear submodel, its character subset in addition to including at least a portion missing feature, The non-missing feature of at least a portion is may also include, in this case, the character subset of the linear submodel of institute is as integrally can Cover a part of feature or all features of forecast sample.

Here, upper strata character subset and lower floor's character subset can synergistically be generated.According to the exemplary embodiment of the present invention, Upper strata character subset and lower floor's character subset can cover entirely different feature, can also cover at least a portion identical feature. As an example, upper strata character subset can cover all non-missing features, correspondingly, it is special that lower floor's character subset can cover all missings The non-missing features of whole of seeking peace.

It should be noted that predicted characteristics subset generation device 200 is when producing character subset, can according to it is any with attribute information, The relevant factor such as submodel or data, exemplary embodiment of the invention is not intended to limit the specific producing method of character subset.

In addition, the screening or packet of attribute information during producing feature, can not only carried out based on attribute information, The attribute information that screening or packet are obtained can be also further processed, i.e. alternately, predicted characteristics subset is produced Device 200 can carry out Feature Engineering processing to the prediction data record of acquisition, for example, predicted characteristics subset generation device 200 can The primitive attribute information that prediction data is recorded is carried out discretization, field combination, extract part field value, round etc. it is various The processing of Feature Engineering, and the feature after processing is combined as each character subset according to ad hoc rules.

In step S300, multiple character subsets of forecast sample are respectively supplied to nested machine by prediction meanss 300 Upper layer model and underlying model included by learning model, to obtain the prediction knot that nested machine learning model is directed to forecast sample Really.

Here, nested machine learning model can be stored among the system shown in Fig. 1, or, nested machine learning model It can be stored in outside the system shown in Fig. 1；As an example, the nested machine can be read by prediction meanss 300 or other devices Learning model so that character subset can directly be supplied to the nested machine learning model read out by prediction meanss 300.

In addition, nested machine learning model can be also always positioned at outside the system shown in Fig. 1, and it is straight by prediction meanss 300 Connect or character subset is supplied to externally-located nested machine learning model via other devices.In this case, predict Device 300 can also predicting the outcome from the nested machine learning model of external reception.

The nested machine of training according to the exemplary embodiment of the present invention is described below in conjunction with Fig. 3, Fig. 4, Fig. 5 A with Fig. 5 B The system and its training method of device learning model.

Nested machine mould described here may include two-layer model, also, be trained according to levels nesting frame Layer model and underlying model.As described above, can be quantitatively one or more per layer model, can have between each submodel There is the identical or entirely different character subset in identical, part,

Here, it is contemplated that the upper strata mould of nested machine learning model is designed to model, sample, feature, forecasting problem etc. Type and underlying model, for example, the upper layer model may include a decision tree submodel, also, the underlying model may include Multiple linear submodels, wherein, each linear submodel may correspond to a leaf node of the decision tree submodel.

Particularly, the system that Fig. 3 shows the nested machine learning model of training according to the exemplary embodiment of the present invention Block diagram, wherein, the nested machine learning model includes the upper layer model that is trained according to levels nesting frame and lower floor Model.Training system shown in Fig. 3 can be realized all by computer program with software mode, can also be filled by special hardware Put to realize, can also be realized by way of software and hardware combining.Correspondingly, each device of the system shown in composition Fig. 3 can To be to only rely on computer program to realize the virtual module of corresponding function or realize the work(by hardware configuration The universal or special device of energy, can also be that operation has processor of corresponding computer program etc..

As shown in figure 3, training data record acquisition device 1000 is used to obtain training data record.Here, training data Record acquisition device 1000 can using it is various it is appropriate by the way of come offline or acquisition training data is recorded online.According to the present invention Exemplary embodiment, training data record acquisition device 1000 can use with prediction data record acquisition device 100 it is similar Mode performs operation, and the specific data that only both obtain are different, therefore here will no longer be described in greater detail.By The training data that training data record acquisition device 1000 is obtained is recorded in addition to including various attribute informations, in addition to this Mark (label) of the data record relative to forecasting problem.

Training characteristics subset generation device 2000 is used to the attribute information based on training data record generate and train number According to the multiple character subsets for recording corresponding training sample.Here, training characteristics subset generation device 2000 can be according to any suitable When mode generate character subset, for example, it is contemplated that the content of attribute information, implication, value continuity, span, take It is worth the factors such as Space Scale, Deletional, importance, or, the upper layer model in nested machine learning model or lower floor can be combined The characteristics of model etc..According to the exemplary embodiment of the present invention, training characteristics subset generation device 2000 can be according to special with prediction Levy the corresponding mode of subset generation device 200 to generate each feature of training sample, i.e. training sample is with feature samples in spy Character subset aspect of seeking peace is respectively provided with correspondence.It should be understood that due in practice prediction data record relative to training data record There may be the attribute information of some missings, therefore, be generated in predicted characteristics subset generation device 200 with lacking attribute information During relevant feature, corresponding missing attribute information in prediction data record and be set as null value.

Trainer 3000 is used to train the upper strata included by nested machine learning model according to levels nesting frame Model and underlying model, wherein, each among upper layer model and underlying model is instructed based on respective character subset Practice.Here, trainer 3000 can be first according to appropriate mode and train layer model, then further obtain and upper strata mould Each connected underlying model of type.Particularly, trainer 3000 can perform initialization process according to the parameter of configuration, and It is determined that upper layer model or the type of underlying model and corresponding character subset are divided.Assuming that upper layer model is single decision tree mould Type, and underlying model is multiple linear models.Correspondingly, in the traditional decision tree-model shown in Fig. 5 A, fall in same leaf The output valve of sample in child node is identical, and if this constant output valve is substituted for a linear model portion Point, so that it may obtain the nested machine learning model of a levels as shown in Figure 5 B.The nested machine learning model trained It can be stored in Fig. 3 system subsequently to use, or, the nested machine learning model trained can be supplied to outside System or device.

The side of the nested machine learning model of training according to the exemplary embodiment of the present invention is described hereinafter with reference to Fig. 4 The flow chart of method.Here, as an example, the method shown in Fig. 4 can be as shown in Figure 3 training system perform, also can lead to completely Cross computer program to realize with software mode, the method shown in Fig. 4 can be also performed by the computing device of particular configuration.

For convenience, it is assumed that the training system of method as shown in Figure 3 shown in Fig. 4 is performed, as illustrated, in step In rapid S1000, training data record is obtained by training data record acquisition device 1000.Here, can according to step S100 classes As mode perform step S1000, the specific data only obtained in the two steps are different, for example, training data Record is in addition to including various attribute informations, in addition to data record is relative to the mark (label) of forecasting problem.

Next, in step S2000, the attribute recorded by training characteristics subset generation device 2000 based on training data Information records multiple character subsets of corresponding training sample to generate with training data.

For example, in step S2000, the attribute letter that training characteristics subset generation device 2000 can be recorded based on training data Cease to generate the feature of training sample, and training sample is generated according to the value continuity and/or valued space scale of feature Upper strata character subset and training sample lower floor's character subset.

Correspondingly, upper strata character subset can cover whole features that value is successive value, also, lower floor's character subset can be contained Lid value is whole features of discrete value, or, lower floor's character subset can cover all features of forecast sample；Or, on Layer character subset can cover whole features that value is successive value together with the feature that at least a portion value is discrete value, and And, lower floor's character subset can cover the feature that remaining value is discrete value, or, lower floor's character subset can cover forecast sample All features.

In another example, in step S2000, the attribute that training characteristics subset generation device 2000 can be recorded based on training data Information generates the feature of training sample, and according to the Deletional upper strata character subset and training to generate training sample of feature Lower floor's character subset of sample, wherein, whether Deletional instruction this feature of feature is recorded relative to training based on prediction data The missing attribute information of data record and generate.

Here, it should be appreciated that so-called missing refers to that some attribute informations exist in training data record and predicting number Lacked according in record, therefore, missing attribute information is arranged to null value in prediction data record, and in training data record Can then have actual value.Correspondingly, the missing feature of training sample or non-missing feature are also only to have continued to use and forecast sample Missing feature or the same saying of non-missing feature, these features of training sample are not represented has the category lacked in itself Property information.

Correspondingly, upper strata character subset can cover all non-missing features, also, lower floor's character subset can cover whole lack Lose feature and all non-missing feature.

It should be understood that step S2000 can be performed according to mode corresponding with step S200, it will not be described in great detail here some heavy Multiple content and details.

In step S3000, trainer 3000 can train nested machine learning model according to levels nesting frame Included upper layer model and underlying model, wherein, each among upper layer model and underlying model is based on respective feature Subset is trained.

Particularly, at least one among the following items of the configurable nested machine learning model of trainer 3000： The sum of upper layer model, the type of total, each upper layer model of underlying model, the type of each underlying model, each upper layer model Parameter, the parameter of each underlying model, the Parameters variation mode of each upper layer model, the Parameters variation mode of each underlying model.Institute's shape Into model training configuration can be used for instruct subsequently be directed to each submodel often wheel train.Especially, in this step, can be by Nested machine learning model parameter, upper strata model parameter (for example, decision tree submodule shape parameter) and/or underlying model parameter setting To gradually change.By this parameter adaptive (parameter adaptation), model population parameter can be allowed (as learnt Rate) and submodule shape parameter (such as linear model iteration wheel number, regularization coefficient, decision tree depth etc.) gradually changed.

For example, when the upper layer model of nested machine learning model is decision tree submodel and underlying model is at least one line During sub-model, it will be appreciated that：For decision tree submodel, falling the output valve of the sample on same leaf node has been It is exactly the same, and if this constant output valve is substituted for a linear submodel, so that it may to obtain a levels nested Machine learning model.Specifically, it is assumed that input training sample is expressed as x, then nested machine learning model can be expressed as：

In above formula, v_jIt is the linear weight vector on j-th of leaf node, b_j(x) it is an indicator function, only x exists Export 1 when on j-th of leaf node, other when export 0.The training method of the nested machine learning model is divided into two Step, the first step is one decision-tree model of generation, and second step is that corresponding weight vectors are solved on the basis of the model of generation. According to the exemplary embodiment of the present invention, upper layer model and underlying model can correspond to different character subsets (that is, feature changes respectively Change result), correspondingly, Φ^tThe eigentransformation of input decision tree submodel, Φ can be represented^lRepresent the feature of input linear submodel Conversion.

For convenience, it is assumed that be reduced to nested machine learning model：

First, a decision-tree model ∑ can be obtained using appropriate decision-tree model training method_jα_jb_j(x), its In, α_jIt is the weight on j-th of leaf node, then, it is assumed that there are the instruction of N (N is the integer more than 1) individual training sample composition Practice sample set D={ (x_i,y_i) | i=1,2 ..., N }, wherein, x_iIndicate i-th of training sample, y_iFor x_iMark, then can basis Following formula calculates weight vectors optimal solution：

Here, λ^tlAnd β^tlIt is regularization coefficient, l () is corresponding loss function.The formula can use FTRL- Proximal is solved.

The training method of nested machine learning model is enumerated above, however, it should be understood that the exemplary embodiment of the present invention It is not limited to above-mentioned example.

It should be understood that Fig. 1 and device illustrated in fig. 3 can be individually configured to perform software, hardware, the firmware of specific function Or any combination of above-mentioned item.For example, these devices may correspond to special integrated circuit, pure software generation can also correspond to Code, also corresponds to unit or module that software is combined with hardware.In addition, the one or more functions that these devices are realized Also it can be sought unity of action by the component in physical entity equipment (for example, processor, client or server etc.).

Above by reference to Fig. 1 and Fig. 2 describe it is according to an exemplary embodiment of the present invention using nested machine learning model come The system and method for perform prediction.It should be understood that above-mentioned Forecasting Methodology can be by recording the program in computer-readable media come real It is existing, correspondingly, according to the exemplary embodiment of the present invention, it is possible to provide it is a kind of using nesting machine learning model come perform prediction Medium, wherein, wherein, the nested machine learning model includes the upper layer model trained according to levels nesting frame with Layer model, record has the computer program for performing following methods step on the computer-readable medium：(A) obtain pre- Survey data record；(B) attribute information recorded based on prediction data records corresponding forecast sample to generate with prediction data Multiple character subsets；(C) multiple character subsets of forecast sample are respectively supplied to upper included by nested machine learning model Layer model and underlying model, to obtain nested machine learning model predicting the outcome for forecast sample.

The nested machine learning model of training according to an exemplary embodiment of the present invention is described above by reference to Fig. 3 and Fig. 4 System and method.It should be understood that above-mentioned training method can be realized by recording the program in computer-readable media, correspondingly, According to the exemplary embodiment of the present invention, it is possible to provide a kind of medium of the nested machine learning model of training, wherein, the nested machine Device learning model includes the upper layer model and underlying model trained according to levels nesting frame, in computer-readable Jie Record has the computer program for performing following methods step in matter：(a) training data record is obtained；(b) based on training number Multiple character subsets that corresponding training sample is recorded with training data are generated according to the attribute information of record；And (c) basis Levels nesting frame trains upper layer model and the underlying model included by nested machine learning model, wherein, upper layer model It is trained with each among underlying model based on respective character subset.

Computer program in above computer computer-readable recording medium can be in client, main frame, agent apparatus, server etc. In the environment disposed in computer equipment run, it should be noted that the computer program can be additionally used in perform except above-mentioned steps with Outer additional step or performed when performing above-mentioned steps more specifically handles, and these additional steps and further handles Content is described referring to figs. 1 to Fig. 4, here in order to avoid repetition will be repeated no longer.

It should be noted that forecasting system according to an exemplary embodiment of the present invention or training system can be completely dependent on computer program Operation realize corresponding function, i.e. each device is corresponding with each step to the function structure of computer program so that whole Individual system is called by special software kit (for example, lib storehouses), to realize corresponding forecast function.

On the other hand, each device shown in Fig. 1 or Fig. 3 can also pass through hardware, software, firmware, middleware, microcode Or it is combined to realize.When being realized with software, firmware, middleware or microcode, the program for performing corresponding operating Code or code segment can be stored in the computer-readable medium of such as storage medium so that processor can be by reading simultaneously Corresponding program code or code segment is run to perform corresponding operation.

Here, exemplary embodiment of the invention is also implemented as computing device, and the computing device includes memory unit And processor, the set of computer-executable instructions that is stored with memory unit conjunction, when the set of computer-executable instructions is closed by institute When stating computing device, perform using nested machine learning model come the method for perform prediction and/or the training nested machine The method of learning model.

Particularly, the computing device can be deployed in server or client, can also be deployed in distributed network On node apparatus in network environment.In addition, the computing device can be PC computers, board device, personal digital assistant, intelligence Can mobile phone, web applications or other be able to carry out the device of above-mentioned instruction set.

Here, the computing device is not necessarily single computing device, can also be it is any can be alone or in combination Perform the device of above-mentioned instruction (or instruction set) or the aggregate of circuit.Computing device can also be integrated control system or system A part for manager, or can be configured as with Local or Remote (for example, via be wirelessly transferred) with the portable of interface inter-link Formula electronic installation.

In the computing device, processor may include central processing unit (CPU), graphics processor (GPU), may be programmed and patrol Collect device, dedicated processor systems, microcontroller or microprocessor.Unrestricted as example, processor may also include simulation Processor, digital processing unit, microprocessor, polycaryon processor, processor array, network processing unit etc..

Some operations described in Forecasting Methodology and training method according to an exemplary embodiment of the present invention can be by soft Part mode realizes that some operations can be realized by hardware mode, in addition, can also be realized by way of software and hardware combining These operations.

Processor can run the instruction being stored in one of memory unit or code, wherein, the memory unit can be with Data storage.Instruction and data can be also sent and received by network via Network Interface Unit, wherein, the network connects Mouth device can use any of host-host protocol.

Memory unit can be integral to the processor and be integrated, for example, RAM or flash memory are arranged in into integrated circuit microprocessor etc. Within.In addition, memory unit may include independent device, such as, outside dish driving, storage array or any Database Systems can Other storage devices used.Memory unit and processor can be coupled operationally, or can for example by I/O ports, Network connection etc. is communicated so that processor can read the file being stored in memory unit.

In addition, the computing device may also include video display (such as, liquid crystal display) and user mutual interface is (all Such as, keyboard, mouse, touch input device etc.).The all component of computing device can be connected to each other via bus and/or network.

Operation involved by Forecasting Methodology and/or training method according to an exemplary embodiment of the present invention can be described as respectively Plant the functional block or function diagram of interconnection or coupling.However, these functional blocks or function diagram can be equably integrated into it is single Logic device or operated according to non-definite border.

Particularly, as described above, according to an exemplary embodiment of the present invention performed using nested machine learning model The computing device of prediction may include memory unit and processor, wherein, the nested machine learning model is included according to levels The set of computer-executable instructions that is stored with upper layer model and underlying model that nesting frame is trained, memory unit is closed, and works as institute When stating set of computer-executable instructions conjunction by the computing device, following step is performed：(A) prediction data record is obtained；(B) The attribute information recorded based on prediction data records multiple character subsets of corresponding forecast sample to generate with prediction data； (C) multiple character subsets of forecast sample are respectively supplied to upper layer model and lower floor's mould included by nested machine learning model Type, to obtain nested machine learning model predicting the outcome for forecast sample.

It should be noted that combined Fig. 1 describes the nested machine of utilization according to an exemplary embodiment of the present invention with Fig. 2 above Learning model carrys out each processing details of perform prediction, and processing details when computing device performs each step is will not be described in great detail here.

In addition, the computing device of the nested machine learning model of training according to an exemplary embodiment of the present invention may include storage Part and processor, wherein, the nested machine learning model includes the upper layer model trained according to levels nesting frame And underlying model, the set of computer-executable instructions that is stored with memory unit conjunction, when the set of computer-executable instructions closes quilt During the computing device, following step is performed：(a) training data record is obtained；(b) the attribute letter recorded based on training data Breath records multiple character subsets of corresponding training sample to generate with training data；And (c) is according to levels nesting frame To train upper layer model and the underlying model included by nested machine learning model, wherein, among upper layer model and underlying model Each be trained based on respective character subset.

It should be noted that combined Fig. 3 describes the nested machine of training according to an exemplary embodiment of the present invention with Fig. 4 above Each processing details of learning model, will not be described in great detail processing details when computing device performs each step here.

Be described above the present invention each exemplary embodiment, it should be appreciated that foregoing description be only it is exemplary, not Exhaustive, and present invention is also not necessarily limited to disclosed each exemplary embodiment.Without departing from scope and spirit of the present invention In the case of, many modifications and changes will be apparent from for those skilled in the art.Therefore, originally The protection domain of invention should be defined by the scope of claim.

Claims

1. it is a kind of using nested machine learning model come the method for perform prediction, wherein, the nested machine learning model includes The upper layer model and underlying model trained according to levels nesting frame, methods described includes：

(A) prediction data record is obtained；

(B) multiple features that corresponding forecast sample is recorded with prediction data are generated based on the attribute information of prediction data record Subset；

(C) the upper layer model that multiple character subsets of forecast sample are respectively supplied to included by nested machine learning model is with Layer model, to obtain nested machine learning model predicting the outcome for forecast sample.

2. the method for claim 1, wherein the upper layer model includes decision tree submodel, also, it is described under Layer model includes multiple linear submodels,

Wherein, each linear submodel corresponds to a leaf node of the decision tree submodel.

3. method as claimed in claim 1 or 2, wherein, in step (B), the attribute information recorded based on prediction data come The feature of forecast sample is generated, and the upper of forecast sample is generated according to the value continuity and/or valued space scale of feature Lower floor's character subset of layer character subset and forecast sample.

4. method as claimed in claim 3, wherein, upper strata character subset covers whole features that value is successive value, also, Lower floor's character subset covers whole features that value is discrete value；

Or, it is discrete value that upper strata character subset, which covers whole features that value is successive value together with least a portion value, Feature, also, lower floor's character subset covers the feature that remaining value is discrete value.

5. method as claimed in claim 1 or 2, wherein, in step (B), the attribute information recorded based on prediction data come The feature of forecast sample is generated, and according to the Deletional upper strata character subset and forecast sample to generate forecast sample of feature Lower floor's character subset, wherein, whether Deletional instruction this feature of feature is remembered based on prediction data record relative to training data The missing attribute information of record and generate.

6. method as claimed in claim 5, wherein, upper strata character subset covers all non-missing features, also, lower floor's feature Subset covers all missing features and all non-missing feature.

7. it is a kind of using nested machine learning model come the system of perform prediction, wherein, the nested machine learning model includes The upper layer model and underlying model trained according to levels nesting frame, the system includes：

Prediction data records acquisition device, for obtaining prediction data record；

Predicted characteristics subset generation device, is generated and prediction data record pair for the attribute information recorded based on prediction data The multiple character subsets for the forecast sample answered；

Prediction meanss, for multiple character subsets of forecast sample to be respectively supplied to included by nested machine learning model Layer model and underlying model, to obtain nested machine learning model predicting the outcome for forecast sample.

8. system as claimed in claim 7, wherein, the upper layer model includes a decision tree submodel, also, it is described under Layer model includes multiple linear submodels,

9. a kind of method for training nested machine learning model, wherein, the nested machine learning model is included according to levels Upper layer model and underlying model that nesting frame is trained, methods described include：

(a) training data record is obtained；

(b) multiple features that corresponding training sample is recorded with training data are generated based on the attribute information of training data record Subset；And

(c) upper layer model and the underlying model included by nested machine learning model are trained according to levels nesting frame, its In, each among upper layer model and underlying model is trained based on respective character subset.

10. a kind of system for training nested machine learning model, wherein, the nested machine learning model is included according to levels Upper layer model and underlying model that nesting frame is trained, the system include：

Training data records acquisition device, for obtaining training data record；

Training characteristics subset generation device, is generated and training data record pair for the attribute information recorded based on training data The multiple character subsets for the training sample answered；And

Trainer, for training the upper layer model included by nested machine learning model according to levels nesting frame with Layer model, wherein, each among upper layer model and underlying model is trained based on respective character subset.