CN103886203B - Automatic modeling system and method based on index prediction - Google Patents

Automatic modeling system and method based on index prediction Download PDF

Info

Publication number
CN103886203B
CN103886203B CN201410109141.7A CN201410109141A CN103886203B CN 103886203 B CN103886203 B CN 103886203B CN 201410109141 A CN201410109141 A CN 201410109141A CN 103886203 B CN103886203 B CN 103886203B
Authority
CN
China
Prior art keywords
module
algorithm
integration
data
configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410109141.7A
Other languages
Chinese (zh)
Other versions
CN103886203A (en
Inventor
李攀登
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bull Information Systems (beijing) Co Ltd
Original Assignee
Bull Information Systems (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bull Information Systems (beijing) Co Ltd filed Critical Bull Information Systems (beijing) Co Ltd
Priority to CN201410109141.7A priority Critical patent/CN103886203B/en
Publication of CN103886203A publication Critical patent/CN103886203A/en
Application granted granted Critical
Publication of CN103886203B publication Critical patent/CN103886203B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an automatic modeling system based on index prediction. The automatic modeling system comprises a data loading and storing module, a core algorithm module, a model evaluation/integration module, an enterprise application module and a configuration module, wherein the data loading and storing module is used for loading data and storing result data generated after follow-up processes are completed, the core algorithm module is provided with an algorithm library, enables scripts of algorithm groups in the algorithm library to run and obtains the optimal parameter in each algorithm group, the model evaluation/integration module obtains the optimal or integration algorithm according to the optimal parameters obtained from the core algorithm module, the enterprise application module enables the optimal or integration algorithm obtained by the model evaluation/integration module to run, standardizes the running result and then outputs result data, and the configuration module is used for controlling and driving the data loading and storing module, the core algorithm module, the model evaluation/integration module and the enterprise application module to run. The invention further provides an automatic modeling and deployment processing method based on the index prediction.

Description

Index prediction-based automatic modeling system and method thereof
The technical field is as follows:
the invention relates to the application field of data mining technology, in particular to an automatic modeling system and method based on index prediction.
Background art:
with the continuous expansion of the business quantity of each business, the saturation of the market, the warming of competition and the continuous expansion of the market coverage, the change of the market needs to be comprehensively known; with the aggravation of market competition, the market change speed is accelerated, and the market is required to have prospective analysis capability; with the complexity of the influence factors of the business, the quantitative judgment of the abnormal market development becomes a difficult problem. The market indexes are required to be comprehensively analyzed, prospectively predicted, early warning of abnormal development and the like by people, a data mining technology is provided for implementation in a theoretical method at present, the most common method is a time sequence analysis method, and a database technology provides a foundation for production and operation of the method.
Index predictive modeling and application methods of various industries are accumulated to a certain extent at present, a plurality of enterprises have a plurality of modelers engaged in some things of predictive modeling and model online, some predictive model development and deployment experiences are also accumulated in the current industry, and the main modes of predictive model development and deployment adopted at present are as follows: exporting data from a database to the local, training a model according to business requirements by a modeler by using a third-party modeling tool, continuously and manually debugging the model to obtain model parameters or rules, converting the parameters or rules into sql language, solidifying the sql language to a production environment, and manually analyzing the reasons of market abnormity or the influence on the market after new charges are released according to the result output by the model.
The development and deployment mode of the prediction model can well meet the service requirements of common time series classes. However, the above model development and deployment methods and the post-business processing methods have great disadvantages, for example, when the indexes are too many, for example, hundreds or thousands of indexes, a modeler needs to manually complete the processes of data extraction, model training, script development and curing, online testing, and the like, which consumes a great amount of labor cost, and there is a disadvantage: when the current data distribution does not meet the parameter distribution trained before or the business requirement is changed, if the original requirement is a first prediction period and then the requirement is changed into a plurality of prediction periods, a modeler needs to re-develop the original modeling process of each index respectively, and the method causes huge waste on human resources; in addition, the business problem fed back by prediction needs to be judged manually, and excessive subjective judgment is bound to be involved.
The invention content is as follows:
in order to solve the above technical problem, the present invention provides an automatic modeling system based on index prediction, including: the data loading and storing module is used for loading data and storing result data generated after the subsequent process is finished; the core algorithm module is provided with an algorithm library, and the core algorithm module runs scripts of all algorithm families in the algorithm library to acquire optimal parameters in all the algorithm families; the model evaluation/integration module is used for acquiring an optimal algorithm or an integration algorithm according to the optimal parameters acquired in the core algorithm module; the enterprise application module runs the optimal algorithm or the integrated algorithm obtained by the model evaluation/integration module, standardizes the running result and outputs the result data; and the configuration module is used for controlling and driving the data loading and storing module, the core algorithm module, the model evaluation/integration module and the enterprise application module to run.
Preferably, the data loading and storing module performs first preprocessing on the loaded data; and the core algorithm module performs second preprocessing, sample preparation, model training and testing on the data subjected to the first preprocessing, and outputs model training parameters, residual errors, prediction results and configuration files.
Preferably, the first preprocessing comprises serialization processing and multi-index merging; the core algorithm module stores the acquired optimal parameters in each algorithm family in the configuration file.
Preferably, the model evaluation/integration module evaluates the optimal parameters obtained from the core algorithm module, obtains the optimal algorithm according to the evaluation result, or integrates the corresponding algorithms according to the evaluation result to obtain the integration algorithm.
Preferably, the configuration module comprises a data loading configuration unit, a model evaluation configuration unit, an enterprise application configuration unit and a main function configuration unit; the main function configuration unit can drive the data loading configuration unit, the model evaluation configuration unit, the enterprise application configuration unit and the core algorithm module so as to drive the whole process, wherein the data loading configuration unit is used for driving the data loading and storing module, the model evaluation configuration unit is used for driving the model evaluation/integration module, and the enterprise application configuration unit is used for driving the enterprise application module.
Preferably, the extensible resource module has an extensible resource library, and the extensible resource module runs scripts of different algorithm families in the extensible resource library to obtain the optimal parameters in the different algorithm families.
Preferably, when the configuration module cannot search the configuration of the core algorithm module, the operation of the extensible resource module is driven.
Preferably, the model evaluation/integration module evaluates the optimal parameters obtained from the scalable resource module, and obtains the optimal algorithm according to the evaluation result, or integrates the corresponding algorithms according to the evaluation result to obtain the integration algorithm.
Preferably, the configuration module has an enterprise application configuration unit and a master function configuration unit;
and when the main function configuration unit cannot search the configuration of the core algorithm module, driving the enterprise application configuration unit, and driving the operation of the extensible resource module by the enterprise application configuration unit.
Preferably, the display module is configured to display the result data.
Preferably, when the configuration module searches the configuration of the presentation module, the presentation module is driven to present the result data.
Preferably, the configuration module has an enterprise application configuration unit and a master function configuration unit;
and when the main function configuration unit searches the configuration of the display module, driving the enterprise application configuration unit, and driving the display module to run by the enterprise application configuration unit.
Preferably, the model evaluation/integration module first evaluates whether the optimal algorithm meets requirements, if so, the enterprise application module operates the optimal algorithm and standardizes operation results and then outputs the result data, if not, the model evaluation/integration module integrates corresponding algorithms according to evaluation results to obtain the integration algorithm, and then the enterprise application module operates the integration algorithm and standardizes operation results and then outputs the result data.
On the other hand, the invention also provides an automatic modeling method based on index prediction, which comprises the following steps: a data loading step, namely loading data required by a subsequent process; running a core algorithm, namely running scripts of each algorithm family in an algorithm library to obtain optimal parameters in each algorithm family; a model evaluation/integration step, wherein an optimal algorithm or an integration algorithm is obtained according to the optimal parameters obtained in the core algorithm operation step; an enterprise application step of operating the optimal algorithm or the integrated algorithm obtained in the model evaluation/integration step, standardizing an operation result and outputting result data; and a control step of controlling and driving the data loading step, the core algorithm running step, the model evaluation/integration step and the enterprise application step.
Preferably, in the data loading step, the loaded data is subjected to a first preprocessing; and in the core algorithm operation step, performing second preprocessing, sample preparation, model training and testing on the data subjected to the first preprocessing, and outputting model training parameters, residual errors, prediction results and configuration files.
Preferably, the first preprocessing comprises serialization processing and multi-index merging; in the core algorithm running step, storing the acquired optimal parameters in each algorithm family in the configuration file.
Preferably, in the model evaluation/integration step, the optimum parameters obtained in the core algorithm operation step are evaluated, and the optimum algorithm is obtained according to an evaluation result, or the corresponding algorithms are integrated according to the evaluation result to obtain the integrated algorithm.
Preferably, a storing step of storing the result data obtained in the enterprise application step.
Preferably, the step of expanding the resources, which runs scripts of different algorithm families in the expandable resource library, obtains the optimal parameters in the different algorithm families.
Preferably, the operation of the step of expanding resources is driven when the configuration of the step of operating the core algorithm is not searched in the step of controlling.
Preferably, in the model evaluation/integration step, the optimal parameters obtained in the resource expansion step are evaluated, and the optimal algorithm is obtained according to an evaluation result, or the corresponding algorithms are integrated according to the evaluation result to obtain the integration algorithm.
Preferably, a presentation step of presenting the result data.
Preferably, the presenting step is driven to present the result data when the configuration of the presenting step is searched in the controlling step.
Preferably, in the model evaluation/integration step, it is first evaluated whether the optimal algorithm meets requirements, if yes, the optimal algorithm is operated and the operation result is normalized in the enterprise application step and then the result data is output, if not, the corresponding algorithm is integrated according to the evaluation result in the model evaluation/integration step to obtain the integration algorithm, and then the integration algorithm is operated and the operation result is normalized in the enterprise application step and then the result data is output.
The invention starts from the actual enterprise application, carries out technical innovation on the supported technology, packages and standardizes the traditional prediction technology from the aspects of automation and intellectualization, opens up a wide channel from manual modeling and deployment to automatic modeling and deployment, is particularly suitable for the application of multi-model construction, such as the development and real-time application of hundreds of index prediction models, helps enterprises to establish an accurate, timely and comprehensive prediction and monitoring platform, and provides a timely and reliable means for comprehensively managing the applications of enterprise strategic management, business management and control, data quality management and the like. The implementation of the invention can obtain the following beneficial effects:
1. the whole process of data loading, model development, model selection, model deployment and the like is encapsulated into configurable automation, so that the workload of manual intervention is greatly reduced, and the efficiency is greatly improved. In addition, the self-learning of the model is carried out according to the set period, the defect that the data distribution does not meet fixed training parameters in the prior art is overcome, and the sharable extensible model library is constructed, so that the algorithm is more flexible than the traditional mode in terms of selection, and the technical barrier caused by the lack of knowledge reserve of modeling personnel is reduced.
2. The method has low requirement on system configuration, is complete in method, has the characteristics of high expandability, automation, high self-learning degree and the like, and is the most different from the traditional mode in the aspect of model selection.
3. The business application library encapsulation is applied to the fields of enterprise strategic management, business control, data quality management and the like, so that the labor is greatly saved in the application process. Meanwhile, in the aspect of data quality monitoring application, the correct recognition rate is greatly improved; in the aspect of service management and control application, the capability of solving the service problem of service identification and positioning is greatly improved, and the system has a tariff simulation preview function and can reduce loss for application enterprises.
Description of the drawings:
FIG. 1 is a block diagram of an automated modeling system according to an embodiment of the present invention;
FIG. 2 is a flow diagram of automated modeling according to an embodiment of the present invention;
FIG. 3 is a flow chart of the operation of the model evaluation/integration module according to an embodiment of the present invention.
The specific implementation mode is as follows:
the automatic modeling system based on index prediction, which is provided by the embodiment of the invention, is based on R language, can realize the functions of process encapsulation, such as construction, management, sharing, data loading, storage, centralized decomposition and development of a model library, optimal algorithm search, decomposition algorithm reintegration, knowledge migration, automatic deployment of model results, model enterprise application, application display and the like, encapsulate the functions into different executable modules, and complete the automation of the whole modeling and application process and the connection of extension modules through the setting of configuration modules.
The present invention relates to an index prediction based automatic modeling system, and more particularly, to a method and system for automatically triggering a model automatic development and deployment engine through business-oriented, wherein the model engine includes a plurality of intelligent processing modules with configuration, data loading, self-learning, migration learning, etc. to implement business requirements, and a search setting, model judgment and evaluation mechanism configured in the present embodiment judges the flow operation of current model development and deployment, and triggers a corresponding script. The following description will be made in detail with reference to the accompanying drawings.
Fig. 1 is a block diagram showing a configuration of an index prediction-based automatic modeling system according to an embodiment of the present invention. As shown in fig. 1, the automatic modeling system based on index prediction includes a configuration module 1 and a module packaging part 2, where the configuration module 1 includes a data loading configuration unit 11, a master function configuration unit 12, a model evaluation configuration unit 13 and an enterprise application configuration unit 14, and the module packaging part 2 includes a core and base module 21, a presentation module 22, an extensible resource module 23 and an enterprise application module 24 of modeling package. The core and base module 21 of the above-described R-add modeling package has a data load and store module 211, a core algorithm module 212, and a model evaluation/integration module 213.
The configuration module 1 is a driving module of other 6 large modules, such as the data loading and storing module 211, the core algorithm module 212, the model evaluation/integration module 213, the presentation module 22, the extensible resource module 23, and the enterprise application module 24, and is responsible for configuration of parameters of the large modules, encapsulation of processes, automatic operation driving, and the like, and functions as a control center in the automatic modeling system. The control and driving relationships of the data loading and configuring unit 11, the master function configuring unit 12, the model evaluation and configuring unit 13, and the enterprise application configuring unit 14 in the configuration module 1 to other 6 big modules, such as the data loading and storing module 211, the core algorithm module 212, the model evaluation/integration module 213, the presentation module 22, the extensible resource module 23, and the enterprise application module 24, are as follows:
1) the data loading and storing module 211 encapsulates data reading and reading modes (batch or single) of different data sources and data formats into parameterized UDF (user defined function), and then reads the actual values of the parameters in the data loading and configuring unit 11 by the corresponding script in the main function configuring unit 12 through inputting the actual values of the parameters in the data loading and configuring unit 11 in the configuring module 1 to drive and control the loading of data;
2) the model evaluation/integration module 213 packages the core algorithm into a parameterized UDF, and then inputs corresponding parameters or texts into the model evaluation configuration unit 13 of the configuration module 1, so that the main function configuration unit 12 uniformly reads the configuration files of the model evaluation configuration unit 13 and transmits the configuration files to the model evaluated UDF, thereby playing a role in driving and controlling;
3) the remaining other modules develop the core algorithm in the corresponding modules, except that the configuration file is generated in the enterprise application configuration unit 14, and the control mode is still scheduled and driven by the main function configuration unit 12.
The corresponding configuration files in the configuration module 1 are placed in a configuration library (not shown) at the bottom of the configuration module 1.
Specifically, the configuration module 1 is used for configuring and driving different modules, and stores driving information such as initialization parameters, service selection, data selection, and the like of each large module in a configuration file, and different parameters and other driving information can be configured according to different applications. The configuration module 1 configures and stores various parameters required by the operation of the automatic modeling system, and is responsible for driving the operation of various modules of the automatic modeling system.
The data loading configuration unit 11 of the configuration module 1 is configured to configure data types, data sources, data intervals, and schedules. As shown in table 1, different data types need to be encapsulated in the data loading and storing module 211 in a corresponding reading manner, and configured and driven in the data loading configuration unit 11 of the configuration module 1 according to the service application.
Table 1:
for example, historical data of the existing KPI indicators is stored in an enterprise database, in this embodiment, the historical data of the indicators obtained from the enterprise database in the robbc mode may be packaged by the data loading and storing module 211, and normalized into a data structure required by a model, a parameterized data source selection mode, a data table name, a data capturing period, and the like; meanwhile, the data loading configuration unit 11 of the configuration module 1 may transmit actual parameters to corresponding positions of functions encapsulated in the data loading and storing module 211 according to actual service requirements, and then perform unified connection and driving through the main function configuration unit 12.
The main function configuration unit 12 of the configuration module 1 is used for driving the whole process. The main function configuration unit 12 encapsulates the data loading and storing module 211, the core algorithm module 212, the model evaluating/integrating module 213, the showing module 22, the extensible resource module 23, the enterprise application module 24, and other modules in the module encapsulating part 2 to form a parameterized total UDF, where different types of algorithm types, such as classification, time series, and other types, may form different main function UDFs, and drives the data loading configuration unit 11, the model evaluating configuration unit 13, and the enterprise application configuration unit 14 in the configuration module 1, and thus drives the large modules in the module encapsulating part 2, when the main function configuration unit 12 transfers actual parameters according to actual service definitions.
The model evaluation configuration unit 13 in the configuration module 1 is configured to drive the model evaluation/integration module 213 in the module packaging part 2, and for example, may transmit the actual parameters bagging, vote, and boosting to the model evaluation/integration module 213 in a personalized manner to drive a model evaluation integration procedure.
The enterprise application configuring unit 14 in the configuring module 1 is configured to drive the enterprise application module 24 and trigger the main function configuring unit 12. The enterprise application configuration unit 14 is a trigger portal of the entire automatic modeling system, and defines actual parameters of the automatic modeling system and inputs and outputs the actual parameters.
The model encapsulation part 2 is a center of core modules such as model development and enterprise application, and is used for model processing and processing of enterprise application requirements, and is driven individually according to current requirements under the control of the configuration module 1, that is, different business applications selectively drive corresponding modules, and the standardized operation relationship and logic of the module are as follows:
1) first, data is loaded to the core algorithm module 212;
2) the core algorithm module 212 performs operations such as preprocessing, sample preparation, model training, testing and the like on the loaded data, and outputs model training parameters, residual errors, prediction results, configuration files and the like;
3) filtering, processing and integrating the processing results of the core algorithm module 212 by the model evaluation/integration module 213;
4) an interface for providing the results output by the model evaluation/integration module 213 to the enterprise application module 24;
5) the presentation module 22 and the extensible resource module 23 are selectively driven by the enterprise application configuration unit 14 in the configuration module 1.
The core and base module 21 of the modeling package is used for inputting, processing and outputting the core model of the automatic modeling system, and the core and base module 21 of the modeling package is respectively packaged as the data loading and storing module 211, the core algorithm module 212 and the model evaluation/integration module 213, so as to generate three parameterized and corresponding UDFs, and the flow association and call of the three parameterized UDFs are completed in the main function configuration unit 12 in the configuration module 1.
The data loading and storing module 211 is configured to load data and store result data generated after the subsequent process is completed. The core algorithm operation part and the application of the automatic modeling system are based on data. The specific treatment process is as follows:
the data loading and storing module 211 encapsulates the reading parameterization scripts of different types of data sources, configures corresponding loading settings in the configuration module 1, performs preprocessing on source data after the data is loaded, such as serialization processing, multi-index merging and the like, sets conditions and interface types required by current requirements in the configuration module 1, performs conversion, performs processing on the preprocessed data by other modules, and finally feeds back the processed data to the data loading and storing module 211 for storage.
The data loading configuration unit 11 in the configuration module 1 can realize batch automatic loading.
The core algorithm module 212 is preferably an SRC core algorithm module, and is a core operation center of the entire system, which is a module for generating data required by enterprise applications based on an index prediction algorithm family and an intelligent processing flow packaged by R-add. The specific treatment process is as follows:
1) carrying out intelligent identification on mutation structures of the preprocessed data of the data loading and storing module 211, assuming that the number of samples is n and the mutation points are k, firstly calculating the posterior probability of each traversal mutation point under the condition of known historical samples and sample distribution parameters:
equation 1
Equation 2
Equation 3
Wherein,(equation 4), approximate finding the parameters in the posterior probability using the Gibbs sampler (Gibbs sample Generator) methodAnd a mutation point k;
2) selecting the n-k sequence in the last step as a total sample, and automatically generating a CV sample as an input of model training by adopting the following algorithm: taking f and Q as positive integers respectively, wherein n > f x Q, firstly, Q sub-samples with the length of n-Q x f (Q =1, …, Q) are taken as the input of model training, the remaining Q x f is taken as a verification sample, and the steps are analogized in turn to generate Q CV samples which are taken as the input of the training samples of the model;
3) the CV samples are used as input, each algorithm in the algorithm family is designed and packaged as Base leaner, the selectable algorithm family is exponential smoothing family, ARIMA family, coordination family, ARCH family, state-space family (state-space), nonparametric family and the like, and the optimal parameters in the algorithm family are output according to indexes such as AIC & SBC, LR, RMSE, MAE and the like for each algorithm family of each sample and are stored in a configuration file so as to facilitate model evaluation, integration and forecast deployment.
The model evaluation/integration module 213 is a module for evaluating and integrating the results generated by the core algorithm module 212 and the scalable resource module 23, and implements an algorithm:
1) calculating the average prediction error generated by Q samples of Base leaner in the core algorithm module:
(equation 5);
2) calculate the average prediction error for the full sample:(equation 6);
3) according toSelecting the optimal Base Learner at the minimum;
4) model integration: the model integration procedure is driven according to the actual configuration in the model evaluation configuration unit 13 in the configuration module 1, and the model evaluation/integration module 213 packages the corresponding integration algorithm, such as bagging, vote, boosting, etc., based on the Base Learner.
The exhibition module 22 and the extensible resource module 23 are respectively exhibited and extended under the driving of the configuration module 1. The display module 22 is used for visualizing a model training result and a prediction result, and is driven by the enterprise application configuration unit 14 of the configuration module 1, inputs model results, such as model estimation and prediction results, generated from the core algorithm module 212, packages visualization scripts according to different types of model result standards, and sets a generation sequence in the master function configuration unit 12.
The extensible resource module 23 is mainly used for a non-main algorithm type tool program interface, for example, the core algorithm of the text classification model is a classification algorithm, but the processing flow of the extensible resource module needs to perform word segmentation, vectorization and other processing on data, so that word segmentation main programs and vectorization main programs can be developed in the extensible module, and then the main function configuration unit 12 of the configuration module 1 performs flow connection, and finally generates intelligent processing and application.
The enterprise application module 24 is used for processing and applying the output result of the core algorithm module 212. For example, where enterprise application module 24 encapsulates an enterprise application module that is dominated by enterprise needs, enterprise application module 24 may translate the knowledge of several other modules into an intelligent precipitate for solving enterprise problems. The enterprise application module 24 is specifically processed as follows:
in the enterprise application module 24, different data requirement interfaces and processing modes are defined according to different applications. For example, for the data quality management application, a malfunction or abnormal alarm of a certain index in a certain future period needs to be output, and the judgment condition of the alarm and the future predicted value need to be processed and output by the core algorithm module 212 and the scalable resource module 23 encapsulated by the R-add. Before that, it is also necessary to first convert the early warning requirements into statistical and mining algorithms according to the above-mentioned early warning requirements in the enterprise application configuration unit 14, and is placed in a core algorithm library or an expandable resource library packaged by an R-add, and configures a statistical threshold and an input/output standard in the main function configuration unit 12 of the configuration module 1, then, when driving the data quality management program in the enterprise application configuration unit 14, the automatic modeling system automatically runs the configured main function configuration unit 12 in the configuration module 1, then drives the operation of the core algorithm module 212 and the scalable resource module 23, automatically outputs data results to be fed back to the data loading and storing module 211, then, the main function configuration unit 12 drives the display module 22 and the enterprise application module 24 to display the corresponding result at the front end and feed back the data with quality problem to the corresponding service personnel.
Fig. 2 is a flowchart of an automatic modeling process according to an embodiment of the present invention. Next, an automatic modeling process according to the present embodiment will be described with reference to fig. 2.
First, the automatic modeling system is started, and information such as a requirement, a data source, and a business problem type is generated based on the actual parameters received by the enterprise application configuration unit 14, and the master function configuration unit 12 is triggered (step S1). After the main function configuration unit 12 operates, the data loading configuration unit 14 is driven, and the data loading and storing module 211 is driven by the data loading configuration unit 14 to load or store related data, where the data includes local data resources, database data resources and modules, and results fed back by the enterprise application module 24 (step S2). Specifically, after the data loading and storing module 211 is driven, the data sources related to the configuration module 1 are automatically loaded, and multiple sets of data can be simultaneously loaded for different types of model development, and further processing and application are performed. The format and source of the loaded data can be seen in table 1 above.
After the data loading and storing module 211 loads the relevant data, the main function configuring unit 12 searches the configuration text in the configuration module 1 to determine whether the configuration of the core algorithm module 212 exists (step S3). If there is a configuration of the core algorithm module 212 (step S3: yes), the master function configuration unit 12 drives the core algorithm module 212, and the core algorithm module 212 runs each algorithm group script in the algorithm library (preferably, the SRC library) in the core algorithm module 212 according to the relevant configuration in the configuration module 1, outputs index data such as a prediction index, a statistical index, and a weight index of each algorithm group, and outputs an optimal parameter in each algorithm group based on the index data of each algorithm group (step S4).
If there is no configuration of the core algorithm module 212 (no in step S3), which indicates that the current application is a newly added application, the master function configuration unit 12 in the configuration module 1 automatically identifies the relevant configuration of the scalable resource module 23, drives the enterprise application configuration unit 14, drives the scalable resource module 23 by the enterprise application configuration unit 14, and the scalable resource module 23 operates to obtain different algorithm families from the scalable resource library according to the relevant configuration identified by the master function configuration unit 12, operates scripts in the different algorithm families, outputs index data such as prediction indexes, statistical indexes, and weight indexes of the different algorithm families, and outputs the optimal parameters in the different algorithm families based on the index data of the different algorithm families (step S5).
After the core algorithm module 212 or the scalable resource module 23 outputs the index data, the master function configuration unit 12 drives the model evaluation configuration unit 13, and the model evaluation/integration module 213 is driven by the model evaluation configuration unit 13 (step S6). In step S6, the model evaluation/integration module 213 searches for a corresponding evaluation function according to different algorithms or applications in the algorithm library of the core algorithm module 212 or the scalable resource library of the scalable resource module 23, and outputs an optimal algorithm or an integrated algorithm. For example, the following table 2 illustrates the operation results of the model evaluation/integration module 213 according to different algorithms or applications in the algorithm library of the core algorithm module 212, and the following table 3 illustrates the operation results of the model evaluation/integration module 213 according to different algorithms or applications in the extensible resource library of the extensible resource module 23.
Table 2:
table 3:
user ID R-add identifies transaction problems Probability of transaction Identifying a type
3767193 Off-grid sudden increase 0.85 A priori identification
4571653 Product unsubscribing beyond normal range 0.92 Post incident pre-warning
After the model evaluation/integration module 213 outputs the optimal algorithm or the integrated integration algorithm, the master function configuration unit 12 searches for the optimal algorithm or the integrated integration algorithm (step S7). If the optimal algorithm or the integrated integration algorithm exists (step S7: yes), the main function configuration unit 12 drives the enterprise application configuration unit 14, and the enterprise application module 24 is driven by the enterprise application configuration unit 14, and the enterprise application module 24 integrates the optimal algorithm or the integration algorithm obtained by the model evaluation/integration module 213 according to different algorithms or applications in the algorithm library of the core algorithm module 212 or the scalable resource library of the scalable resource module 23, and outputs the relevant application feedback (result data) after normalization (step S8). For example, the respective feedback results are output according to application functions such as data quality management quality early warning, performance evaluation in a strategy module, advance transaction identification in business management and the like. If the optimal algorithm or the integrated integration algorithm does not exist (no in step S7), a wait state is established to wait for the optimal algorithm or the integrated integration algorithm to be output from the model evaluation/integration module 213.
After the enterprise application module 24 is running, the main function configuration unit 12 automatically searches the configuration library of the configuration module 1 to determine whether there is a configuration of the presentation module 22 (step S9). If the configuration of the display module 22 exists, that is, the display is required (step S9: yes), the main function configuration unit 12 drives the enterprise application configuration unit 14, the enterprise application configuration unit 14 drives the display module 22, and the display module 22 outputs the result data in the form of a display result, a short message prompt and related data, and feeds the result data back to the data loading and storing module 211 for storage (step S10). For example, if a future trend needs to be predicted, the presentation module 22 generates and automatically saves a predicted trend graph into a presentation library, and outputs the generated predicted trend graph. If the configuration of the display module 22 does not exist, that is, the application does not need to be displayed (for example, transaction warning, only warning information needs to be sent to relevant service personnel) (step S9: no), the main function configuration unit 12 feeds the result data back to the data loading and storing module 211 for storage, and the result data is ready for subsequent processing.
Fig. 3 is a flowchart illustrating the operation of the model evaluation/integration module according to the embodiment of the present invention. Next, a process of the model estimation/integration module according to the present embodiment will be described with reference to the drawings.
First, the master function configuration unit 12 drives the model evaluation configuration unit 13 to read the relevant configuration of the model evaluation/integration module 213 from the configuration module 1, select the corresponding model evaluation criterion (step S61), process the data source result according to the optimal model of each model family output by the core algorithm module 3 (step S62), select the optimal model of the whole model library according to the model evaluation criterion (step S63), determine whether the optimal model meets the requirements (step S64), and if so (step S64: yes), output the optimal model (step S65); if not (step S64: No), integrating a plurality of models closest to the requirement, wherein the integration method is the method configured in the configuration module 1, such as bagging, vote, boosting, etc., and outputting the integrated models meeting the requirement (step S66).
It should be understood that the detailed description and examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. Various equivalent modifications of the invention, which fall within the scope of the appended claims of this application, will occur to persons of ordinary skill in the art upon reading this disclosure.

Claims (24)

1. An automated modeling system based on index prediction, comprising:
the data loading and storing module is used for loading data required by the subsequent flow;
the core algorithm module is provided with an algorithm library, and the core algorithm module runs scripts of all algorithm families in the algorithm library to acquire optimal parameters in all the algorithm families;
the model evaluation/integration module is used for acquiring an optimal algorithm or an integration algorithm according to the optimal parameters acquired in the core algorithm module;
the enterprise application module runs the optimal algorithm or the integrated algorithm obtained by the model evaluation/integration module, standardizes the running result and outputs result data;
and the configuration module is used for controlling and driving the data loading and storing module, the core algorithm module, the model evaluation/integration module and the enterprise application module to run.
2. The automated modeling system of claim 1, wherein:
the data loading and storing module carries out first preprocessing on the loaded data;
the core algorithm module performs second preprocessing, sample preparation, model training and testing on the data subjected to the first preprocessing, and outputs model training parameters, residual errors, prediction results and configuration files;
the data loading and storage module is further capable of storing the result data generated by the enterprise application module.
3. The automated modeling system of claim 2, wherein:
the first preprocessing comprises serialization processing and multi-index combination;
the core algorithm module stores the acquired optimal parameters in each algorithm family in the configuration file.
4. The automated modeling system of claim 3, wherein:
the model evaluation/integration module evaluates the optimal parameters acquired in the core algorithm module, acquires the optimal algorithm according to the evaluation result, or integrates the corresponding algorithm according to the evaluation result to acquire the integration algorithm.
5. The automated modeling system of claim 1, wherein:
the configuration module comprises a data loading configuration unit, a model evaluation configuration unit, an enterprise application configuration unit and a main function configuration unit;
the main function configuration unit can drive the data loading configuration unit, the model evaluation configuration unit, the enterprise application configuration unit and the core algorithm module so as to drive the whole process, wherein the data loading configuration unit is used for driving the data loading and storing module, the model evaluation configuration unit is used for driving the model evaluation/integration module, and the enterprise application configuration unit is used for driving the enterprise application module.
6. The automated modeling system of claim 1, further comprising:
and the extensible resource module is provided with an extensible resource library, and the extensible resource module runs scripts of different algorithm families in the extensible resource library to acquire optimal parameters in the different algorithm families.
7. The automated modeling system of claim 6, wherein:
and when the configuration module cannot search the configuration of the core algorithm module, driving the operation of the extensible resource module.
8. The automated modeling system of claim 7, wherein:
the model evaluation/integration module evaluates the optimal parameters acquired from the extensible resource module, acquires the optimal algorithm according to the evaluation result, or integrates the corresponding algorithms according to the evaluation result to acquire the integration algorithm.
9. The automated modeling system of claim 8, wherein:
the configuration module is provided with an enterprise application configuration unit and a main function configuration unit;
and when the main function configuration unit cannot search the configuration of the core algorithm module, driving the enterprise application configuration unit, and driving the operation of the extensible resource module by the enterprise application configuration unit.
10. The automated modeling system of claim 1, further comprising:
and the display module is used for displaying the result data.
11. The automated modeling system of claim 10, wherein:
and when the configuration module searches the configuration of the display module, driving the display module to display the result data.
12. The automated modeling system of claim 11, wherein:
the configuration module is provided with an enterprise application configuration unit and a main function configuration unit;
and when the main function configuration unit searches the configuration of the display module, driving the enterprise application configuration unit, and driving the display module to run by the enterprise application configuration unit.
13. The automated modeling system of any of claims 1-12, wherein:
the model evaluation/integration module firstly evaluates whether the optimal algorithm meets requirements, if so, the enterprise application module operates the optimal algorithm and standardizes operation results and then outputs result data, if not, the model evaluation/integration module integrates corresponding algorithms according to evaluation results to obtain the integration algorithm, and then the enterprise application module operates the integration algorithm and standardizes operation results and then outputs the result data.
14. An automatic modeling method based on index prediction, comprising:
a data loading step, namely loading data required by a subsequent process;
running a core algorithm, namely running scripts of each algorithm family in an algorithm library to obtain optimal parameters in each algorithm family;
a model evaluation/integration step, wherein an optimal algorithm or an integration algorithm is obtained according to the optimal parameters obtained in the core algorithm operation step;
an enterprise application step of operating the optimal algorithm or the integrated algorithm obtained in the model evaluation/integration step, standardizing an operation result and outputting result data;
and a control step of controlling and driving the data loading step, the core algorithm running step, the model evaluation/integration step and the enterprise application step.
15. The automated modeling method of claim 14, wherein:
in the data loading step, carrying out first preprocessing on the loaded data;
and in the core algorithm operation step, performing second preprocessing, sample preparation, model training and testing on the data subjected to the first preprocessing, and outputting model training parameters, residual errors, prediction results and configuration files.
16. The automated modeling method of claim 15, wherein:
the first preprocessing comprises serialization processing and multi-index combination;
in the core algorithm running step, storing the acquired optimal parameters in each algorithm family in the configuration file.
17. The automated modeling method of claim 16, wherein:
in the model evaluation/integration step, the optimum parameters obtained in the core algorithm operation step are evaluated, and the optimum algorithm is obtained according to the evaluation result, or the corresponding algorithms are integrated according to the evaluation result to obtain the integration algorithm.
18. The automated modeling method of claim 14, further comprising:
a storage step of storing the result data obtained in the enterprise application step.
19. The automated modeling method of claim 14, further comprising:
and a resource expanding step, namely running scripts of different algorithm families in an expandable resource library to obtain optimal parameters in the different algorithm families.
20. The automated modeling method of claim 19, wherein:
and driving the operation of the resource expanding step when the configuration of the core algorithm operation step is not searched in the control step.
21. The automated modeling method of claim 20, wherein:
in the model evaluation/integration step, the optimal parameters obtained in the resource expansion step are evaluated, and the optimal algorithm is obtained according to the evaluation result, or the corresponding algorithm is integrated according to the evaluation result to obtain the integration algorithm.
22. The automated modeling method of claim 14, further comprising:
and a display step of displaying the result data.
23. The automated modeling method of claim 22, wherein:
when the configuration of the presentation step is searched in the control step, the presentation step is driven to present the result data.
24. The automated modeling method of any of claims 14-23, wherein:
in the model evaluation/integration step, whether the optimal algorithm meets requirements is evaluated firstly, if yes, the optimal algorithm is operated and operation results are standardized in the enterprise application step, and then the result data is output, if not, corresponding algorithms are integrated according to the evaluation results in the model evaluation/integration step to obtain the integration algorithm, and then the integration algorithm is operated and operation results are standardized in the enterprise application step, and then the result data is output.
CN201410109141.7A 2014-03-24 2014-03-24 Automatic modeling system and method based on index prediction Expired - Fee Related CN103886203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410109141.7A CN103886203B (en) 2014-03-24 2014-03-24 Automatic modeling system and method based on index prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410109141.7A CN103886203B (en) 2014-03-24 2014-03-24 Automatic modeling system and method based on index prediction

Publications (2)

Publication Number Publication Date
CN103886203A CN103886203A (en) 2014-06-25
CN103886203B true CN103886203B (en) 2017-01-11

Family

ID=50955093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410109141.7A Expired - Fee Related CN103886203B (en) 2014-03-24 2014-03-24 Automatic modeling system and method based on index prediction

Country Status (1)

Country Link
CN (1) CN103886203B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239630B (en) * 2014-09-10 2017-03-15 中国运载火箭技术研究院 A kind of emulation dispatch system of supportive test design
CN104778254B (en) * 2015-04-20 2018-03-27 北京蓝色光标品牌管理顾问股份有限公司 A kind of distributed system and mask method of non-parametric topic automatic marking
CN107025509B (en) * 2016-02-01 2021-06-18 腾讯科技(深圳)有限公司 Decision making system and method based on business model
CN108229686B (en) * 2016-12-14 2022-07-05 阿里巴巴集团控股有限公司 Model training and predicting method and device, electronic equipment and machine learning platform
CN107169356B (en) * 2017-05-03 2020-08-18 上海上讯信息技术股份有限公司 Statistical analysis method and device
CN107766424B (en) * 2017-09-13 2020-09-15 深圳市宇数科技有限公司 Data exploration management method and system, electronic equipment and storage medium
CN107844634B (en) * 2017-09-30 2021-05-25 平安科技(深圳)有限公司 Modeling method of multivariate general model platform, electronic equipment and computer readable storage medium
CN107958268A (en) * 2017-11-22 2018-04-24 用友金融信息技术股份有限公司 The training method and device of a kind of data model
CN108133294B (en) * 2018-01-10 2020-12-04 阳光财产保险股份有限公司 Prediction method and device based on information sharing
CN108389631A (en) * 2018-02-07 2018-08-10 平安科技(深圳)有限公司 Varicella morbidity method for early warning, server and computer readable storage medium
CN110909066B (en) * 2019-12-06 2021-03-16 中科院计算技术研究所大数据研究院 Streaming data processing method based on SparkSQL and RestAPI
CN113590686B (en) * 2021-07-29 2023-11-10 深圳博沃智慧科技有限公司 Processing method, device and equipment for ecological environment data index

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763105A (en) * 2010-01-07 2010-06-30 冶金自动化研究设计院 Self-adaptation selectable constrained gas optimizing dispatching system and method for steel enterprises
CN202159334U (en) * 2011-03-14 2012-03-07 李盼池 Polymer flooding development index predicting system
CN103020448A (en) * 2012-12-11 2013-04-03 南京航空航天大学 Method and system for predicting instantaneous value of airport noise based on time series analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763105A (en) * 2010-01-07 2010-06-30 冶金自动化研究设计院 Self-adaptation selectable constrained gas optimizing dispatching system and method for steel enterprises
CN202159334U (en) * 2011-03-14 2012-03-07 李盼池 Polymer flooding development index predicting system
CN103020448A (en) * 2012-12-11 2013-04-03 南京航空航天大学 Method and system for predicting instantaneous value of airport noise based on time series analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
R软件的数据挖掘应用;陈荣鑫;《重庆工商大学学报(自然科学版)》;20111231;第28卷(第6期);第602-607页 *

Also Published As

Publication number Publication date
CN103886203A (en) 2014-06-25

Similar Documents

Publication Publication Date Title
CN103886203B (en) Automatic modeling system and method based on index prediction
CN109189750B (en) Operation method, data analysis system and the storage medium of data analysis workflow
CN107450902B (en) Method and system for visual modeling
CN107844424B (en) Model-based testing system and method
US11036483B2 (en) Method for predicting the successfulness of the execution of a DevOps release pipeline
WO2022199179A1 (en) Remaining life prediction model modeling system and method for equipment, and prediction system
CN108763091B (en) Method, device and system for regression testing
US5303147A (en) Computer aided planning method and system
CN112804093B (en) Centralized scheduling support method and system based on fault capability center
US20060129879A1 (en) System and method for monitoring the status and progress of a technical process or of a technical project
CN110895506B (en) Method and system for constructing test data
Lugaresi et al. Generation and tuning of discrete event simulation models for manufacturing applications
Silva et al. A multi-criteria decision model for the selection of a more suitable Internet-of-Things device
CN111984882A (en) Data processing method, system and equipment
US20210142122A1 (en) Collaborative Learning Model for Semiconductor Applications
CN117235527A (en) End-to-end containerized big data model construction method, device, equipment and medium
CN115494801B (en) Plate production line building method and terminal
CN115062791A (en) Artificial intelligence interpretation method, device, equipment and storage medium
CN113807704A (en) Intelligent algorithm platform construction method for urban rail transit data
CN114610590A (en) Method, device and equipment for determining operation time length and storage medium
CN112308225B (en) Method, apparatus and computer readable storage medium for training neural network
CN109155014A (en) The data-driven of real-time wind-force market forcast analysis is called
Lei et al. Application of distributed machine learning model in fault diagnosis of air preheater
CN109753427B (en) Analysis system for power generation and supply test unit
Abdullah et al. Development of enterprise human system modelling framework in support of cellular manufacturing lean operation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170111

Termination date: 20180324